ECHET96 Search CD [Molecules: None] [Related articles/posters: 004 077 109 003 080 ]

email.gif - 0.3 KECHET96 Article 014: em-helpdesk

Database mining for heterocycles: are structures of small heterocycles generated by a computer program present in databases?

Kurt Varmuza ,(E-mail)a Ulrich Jordis(E-mail)b and Günther Wolfb

aTechnical University Vienna, Inst. General Chemistry, Department of Chemometrics, Lehargasse 4/152, A-1060 Vienna, Austria
bTechnical University Vienna, Institute of Organic Chemistry, Department of Computer Applications in Organic Chemistry, Getreidemarkt 9/154, A-1060 Vienna, Austria

Abstract

Computer generated structures (MOLGEN) were converted via ISIS/Draw for Beilstein Commander and searched using the implicit free sites option. The results from this exhaustive list of 3-, 4- and 5-membered heterocycles are used for the creation of "badlists" for spectral interpretation or database statistics. Tables of "No-hit" (sub)structures serve as challenge for theoretical as well as synthetic chemists

Keywords

structure generator, substructure search, badlist, chemical structure database

Introduction

Chemical structures of heterocyclic rings can be defined by elemental composition, the relative positions of the ring atoms, and by the number and position of multiple bonds. Prohibitive strain energies preclude some topological rings present as substructures in libraries of chemical compounds. General restriction for the existence of ring structures [1] are based on Bredt´s rule or energy calculations [2].

This article reports search results using an exhaustive set of strictly defined ring substructures containing three, four or five ring atoms (C, N, O) as substructure input for the database Beilstein Crossfire. These results of course cannot describe ring stabilities in general but are a useful and interesting fundament for applications of other methods.

Knowledge of whether a particular ring system exists "in the real world" is especially valuable in computer-assisted structure elucidation. Still, the still only systematic, and in some sense exhaustive and controlled approach for structure elucidation of organic compounds is based [3]on the DENDRAL project: an isomer generator program is fed with the brutto formula of the unknown, a "goodlist" (substructures that must be present), a "badlist" (substructures that must be absent), and other structural restrictions. A complete and redundancy-free set of all topological molecular structures is built from these data. In available software systems [4 -7] NMR and IR data are used to derive structural restrictions. Recently mass spectral data have also been used [8]. Usually, a permanent badlist is considered that defines a list of substructures which are impossible according to current chemical experience . Knowledge of the existence of sometimes exotic ring structures is helpful for arranging a permanent badlist.

The search results presented may also be of interest for synthesis planning for new compounds or simply serve as a challenge for synthesising new heterocyclic ring systems.

This work was restricted to 3-, 4-, and 5-membered rings, the elements carbon, nitrogen, and oxygen, and the database Beilstein Online.

Systematic generation of heterocyclic rings

The isomer generator software MOLGEN [9, 10] has been used for an exhaustive and redundancy free generation of 3-, 4- and 5-membered rings with the general formula Cc Nn Oo Hh. Applied restrictions for isomer generation were:

  1. c + n + o = r

    c, n, o, h: number of C-, N-, O-, H-atoms

    r ring size (3, 4, 5)

    All possible combinations of c, n, and o have been used, including zero values.

  2. h = hmax, hmax -2, hmax - 4, ... (h > 0) with hmax = 2c + n

    Thereby all possible values for the number of double bond equivalents are considered. Only rings containing at least one hydrogen atom were generated. For substructure searches all hydrogens were replaced by implicit free sites (any substitution possible, including hydrogen).

  3. Bond types allowed were: single, double, triple. No restrictions for bond types were applied in substructure search.
  4. Only the valencies 4, 2, 3 for C, N, O, resp. have been used for ring generation. No restrictions for valencies were applied in substructure search.

Brutto formulae were input manually into MOLGEN. Output was a Molfile (*.SDF) containing all isomers for a given formula. After a slight format correction these structural data could be directly used for substructure searches in Beilstein Crossfire.

A check for duplicate structures in the final files was performed using the program TOSIM[11, 12] .

Table 1 gives an overview of the generated ring structures.

Table 1 Characterization of the generated ring structures.
3-membered rings 4-membered rings 5-membered rings
No. of possible rings 21 63 200
No. of rings containing a double bond (between any atoms) 10 38 143
No. of rings containing a triple bond 2 7 32
No. of rings in which all atoms are topological different 5 28 121
No. of nitrogen containing rings 14 48 169
No. of oxygen containing rings 7 28 107
No. of nitrogen and oxygen containing rings 4 20 87


Table 2 Summary of search results.
3-rings4-rings5-rings
No. of possible rings 21 63 200
No. of rings not found 2 24 86
No. of rings found with 1 to 5 examples 3 12 21
No. of structures for 1st most frequent ring 123621 cpds. 72103 cpds. search for cyclopentane did not go to completion in Crossfire!
No. of structures for second most frequent ring 71233 cpds. 68867 cpds. 317346
No. of structures for third most frequent ring 15567 cpds. 18678 cpds. 296002
The following tables summarize the search results for 3-, 4- and 5-membered heterocycles.





Experimental

The structures files produced by MOLGEN are in a MDL MOL-file format. They were renamed from *.SDF to *.MOL and imported into ISIS/Draw Beilstein Commander (Ver. 1.0) was set to ISIS/Draw for the preferred editor. Individual structures were then copied from ISIS/Draw to the Beilstein Commander and the query options set to implicict free sites:

References

  1. Nuzillard J.M., Quick method for anti-Bredt structure detection, J. Chem. Inf. Comput. Sci1994, 34, 723.
  2. Maier W.F., Schleyer P. von Rague, Evaluation and prediction of the stability of bridgehead olefins, J. Am. Chem. Soc, 1981, 103, 1891.
  3. Gray, N.A.B. Computer-Assisted Structure Elucidation, Wiley, New York, 1986.
  4. Funatsu K., Sasaki S.I., Recent advances in the automated structure elucidation system, CHEMICS. Utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates, J. Chem. Inf. Comput. Sci1996, 36, 190.
  5. Christie, B. D.; Munk, M. E. Structure Generation by Reduction: A New Strategy for Computer-Assisted Structure Elucidation, J. Chem. Inf. Comput. Sci. 1988, 28, 87.
  6. Kalchhauser, H.; Robien, W., CSEARCH: A Computer Program for Identification of Organic Compounds and Fully Automated Assignment of Carbon-13 Nuclear Magnetic Resonance Spectra, J. Chem. Inf. Comput. Sci, 1985, 58, 103.
  7. Thiele, H. X-PERT: A New Expert System for Structure Elucidation. In Software Development in Chemistry, Moll, R., Ed., Springer, Berlin, 1995, vol. 9, pp. 305-317.
  8. Varmuza K., Werther W.: Mass spectral classifiers for supporting systematic structure elucidation, J. Chem. Inf. Comput. Sci. 1985, 58, 323.
  9. Benecke Ch., Grund R., Hohberger R., Kerber A., Laue R., Wieland T.,MOLGEN, isomer generator software, vers. 3 (1995), running under MS-Windows University of Bayreuth, Department of Mathematics II, Germany. MOLGEN is available from the authors.
  10. Benecke Ch., Grund R., Hohberger R., Kerber A., Laue R., Wieland T., MOLGEN, a generator of connectivity isomers and stereoisomers for molecular structure elucidation, Anal. Chim. Acta, 1995, 314, 141.
  11. Scsibrany H., Varmuza K., TOSIM. PC-Software for the Investigation of Topological Similarities in Molecules, Software Development in Chemistry ,ed. Jochum C., 1994, vol. 8, pp. 235-249, Gesellschaft Deutscher Chemiker, Frankfurt am Main.
  12. Varmuza K., Scsibrany H., Cluster Analysis of Chemical Structures, based on Binary Molecular Descriptors and Principal Component Analysis, Software Development in Chemistry , ed. Moll R., 1995, vol. 9, pp. 81-90, Gesellschaft Deutscher Chemiker, Frankfurt am Main.