ECHET96 Article 014: em-helpdesk
Database mining for heterocycles: are structures of small heterocycles generated by a computer program present in databases?
aTechnical University Vienna, Inst. General Chemistry, Department of Chemometrics, Lehargasse 4/152, A-1060 Vienna, Austria
bTechnical University Vienna, Institute of Organic Chemistry, Department of Computer Applications in Organic Chemistry, Getreidemarkt 9/154, A-1060 Vienna, Austria
Computer generated structures (MOLGEN) were converted via ISIS/Draw for Beilstein Commander and searched using the implicit
option. The results from this exhaustive list of 3-, 4- and 5-membered heterocycles are used for the creation of "badlists" for spectral interpretation
database statistics. Tables of "No-hit" (sub)structures serve as challenge for theoretical as well as synthetic chemists
Keywordsstructure generator, substructure search, badlist, chemical structure database
Chemical structures of heterocyclic rings can be defined by elemental composition, the relative positions of the ring atoms, and by the number and position of multiple bonds. Prohibitive strain energies preclude some topological rings present as substructures in libraries of chemical compounds. General restriction for the existence of ring structures  are based on Bredt´s rule or energy calculations .
This article reports search results using an exhaustive set of strictly defined ring substructures containing three, four or five ring atoms (C, N, O) as substructure input for the database Beilstein Crossfire. These results of course cannot describe ring stabilities in general but are a useful and interesting fundament for applications of other methods.
Knowledge of whether a particular ring system exists "in the real world" is especially valuable in computer-assisted structure elucidation. Still, the still only systematic, and in some sense exhaustive and controlled approach for structure elucidation of organic compounds is based on the DENDRAL project: an isomer generator program is fed with the brutto formula of the unknown, a "goodlist" (substructures that must be present), a "badlist" (substructures that must be absent), and other structural restrictions. A complete and redundancy-free set of all topological molecular structures is built from these data. In available software systems [4 -7] NMR and IR data are used to derive structural restrictions. Recently mass spectral data have also been used . Usually, a permanent badlist is considered that defines a list of substructures which are impossible according to current chemical experience . Knowledge of the existence of sometimes exotic ring structures is helpful for arranging a permanent badlist.
The search results presented may also be of interest for synthesis planning for new compounds or simply serve as a challenge for synthesising new heterocyclic ring systems.
This work was restricted to 3-, 4-, and 5-membered rings, the elements carbon, nitrogen, and oxygen, and the database Beilstein Online.
Systematic generation of heterocyclic rings
The isomer generator software MOLGEN [9, 10] has been used for an exhaustive and redundancy free generation of 3-, 4- and 5-membered rings with the general formula Cc Nn Oo Hh. Applied restrictions for isomer generation were:
- c + n + o = r
c, n, o, h: number of C-, N-, O-, H-atoms
r ring size (3, 4, 5)
All possible combinations of c, n, and o have been used, including zero values.
- h = hmax, hmax -2, hmax - 4, ... (h > 0)
with hmax = 2c + n
Thereby all possible values for the number of double bond equivalents are considered.
Only rings containing at least one hydrogen atom were generated.
For substructure searches all hydrogens were replaced by implicit free sites (any
substitution possible, including hydrogen).
- Bond types allowed were: single, double, triple. No restrictions for bond types
were applied in substructure search.
- Only the valencies 4, 2, 3 for C, N, O, resp. have been used for ring
generation. No restrictions for valencies were applied in substructure search.
Brutto formulae were input manually into MOLGEN. Output was a Molfile (*.SDF) containing all isomers for a given formula. After a slight format correction these structural data could be directly used for substructure searches in Beilstein Crossfire.
A check for duplicate structures in the final files was performed using the program TOSIM[11, 12] .
Table 1 gives an overview of the generated ring structures.
Table 1 Characterization of the generated ring structures.
| No. of possible rings
|No. of rings containing a double bond (between any atoms)
|No. of rings containing a triple bond
|No. of rings in which all atoms are topological different
|No. of nitrogen containing rings
|No. of oxygen containing rings
|No. of nitrogen and oxygen containing rings
Table 2 Summary of search results.
The following tables summarize the search results for 3-, 4- and 5-membered heterocycles.
|No. of possible rings
|No. of rings not found
|No. of rings found with 1 to 5 examples
|No. of structures for 1st most frequent ring
||search for cyclopentane did not go to completion in Crossfire!|
|No. of structures for second most frequent ring
|No. of structures for third most frequent ring
The structures files produced by MOLGEN are in a MDL MOL-file format. They were renamed from *.SDF to *.MOL and imported into ISIS/Draw
Beilstein Commander (Ver. 1.0) was set to ISIS/Draw for the preferred editor. Individual structures were then copied from ISIS/Draw to the Beilstein Commander and the query
options set to implicict free sites:
- Nuzillard J.M., Quick method for anti-Bredt structure detection, J. Chem. Inf. Comput. Sci1994, 34, 723.
- Maier W.F., Schleyer P. von Rague, Evaluation and prediction of the stability of bridgehead olefins, J. Am. Chem. Soc, 1981, 103, 1891.
- Gray, N.A.B. Computer-Assisted Structure Elucidation, Wiley, New York, 1986.
- Funatsu K., Sasaki S.I., Recent advances in the automated structure elucidation system, CHEMICS. Utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates, J. Chem. Inf. Comput. Sci1996, 36, 190.
- Christie, B. D.; Munk, M. E. Structure Generation by Reduction: A New Strategy for Computer-Assisted Structure Elucidation, J. Chem. Inf. Comput. Sci. 1988, 28, 87.
- Kalchhauser, H.; Robien, W., CSEARCH: A Computer Program for Identification of Organic Compounds and Fully Automated Assignment of Carbon-13 Nuclear Magnetic Resonance Spectra, J. Chem. Inf. Comput. Sci, 1985, 58, 103.
- Thiele, H. X-PERT: A New Expert System for Structure Elucidation. In Software Development in Chemistry, Moll, R., Ed., Springer, Berlin, 1995, vol. 9, pp. 305-317.
- Varmuza K., Werther W.: Mass spectral classifiers for supporting systematic structure elucidation, J. Chem. Inf. Comput. Sci. 1985, 58, 323.
- Benecke Ch., Grund R., Hohberger R., Kerber A., Laue R., Wieland T.,MOLGEN, isomer generator software, vers. 3 (1995), running under MS-Windows University of Bayreuth, Department of Mathematics II, Germany. MOLGEN is available from the authors.
- Benecke Ch., Grund R., Hohberger R., Kerber A., Laue R., Wieland T., MOLGEN, a generator of connectivity isomers and stereoisomers for molecular structure elucidation, Anal. Chim. Acta, 1995, 314, 141.
- Scsibrany H., Varmuza K., TOSIM. PC-Software for the Investigation of Topological Similarities in Molecules, Software Development in Chemistry ,ed. Jochum C., 1994, vol. 8, pp. 235-249, Gesellschaft Deutscher Chemiker, Frankfurt am Main.
- Varmuza K., Scsibrany H., Cluster Analysis of Chemical Structures, based on Binary Molecular Descriptors and Principal Component Analysis, Software Development in Chemistry , ed. Moll R., 1995, vol. 9, pp. 81-90, Gesellschaft Deutscher Chemiker, Frankfurt am Main.