CHEMMOL,a free evaluation program for converting MDL-molfiles from 2D to 3D

Roger Sayle & Bernard Blessington

Glaxo Research & Development (GRD) ,Gunnels Wood Rd., Stevenage, Hertfordshire, SG1 2NY and Bradford University, Department of Pharmaceutical Chemistry,Bradford BD7 1DP, UK


ABSTRACT

The method of depicting stereochemistry within MDL-molfiles, probably the most widely used format for organic chemistry, is described and examples of problems arising when such files are processed by several commercial 3D modelling/display software packages are shown. A program (chemmol.exe) is provided for free evaluation, which will convert a 2D molfile into a 3D version,referred to as display coordinates, which is still accepted by 2D drawing programs but can also be successfully converted to 3D output for visualisation. A selection of test files, discussion of CHEMMOLS performance and a brief comparison with existing software are provided. This approach is discussed and offered as a preliminary solution to one aspect of the widespread problem of incompatible chemical structure file formats.
Introduction:

Organic chemists communicate most naturally using 2D structure drawings and even if these are only pictures they are far better than text. However the use of electronic chemical structure files opens up a range of fascinating and valuable tools for organic chemists which require only a PC(or MAC). Software for structure and sub-structure building and searching of 2D / 3D compound and reaction databases; electronic structure exchange using MIME compatible Email; 3 dimensional modelling; quantum computational methods; and correlation with spectroscopic, chromatographic and biological databases are all readily available and increasingly easy to use with Windows interfaces.

Redrawing structures afresh, within each software package, is often done but it is labour intensive and can easily generate transcription errors.A better method is electronic structure copying but it does demand accurate interpretation and exchange of chemical structures.Serious errors can arise, so organic chemists should be aware of current problem areas and contribute to the adoption , implementation and ,in particular, the evaluation of seamless and accurate exchange standards. Millions of chemical structures and reactions are now being offered via databanks ,so it should not just be left to computer specialists. One problem area, stereochemistry, forms the main theme for this paper.

The MDL-molfile (.mol) has been dominant for small molecule structure handling by computer because of its extensive use over many years, its implemented by many different software vendors, and its open and comprehensive specification, which provides capacity for expansion and development. Because it is capable of both three dimensional(3D) and two dimensional(2D) representation of molecules it is specifically suited to the common "dot/wedge" representation of stereochemistry used by organic chemists. This feature is essential for organic chemistry and gives the .mol file format a clear advantage for small molecules over the Brookhaven Protein Databank(.pdb) format which has been the norm for biomolecules.

In reality one version of the .mol has been used mainly for 2D representations and the .pdb has been preferred for 3D. However both have suffered , the .mol probably most, because no controlling body has responsibility for regulating their implementation (1). In recent years different varieties of .mol have been produced which differ only slightly, but sufficiently to make them unreadable or , probably worse, incorrectly readable ,when processed by a different vendor's software. Organic chemists need to be aware of this problem. MDL's Affinity Group is trying to rationalise this by offering free ISIS Draw software(click here for more information) to promote the ISIS sketch (.skc) file standard (and its text equivalent TGF), which is a development of the molfile and has many of the bond character and atom coordinate features of the .mol file at its heart, plus the capability of also incorporating drawing information. Hopefully these efforts and other similar ones will soon produce a much needed rationalisation of structure file use amongs 2D users. A pressing need is to clarify and regulate how superatoms, sometimes called group formulae or atom strings, are represented and incorprated into the final molecular representation.

However a fundamental problem still exists for .mol , .tgf and .skc when 3D software , typically found with modelling packages, is used used to read 2D files. This arises because of the way in which stereochemistry, which is both accurately and clearly depicted within the 2D structures by dotted and wedged or thickened bonds, is converted to a 3D representation. Problems can vary from complete loss of stereochemical information, which has the saving grace of at least being easy to recognise, to the subtle and apparently random insertion of new chirality or the inversion of existing chirality. These chemical configuration errors can make a complete nonsense of subsequent molecular modelling or matching and are especially insidious when they occur, as they often do, without any warning notice. The detection of such serious errors is not easy, requiring the eye of a very experienced organic chemist, even for the relatively simple 3D representations included in this paper.

Demonstration of Errors Arising During Structure Exchange:

* Three types of molecule, a steroid; a penicillin; and a sugar are used as simple examples. Perhaps the easiest way is if you can click on figs1-18 hyperlinks to see pictures of the screen output we have produced of these test structures using various programs.

** If however you wish to do more you can proceed and click on the various (filename.mol) hyperlinks to examine these same files on your WWW viewer if it is configured for RasMol or ISIS Draw.

*** Alternatively if you have ChemX, ChemWindow ,RasWin,ISIS Draw, Hyperchem or other software capable of importing .mol files you can click here to find how to download to your local file a copy of Chemmol.exe plus the above molfiles which you can manipulate yourself on your own computer..


To start this illustration look at the picture of a steroid molecule (fig. 1 ) produced and chemically checked with ChemWindows and then see what happens when the same molfile is imported into RasWin and rotated using the vertical and horizontal scroll bars.Two pictures of before(fig. 2 ) and after horizontal rotation through almost 90 degrees( fig. 3 ) are shown. Exactly the same result , total and obvious loss of stereochemistry, happens when the same molfile is read by Hyperchem

Now look at the drawing of one stereomer of 6-aminopenicilloic acid from ChemWindows (fig. 4 )and then examine carefully pictures of the screen produced when this molfile is read by ChemX. Two pictures(fig. 5 and fig. 6 ) are shown of the ChemX screen, showing different rotated views of the resultant 3D molecule. The misrepresented trans hydrogen stereochemistry of the beta-lactam ring is not quite so easy to see . ChemX also has a facility to depict Cahn Ingold Prelog(R/S) designations(fig. 7 ) for each chiral centre. Correct handling of this topic, which is even more complex, must be left for future discussion but the display is clearly not right.

More complex errors can be seen when the disaccharide , as read and chemically checked with ChemWindows (fig. 8 ),, is transferred as a molfile(disacch1.mol) into RasWin or ChemX (see fig. 9 and fig. 10 for two screen views). Both the carbon skeleton and the stereochemistry are misread. The skeleton misread, showing pentoses rather than the intended hexoses, arises because this particular molfile contains the CH2OH superatom. This actually prevents Hyperchem from reading this molfile at all.

A second molfile(disac1s.mol) for the same structure, produced from (disacch1.mol) using the "make stick structure" command from ChemWindows (fig. 11 ), can be read by all the above programs and produces correct hexose carbon skeletons. However the stereochemistry errors still persist. RasWin and Hyperchem again loose all stereochemistry whilst ChemX produces different stereochemistry (fig. 12 ) to that originally drawn in ChemWindows.

Software and Hardware

CPSS (for Microsoft DOS) and ISIS (for Microsoft Windows) are combination chemical drawing/ database software products of Molecular Design Inc( 2132 Farallon Drive,San Leandro,CA,US),.whilst Chem-X are combined chemical drawing/modelling and database programs from Chemical Design.Ltd., (Cromwell Park, Chipping Norton,Oxon,UK.). CHEMWINDOW v3.1 is a drawing package from Softshell.Int.,(715 Horizon Drive, Grand Junction, CO, US.). KEKULE v 2.0a is a chemical drawing and scan conversion program from PSI. Int., (810 Gleneagles Court, Towson,MD,US.). HYPERCHEM is a modelling suite marketed (up to release 3) by Autodesk and by Hypercube( Waterloo, Ontario, Canada) from release 4. RASWIN is a FREE Windows version of a multi - molecular - format display package produced by Roger A. Sayle, (Glaxo Group Research, Greenford UK.) and WINGIF v1.4( SuperSet Software Corp., P.O.Box 50476, Provo, UT,US) is capable of generating GIF files from chemical drawing packages for incorporation into WWW html documents.

An IBM PC clone(486DX-66Mz, 16Mb RAM, 900Mb HD) was used for all evaluation work, running under Microsoft's (Microsoft Place, Winnersh, Wokingham, Berkshire RG11 5TP, UK ) Windows 3.11.

CHEMMOL.exe

A prototype program CHEMMOL has been written to address such stereochemistry misreads and answer the question;

What molfile version can be read and displayed by both 2D and 3D software ?.

Chemmol tries to generate a molfile containing Display Coordinates, corresponding to the literal z-axis displacement, above or below the plane of drawing, for all bonds arising from chirogenic atoms. Chemmol was intended for retrospective use, to convert existing molfiles, many hundreds of thousands of which exist in the large databanks of pharmaceutical and agrochemical companies and are also components of both .sdf and .rxn files in the majority of chemical databases. A logical and easier future development would be to generate such display coordinate information at the time stereochemical information is first introduced into 2D drawings.

To illustrate this you are invited to use RasWin and/or ISIS Draw to look again at the ChemWindows drawings of previous molfiles( betameth.mol fig. 1 , 6apa6.mol fig. 4 ,disacch1s.mol fig. 11 ) and compare them with their Display Coordinate files (dcbetamet.mol ,dc6apa6.mol , dcdisa1s.mol), which have been produced with Chemmol.Pictures of structures for these diplay coordinate files are shown using Hyperchem (figs. 13 , 14 ,15 ) or RasWin(fig. 16 , 17 ,18 ).

Other examples and the opportunity to convert and exchange your own files are detailed below.

Discussion & Development:

A surprisingly large number of 2D MDL mol files , representing different molecular structure types, have been found by us to generate chemical and stereochemical errors when examined by 3D viewers. Some errors arise because the .mol files contain superatoms. These usually cause a loss of atoms and bonds during conversion and are relatively easy to identify visually. A future useful facility would be to have a molecular formula checker (similar to the status line display in KEKULE) automatically check matching of the input and converted files.

The term .mol is used to describe files for several different kinds of structures. These can range from simple 2D structures , through rough 3D approximations, 3D computed structures, optimised by molecular mechanics or semie empirical methods, and finally experimentally established atomic coordinates( for example those derived from X ray studies). The MDL-molfile format allows two dimension indicator characters at line 2 of the header block, which have, by custom, been set as 2D ( or frequently left as blank) or 3D. We would have liked to insert the letters DC , meaning display coordinates , in this position to indicate that the file produced by CHEMMOL was a rough conversion, which faithfully reproduces the stereochemistry information (configuration) encoded in the bond block of the originating 2D .mol file. Because this could cause problems for other programs during their reading of .mol files, we decided to leave this instruction out and just use 3D. However this is exactly the kind of development issue which requires the controlling influence of a body dedicated to the implementation and development of chemical molecular file formats. Our temporary solution is to prefix the convert file names with DC. So for example dctaxol.mol is the converted , rough approximation, 3D file name for the file generated via CHEMMOL from the input 2D file taxol.mol.

DC files can be visualised by RasMol and/or processed by several commercial modelling programmes which give optimised 3D molecular structures which can then be saved as .mol or .pdb files.Once this step has been taken however the resultant structure, unlike the corresponding dcmolfile, cannot be used to show the original 2D structure with ChemWindow or ISIS. Again we have used a convention and would prefer to leave 3D solely for structures based on experimental X ray data. We prefix the output filesnames with CC, standing for computer calculated , though again we would have liked to incorporate this designation into the dimension characters in the resultant header blocks. Several examples have been included to illustrate this.All dc prefixed files were produced using CHEMMOL and cc prefixed files were produced from the corresponding dc version with HYPERCHEM(Release 4) using its "add hydrogen" and "build" commands. They can all be examined using RasMol.

These same files can be examined on your own computer by activating the appropriate hyperlink in the following sequence of mol files. The way your viewer is configured for MDL-molfiles will determine what pictures you see.(betameth.mol , 6apa6.mol , disacch1.mol , disac1s.mol , dcbetmet.mol , dc6apa6.mol , dcdisac1s.mol , ccbetmet.mol , cc6apa6.mol , ccdisa1s.mol )

Software and File Handling Details:

CHEMMOL is a simple , small, executable DOS programmme which will be offered free for individual exploratory use, initially on PCs. A copy of Chemmol , installation and operating instructions along with a test series of original and converted mol files.will be made available(click here). Further information can be obtained by contacting Dr.Bernard Blessington by Email (b.blessington@brad.ac.uk) or Dr. Roger Sayle (ras32425@ggr.co.uk).

Results:

We would appreciate it if users would report (b.blessington@brad.ac.uk) the number of successful conversions and, if possible, actual molfiles but , more importantly, please supply details of any molecular conversions which are deemed incorrect or partially incorrect ( We know of some, including the disaccharide above, so no prizes are on offer).

WARNING

This version of CHEMMOL is a prototype, so users should clearly recognise its limitations. No responsibility is taken for any consequences arising from its use . It can only handle "simple molfiles" containing explicit bonds. So structures containing superatoms, characterstrings or R groups need to be depicted as simple mol files. The "draw stick structure" command in CHEMWINDOWS v3.1 could be used , for example, to do this.

References:

(1)This conclusion was clearly highlighted during the Email discussion session of Paper56in the first Electronic Conference on Computational Chemistry(ECCC1) at North Illinoise University(ECCC1-information) .


keyword search Home page