Proceedings of the Second Electronic Computational Chemistry Conference, 1995.

Advanced VRML Based Chemistry Applications: A 3D Molecular Hyperglossary

Omer Casher, Christopher Leach, Christopher S. Page and Henry S. Rzepa

Department of Chemistry, Imperial College of Science, Technology and Medicine, London, England, SW7 2AY

Abstract: A test bed for what we have termed a 3D Molecular Hyperglossary has been developed. This is illustrated with the results of a three-dimensional search of the Cambridge Structural Database for intermolecular interactions between chlorinated aryl hydrogen atoms and oxygen centres. The results have been represented as a scatter diagram of points about a well defined molecular centre, and encoded in Virtual Reality Modelling Language (VRML) to enable the user to interrogate what amounts to a three-dimensional hyperglossary index using a VRML browser. Individual regions of the molecular hyperglossary can be inspected by activating hyperlinks contained within the VRML descriptors. These hyperlinks point to three-dimensional VRML encoded diagrams that were automatically generated from the original crystallographic coordinates. The individual molecular diagrams can in turn contain hyperlinks to other relevant information about the molecule, such as links to electronic journal articles or conferences, and other electronic documentation. The implications for use in electronic journals and conferences are discussed.


Introduction

We have previously discussed[1] the chromatographic and structural properties of 3-chloro-9, 13-di-n-butylaminium-1-hydroxypropyl-6-trifluoromethylphenanthrene chloride, the 1-H derivative of the anti-malarial drug halofantrine.

Figure 1 Crystal structure of 3-chloro-9, 13-di-n-butylaminium-1-hydroxypropyl-6-trifluoromethylphenanthrene chloride

The crystal structure (Figure 1) of this molecule showed a short intermolecular C-H...O contact of 2.28Å between the 1-H aromatic proton and the hydroxy group attached to the chiral centre. The lack of chiral chromatographic separation in this particular compound was attributed to this specific interaction, which in turn was thought to result from enhanced acidity of the 1-H proton due to inductive effects arising from from a "W" interaction with a meta chlorine ring substituent, and a long-range, seven-bond, anti-periplanar interaction along the carbon framework with a remote CF3 substituent.

Since chlorinated aromatics occur in a remarkable number of modern drugs, we were intrigued as to how unique the particular structural features of this compound were, and in particular whether there was any statistical preference for meta or para direction for the hydrogen bond formation. We envisaged conducting a series of searches of the Cambridge Structural Database[2] to investigate these aspects. This approach was first demonstrated in a series of influential papers in the early 1980s by Taylor, Kennard and Versichel[3], in their discussions of the geometry of the N-H...O=C hydrogen bond.

Different functional group classes show clear preferences for particular hydrogen bond patterns, often in spite of other unpredictable, non-specific lattice forces. It is desirable, however, not only to be able to view, but to interrogate the results of a search, normalised in a coordinate frame defined by the search fragment. Examination of the proposed structures may reveal certain trends in the distribution, or certain shortcomings in the search criteria. This additional information allows one to build up more and more sophisticated queries that either support or rule out a given hypothesis. From these observations, it should ultimately be possible to construct an axiomatic hierarchy describing the hydrogen bond formation.

Such a search applied to chlorobenzene results in a three-dimensional distribution of points in space, each representing individual entries in the database. The position of each point is calculated relative to the ring centroid by defining the centroid-Cl vector as a "y-axis" and one other vector in the plane of the ring. In this paper, we report our solution to the problems of displaying such a complex scatter diagram and embedding pertinent information within it, and describe the implications that it may have in representing such information in electronic journals and other electronic media.


Procedure

Database queries for interactions between an oxygen and a chlorinated aryl moiety were defined by a contact of between 0.0 and 5.5Å to the centre of the ring. Angular criteria were set such that contacts that were not in the same sextant as either an ortho, meta or para hydrogen were excluded. The Quest software was also instructed to record a number of parameters, such as the O...centroid and O...H contact distances and the polar orientation of the interaction, using the DEFINE keyword, as these could be used for subsequent sub-searaches of the data. It is important to note, however, that the scatterplot is derived from the raw crystallographic data rather than these parameters as additional information is required in the subsequent analysis. The data resulting from these searches were subjected to processing using software that we developed specifically for this purpose.

To probe the origins of specific ring interactions, the computed electrostatic potential for chlorobenzene was calculated at the ab initio 6-311G(d, p) basis set level using the Becke, Lee Yang and Parr (BLYP) density-functional as implemented in the Gaussian 94[4] system, and represented as a difference map by subtraction of the values obtained for benzene itself. Chlorobenzene was optimised to default tolerances, and the MEP map was subsequently generated by invoking the cube(80, potential) keyword. The nosymm keyword was included to prevent reorientation of the molecule during the cube computations. The optimised chlorobenzene z-matrix was also used for the benzene cube calculation, the chlorine centre being replaced by a hydrogen and the corresponding C-H bond length being set to 1.09Å. The grid dimensions were explicity matched to the chlorobenzene calculation through use of the cube(cards, potential) keyword, and the difference map was created using the G94 cubman utility. This 3D distribution function was visualised using custom modules developed as part of our Explorer EyeChem[5] module suite and superimposed upon the crystal cluster diagram.


Virtual Reality Modelling Language (VRML)

VRML is most simply envisaged as a three-dimensional extension to the two-dimensional ASCII character set. In the latter, a single byte of information suffices to encode the quite complex shape of a letter, numeral or other character in the standard ASCII set. A local program (word processor, editor, World-Wide Web browser) serves to convert this byte of information to a pleasing on-screen representation of the character. Thus a text file, which contains symbolic representations of some very specific 2D objects (the ASCII characters), is a particularly concise way of transmitting information. In contrast, encoding the actual shapes of the characters as a bit-mapped image would result in far larger files.

In VRML, a set of three-dimensional objects, such as spheres or cylinders, can be allocated a size, texture, colour and position, and represented in a 3D space using a visualisation program. The basic structure for this representation is based on the Open Inventor Object Library, which includes code for rendering such objects on a computer. In the same way that a text file is a highly compact method of transmitting information where the task of screen rendering is performed locally, so VRML is a very efficient method for transmitting visually complex 3D information. Again a VRML file is far more compact than a bit-mapped 3D MPEG animation file or even a bit-mapped 2D image.

Just as Hypertext Markup Language (HTML) introduced the concept of hyperlinks in a collection of ASCII characters, so a VRML file can contain in-lined and anchored URLs to 3D objects. In our implementation[6], we wrote programs to automatically generate such a VRML file to represent the scatterplot data from a structural database query. Other specific requirements were to include ball-and-stick molecular diagrams and the results from theoretical calculations such as those noted above. These were obtained using our EyeChem suite of modules, which are themselves based on Open Inventor, and which therefore required little modification to produce VRML encoded representations. Additional EyeChem modules were built to read and manipulate the output from the Cambridge Structural Database and to generate the VRML files containing any necessary hyperlinks automatically.

The additional modules, which we collectively call the EyeQ suite, read the FDAT, JNL, MODEL FRAG and TABLE Quest output files. The user selects from a sample fragment which atoms are to be used to define the centroid, y-axis, plane and probe. Where multiple atoms are selected, the mid-point is calculated on a per structure basis. Additional constraints may be included on those parameters that have been specifically included in the TABLE file. By linking together TABLE readers in an appropriate fashion, it is possible to perform AND, OR and NOT operations and thus carry out quite sophisticated sub-searches without having to redefine the Quest query. The software automatically determines whether there are mirror-symmetry planes in the reference fragment and gives the user the option of placing all of the hits into one quadrant and hence of reflecting this quadrant in all planes. A further module normalises, based on the probe radius, the position of the contact relative to the Connolly surface, and permits the exclusion of any point that does not penetrate it.

Figure 2 The map used to generate the scatterplot shown in Figure 3. The VRML conversion modules have been omitted for the sake of clarity.

Once a satisfactory scatterplot has been obtained, the system will automatically convert this to VRML, attaching links to the corresponding VRML structure files and HTML references that are generated from the FDAT and JNL files.

Taken together, these various tools allowed us to construct a three-dimensional equivalent to the two-dimensional hyperglossary we have previously described[7], in which hyperlinks serve to establish connections between various molecular data.


Results and Discussion

Figure 3 Scatterplot showing oxygen contacts to the centroid of chlorobenzene in range 0.0 to 5.5Å within ± 20° of the plane. Those contacts that penetrate the Connolly surface are highlighted. The Connolly surface itself and the MEP difference map fitted to an energy of ± 0.0135 Hartrees are also shown.

The software used to generate the scatterplot enables the user to perform limited sub-searches of the data based on parameters defined within Quest, such as the probe-centroid distance and angular ranges. The main scatterplot shown in Figure 3 thus shows all contacts in the range 0.0 to 5.5Å from the centroid and within ± 20° of the plane. The points have been reflected in both planes of symmetry. The points that are highlighted are the positions of the atomic centres where the probe has crossed the Connolly surface. These points have also been normalised against the Connolly surface, which is shown by the dots surrounding the chlorobenzene moeity. The semi-transparent surface closest to the chlorobenzene is the molecular electrostatic potential difference map[8], fitted to an energy of ± 0.0135 Hartrees (± 8.471 kCal/mol).

On the basis of the difference map, which clearly shows an enchanced electropositive character at the meta and ortho positions of the ring, one might predict that aromatic C-H...O interactions as analysed from the Cambridge Database would also display a preference for clustering at the ortho and meta positions over the para. Interestingly, what is observed is ortho > para > meta. The lack of correlation between the MEP and the x-ray database is the subject of an ongoing study and will be discussed elsewhere[9].

The remainder of this article shall thus be restricted to a general discussion of the method of presenting these results and of the implications that this may have for future methods of scientific communication.


The Implications for Electronic Journals

Figure 3 shows a broad overview of the three-dimensional distribution of points around the aromatic nucleus. Several datasets, produced by different queries but related by the same common axes, may be projected together, and can be identified by the use of size or colour. Similarly, it is possible to draw the reader's attention to particularly interesting structures, such as those in the example that impinge on the Connolly surface. Integrated into this scatter diagram is a ball-and-stick model of chlorobenzene to establish the reference frame. We have shown elsewhere[10] that VRML can be a particularly useful way of rendering computed molecular properties. Thus in a single diagram we can illustrate several different angles on the same problem, obtained by quite different methods. Since the total number of points is quite large, we have selected a sub-set defined by a particular region of space surrounding the chlorobenzene. Each such region contains a hyperlink to a sub-set of data, which the user can select by clicking in appropriate area (Figure 4).

Figure 4 Clicking in a given region of the scatterplot brings up a sub-set of the points to simplify selection.

Figure 5 Each point is then hyperlinked to the corresponding structure...

Figure 6 ...and associated literature.

Each point in the scatterplot holds additional relevant information that is accessible via hyperlinks. So, by selecting one of these points, a unique VRML file will be downloaded, containing the appropriate molecular structure and enabling the user to inspect specific features interactively (Figure 5). Each VRML file contains a link to appropriate references such as a link to an electronic journal article. This can be accessed and viewed via a conventional HyperText Markup Language browser, such as Netscape Navigator (Figure 6). In this way, the researcher has access to the complete set of results, provided by a seamless and transparent mechanism.


Future Applications and Directions

Several different collections of object can be collected together in one "scene", and the entire collection can be thought of as a three-dimensional molecular collaboratory. In particular, each separate scene can originate from a different laboratory and a different geographical location. The browser will integrate these scenes into a composite diagram. Just as an electronic conference such as ECCC-2 brings together a collection of documents, largely comprising two-dimensional "hypertext" and images that are held on different servers around the world, so one can imagine creating a similar three-dimensional metaphor for complex visual objects. Instrumental results in one laboratory, for example, might be combined with theoretical simulations in another to produce a composite scene.

Before VRML can become a truly general descriptive tool for chemical objects, the collection of primitive objects will have to be enriched and a mechanism for including simple text descriptors will have to be incorporated. Developments such as Molecular Inventor hold much promise in this regard. VRML support in standard chemical drawing packages such as Chem3D is also an encouraging sign. One area where development might also be expected is a combination of a secure scripting language such as JAVA with the basic VRML descriptors. This would allow the objects in a "scene" to be choreographed, allowing time-dependent phenomena to be described; a sort of virtual molecular ballet! Other metaphors found in the basic HTML/HTTP interaction of the World-Wide Web, such as FORMS for user responses, will need to be translated to the VRML scenario. Perhaps when compound document architectures finally mature, the transition between a more conventional HTML based document and a VRML "scene" will vanish as far as user perception is concerned.

We are clearly at the very earliest stages of exploring how a 3D document might influence scientific collaboration and information dissemination. The next priority must be to develop models for how individual "scenes" in such a world might be indexed and searched for. Clearly, a three-dimensional equivalent of the keyword search needs to be developed. Perhaps we are also some way from user acceptance of such models. There is still a very marked resistance to reading information directly from a computer screen rather than from a printed sheet of paper. The idea of navigating through a three-dimensional world is an even more dramatic departure from convention, and the implications are only now being realised. However, we venture to predict that, within a few years, the idea of a 3D scientific journal may yet come to be accepted.


Acknowledgements

We thank SmithKline Beecham Pharmaceuticals for a studentship (to CSP) and Dr Mike Webb for many helpful discussions and advice.


References

1. P. Camilleri, D. S. Eggleston, H. S. Rzepa and M. L. Webb, J. Chem. Soc., Chem Commun.., 1135, 1994.

2. F. H. Allen, J. E. Davies, J. J. Galloy, O. Johnson, O. Kennard, C. F. Macrae, E. M. Mitchell, G. F. Mitchell, J. M. Smith and D. G. Watson, J. Chem. Inf. Comput. Sci., 31, 187, 1991.
For further information, contact the Cambridge Crystallographic Data Centre.

3. R. Taylor and O. Kennard, J. Am. Chem. Soc., 104, 5063, 1982;
R. Taylor, O. Kennard and W. Versichel, J. Am. Chem. Soc., 105, 5761, 1983;
R. Taylor, O. Kennard and W. Versichel, J. Am. Chem. Soc., 106, 244, 1984;
R. Taylor, O. Kennard and W. Versichel, Acta Cryst. B (Str. Sci.), 40, 280, 1984;
R. Taylor and O. Kennard, Acc. Chem. Res., 17, 320, 1984.

4. Gaussian 94, Revision C.2, M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson, M. A. Robb, J. R. Cheeseman, T. Keith, G. A. Petersson, J. A. Montgomery, K. Raghavachari, M. A. Al-Laham, V. G. Zakrzewski, J. V. Ortiz, J. B. Foresman, J. Cioslowski, B. B. Stefanov, A. Nanayakkara, M. Challacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres, E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Binkley, D. J. Defrees, J. Baker, J. J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A. Pople, Gaussian, Inc., Pittsburgh PA, 1995.

5. O. Casher, S. Green and H. S. Rzepa, J. Mol. Graphics, 12, 226, 1994.

6. O. Casher and H. S. Rzepa, Computer Graphics, 29, 52, 1995.
Click here for other examples of our work in this area.

7. C. Leach, P. Murray-Rust and H. S. Rzepa, Electronic Conference on Trends in Organic Chemistry, (Eds H. S. Rzepa, J. M. Goodman and C. Leach), June 1995.

8. M. D. Ryan, in "Modeling the Hydrogen Bond", (Ed D. A. Smith), ACS Symposium Series 569, 36, 1993.

9. C. S. Page, H. S. Rzepa and M. L. Webb, Paper in Preparation.

10. G. A. Suñer, O. Casher, and H. S Rzepa, Electronic Conference on Trends in Organic Chemistry, (Eds H. S. Rzepa, J. M. Goodman and C. Leach), June 1995