The Web in Chemistry: Hyperactive Molecules.

Henry S. Rzepa* and Christopher Leach

Department of Chemistry, Imperial College of Science Technology and Medicine, London, SW7 2AY, U.K. URL: http://www.ch.ic.ac.uk/talks/www94-chicago-paper.html

A paper to be presented at the Second International Conference on the World-Wide-Web, Chicago, October 17-20, 1994.

Introduction

Chemistry is in many ways an ideal subject for the application of hypermedia concepts.[1] More than ten million molecules have been well documented in the scientific literature, providing a particularly fertile database for isolating three dimensional structural themes and relationships and associating these with the large diversity of measurable molecular properties. Traditionally, chemists have been forced to use the printed medium almost exclusively for their molecular communication, and have hence been forced to adopt remarkably creative but nevertheless contrived formalisms for representing their subject on paper.

The last 15 years or so have seen a sea change in the way the tangled mass of molecular information is searched and delivered to users, with the gradual introduction of computer networks and on-line databases enabling text-based bibliographic searches to be carried out on the "worktop". Graphical object based searches (in the jargon "sub-structure searches") are a more recent innovation, a recognition that most chemists think in terms of icons rather than text. Molecules after all are a collection of simple objects such as atoms whose relationship to each other (i.e. bonds) defines molecules, and where collections of these molecules and their three dimensional relationships often defines the macroscopic properties and activity of chemicals. Chemistry and molecular biology really are "virtual reality" subjects par excellence, and it turns out that much of the infra-structure available in the World-Wide-Web system is remarkably suited for delivering molecular information in a more innovative and instantly productive form than the more traditional mechanisms. In this presentation, we will first outline some additional chemically oriented components that we have added to the infra-structure, illustrate these with a specific example, and then discuss the implications of this new medium for both chemical research and teaching.

Chemical MIME types.

The existing seven primary MIME (Multipurpose Internet Mail Extension) types are well suited for the delivery of text, a variety of two dimensional images and multi-media video and sound. However, there is little recognition in these existing definitions of what we might describe as "virtual reality" MIME types, enabling the specification of three dimensional objects and their relationships. As it happens, such definitions have been around for some time in chemistry. For example, the protein data bank or "pdb" definition[2] was specifically created in the early 1970s to enable virtual display and navigation around large molecules such as proteins, carbohydrates and oligonucleotides (DNA). We have identified a number of precisely specified molecular definitions and proposed[3] that these be collectively identified via a new primary MIME type to be known as chemical. This would enable molecular data to be delivered chemically intact via the Web metaphor. Next, we have identified a number of freely distributable existing graphical packages, or "helpers" that will recognise and process this data and produce on-screen navigable displays of what we have termed "hyperactive molecules". These include programs such as Rasmol,[4] XMol,[5] EyeChem[6] and MAGE.[7] Many more commercial programs are available which are also suitable.

Chemical Structure Markup Language (CSML).[8]

The concept that text in a HTML document can have attributes defined such as size, color, font and style specifications, and orientation within a two dimensional space is of course very familiar. It struck us that atoms, bonds and other molecular properties can be assigned similar visual attributes. Just as the HTML specification is used to define the markup of text and simple 2D images in Web browsers, so molecular viewers can be thought of as formatters of chemical structural information. Currently, existing programs such as Rasmol do not implement any transport protocol such as HTTP, and we have therefore used the following combination to achieve markup of chemical structures. A hyperlink is inserted into text or 2D images in a HTML document, with a URL corresponding to a file containing a set of CSML commands. These indicate how specified atoms or collections of atoms within a molecule will be rendered on the screen display.[9] This file is allocated a MIME type chemical/x-csml, and associated with a local helper script which reads the CSML instructions and passes them to e.g. a running Rasmol process. Local chemically cognisant computations based on the marked-up molecules can thus be enabled, involving perhaps the calculation of interatomic distances, molecular weights of specified residues, isotope patterns, or even more elaborate evaluations of molecular wavefunctions, energies and other derived properties.

Currently, the function of the Web browser and the molecular visualiser are separated, with CSML and its helper script serving as the communication channel between them. In the future, we expect that compound document architectures such as OpenDoc will enable molecular functionality and CSML markup to be integrated into the operation of the Web browser itself.

The Solvation of cis Cyclohexane-1,3-diol: An application of Hyperactive Molecules.

The following two paragraphs are couched in the two dimensional based description of a chemical problem, typical of how the science is presented in most existing paper bound journals or books.

Although small, cis cyclohexane-1,3-diol has several quite subtle molecular properties, which present an interesting communicational challenge. This molecule can exist in two rapidly interconverting molecular shapes or "conformations", known as di-equatorial and di-axial. The precise concentration of each form is significantly influenced by the solvent used to dissolve the molecule.[10]

The preference of the molecule for either of these conformations can be used as a specific indicator of the molecular energetics and the influence of the surrounding environment on those energetics. Another feature which needs to be discussed, which as it happens is of paramount importance in understanding the shapes and activities of many enzymes, is a phenomenon known as hydrogen bonding, defined in this case as an interaction between the small molecular subunit O-H...O. Only the di-axial conformation of cyclohexane-1,3-diol can indulge in any intramolecular hydrogen bonding, whilst the shape of the di-equatorial isomer precludes this.

Such a description of chemistry in action will of course be largely familiar to trained molecular scientists, who are used to interpreting this symbolic language in their version of three dimensional "virtual chemical" reality. Even so, most chemists would traditionally at this stage rush off to a set of plastic "molecular models" and start constructing cyclohexane-1,3-diol in order that they fully appreciate the above discussion. The following stages are used to convert this into a description couched in terms of Web based hyperactive molecules;

We are now ready to move on to the next stage of the research project, which is to study the energy of the di-axial conformation of cyclohexane-1,3-diol as a function of the geometrical variables describing the precise orientation of the two O-H groups in the molecule.[11] This is achieved using a quantum mechanically based program such as MOPAC, and the derived energy can be represented as an isometric projection of the two geometry dimensions;

The next requirement was to isolate the effect of hydrogen bonding on the features of this energy map. We have achieved this in a novel way by applying the COSMO continuum solvation model to the MOPAC energy Hamiltonian to simulate the effect of a solvent, defined in terms of the dielectric constant [[epsilon]] of the solvent varying over the values 1-80. The idea here was that those orientations of cyclohexane-1,3-diol in which an intramolecular O-H...O hydrogen bond is present would have a lesser solvation energy than those conformations where such a feature was absent. The effect was anticipated to grow as the solvent dielectric increased from 1 (a gas phase non-solvating environment) to 80 (corresponding to the highly solvating water environment). The issue then is how to present this complex set of information in a manner that is easily comprehended by a reader. The information comprises a calculated solvation energy as a function of two molecular geometrical variables and a solvent dielectric. An additional 3N-8 geometrical parameters (N = number of atoms present in the molecule) may need to be presented visually at interesting points in the energy map. There is a great deal of information here, and any attempts to present it on the printed page lead to complexity and loss of clarity to the reader.

How can the multimedia Web and hyperactive molecule metaphors help clarify and communicate the science? Our solution was as follows.

Discussion.

It is easy to identify many limitations in the currently paper based journals and textbooks in which chemistry as a subject is presented. Chemical structures have had to be presented in static two dimensional representations, or as highly symbolic text based descriptions. Much information such as molecular coordinates are regularly discarded by authors of such papers because there is no effective and cheap method available for its publication. There is no possibility for the reader to interact with the data; only the author's point of view can be visualised. The infra-structure described above has the potential for changing much of this, including the face of scientific journals and the way in which chemistry as a subject is taught and popularised. This in turn raises important issues. Few chemists are yet fully aware of the these implications for how chemistry as a subject is taught and communicated. The next five years or so will be an interesting time indeed.

Acknowledgements: We thank Benjamin Whitaker (Leeds), Mark Winter (Sheffield), Peter Murray-Rust, Roger Sayle and Martin Hargreaves (Glaxo), and Glaxo Research and Development (Greenford) for a studentship and funding.

Footnotes

1 For a review of the applications to chemistry, see H. S. Rzepa, "Chemistry and the World-Wide-Web", in "Chemistry and the Internet", Ed. S. Bachrach, ACS Publications, 1995, to be published.

2 The Protein Data Bank, Chemistry Department, Brookhaven National Laboratory, Upton, NY.

3 H. S. Rzepa and P. Murray-Rust, Internet Draft: Chemical Mime Type, May-November 1994. See ftp://cnri.reston.va.us/internet-drafts/draft-rzepa-chemical-mime-type-00.txt

4 R. Sayle, Rasmol: A Molecular Visualisation System.

5 XMol, Minnesota SuperComputer Center, Minneapolis, Mn, USA., see http://www.arc.umn.edu/GVL/Software/xmol/XMol.html

6 O. Casher, H. S. Rzepa and S. Green, J. Mol. Graphics, 1994, in press.

7 D.C. Richardson and J.S. Richardson, Protein Science, 1992, 1, 3; D. C. Richardson and J. S. Richardson, Trends in Biochem. Sci. 1994, 19, 135-8

8 O. Casher, G. Chandramohan, M. Hargreaves, P. Murray-Rust, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans. 2, submitted for publication.

9 This is distinct from the proposed chemistry SGML dtd (T. Tallant, personal communication, Oak Ridge National Laboratory), which is to be used for two dimensional markup of chemistry in a manner suitable for eventual printing.

10 R. J. Abraham, E. J. Chambers and W. A. Thomas, J. Chem. Soc., Perkin Trans. 2, 1993, 1061.

11 O. Casher, C. Leach and H. S. Rzepa, paper submitted to the First Electronic computational Chemistry Conference; http://www.ch.ic.ac.uk/eccc.html. For details of the conference, see S. Bachrach, http://hackberry.chem.niu.edu:70/0/webpage.html

12 For example Global Instructional Chemistry, Ed. H. S. Rzepa; http://www.ch.ic.ac.uk/GIC/