Towards an Integrated Chemistry Information Environment: Introduction

1. Introduction

Although experiments in on-line provision of chemical information and journals can be traced back to the early 1970s, such services were introduced into most chemistry departments only in the mid 1980s, often via only a single low bandwidth "point of presence". The use of such services in taught courses has been much more variable, largely because significant, and in a teaching environment unmanageable, costs were associated with these services. During this period, the standard user interface was mostly restricted to either the 24 line by 80 character telnet "VT100" terminal mode, or Tektronix 4014 vector graphics mode. There was little integration between various information services from the point of view of query formulation or chemical structure definition. Equally variable was the quality of documentation and on-line help, too often depending purely on the user having access to printed material.

Other significant limitations were the lack of integration of information sources into other laboratory based excercises or molecular modelling themes, and on a wider scale of the incorporation of related projects being conducted in other teaching faculties around the world.

Recently, solutions to such problems have evolved in both a commercial and an educational context. The commercial model is an interesting one, in that it must of necessity evolve around robust and affordable charging models. For example, Current Science has recently launched the "BioMedNet" club, which offers an environment in which subscribers can browse electronic journals, perform keyword searches, and have access to other network resources in a self-consistent and "user-friendly" manner. This club makes use of standard software such as a World-Wide web client and an assumed Internet connectivity to provide access. A rather different, and very much more proprietary model is the "SciFinder" interface to the Chemical Abstracts database, representing the latest stage in the 15 year evolution of on-line services provided by this organisation. SciFinder in its current state of development is very much a closed turn-key client-single server system which does not appear to offer a viable model for the implementation of any local teaching resources. Moreover, the current cost of subscribing to such a service would represent a very significant increase in most library or teaching budgets, at a time when these budgets are under severe pressure to contract. The focus of SciFinder on the commercial sector means that this charging model fits with difficulty into any teaching environment.

A quite different open approach is rapidly evolving in many teaching institutions and is based on a client-multiple server model known as the World-Wide Web system which you are using to view this document. The Web originated in 1989 at the European Laboratory for Particle Physics (CERN) with the first definition of HTML or Hypertext-markup-language and a transport protocol called HTTP (Hypertext-Trasport-Protocol). The participation of the National Center for Supercomputing Applications (NCSA) in 1993 introduced a Web client called Mosaic, which allowed a combination of text and two dimensional images to be used to create a cohesive environment for describing various information services. The real technical innovation of the Web over earlier hypertext systems was the introduction of a global resource locator known as a URL (Uniform Resource Locator), which allows a section of text or a graphic to be seamlessly linked to other relevant documents or resources anywhere on the Internet.

Starting in September 1993, a significant chemical presence began to build up using these technologies [1]. The "critical mass" was probably achieved in 1995, when for the first time it became possible to devise an experiment in molecular information retrieval which could be completely integrated, not merely on a local but on a global scale, with other chemical resources [2]. In this article, we describe our own implementation of an experiment in molecular information retrieval in a teaching environment which takes into account the increasing molecular richness and diversity of the Internet.


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

2. The Basic Chemical Web Technologies

The last two years has seen the introduction of a number of World Wide Web clients designed to display the contents of a HTML document written in hypertext markup language or HTML. Programs such as NCSA Mosaic, Microsoft Internet Explorer, Netscape Navigator or Apple Cyberdog allow the two dimensional page metaphor used in traditional chemistry texts to be translated to on-line form. However, this is no reason that we should continue to be bounded by two dimensions. From the outset, we considered it necessary to develop an infra-structure for linking documents written in HTML with chemically specific datafiles, which could be processed in an explicitly molecular manner by the user. In this section, we discuss various methods that we have evolved over the last two years for more closely integrating chemistry into the traditional "document".

2.1 Chemical MIME

Our first solution to this problem was to adopt a mechanism derived from mail handling programs called MIME or Multipurpose Internet Mail Extensions [3]. This mechanism is integrated in a generic manner into most HTML browsers. Our particular implementation of this was termed chemical MIME [4]. It enables a browser to pass on any documents of an explictly chemical nature to a program of the user's choice present on their computer. This means that HTML documents can contain hyperlinks to chemical data, which can then be displayed in a visual manner which a generic HTML browser is incapable of. A typical example is a hyperlink to a "pdb" file containing 3D molecular coordinates, which can be displayed using an external program such as RasMol or as an "in-lined" molecule using Chemscape Chime (Figure 1).
Figure 1. Adenosine Triphosphate displayed using Chemical MIME.
chemical MIME
If you have installed Chemscape Chime as a "plug-in" to Netscape 2.0, this molecule should appear as a rotating image. If you are using other WWW clients, and have configured the chemical MIME type as chemical/x-pdb, clicking on the thumbnail image will activate the molecule.

[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

2.2 Chemical Structure Markup Language.

Whilst such an approach is capable of adding a rich seam of chemical content to a document, there are specific limitations which soon become apparent. Because programs such as RasMol or derived plug-ins such as Chime cannot themselves resolve hyperlinks, the chemically specific document becomes something of a cul-de-sac into which further hyperlinks cannot easily be inserted. For example, one might wish to have a hyperlink in the master HTML document which when invoked might highlight one specific atom or functional group in the molecule display. To accomplish this, it is necessary to establish further subsequent communication from the original HTML document to the chemical display window. Our original solution to this specific problem was to develop what we termed "CSML" or chemical-structure-markup-language [3b], achieving communication between the HTML browser and RasMol using a feature built into the Unix version of Rasmol, by which a script can communicate with a running RasMol process. By this means, we were able to associate peaks in a 2D NMR spectrum displayed in an HTML document with the individual protons responsible highlighted in a molecule display window.(Figure 2). The user could navigate around the spectrum using a device known as an "image-map", identifying individual proton pairs as they went. Subequently, the CSML mechanism has also been integrated into the Chemscape Chime plug-in, and applied to a "molecule-of-the-month" current awareness collection at Imperial College. Such annotation provides a powerful new teaching tool for use on the Web.
Figure 2. The Partial 2D NOESY Proton NMR Spectrum of the oligonucleotide CGCGTTTTCGCG illustrating the Application of CSML
This represents a "clickable map", locally resolvable if you use Netscape V2, or remotely if you use other browsers. Activate the molecule first, and then spectral cross peaks to "annotate"the RasMol view by highlighting selected protons
c1-c1
c1-g2
g2-g2
g2-c3
c3-c3
c3-g4
g4-g4
g4-t5
t5-t5
t5-t6
t7-t7   
t7-t8   
t8-t8   
t8-c9   
c9-c9   
c9-g10  
g10-g10 
g10-c11 
c11-c11 
c11-g12 
Because "clickable maps" cannot link to "in-lined" molecule displays, the list above represents an alternative way of achieving the same result. If you have Chemscape Chime installed, clicking on these links will highlight individual pairs of protons. The Chemscape Chime display associated with the list of proton contacts on the left

[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

2.3 Virtual Reality Modelling Language

The information display mechanisms described thus far represent essentially a one-directional communication between a hyperlinked document and a molecular visualiser. There is no capability for reversing the direction, from a "marked-up" molecule to HTML or other documents. Two recent developments offer solutions to this problem. During 1995, a three dimensional object description language called VRML or Virtual Reality Modelling Language was introduced. If HTML is thought of as a language used to choreograph the two dimensional ASCII character set, then VRML would correspond to a similar description of a set of three dimensional objects such as spheres, cylinders and other primitive graphical objects. A VRML browser can display these objects in 3D space, and the user can navigate around in this space. Unlike a custom display program such as RasMol, VRML browsers also fully support the hyperlink concept via URLs. Thus a molecule described using VRML can have hyperlinks associated with various atoms, or larger groups, and thus a bidirectional information flow between say an HTML and a VRML document can now be achieved, with each invoking the other. As with Rasmol, the VRML scene can be rendered in either a separate window, or as an in-line image using a "plug-in".
Figure 3. Dimethyl sulfate encoded in VRML, containing embedded hyperlinks associated with individual atoms and bonds illustrating the hydrolysis of this species.
VRML

[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

2.4 Java Applets

Currently, VRML in version 1.0 supports no chemical semantics, i.e. bonds and atoms are not explicitly identified as such, and hence the way they are displayed cannot be changed. Java is a programming language which allows the molecular display code (e.g. RasMol), the display data (e.g. a pdb file) and the hyperlink communication to be built into a single file, or "applet". The applet window is in-lined into the main body of the HTML document. Furthermore, two or more Java applets can establish mutual communication, such that a 2D NMR spectrum can be associated with a 3D rotatable model of the corresponding molecule, with appropriate atoms again highlighted. Thus Java allows small compact applets to be written by users for a specific task. In this, it does not necessarily supercede a specialised display program such as Rasmol, and all three mechanisms outlined above have their particular roles to play in the creating of a rich chemical environment for the user.

Because Java is highly customisable, and also secure, several other issues come to the fore which the community will need to solve. Firstly, is the recognition that two or more Java Applets may need to intercommunicate. To achieve this, chemical standards will have to be created to allow this to happen easily and seamlessly. Secondly, some mechanism for indexing the action and content of a Java applet will need to be created. Such issues also apply to the VRML concepts outlined above. We envisage the major thrust of such work coming from the commercial software developers, but perhaps with an impartial standards body set up to attempt to control the evolution.


Figure 4. A Molecule Rendered using a Java Applet
JAVA
If your WWW client is "Java-aware", the image you see should be rotatable. If your client does not support Java, a simple static image will be present.


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

2.5 Chemical Markup Language

The technologies covered thus far relate to the visualisation and interpretation of molecular coordinate data, with spectral data represented as simple bit-mapped images. However, the variety of disciplines and techniques that chemistry covers is enormous, so it's not surprising that information exchange between different types of molecular datafile is difficult.

It is generally accepted that the best way to tackle these problems is through the use of markup languages. You are reading a markup language (HTML) at the moment! Markup languages add meta-information to a document to tell the recipient more about it. In this spirit, we have started to develop what is termed Chemical Markup language [5].

CML consists of three parts (in ascending hierarchy):

These are quite general, so that markup might appear as
<X.VAR TITLE="Heat of Evaporation" REL="glossary" HREF="/chem/theor?deltahevap" UNITS="kilocalorie/mole">34.12</X.VAR>

The most important result of this is that a very large body of current chemical information can be encoded with CML. CML documents can have a very flexible structure and have already been used to describe precisely:

In the future, we expect mechanisms such as this to achive a closer intergation of virtual chemistry libraries
[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]