Towards an Integrated Chemistry Information Environment: Discussion

3.1 A collection of On-line Chemical Information Sources

A virtual chemistry information environment must to a significant extent still depend on a variety of technologies, both old and new. Here we describe a model that was first used in an undergraduate environment in October 1994. The library consists of an integrated series of information sources, in which the information is presented to the user in a variety of methods, ranging from simple text, text in the form of HTML with integrated hyperlinks, "forms" where the user can provide feedback, 2D images, 3D molecular datasets to be displayed as embedded images, 3D "scenes" in VRML with embedded hyperlinks to further information, and documents containing Java "applets" where chemical semantics can be coded into the document. Clearly, this concept is still at a very early stage, and most of the information sources currently available implement very little of this technology. The following section represents a snapshot of what is available in early 1996, and can be expected to evolve very rapidly indeed as more people develop specific content.

Each section in this virtual library is associated with an icon, as found in Table 1. A single mouse click on this icon will initiate connection to the data source or technique described. To obtain a fully-functioning library, you will have to set the Helper configuration first.
Table 1. The Virtual Chemistry Library
Start PointInformation SourceProgram Invoked
(if any)
Information about the Supplier
Claris WorksStarting the Electronic NotebookClaris WorksClaris Works
Search the World using Lycos
World-Wide Web PointersA guided tour of the World-Wide Web Information systemThe World-Wide Web Organisation
BIDSThe BIDS Science Citation databaseTelnetBath Information Delivery System
CASThe CAS On-line systemTelnet or SciFinderChemical Abstracts
Silver Platter WebSPIRSThe "WebSpirs" bibliographic database systemSilver Platter
FisherSafety information/chemical availabilityThe Fisher Catalogue
DaylightThe World Drug Index and SavantDaylight Information systems
CambridgeSoftThe ChemFinder systemCambridgeSoft
Beilstein CrossfireThe Crossfire systemBeilstein CommanderBeilstein
MDLI ISISThe ISIS/Base reaction databaseISIS/Draw and BaseMDL Information Systems
CCDCThe Cambridge Crystal Structure databaseX-Window ServerCCDC
The Virtual 3D libraryThe Virtual 3D LibraryVRML and/or Chime plug-inThe vchemlib project
CLICElectronic Conferences and JournalsThe CLIC Project


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

3.2 Navigating the Library

The visitor to the virtual library enters by a "door" provided by a World-Wide Web client such as Netscape, Mosaic, Internet Explorer, etc. They collect the "shopping trolley" by opening up a "notebook" such as Claris Works, in which they will collect bibliographic and graphical information as they proceed with their search.

The visitor starts with simple "one dimensional" keyword searches to retrieve simple bibliographic information. Modern implementations of such keyword search systems use the "forms" HTML interface, which offers a self-contained user interface. The Lycos search system offers perhaps the most comprehensive general catalog of the Internet, and allows two or more keywords to be specified using boolean-like logic. Lycos indexing is performed by an automatic search robot, which is programmed to look specifically at the content of HTML documents. In particular, only the content of documents enclosed by <html> and </html> tags is indexed, which would exclude detailed indexing of any document with chemical content, such as for example a 3D coordinate file. A more recent global search index is Alta Vista introduces some novel features deriving from the structure of the Web itself. For example, one can search for documents which "cite" a particular URL (Uniform resource locator), much in the same way that the Science citation index can be used to follow a "thread" of chemical information.

Other more chemically specific bibliographic searches are offered to the visitor, including the Science Citation Index itself via a UK wide national licensing scheme referred to as "BIDS", demonstration files on CAS On-line system, and more specific databases such as the Fisher catalogue, and "samplers" of the WebSpirs system from Silver Platter. The results of such searches are normally presented as text-based documents. There is little the user can do with such information other than copy-paste the text into a word processor.

To achieve keyword retrieval in a chemical context, one has currently to enter a turn-key system designed to perform this task. Examples include the Daylight system, which allows retrival of structural information using SMILES strings as the search term, and the CambridgeSoft ChemFinder system, which allows a variety of chemical and keyword search terms to be specified. Both these services offer small "sampler" databases via a "forms" interface for the user to experiment with. Here, the results of a search are presented in not only a graphical form, generated in real-time by background programs and scripts, but one also has the opportunity of acquiring 2D or 3D coordinate and connectivity data using the MIME mechanism referred to previously. This allows the user to open a separate molecular window using local programs, and if necessary to save the information on their local disk. Another MIME implementation involves acquiring reaction or 3D query definitions from a remote site, and using these to search a locally implemented database (or one operating a local client/remote server system). We have implemented this with the MDLI ISIS/Base system for searching for synthetic transformations of penicillins using a search definition saved in the "TGF" format, for the Beilstein Crossfire system for searching for molecular properties and for the Cambridge crystal structure database.

These types of search will need various local tools and programs to help with the searching, and these more specialised programs will be invoked from various hyperlinks via MIME definitions. In addition a "telnet" terminal emulator for bibliographic searchs which do not support the "forms" interface is required.
Table 2. MIME Definitions for Activating Chemical Search Programs
MIME TypeProgram Activated
application/x-clarisClaris Works
chemical/x-pdbRasMol or Chime
chemical/x-mdl-tgfISIS Draw/Base
One way of avoiding the need for a plethora of additional "helper" programs is to provide the service via what are called "cgi-bin" (common gateway interface) programs, held on a central server. Such a route has been strongly advoated by Weininger, and illustrated via the Daylight CGI interface. Such a service was also a component of the ECTOC conference for creating a hyperglossary of molecular coordinates and other information. Such a hyperglossary enables auxillary information to be stored to complement conferences and electronic journals. It can hold chemical information, whether it be textual, 2D or even 3D. Once entered into the hyperglossary, the information is readily accessable by the rest of the community in a structured and indexed form. The electronic nature of this system allows the information contained to be indexed for easy searching and the complexity of the searches can range from simple text keyword to the complicated substructure search engine, depending on the sophistication of the server holding the information. The hyperglossary within ECTOC was a collection of molecules that were discussed during the conference. Another example of a hyperglossary in action is the one that supplemented the Internet course on The Principles of Protein Structure to hold a collection of terms on proteins. Both of these hyperglossaries allowed users to add their own contributions.

Future developments in this area include presenting the molecular data as a hyperlinked set of coordinate files encoded in the VRML format, with additional functionality provided by Java encoded applets and scripts illustrate specific topics in chemistry.


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

3.3 Application of these Techniques at Imperial College

Starting in October 1994, students in an advanced organic chemistry laboratory have had access to this integrated chemical information environment as part of the "techniques" component of their course. In the laboratory they will have converted a pencillin derivative to a cephalosporin, and purified this using chromatography. They are asked to perform a rounded literature search on this synthetic conversion and purification, and asked to find any information on these or related compounds that relates to safety, 3D structure etc.

During the course, the students have to select 8 techniques out of a menu of 19 available. The "IT" technique has proved very popular, with the majority opting to do it. Typically, the students will spend about 1 week of laboratory time learning the various techniques, and many continue to use the techniques during subsequent research projects and indeed Ph.D. programs.

The comment made most often by students is of the confusion brought about by using various user-interfaces for defining chemical structures and keyword searches, and the difficulty of transporting information across the divides introduced by the use of different programs. In 1996 we anticipate that some of these concerns will slowly be addressed as the suppliers of chemical information move to integrate their own user-interfaces with those enabled by the Web. At a time of rapidly evolving Web technology, applying standards is inevitably difficult, but at least a small glimmer that this is possible is beginning to emerge. A fuller maturity of this system may however not be seen for several years yet.


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]

3.4 The Future

The virtual chemistry library as currently available has many tantalising hints of what might be possible, but clearly much still remains to be achieved. Thus the library as implemented in April 1996 still requires computers endowed with prodigious amounts of memory which must support the many different program windows that the user might need to open during the course of a session. The custom programs listed in Table 2 are currently capable of little inter-communication with each other beyond simple cut-paste operations on text. Thus the user is still faced with a variety of user interfaces, both for bibliographic keyword searches, and for sub-structure drawing. Some standards do exist. For example, a search query in one system can produce results in the form of a SMILES string that can be used to initiate further searches in other systems. Both HTML and VRML define ways of encoding information which can be cross-referenced or hyperlinked to achieve a coherent theme. Java may provide a transparent interface to incorporate heuristic algorithms and other functionality which can operate on molecular information, spectra, etc and deliver it in customised form to the user.

Gradually, we envisage that modularity in the software components will be accomplished. Perhaps the highest priority now must be to create a structure editing interface as a "plug-in" (or an OpenDoc "part" for the Cyberdog WWW client) which would enable a much more seamless interface to be constructed. The recent announcement by Tripos of a Java applet called Sketch and Fetch appears to be the first such product which seamlessly intergrates into a Web page, although the extent of its modularity is not yet know. Both Java and VRML are now integrated into this environment, so the grand synthesis is gradually coming together.

Given the richness of the tools available now or in the near future, the prospect for developing exciting new methods of presenting molecular sciences looks very good. Over the next year or two, many of these mechanisms will mature into a product that will offer a sea change in the way molecular "information technology" is both used and taught.


[ Abstract |1: Introduction | 2: The Basic Chemical Web Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]