The Application of Chemical Multipurpose Internet Mail Extensions (Chemical MIME) Internet Standards to Electronic Mail and World-Wide Web information exchange

Henry S. Rzepa,a Peter Murray-Rustb and Benjamin J. Whitakerc

aDepartment of Chemistry, Imperial College, London, SW7 2AY; E-mail: rzepa@ic.ac.uk
bVirtual School of Molecular Science, Nottingham
cSchool of Chemistry, Leeds.

Summary: The global adoption of a proposed Internet standard based on chemical primary Multipurpose Internet Mail Extensions (chemical MIME) media type is reviewed. Examples of the configuration of this standard for use with Internet based electronic mail and World-Wide Web clients are shown. The long term objectives of the integration and inter-operability of chemical information across the boundaries of electronic journals, conferences, virtual courses, databases, modelling and information handling tools and other newly emerging tools for scientific communication based on the Internet are set out.

Introduction

The development of Internet-based document and information delivery systems during the last four years has been rapid.1 This review will focus on one aspect, the chemical application of an Internet standard known as MIME (Multipurpose Internet Mail Extensions) to the World-Wide Web and to electronic mail. The focus of attention has predominantly been on the creation and delivery of chemically oriented World-Wide Web-based documents, which has in turn introduced concepts such as the use of structured and interlinked document collections via Hypertext Markup Language (HTML). Electronic mails remains arguably more widely used than the World-Wide Web, but despite this, the basic mechanism has altered little over the last four years, perhaps because e-mail continues to be regarded as a temporary and informal communication medium, not well suited for the precisely defined exchange of structured information in a subject area such as chemistry. Unlike the World-Wide Web, e-mail continues to be predominantly used to exchange loosely structured messages based on ASCII text and rarely if even containing explicit markup (chemical or otherwise) or easily machine-parsable semantics. It is also a mechanism in which the recipient cannot easily choose the time and place to receive the information, in contrast to the Web where the user has control over when a document can be "pulled".

The purpose of this article is to review the chemical application of MIME standards on the Web, to introduce the use of MIME in electronic mail and the World-Wide Web, to show how a transparent integration of e-mail and Web based exchanges of chemical information might be achived, and to present our manifesto for how we believe future development should proceed.

Multipurpose Internet Mail Extensions (MIME)

In 1992, Borenstein and Freed proposed a simple protocol4 for electronic mail termed MIME, which was subsequently adopted as a standard by the Internet Engineering Task Force (IETF). This standard involves two components. The first defines how binary computer files must be encoded to achieve so-called 7-bit transparency for compatibility with most text-based Internet mail routers, and is not discussed further here. The second component defines a standard mechanism whereby computer files can be associated with an e-mail message via appropriate headers and delimiters, and allows the appropriate processing of such enclosures by mail handling programs in the possession of the e-mail recipient. Borenstein and Freed envisaged that whilst the main component of an e-mail message could remain informal and unstructured, the MIME mechanism would allow structured and well defined attached data files to be handled separately. These data files were to be known as media types, and in the original proposal, a number of such primary media types were defined, each sufficiently generic that default handling schemes could, at least in principle, be applied their content. Thus it is clearly apparent that different processing and display mechanisms are required for the primary media types TEXT, IMAGE, AUDIO and VIDEO. The APPLICATION media type has less well defined boundaries, and tends to be used for the resolution of proprietary data types defined by the developers of software applications. Most recently, the MODEL media type has been added to allow the processing of numerical and symbolic data for 3D models.

The MIME protocol also defines a secondary media type header which allows the definition of more specific information on the expected content of a message attachment. For example, IMAGE/JPEG defines a raster type image file in the specific standard format defined by the Joint Photographic Experts Group. The two level mechanism also allows a separate name space to be defined for each primary media type.

In early 1994, we considered5 how the MIME mechanism could be used to allow the exchange of standard (ratified or de facto) chemical data types using either e-mail mechanisms or the then emerging medium of the World-Wide Web. Whilst many of the so-called chemical legacy formats are not always fully documented and specified in the literature, and some such as the Brookhaven protein databank format have spawned a number of variants and mutations over the years, we nevertheless felt that the concept of "chemical" as a new primary MIME media type would have a number of distinct advantages.

Firstly, it was apparent that none of the original or subsequently proposed primary media types would allow any sensible component of default handling of implicit chemical information contained in a data file. Secondly, the MIME mechanism operates by assigning three or four letter filename extensions to the data files, and hence each primary type must operate within a closely regulated name space convention. By assigning a primary type CHEMICAL, this name space could be delegated to the community that defines the media type, rather than the less manageable Internet community as a whole. Finally, the adoption of CHEMICAL as a primary media type was seen as the first step in achieving a closer integration between the exchange of chemical information via document server systems such as the World-Wide Web and the exchange of the same data types using electronic mail mechanisms.

Chemical MIME Types

In the four years or more that have elapsed since the original proposal for chemical MIME types, their use via the World-Wide Web has become common, but their application with electronic mail much less so. Listed in Table 1 are the chemical MIME types which as far as we are aware have actually been used to a greater or less extent during this period. Suggestions for appropriate programs capable of processing and/or displaying the molecular content are included in the table. A description of the full definitive list will be published elsewhere.6 Proposals for additional chemical MIME types should be addressed to the present authors in the first instance.

These MIME types can be further sub-divided into three categories.

  1. chemical MIME types which have been configured for Web (HTTP) document servers operating on an Internet-wide scale, ie associated with publically published documents. Such configuration is normally accomplished via a privileged account, and the use of standard types is essential so that different servers allow documents of the same type to be access by remote users in an identical manner. The precise manner in which any individual server is configured may differ, but a typical entry in a "mine.types" configuration file might appear as follows

    chemical/x-mdl-molfile mol

    This simply serves as an instruction to the server that any document associated with a filename extension .mol is issued upon request with a document header containing a specification of the MIME type as chemical/x-mdl-molfile

  2. It is common for Intranet systems, ie those associated with documents which are only accessible in a controlled private environment, to define additional non-standard MIME types for local use. The responsibility for coordinating the use of such private types lies entirely within the organisation, and is to be contrasted with the use of public types, for which articles such as this serve to co-ordinate globally.
  3. The configuration of user software for MIME is accomplished quite differently from that for servers. A number of the MIME types listed in Table 1 in fact derive from so-called "plug-ins" which can be used to enhance the basic capability of World-Wide Web browser and email software, and which removes much of the burden of installation of the MIME mechanism from the user. An alternative is for the user to pro-actively specify that a designated "helper" program be used to resolve the chemical document. In some cases, such as the Netscape Communicator program, the same software package can be used for handling both Web documents or email messages, and the user's configuration for both is handled via a single plug-in installation process. For other programs, such as stand-alone email clients, the user will have to do the configurtion process explicitly.
chemical/x-daylight-smiles smi

Table 1. Chemical MIME Media Types in use during 1994-1998.

Type Filename-extension DescriptionSource of Possible Program or Plug-in
chemical/x-cdx cdx ChemDraw eXchange filehttp://www.camsoft.com/plugins/(a)
chemical/x-cif cifCrystallographic Interchange Formathttp://www.crystalmaker.co.uk/
chemical/x-chem3d cdx Chem3D filehttp://www.camsoft.com/download_conet.html#3d
chemical/x-cmdf cmdf CrystalMaker Data formathttp://www.crystalmaker.co.uk/
chemical/x-cml cmlChemical Markup Languagehttp://www.venus.co.uk/omf/cml/
chemical/x-x-daylight-smiles smiDaylight SMILEShttp://www.daylight.com/
chemical/x-csml csml, csmChemical Style Markup Languagehttp://www.mdli.com/chemscape/chime/(a)
chemical/x-galactic-spc spcSPC format for spectral and chromatographic data.http://www.galactic.com/galactic/
Data/spcvue.htm
chemical/x-gaussian-input gau Gaussian Input formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-gaussian-cube cub Gaussian Cube (Wavefunction) formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-isostar istr, ist IsoStar Library of intermolecular interactionshttp://www.ccdc.cam.ac.uk/
chemical/x-jcamp-dx jdx, dxJCAMP Spectroscopic Data Exchange Formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-kinemage kin Kinetic (Protein Structure) Imageshttp://www.faseb.org/protein/kinemages/
MageSoftware.html
chemical/x-mdl-molfile mol MDL Molfilehttp://www.mdli.com/chemscape/chime/(a)
chemical/x-mdl-rxnfile rxn MDL Reaction formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-mdl-tgf tgfMDL Transportable Graphics Formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-macmolecule mcm MacMolecule File Formathttp://www.molvent.com/
chemical/x-macromodel-input mmd, mmodMacroModel Molecular Mechanicshttp://www.columbia.edu/cu/chemistry/
mmod/mmod.html
(a)
chemical/x-mopac-input mop MOPAC Input formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-pdb pdbProtein DataBankhttp://www.mdli.com/chemscape/chime/(a)
chemical/x-xyz xyzCo-ordinate Animation formathttp://www.mdli.com/chemscape/chime/(a)
chemical/x-vmd vmd Visual Molecular Dynamicshttp://www.ks.uiuc.edu/Research/vmd/
(a) MIME type supported via a Browser plug-in.

Application of chemical MIME using Client Software

An overview of how MIME can be applied to the transport of specific chemical data types using the two principle Internet mechanisms of e-mail and the Web is illustrated in Scheme 1.

Scheme 1: Internet-based document and data flow, illustrating how MIME headers can be used to structure information exchange.

The data-flow diagram shows that three, and perhaps four, distinct data storage areas are used on any individual user's computer file system. These include the general user file area, an area specified by the user for receipt of e-mail attachments, a temporary area associated with the Web-client cache if specified by the user and finally a Web document collection area if the user has specified a personal web-server or has access to a central web server. Chemical MIME at least in part provides one mechanism for achieving self-consistency in the handling of chemical files across these four file areas.

To more specifically illustrate this process, a distinction is first drawn between user-owned data files initially residing on a local filebase which are to be exported to a remote user, and the process of of files being acquired remotely and imported into a local filebase by the user.

Receipt of chemical files using Client Software

A Web client makes a HTTP request to a Web server configured to support chemical MIME types, which results in the response shown
GET /atp.pdb http/1.0

HTTP 200 Document follows
Date: Mon, 30 Mar 1998 13:54:40 GMT
Server: NCSA/1.5.2
Last-modified: Fri, 19 Aug 1994 15:46:58 GMT
Content-type: chemical/x-pdb
Content-length: 2916
The received MIME type is resolved via a suitable internal look-up table available to the Web client which maps the MIME types to an application program or plug-in capable of parsing, processing and/or displaying the chemical data, in this case a simple PDB format file.

If an e-mail client is used to make request to an e-mail relay, a related set of headers are received;

Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="============_-1320854989==_============"
Date: Mon, 30 Mar 1998 15:18:23 +0100
To: recipient@somewhere
From: "Sender" 
Subject: Illustration of  chemical MIME headers
Status: O

--============_-1320854989==_============
Content-Type: text/plain; charset="us-ascii"

This message contains a chemical attachment

--============_-1320854988==_D============
Content-Type: chemical/x-pdb; name="ferrocene.pdb"
Content-Disposition: attachment; filename="ferrocene.pdb"
Content-Transfer-Encoding: base64

Q09NUE5EICAgIGZlcnJvY2VuZS5...
The e-mail program can be used to extract the appropriate component of the multipart message attachment (in this example separated by the unique string 1320854989), decoding it if necessary from the base64 scheme adopted to ensure 7-bit transparency of the file, and to save the file to the user's filebase in a segregated area identified for such attachments. If the user wishes to view the contents of the attachment, a mapping between the MIME types and a suitable application program can be achieved either via a specific look-up table associated with the e-mail client, or by invoking a Web-client to perform this task.

Transmission of chemical files using Client Software

The standard mechanism is to mount the data files in a Web document database and to map the filename extensions to MIME headers, which are then sent as part of the Web-server HTTP protocols to a remote Web or e-mail client program (see above). The alternative is to configure an email client to perform this task. If the email client is part of an integrated Web suite, then no specific configuration need be employed (e.g. for Netscape Communicator). Stand-alone email clients may need specification of the supported MIME types, further details of which are given below.

Application of chemical MIME to Electronic Mail handling.

The application of MIME for handling message attachments is commonly restricted to specifying very common document types such as word processor documents. Whilst it is now quite common to receive email attachments of this type, this has the distinct disadvantage that any chemical information is surrounded by the word-processing wrappers, and it can be very difficult indeed to identify this chemical content other than by visually reading the document within the appropriate word processor application. In effect, the meta-data used to describe the contents of the attachment may only comprise the name of the document, together with non-standard and perhaps informal text descriptors in the text of the message.

A much superior mechanism is to attach any specifically chemical data files as separate attachments, and to identify these via the chemical MIME mechanism. Because this method of attachment handling has not gained widespread recognition within the chemical community, we include here some specific details of how to set the mechanism up for three typical email environments. A significant problem that still remains with the MIME mechanism is how to achieve reconciliation between attached documents, and the informal meta-data descriptors that were contained in the message bodies. We discuss this issue later in this article.

Example 1. Chemical MIME Handling using the Unix Pine E-Mail Client (V 3.9)

This mechanism in fact constitutes the original Unix-based method developed by Borenstein and Freed4 to test their MIME proposal. For outgoing e-mail messages, the chemical MIME headers are added according to a look-up table present on the users home directory called .mime.types. A typical entry is as follows

chemical/x-pdb 	 pdb

For incoming e-mail messages, the association of a document MIME type with a program suitable for itsresolution is accomplished using a look-up table present on the users home directory called .mailcap
chemical/x-pdb; netscape %s

Example 2. Chemical MIME Handling using the Eudora E-Mail Client.

Eudora is a popular stand-alone e-mail client available for both Windows and MacOS operating systems (but not Unix). Versions 3 or 4 of this program allow hyperlink-style resolution of an enclosed message attachment by a program designated by the recipient. Unlike a Web client such as Netscape, where the chemical MIME types are simply defined on all three major platforms by adding an appropriate plug-in such as Chime, the configuration of Eudora both for sending and receiving chemical attachments is operating system dependent. On MacOS, a chemical MIME plug-in9 is placed in the same folder as the Eudora application. To achieve the equivalent functionality on Windows 95/98/NT, the file Eudora.ini present in the application folder must have an entry of the following type added for each of the MIME types required;
both=pdb,pdb,TEXT,chemical,x-pdb

When receiving e-mail messages which include a chemical MIME attachment, users will have to specify an appropriate program to resolve the attachment. This has to be done only once for each MIME type. This can be by e.g. adding the filename extension appropriate for each type of MIME attachment via the Windows Registry file, or by specifying this within the email program.

Example 3. Chemical MIME Handling using Netscape Communicator illustrating Integration of Web and E-Mail Clients.

Netscape Communicator (at the time of writing at version 4.04) represents, inter alia, an integrated Web client (Navigator) and an e-mail client (Messenger). Configuration of chemical MIME types can be accomplished in two generic ways. The simplest is via the Netscape plug-in mechanism. Several plug-ins7 offer support for chemical MIME types (Table 1). Such plug-ins are installed by placing the executable file into the appropriate plug-in directory to automatically configure both the web and e-mail client components of Netscape with the supported MIME types. This automatic mechanism can also be over-ridden by a user configuration option which will allow additionally defined or redefined chemical MIME types to be associated with other specific programs for processing any individual data type.

In operation, the application of chemical MIME is almost entirely transparent to the user. Any chemical data set defined by the MIME types which is received by the Web client Navigator will be displayed as either an in-lined model using an appropriate chemical plug-in or in an external window using a user specified program. We note here that all incoming data files can also be saved in the Netscape client local disk cache, where in principle the chemical MIME labelling could be used to create a persistently stored chemical database using suitable software. A chemical attachment received by the e-mail client Messenger can be passed to the browser window for resolution as above, the MIME headers being processed internally between Messenger and Navigator, as opposed to externally via the file system and the filename extensions.

When Netscape Messenger is used to send an chemical e-mail attachment to an e-mail relay, the user selects the appropriate filename, and Messenger will insert the appropriate MIME headers by appropriately mapping the filename extensions. This mapping would be automatically done using the extensions defined by e.g. the Chime plug-in, or again via a user specified configuration.

The Netscape implementation is the only one that works transparently across Unix, Windows and MacOS client-based operating systems. One test of operating system transparency is for the test originator to attach a simple chemical co-ordinate file8to an e-mail message and to send this to a remote recipient. The entire process is then reversed by the recipient retrieving the received file from their e-mail attachments folder (Scheme 1) and sending it back to the original sender. The process will be regarded as successful if the test file received back is identical with the originally sent file, and can be suitably and automatically resolved by both parties via an appropriate 3D coordinate display program or plug-in using either e-mail or Web clients.

Alternatives to chemical MIME.

to be written by Peter.

Conclusion.

During the period 1970-1994, chemical applications of the Internet have been largely based on a set of generic transmission protocols, such as terminal (Telnet), file transfer (FTP), e-mail transfer (SMTP) and document handling systems (HTTP). Few open standards were developed during this period which could be used to explicitly label chemical content, and very little inter-operability existed in the transmission mechanisms the the chemical community could take advantage off. We believe that the future must lie in the convergence of the newer Internet technologies with more traditional uses of the Internet such as electronic mail and access to remote chemical substance databases such as Chemical Abstracts, Beilstein, together with the deployment of new genres such as electronic conferences,2 the increasing use of "chemically activated" electronic versions of scholarly journals3 in the area of chemical sciences, and the greater availability of modelling and analysis tools which make explicit use of the Internet.1 Such convergence in turn will enable new applications of the Internet based on so-called "resource discovery" methods to develop to the point that one could truly state that the whole of the chemical Internet would be greater than the sum of its part.XX

Acknowledgements

Funding from the UK JISC e-Lib programme for the CLIC project, and the JISC JTAP programme for the VChemLab project is gratefully acknowledged.

Notes and References

  1. H. S. Rzepa, P. Murray-Rust and B. J. Whitaker, Chem. Soc. Revs, 1997, 1-10; H. S. Rzepa, "Internet-based Computational Chemistry Tools", in Encyclopaedia of Computational Chemistry, Wiley, 1998, in press.
  2. For examples of the application of chemical MIME to electronic conferencing, C. Leach and H. S. Rzepa (Eds), ECTOC-1, Royal Society of Chemistry, 1996; ECHET96, 1997; ECTOC-3, 1998. The conferences are on-line at http://www.ch.ic.ac.uk/ectoc/.
  3. An example of the use of chemical MIME to integrate a variety of chemical data types into the body of an electronic journal is the CLIC Electronic Journal Project; D. James, B. J. Whitaker, C. Hildyard, H. S. Rzepa, O. Casher, J. M. Goodman, D. Riddick and P. Murray-Rust, New. Rev. Information Networking, 1996, 61. For the project itself, see http://chemcomm.clic.ac.uk/. For details of how a "chemically enhanced" article was prepared, see O. Casher and H. S. Rzepa, in Proc. E. Conf. Trends in Organomet. Chem.: ECTOC-3 (Eds H. S. Rzepa and C. Leach), Royal Society of Chemistry, 1998. ISBN (CD-ROM) 0-85404-889-8.
  4. N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", Internet RFC 1521, Bellcore, Innosoft, September 1993.
  5. H. S. Rzepa, B. J. Whitaker and M. J. Winter, J. Chem. Soc., Chem. Commun., 1994, 1907; H. S. Rzepa, Comp. Networks and ISDN Systems, 1994, 27, 317-318; H. S. Rzepa, Chem. Design Auto. News, 1994, 9, 1; O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans 2, 1995, 7; H. S. Rzepa, in "The Internet: A Guide for Chemists", Ed. S. Bachrach, American Chemical Society, 1995; M. J. Winter, H. S. Rzepa and B. J. Whitaker, Chem. Brit., 1995, 685; A. N. Davies, Spectroscopy Europe, 1996, 8, 42; H. S. Rzepa, Science Progress, 1996, 79, 97; B. J. Whitaker, H. S. Rzepa, Proc. Int. Chem. Inf. Conf. (Ed. H. Collier), 1995, 62-71; H. S. Rzepa, O. Casher and B J. Whitaker, Proc. Int. Chem. Inf. Conf. (Ed. H. Collier), 1996, 141-148; H. S. Rzepa, W. Locke, P. Murray-Rust and B. J. Whitaker in Perspect. Protein Eng. '96, (Ed. M. J. Geisow), 1996, Paper No. 19; H. S. Rzepa, P. Murray-Rust and B. J. Whitaker, Chem. Intl., 1997, 19, 17.
  6. H. S. Rzepa, P. Murray-Rust and B. J. Whitaker, Pure & App Chemistry, to be submitted. The latest information is available on-line at http://www.ch.ic.ac.uk/chemime/
  7. T. Maffett and B. van Vliet, MDL Information systems. URL: http://www.mdli.com/chemscape/chime/
  8. A simple test molecule is available at http://www.ch.ic.ac.uk/rzepa/jcics/molecule.pdb A site for testing an extended set of chemical MIME types is available at http://www-dsed.llnl.gov/documents/tests/chem.html
  9. This plug-in is available at http://www.ch.ic.ac.uk/rzepa/jcics/chemical10.hqx
  10. P. Murray-Rust in Proc. E. Conf. Trends in Organomet. Chem.: ECTOC-3 (Eds H. S. Rzepa and C. Leach), Royal Society of Chemistry, 1998. ISBN (CD-ROM) 0-85404-889-8. A fully working version of JUMBO is included on the CD-ROM.
  11. A. P. Tonge and H. S. Rzepa, to be published.
  12. http://www.w3.org/TR/WD-rdf-syntax/