The Application of Chemical Multipurpose Internet Mail Extensions
(Chemical MIME) Internet Standards to Electronic Mail and World-Wide
Web information exchange
Henry S. Rzepa,a Peter Murray-Rustb and Benjamin J. Whitakerc
aDepartment of Chemistry, Imperial College, London, SW7
2AY; E-mail: rzepa@ic.ac.uk
bVirtual School of Molecular Science, Nottingham
cSchool of Chemistry, Leeds.
Summary: The global adoption of a proposed Internet standard based on chemical
primary Multipurpose Internet Mail Extensions (chemical MIME) media type is
reviewed. Examples of the configuration of this standard for use with
Internet based electronic mail and World-Wide Web clients are shown.
The long term objectives of the integration and
inter-operability of chemical information across the boundaries of
electronic journals, conferences, virtual courses, databases,
modelling and information handling tools and other newly emerging
tools for scientific communication based on the Internet are set out.
Introduction
The development of Internet-based document and information delivery systems
during the last four years has been rapid.1 This review will focus on
one aspect, the chemical application of an Internet standard known as MIME
(Multipurpose Internet Mail Extensions) to the World-Wide Web and to electronic
mail. The focus of attention has predominantly been on the creation and delivery
of chemically oriented World-Wide Web-based documents, which has in turn
introduced concepts such as the use of structured and interlinked document collections
via Hypertext Markup Language (HTML). Electronic mails remains
arguably more widely used than the World-Wide Web, but despite this, the basic
mechanism has altered little over the last four years, perhaps because e-mail
continues to be regarded as a temporary and informal communication medium, not
well suited for the precisely defined exchange of structured information in a
subject area such as chemistry. Unlike the World-Wide Web, e-mail continues to
be predominantly used to exchange loosely structured messages based on ASCII text
and rarely if even containing explicit markup (chemical or otherwise) or easily
machine-parsable semantics. It is also a mechanism in which the recipient cannot easily
choose the time and place to receive the information, in contrast to the Web where
the user has control over when a document can be "pulled".
The purpose of this article is to review the chemical application
of MIME standards on the Web, to introduce the use of MIME in electronic
mail and the World-Wide Web, to show how a transparent integration of e-mail and Web
based exchanges of chemical information might be achived, and to present
our manifesto for how we believe future development should proceed.
Multipurpose Internet Mail Extensions (MIME)
In 1992, Borenstein and Freed proposed a simple protocol4
for electronic mail termed MIME, which was subsequently adopted as a
standard by the Internet Engineering Task Force (IETF). This standard
involves two components. The first defines how binary computer files
must be encoded to achieve so-called 7-bit transparency for
compatibility with most text-based Internet mail routers, and is not discussed
further here. The second component defines a standard mechanism
whereby computer files can be associated with an e-mail message via
appropriate headers and delimiters, and allows the appropriate
processing of such enclosures by mail handling programs in the
possession of the e-mail recipient. Borenstein and Freed envisaged
that whilst the main component of an e-mail message could remain
informal and unstructured, the MIME mechanism would allow structured
and well defined attached data files to be handled separately. These
data files were to be known as media types, and in the original
proposal, a number of such primary media types were defined, each
sufficiently generic that default handling schemes could, at least in
principle, be applied their content. Thus it is clearly apparent that
different processing and display mechanisms are required for the
primary media types TEXT, IMAGE, AUDIO and VIDEO. The APPLICATION
media type has less well defined boundaries, and tends to be used for
the resolution of proprietary data types defined by the developers of
software applications. Most recently, the MODEL media type has been
added to allow the processing of numerical and symbolic data for
3D models.
The MIME protocol also defines a secondary media type header which
allows the definition of more specific information on the expected
content of a message attachment. For example, IMAGE/JPEG defines a
raster type image file in the specific standard format defined by the
Joint Photographic Experts Group. The two level mechanism also allows
a separate name space to be defined for each primary media type.
In early 1994, we considered5 how the MIME mechanism
could be used to allow the exchange of standard (ratified or de
facto) chemical data types using either e-mail mechanisms or the
then emerging medium of the World-Wide Web. Whilst many of the
so-called chemical legacy formats are not always fully documented and
specified in the literature, and some such as the Brookhaven protein
databank format have spawned a number of variants and mutations over
the years, we nevertheless felt that the concept of
"chemical" as a new primary MIME media type would have a
number of distinct advantages.
Firstly, it was apparent that none of the original or subsequently
proposed primary media types would allow any sensible component of
default handling of implicit chemical information contained in a
data file. Secondly, the MIME mechanism operates by assigning three or
four letter filename extensions to the data files, and hence each
primary type must operate within a closely regulated name space
convention. By assigning a primary type CHEMICAL, this name space could
be delegated to the community that defines the media type, rather than
the less manageable Internet community as a whole. Finally, the
adoption of CHEMICAL as a primary media type was seen as the first
step in achieving a closer integration between the exchange of
chemical information via document server systems such as the
World-Wide Web and the exchange of the same data types using electronic
mail mechanisms.
Chemical MIME Types
In the four years or more that have elapsed since the original proposal for chemical MIME
types, their use via the World-Wide Web has become common, but their
application with electronic mail much less so. Listed in Table 1 are the chemical
MIME types which as far as we are aware have actually been used to a greater or
less extent during this period. Suggestions for appropriate programs capable of
processing and/or displaying the molecular content are included in the table. A description of the full definitive list will be published
elsewhere.6 Proposals for additional chemical MIME types should be
addressed to the present authors in the first instance.
These MIME types can be further
sub-divided into three categories.
- chemical MIME types which have been configured for Web (HTTP) document servers operating on
an Internet-wide scale, ie associated with publically published documents. Such configuration
is normally accomplished via a privileged account, and the use of standard types is essential
so that different servers allow documents of the same type to be access by remote users
in an identical manner. The precise manner in which any individual server is
configured may differ, but a typical entry in a "mine.types" configuration file might
appear as follows
chemical/x-mdl-molfile mol
This simply serves as an instruction to the server that any document associated with a filename
extension .mol is issued upon request with a document header containing a specification of
the MIME type as chemical/x-mdl-molfile
- It is common for Intranet systems, ie those associated with documents which are only
accessible in a controlled private environment, to define additional non-standard MIME
types for local use. The responsibility for coordinating the use of such private types lies
entirely within the organisation, and is to be contrasted with the use of public types, for which
articles such as this serve to co-ordinate globally.
- The configuration of user software for MIME is accomplished quite
differently from that for servers. A number of the MIME types listed in Table 1
in fact derive from so-called "plug-ins" which can be used to enhance
the basic capability of World-Wide Web browser and email software, and which
removes much of the burden of installation of the MIME mechanism from the user.
An alternative is for the user to pro-actively specify that a designated "helper" program be
used to resolve the chemical document. In some cases, such as the Netscape
Communicator program, the same software package can be used for handling both Web documents or
email messages, and the user's configuration for both is handled
via a single plug-in installation process. For other programs, such as
stand-alone email clients, the user will have to do the configurtion process
explicitly.
chemical/x-daylight-smiles smi
Table 1. Chemical MIME Media Types in use during 1994-1998.
(a) MIME type supported via a Browser plug-in.
Application of chemical MIME using Client Software
An overview of how MIME can be applied to the transport of specific
chemical data types using the two principle Internet mechanisms of
e-mail and the Web is illustrated in Scheme 1.
Scheme 1: Internet-based document and data flow, illustrating how MIME
headers can be used to structure information exchange.
The data-flow diagram shows that three, and
perhaps four, distinct data storage areas are used on any individual
user's computer file system. These include the general user file area,
an area specified by the user for receipt of e-mail attachments, a
temporary area associated with the Web-client cache if specified by
the user and finally a Web document collection area if the user has
specified a personal web-server or has access to a central web server.
Chemical MIME at least in part provides one mechanism for achieving
self-consistency in the handling of chemical files across these four file areas.
To more specifically illustrate this process, a distinction
is first drawn between user-owned data files initially residing on a local
filebase which are to be exported to a remote user, and the process of
of files being acquired remotely and imported into a local filebase by the user.
Receipt of chemical files using Client Software
A Web client makes a HTTP request to a Web server configured to support
chemical MIME types, which results in
the response shown
GET /atp.pdb http/1.0
HTTP 200 Document follows
Date: Mon, 30 Mar 1998 13:54:40 GMT
Server: NCSA/1.5.2
Last-modified: Fri, 19 Aug 1994 15:46:58 GMT
Content-type: chemical/x-pdb
Content-length: 2916
The received MIME type is resolved via
a suitable internal look-up table available to the Web client which maps the MIME types to an
application program or plug-in capable of parsing, processing and/or
displaying the chemical data, in this case a simple PDB format file.
If an e-mail client is used to make
request to an e-mail relay, a related set of headers are received;
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="============_-1320854989==_============"
Date: Mon, 30 Mar 1998 15:18:23 +0100
To: recipient@somewhere
From: "Sender"
Subject: Illustration of chemical MIME headers
Status: O
--============_-1320854989==_============
Content-Type: text/plain; charset="us-ascii"
This message contains a chemical attachment
--============_-1320854988==_D============
Content-Type: chemical/x-pdb; name="ferrocene.pdb"
Content-Disposition: attachment; filename="ferrocene.pdb"
Content-Transfer-Encoding: base64
Q09NUE5EICAgIGZlcnJvY2VuZS5...
The e-mail program can be used to extract the appropriate component of the
multipart message attachment (in this example separated by the unique string
1320854989), decoding it if necessary from the base64 scheme
adopted to ensure 7-bit transparency of the file, and to save the file to the
user's filebase in a segregated area identified for such attachments. If the user
wishes to view the contents of the attachment, a mapping between the MIME types
and a suitable application program can be achieved either via a specific look-up
table associated with the e-mail client, or by invoking a Web-client to perform
this task.
Transmission of chemical files using Client Software
The standard mechanism is to mount
the data files in a Web document database and to map the filename
extensions to MIME headers, which are then sent as part of the
Web-server HTTP protocols to a remote Web or e-mail client program (see above).
The alternative is to configure an email client to perform this task. If the
email client is part of an integrated Web suite, then no specific
configuration need be employed (e.g. for Netscape Communicator). Stand-alone
email clients may need specification of the supported MIME types, further details
of which are given below.
Application of chemical MIME to Electronic Mail handling.
The application of MIME for handling message attachments is commonly
restricted to specifying very common document types such as word
processor documents. Whilst it is now quite common to receive email
attachments of this type, this has the distinct disadvantage that any chemical
information is surrounded by the word-processing wrappers, and it can be
very difficult indeed to identify this chemical content other than by
visually reading the document within the appropriate word processor
application. In effect, the meta-data used to describe the contents of the attachment
may only comprise the name of the document, together with non-standard and perhaps
informal text descriptors in the text of the message.
A much superior mechanism is to attach any specifically chemical data files
as separate attachments, and to identify these via the chemical MIME mechanism.
Because this method of attachment handling has not gained widespread recognition
within the chemical community, we include here some specific details of how
to set the mechanism up for three typical email environments. A significant problem
that still remains with the MIME mechanism is how to achieve reconciliation between
attached documents, and the informal meta-data descriptors that were contained
in the message bodies. We discuss this issue later in this article.
Example 1. Chemical MIME Handling using the Unix Pine E-Mail Client (V 3.9)
This mechanism in fact constitutes the original Unix-based method
developed by Borenstein and Freed4 to test their MIME proposal. For
outgoing e-mail messages, the chemical MIME headers are added according to a look-up
table present on the users home directory called .mime.types. A typical
entry is as follows
chemical/x-pdb pdb
For incoming e-mail messages, the association of a document MIME type with a
program suitable for itsresolution is accomplished using a look-up table
present on the users home directory called .mailcap
chemical/x-pdb; netscape %s
Example 2. Chemical MIME Handling using the Eudora E-Mail Client.
Eudora is a popular stand-alone e-mail client available for both
Windows and MacOS operating systems (but not Unix). Versions 3 or 4 of
this program allow hyperlink-style resolution of an enclosed message
attachment by a program designated by the recipient. Unlike a Web client such as Netscape,
where the chemical MIME types are simply defined on all three major
platforms by adding an appropriate plug-in such as Chime, the
configuration of Eudora both for sending and receiving chemical
attachments is operating system dependent. On MacOS, a chemical MIME
plug-in9 is placed in the same folder as the Eudora
application. To achieve the equivalent functionality on Windows 95/98/NT, the
file Eudora.ini present in the application folder must have an entry
of the following type added for each of the MIME types required;
both=pdb,pdb,TEXT,chemical,x-pdb
When receiving e-mail messages which include a chemical MIME
attachment, users will have to specify an appropriate program to
resolve the attachment. This has to be done only once for each MIME
type. This can be by e.g. adding the filename
extension appropriate for each type of MIME attachment via the Windows
Registry file, or by specifying this within the email program.
Example 3. Chemical MIME Handling using Netscape Communicator
illustrating Integration of Web and E-Mail Clients.
Netscape Communicator (at the time of writing at version 4.04)
represents, inter alia, an integrated Web client (Navigator) and an
e-mail client (Messenger). Configuration of chemical MIME types can be
accomplished in two generic ways. The simplest is via the Netscape plug-in
mechanism. Several plug-ins7 offer
support for chemical MIME types (Table 1). Such plug-ins are installed by placing the executable file into the
appropriate plug-in directory to automatically configure both
the web and e-mail client components of Netscape with the supported MIME
types. This automatic mechanism can also be over-ridden by a user configuration option
which will allow additionally defined or redefined chemical MIME types
to be associated with other specific programs for processing any
individual data type.
In operation, the application of chemical MIME is almost entirely
transparent to the user. Any chemical data set defined by the MIME
types which is received by the Web client Navigator will be displayed
as either an in-lined model using an appropriate chemical plug-in or in an
external window using a user specified program. We note here that
all incoming data files can also be saved in the Netscape client local
disk cache, where in principle the chemical MIME labelling could be
used to create a persistently stored chemical database using suitable
software. A chemical attachment received by the e-mail client
Messenger can be passed to the browser window for resolution as
above, the MIME headers being processed internally between Messenger and
Navigator, as opposed to externally via the
file system and the filename extensions.
When Netscape Messenger is used to send an chemical
e-mail attachment to an e-mail relay, the user selects the appropriate
filename, and Messenger will insert the appropriate MIME headers by
appropriately mapping the filename extensions. This mapping would be
automatically done using the extensions defined by e.g. the Chime plug-in,
or again via a user specified configuration.
The Netscape implementation is the only one that works transparently
across Unix, Windows and MacOS client-based operating systems. One
test of operating system transparency is for the test originator to
attach a simple chemical co-ordinate file8to an e-mail
message and to send this to a remote recipient. The entire process is
then reversed by the recipient retrieving the received file from their
e-mail attachments folder (Scheme 1) and sending it back to the
original sender. The process will be regarded as successful if the
test file received back is identical with the originally sent file,
and can be suitably and automatically resolved by both parties via an
appropriate 3D coordinate display program or plug-in using either
e-mail or Web clients.
Alternatives to chemical MIME.
to be written by Peter.
Conclusion.
During the period 1970-1994, chemical applications of the Internet have been
largely based on a set of generic transmission protocols, such as terminal
(Telnet), file transfer (FTP), e-mail transfer (SMTP) and document handling
systems (HTTP). Few open standards were developed during this period which could
be used to explicitly label chemical content, and very little inter-operability
existed in the transmission mechanisms the the chemical community could take
advantage off. We believe that the future must lie in the convergence of the
newer Internet technologies with more traditional uses of the Internet such as
electronic mail and access to remote chemical substance databases such as
Chemical Abstracts, Beilstein, together with the deployment of new genres such as
electronic conferences,2 the increasing use of "chemically
activated" electronic versions of scholarly journals3 in the area
of chemical sciences, and the greater availability of modelling and analysis
tools which make explicit use of the Internet.1 Such convergence in
turn will enable new applications of the Internet based on so-called
"resource discovery" methods to develop to the point that one
could truly state that the whole of the chemical Internet would be greater than
the sum of its part.XX
Acknowledgements
Funding from the UK JISC e-Lib programme for the CLIC project, and the
JISC JTAP programme for the VChemLab project is gratefully acknowledged.
Notes and References
- H. S. Rzepa, P. Murray-Rust and B. J. Whitaker, Chem. Soc.
Revs, 1997, 1-10; H. S. Rzepa, "Internet-based
Computational Chemistry Tools", in Encyclopaedia of Computational
Chemistry, Wiley, 1998, in press.
- For examples of the application of chemical MIME to electronic
conferencing, C. Leach and H. S. Rzepa (Eds), ECTOC-1, Royal
Society of Chemistry, 1996; ECHET96, 1997; ECTOC-3, 1998. The
conferences are on-line at http://www.ch.ic.ac.uk/ectoc/.
- An example of the use of chemical MIME to integrate a variety of
chemical data types into the body of an electronic journal is the CLIC
Electronic Journal Project; D. James, B. J. Whitaker, C. Hildyard, H.
S. Rzepa, O. Casher, J. M. Goodman, D. Riddick and P. Murray-Rust,
New. Rev. Information Networking, 1996, 61. For the project itself, see http://chemcomm.clic.ac.uk/. For details of
how a "chemically enhanced" article was prepared, see
O. Casher and H. S. Rzepa, in Proc. E. Conf. Trends in
Organomet. Chem.: ECTOC-3 (Eds H. S. Rzepa and C.
Leach), Royal Society of Chemistry, 1998. ISBN (CD-ROM) 0-85404-889-8.
- N. Borenstein and N. Freed, "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", Internet RFC 1521,
Bellcore, Innosoft, September 1993.
- H. S. Rzepa, B. J. Whitaker and M. J. Winter, J.
Chem. Soc., Chem. Commun., 1994, 1907; H. S. Rzepa,
Comp. Networks and ISDN Systems, 1994, 27,
317-318; H. S. Rzepa, Chem. Design Auto.
News, 1994, 9, 1; O. Casher, G.
Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S.
Rzepa and B. J. Whitaker, J. Chem.
Soc., Perkin Trans 2, 1995, 7; H. S. Rzepa, in "The
Internet: A Guide for Chemists", Ed. S. Bachrach, American
Chemical Society, 1995; M. J. Winter, H. S. Rzepa and B. J. Whitaker,
Chem. Brit., 1995, 685;
A. N. Davies, Spectroscopy Europe, 1996, 8, 42;
H. S. Rzepa, Science
Progress, 1996, 79, 97; B. J. Whitaker, H. S. Rzepa,
Proc. Int. Chem. Inf. Conf. (Ed. H. Collier), 1995,
62-71; H. S. Rzepa, O. Casher and B J. Whitaker, Proc. Int.
Chem. Inf. Conf. (Ed. H. Collier), 1996, 141-148; H. S. Rzepa,
W. Locke, P. Murray-Rust and B. J. Whitaker in Perspect. Protein
Eng. '96, (Ed. M. J. Geisow), 1996, Paper No. 19; H. S. Rzepa,
P. Murray-Rust and B. J. Whitaker, Chem. Intl.,
1997, 19, 17.
- H. S. Rzepa, P. Murray-Rust and B. J. Whitaker, Pure & App
Chemistry, to be submitted. The latest information is available
on-line at
http://www.ch.ic.ac.uk/chemime/
- T. Maffett and B. van Vliet, MDL Information systems. URL: http://www.mdli.com/chemscape/chime/
- A simple test molecule is available at
http://www.ch.ic.ac.uk/rzepa/jcics/molecule.pdb A site for testing
an extended set of chemical MIME types is available at
http://www-dsed.llnl.gov/documents/tests/chem.html
- This plug-in is available at
http://www.ch.ic.ac.uk/rzepa/jcics/chemical10.hqx
- P. Murray-Rust in Proc. E. Conf. Trends in
Organomet. Chem.: ECTOC-3 (Eds H. S. Rzepa and C.
Leach), Royal Society of Chemistry, 1998. ISBN (CD-ROM) 0-85404-889-8.
A fully working version of JUMBO is included on the CD-ROM.
- A. P. Tonge and H. S. Rzepa, to be published.
- http://www.w3.org/TR/WD-rdf-syntax/