Network Working Group                                        H. Rzepa
Internet Draft                  Imperial College, London, SW7 2AY, UK.
                                                       P. Murray-Rust
                      Glaxo Group research, Greenford, Middlesex, UK.
                                                          B. Whitaker
                                                 School of Chemistry, 
                                    University of Leeds, LS2 9JT, UK.		
Category: Standards Track                               February 1995

          A Chemical Primary Content Type for 
          Multipurpose Internet Mail Extensions.

Status of this Memo


          This document is  an  Internet-Draft.   Internet-Drafts  are
          working  documents  of  the  Internet Engineering Task Force
          (IETF), its areas, and its working groups.  Note that  other
          groups  may  also  distribute working documents as Internet-
          Drafts.

          Internet-Drafts are draft documents valid for a  maximum  of
          six  months  and  may  be updated, replaced, or obsoleted by
          other documents at any time.  It  is  inappropriate  to  use
          Internet- Drafts as reference material or to cite them other
          than as ``work in progress.''

          To learn the current status of  any  Internet-Draft,  please
          check  the  ``1id-abstracts.txt''  listing  contained in the
          Internet- Drafts Shadow Directories on  ds.internic.net  (US
          East  Coast),  nic.nordu.net  (Europe), ftp.isi.edu (US West
          Coast), or munnari.oz.au (Pacific Rim).
          
          Discussions of the chemical MIME discussion group are 
          archived at URL: http://www.ch.ic.ac.uk/hypermail/chemime/

Abstract

The purpose of this Internet Draft is to propose an update to Internet
RFC 1521 to include a new primary content-type to be known as
chemical. RFC 1521[1] describes mechanisms for specifying and
describing the format of Internet Message Bodies via
content-type/subtype pairs. We believe that chemical defines a
fundamental type of content with unique presentational and
processing aspects. We outline the typical expected uses of such a
content type and propose a number of chemical sub-types. This
document updates IETF Internet Draft
draft-rzepa-chemical-mime-type-00.txt in which this specific proposal
was made, incorporates suggestions received during the initial
discussion period and indicates scientific support for and uptake of
this proposal[2-7].

Rzepa, Murray-Rust and Whitaker                               [Page 1]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995


Table of Contents:

1.  Intended Audience for this Document.
2.  The Need for a Chemical Primary Content Type.
3.  A Proposed Core Set of Chemical Media Types.
4.  Consultation Mechanisms. 
5.  References and Citations.
6.  Security Considerations Section
7.  Authors' Addresses.
8.  Suggested Registration Mechanism for New Chemical/Subtype Values


1. Intended Audience for this Document.

This is directed at anyone who is concerned with implementing
Electronic Mail, Gopher, World-Wide Web and other information
services supporting MIME (RFC 1521) in a local environment where
chemical media information needs to be processed. We do not expect
the "average" scientist to concern themselves with the details of
defining specific Media types, but we would expect them to have
access to local knowledge, or to specific examples and
implementations of chemical Media types. This Internet Draft is a
discussion document for an agreed definition, intended eventually to
form a standard accepted extension to RFC 1521. We are also
targetting developers of chemically cognisant and MIME compliant
software who may wish to include default types in their
configurations.

2. The Need for a Chemical Primary Content Type.

The following quote by MIT Lab Director Nicholas Negroponte appeared
in the Scientific American Special Issue 1995 p.102;

"In the long run, model-based image transmission and encoding are
better than transmission of pictures alone.  Mathematical models of
a scene can describe the spatial relations of the objects in it and
maneuver them through space.  The idea of capturing a picture with
a camera is obsolete if one can instead capture a realistic model
from which the receiver can generate any picture.  For instance,
from a real-time model of a baseball game, a fan watching at home
could get the view from anywhere in the ballpark -- including the
perspective of the baseball."

In this paper, we present a case for recognising that chemicals form a
well-bounded, standardised and accepted implementation of this type
of mathematical model, but with their own unique features. To promote
the implementation, use  and development of such a model, we propose
that a new primary media type of chemical be introduced.

Rzepa, Murray-Rust and Whitaker                               [Page 2]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995

2.1  Chemical Data Types are Well Defined and Widely Used Already.

In excess of thirteen million chemicals are currently known to
science, and many others as yet unknown have been speculated upon. A
number of highly reliable, cross-referenced and indexed collections
of molecular information have long been available in printed form,
and since the 1970s these have been globally accessible in digital
form on-line from organisations such as STN (Scientific and Technical
Networks) or via locally implemented databases. During this period a
number of well defined and documented formats for encoding chemical
information have become accepted and widely used by molecular
scientists. In recent years, the development of the Internet as the
prime delivery mechanism of chemical content has accelerated. In
addition to e-mail, mechanisms such as Gopher and the World-Wide Web
system were being widely adopted by the chemistry and biology
communities. Most recently, chemical electronic journals and extended
learning courses in these subjects based on these delivery mechanisms
have begun appearing.

2.2 The Role of Primary and Secondary MIME Content Types.

Central to all these developments is the MIME concept as defined in
Internet RFC 1521 [1]. This defines standards for the inclusion of
message bodies in e-mail messages and other information systems such
as the World-Wide Web. A two level mechanism exists comprising top
level and sub-types. The MIME top-level types exist to allow mail
gateways and other agents such as World-Wide Web clients to do
filtering and/or conversion properly, and to allow user agents to
have a default behavior for certain classes of objects. Neither the
currently accepted primary content types nor the existing sub types
contain any explicit proposals for actions on message bodies
containing chemical information. It is our intention in this Internet
draft to suggest a mechanism for the handling of such information in
a consistent and extensible manner with the existing MIME guidelines.  

Central to our argument is that the media content of chemical
information is quite different from "image", "audio" or "video" media
types, and that manner in which the information is likely to be
processed and presented is also fundamentally different. Existing
top-level types were chosen very carefully to reflect the
capabilities required for the media and presentation, and not the
specific subject matter. In this context, we argue that a "chemical"
type will have generic media and presentation requirements that are
not fulfilled by the image, audio or video content types currently
defined. For example, the default presentation behaviour for chemical
content is expected to result in the rendering of a molecular scene
or interpretation in either two or three dimensions, with some degree
of navigation through chemical objects being possible, and with
control over how individual objects (atoms for example) are
displayed.

Rzepa, Murray-Rust and Whitaker                               [Page 3]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995 

A degree of semantic interpretation in the process is essential.
Chemical information can be intrinsically three dimensional, and
normally comprises a collection of well defined objects.  We envisage
most, if not all, chemical media types to carry very substantial
semantic content, e.g. structural information relating to the
individual objects and their relationships. For example, the
presentation of a carbon atom might require consideration of the
valency at that atom, or depth cueing to represent the
"stereochemistry" at that point. Further elaborations include model
rotation, geometry calculations, extraction of numerical information,
calculation of associated properties such as wavefunctions and their
subsequent presentation. For these reasons, we considered the
existing "image" type to be too restrictive for the presentational
needs of the molecular science community, and would actually inhibit
further development. An image is intrinsically taken to be a
two-dimensional representation - normally a 2D bitmap rather than an
object collection. In this context, we note that there is no uniquely
definable way of specifying how any chemical content must be
presented as a two dimensional image.

In three dimensional terms, several excellent "helper" programs for
achieving a default process with chemical content are readily
available to the community and have become widely used in the last
few years. It can be said therefore that default actions on 3D
chemical information have achieved a certain level of de facto
status, which would be consolidated by this proposal.

More elaborate and truly innovative forms of processing of the content
are also envisaged, including association with scientific
instruments, presentation to automated synthesis robots and
harvesting via indexing agents searching for structural features or
themes in the chemical content. We note for example the rapidly
developing area of advanced nanotechnology, where synthesis of
arbitrary chemicals could indeed be performed by such machinery,
potentially by acquiring the blueprints from network sources.
Presentation of the media content in this context is quite different
in nature from a display on a computer screen, and serves to
emphasize the fundamentally different nature of chemical media types
from other primary content types.

RFC 1521 also defines a primary content type known as "application"
to be used for data which do not fit in any of the other categories,
particularly in the context of data to be processed by mail-based
uses of application programs. Specific applications cited were e.g.
the processing of Postscript or portable document format files. These
are specific examples of precisely defined presentation formats with
faithfull reproduction of the original intent and no default semantic
parsing of higher level structures.  With "chemical" content types,
the presentation requirements are based much more on the existance of
well defined generic semantic content based on chemical attributes,
which have an easily specified default treatment, and which we argue
need a well defined separate treatment. Features unique to chemical
content are molecular markup-styles where individual objects within
the content can be highlighted in a manner similar to that defined in
say Hyper-text-markup-language, or where objects can be viewed from a
variety of perspectives (for example space-fill, ribbon or wireframe
presentation of chemical structures).

Rzepa, Murray-Rust and Whitaker                               [Page 4]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995

The boundary conditions of a chemical type can be specified, and an
initial minimal set is defined below. The over-riding concern is to
preserve this semantic interpretation of the presentation in any
default treatment of the message body. We believe that such actions
are not best implemented using the "application" type but should be
allocated to a separate "chemical" type to convey the generic and
semantic nature of the content, rather than necessarily the subject
content.

3. A Proposed Core Set of Chemical Media Types.

We are proposing in this draft to focus on small number of existing
standards that cover the span of chemical information, which are in
wide use and which are clearly defined in the scientific literature.
In no sense are they necessarily superior to other file types, but
each has a specific use which a significant proportion of the
community regard as serving a purpose. By encoding these formats
within a chemical content definition, we hope to enhance the
presentation and perception of such information for the end user.

CXF and MIF are included as examples of modern, semantically rich and
structured datatypes. The CXF format is presented as essentially an
exchange format, although the long term goal is its adoption as a
general purpose molecular description format. The MIF format by way
of contrast is an user-extensible and human readable format which is
also characterised by its compact nature.  We think that both formats
will have their areas of application, and should be included here. A
number of other definitions are so called "legacy" systems which are
generally accepted as likely to continue in use for some years, if
only because a considerable amount of software has yet to be
developed to allow such migrations. We believe such formats will play
a major role in new initiatives in electronic journal publishing,
on-line conferences and other keynote areas. Because of the variety
of existing chemical definitions, we think it unlikely the chemistry
community will ever agree upon a single common standard and that a
mechanism such as MIME is likely to have a useful purpose in the
foreseeable future.

Table 1 lists the proposed chemical sub-type names, together with one
suggested filename qualifier. The latter are deliberately kept to
three characters for DOS compatibility.


Rzepa, Murray-Rust and Whitaker                               [Page 5]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995

Table 1.

Primary/sub-type           Suggested qualifier(s)  Reference

chemical/cxf                 cxf                   [8] 
chemical/mif                 mif                   [9] 
chemical/pdb                 pdb                   [10] 
chemical/cif                 cif                   [11] 
chemical/mdl-molfile         mol                   [12]
chemical/mdl-sdf             sdf                   [12] 
chemical/mdl-rdf             rdf                   [12] 
chemical/mdl-rxn             rxn                   [12] 
chemical/embl-dl-nucleotide  emb, embl             [13] 
chemical/genbank             gen                   [14]
chemical/ncbi-asn1           asn                   [14]
chemical/gcg8-sequence       gcg                   [15]
chemical/daylight-smiles     smi                   [16] 
chemical/rosdal              ros                   [17] 
chemical/macromodel-input    mmd, mmod             [18] 
chemical/mopac-input         mop                   [19] 
chemical/gaussian-input      gau                   [20]
chemical/jcamp-dx            jdx                   [21]
chemical/kinemage            kin                   [4]


4.  Consultation Mechanisms. 

Our original suggestion for a primary chemical Media content type was
made on the Computational Chemistry discussion list (CCL) and on
newsgroups associated with molecular sciences (chemistry and biology)
in February 1994. Subsequently, the first Internet-Draft archived as
draft-rzepa-chemical-mime-type-00.txt was available for discussion
during the period May-October, 1994. Its existence became widely
known and subsequently a number of working "proofs-of-concept" of
chemical Media types have become available[2-7]. A discussion list
exists, and can be joined by sending a message to
listserver@ic.ac.uk with the one line message

subscribe chemime your name.

Consolidated proceedings of this discussion group are available as:
http://www.ch.ic.ac.uk/hypermail/chemime/ Copies of relevant Internet
drafts and RFCs are available on http://www.ch.ic.ac.uk/internet/

Our proposal has proved extremely effective already in focusing on how
a consensus on handling the presentation and processing of this type
of media content can be achieved. We also have been delighted with
the way in which the chemical community have accepted and used the
Internet-Draft mechanism for developing standards and achieving such
consensus on a wider scale, having hitherto not been active in

Rzepa, Murray-Rust and Whitaker                               [Page 6]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995
 
defining Internet standards.  Many major bodies (learned societies,
data providers, chemical information vendors, etc) have partcipated
in discussion and none have seen the approach as unsuitable. We are
therefore extending the discussion to capitalise on this. That
section of the biological community which is related to molecular
information has participated in these discussions, and we are not
aware of any moves to define a separate type for biological content.
We nevertheless suggest that chemical content should not extend
further into biology than that which requires the particular
presentation style and default actions associated with the chemical
content. We would expect that further chemical sub-content types may
be proposed by sections of the community in the future, if
appropriate by the mechanism indicated in section 8 of this
document.

5. References and Citations.

[1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail
Extensions) Part One:  Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
September 1993.

[2]  Fitzgerald P., "Molecules-R-Us Interface to the Brookhaven Data
Base", Computational Molecular Biology Section, National Institutes
of Health, USA; see http://www.nih.gov/htbin/pdb for further
details; Peitsch M.C, Wells T.N.C., Stampf D.R., Sussman S. J., "The
Swiss-3D Image Collection And PDP-Browser On The Worldwide Web",
Trends In Biochemical Sciences, 1995, 20, 82.

[3] "Proceedings of the First Electronic Computational Chemistry
Conference", Eds. Bachrach, S. M.,  Boyd D. B., Gray S. K, Hase W.,
Rzepa H.S, ARInternet: Landover, Nov. 7- Dec. 2, 1994, in press;
Bachrach S. M, J. Chem. Inf. Comp. Sci., 1995, in press.

[4] Richardson D.C., and Richardson J.S., Protein Science, 1992, 1, 3;
D. C. Richardson D. C., and Richardson J.S., Trends in Biochem.
Sci.,1994, 19, 135.

[5] Rzepa H. S., Whitaker B. J., and Winter M. J., "Chemical
Applications of the World-Wide-Web", J. Chem. Soc., Chem. Commun.,
1994, 1907;  Casher O., Chandramohan G., Hargreaves M., Murray-Rust
P., Sayle R., Rzepa H.S., and Whitaker B. J., "Hyperactive Molecules
and the World-Wide-Web Information System", J. Chem. Soc., Perkin
Trans 2, 1995, 7; Baggott J., "Biochemistry On The Web", Chemical &
Engineering News, 1995, 73, 36; Schwartz A.T, Bunce D.M, Silberman
R.G, Stanitski C.L, Stratton W.J, Zipp A.P, "Chemistry In Context -
Weaving The Web", Journal Of Chemical Education,  1994, 71, 1041.

Rzepa, Murray-Rust and Whitaker                               [Page 7]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995

[6] Rzepa H.S., "WWW94 Chemistry Workshop", Computer Networks and ISDN
Systems, 1994, 27, 317 and 328.

[7] Nelson S.D., "Email MIME test page", Lawrence Livermore National
Laboratory, 1994. See http://www-dsed.llnl.gov/documents/WWWtest.html
and http://www-dsed.llnl.gov//documents/tests/email.html

[8] Steckert T., and Mockus J., "Standard Chemical eXchange Format".
Chemical Abstracts Service, September, 1994.

[9] Allen F. H., Barnard A., Cook A., and Hall S. R., "Molecular
Information File", J. Chem. Inf. Comp. Sci., 1995, in press;
Hall S.R, Spadaccini N, "The Star File - Detailed Specifications",
Journal Of Chemical Information And Computer Sciences, 1994, 34, 505.

[10] Koetzle T.F, Abola E.E, Bernstein F.C, Callaway J.A, Christian
J.J, Deroski B.R, Esposito P.A, Forman A, Mccarthy J.E, Skora J.G,
"The Protein Data-Bank - Present Status And Future-Plans", Abstracts
Of Papers Of The American Chemical Society 1991, 202, 31-Cinf. See
also "The Protein Data Bank: A Computer-Based Archival File for
Macromolecular Structures", Journal of Molecular Biology, 1977, 112,
pp. 535-542. See http://www.pdb.bnl.gov/ for the most recent
information.

[11] Hall S. R., Allen F.H., and Brown I.D., "The Crystallographic
Information File (cif) - A new standard archive file for
crystallography", Acta. Cryst. 1991, A47, 655.

[12] Dalby A, Nourse J.G, Hounshell W.D, Gushurst A.K.I, Grier D.L,
Leland B.A, Laufer J., "Description of Several Chemical Structure
File Formats Used by Computer Programs Developed at Molecular Design
Limited", J. Chem. Inf. Comput. Sci., 1992, 32(3), 244.

[13] Rice C.M., Fuchs R., Higgins D.G., Stoehr P.J., Cameron G.N.,
"EMBL nucleotide Format EMBL Data Library", Nucleic Acids Research ,
1993, 21, 2967.

[14] Benson D.A, Boguski M, Lipman D.J, Ostell J, "Genbank", Nucleic
Acids Research 1994, 22, 3441; Benson D., Lipman D.J., Ostell J.,
"GENbank", Nucleic Acids Research , 1993, 21, 2963.

[15] Eberhardt, N.L., "The GCG format for sequence information",
Biotechniques, 1992, 13, 914-917.

Rzepa, Murray-Rust and Whitaker                               [Page 8]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995

[16] Weininger D., "The SMILES Format", J. Chem. Inf. Comput. Sco.,
1988, 28, 31.

[17] Barnard J., Jochum C., Welford S., "Rosdal - A Universal
Structure Substructure Representation For Pc-Host Communication",
Abstracts Of Papers Of The American Chemical Society, 1988, 196, 13.

[18] Mohamadi F.,  Richards N.J.G., Guida W.C.,  Liskamp R.,  Lipton
M., Caufield C., Chang G., Hendrickson T., Still W.C., "Macromodel -
an integrated software system for Modeling Organic and Bioorganic
Molecules using Molecular Mechanics", Journal of Computational
Chemistry, 1990, 11, 440.

[19] Stewart J.J.P., "MOPAC: A Semiempirical Molecular-Orbital
Program", Journal of Computer-aided Molecular Design, 1990, 4, 1.

[20] Frisch M.J., Trucks G.W., Head-Gordon M., Gill P.M.W., Wong M.W.,
Foresman J.B., Johnson B.G., Schlegel H.B., Robb M.A., Replogle E.S.,
Gomperts G., Andres J.L., Raghavachari K.,  Binkley J.S., Gonzalez
C., Martin R.L., Fox D.J., Defrees D.J.,  Baker J., Stewart J.J.P.,
and Pople J.A., "Gaussian 92", (Gaussian, Inc., Pittsburgh PA, 1992.

[21] Davies A.N., Lampen P., "JCAMP-DX for NMR, , Applied
Spectroscopy, 1993, 47, 1093-1099; Rutledge D.N., Mcintyre P., "A
proposed European Implementation of the JCAMP-DX Format",
Chemometrics and Intelligent Laboratory Systems, 1992, 16, 95-101;
JCAMP-DX, A standard format for exchange of infrared-spectra in
computer readable form, J. G. Grasselli, Pure and Applied Chemistry
1991, 63, 1781-1792.

6.  Security Considerations Section

The security implications of chemical MIME types do not differ from
those relevant to RFC 1521. We reproduce the section in RFC 1521 for
reference here; ''Security issues are discussed in Section 7.4.2 and
in Appendix F. Implementors should pay special attention to the
security implications of any mail content-types that can cause the
remote execution of any actions in the recipient's environment. In
such cases, the discussion of the application/postscript content-type
in Section 7.4.2 may serve as a model for considering other content-
types with remote execution capabilities. ''

7. Authors' Addresses.

Dr H S Rzepa, Department of Chemistry, Imperial College, London, SW7
2AY Phone: +44 171 594 5774 Fax +44 71 594 5804. 
E-Mail: rzepa@ic.ac.uk

Dr P. Murray-Rust, Glaxo Group Research, Stevenage, Herts, UK
E-Mail: pmr1716@ggr.co.uk

Dr B. J. Whitaker, School of Chemistry, University of Leeds, UK.
E-Mail: benw@chemistry.leeds.ac.uk

Rzepa, Murray-Rust and Whitaker                               [Page 9]

 Expires 09/20/95 draft-rzepa-chemical-mime-01.txt      March 1995


8.  Suggested Registration Mechanism for New Chemical/Subtype Values

This is explained in RFC 1590, "Media Type Registration Procedure",
from which the following is quoted. This document should be consulted
for further detail. It is strongly recommended that prior to
submissions, the discussions of the chemical MIME group are
consulted.  Details of how to do this are given in section 4 of this
document.

'' Send a proposed Media Type (content-type/subtype) to the
"ietf-types@cs.utk.edu" mailing list.  This mailing list has been
established for the sole purpose of reviewing proposed Media Types.
Proposed content-types are not formally registered and must use the
"x-" notation for the subtype name.

The intent of the public posting is to solicit comments and feedback
on the choice of content-type/subtype name, the unambiguity of the
references with respect to versions and external profiling
information, the choice of which OIDs to use, and a review of the
security considerations section.  It should be noted that the
proposed Media Type does not need to make sense for every possible
application.  If the Media Type is intended for a limited or specific
use, this should be noted in the submission.


After two weeks, submit the proposed Media Type to the IANA for
registration.  The request and supporting documentation should be sent
to "iana@isi.edu".  Provided a reasonable review period has elapsed,
the IANA will register the Media Type, assign an OID under the IANA
branch, and make the Media Type registration available to the
community.  ''

Rzepa, Murray-Rust and Whitaker                              [Page 10]