Network Working Group                                        H. Rzepa
Internet Draft                  Imperial College, London, SW7 2AY, UK.
                                                       P. Murray-Rust
                      Glaxo Group research, Greenford, Middlesex, UK.
                                                          B. Whitaker
                                                 School of Chemistry, 
                                    University of Leeds, LS2 9JT, UK.		
Category: Standards Track                               February 1995

          A Chemical Primary Content Type for 
          Multipurpose Internet Mail Extensions.

Status of this Memo


          This document is  an  Internet-Draft.   Internet-Drafts  are
          working  documents  of  the  Internet Engineering Task Force
          (IETF), its areas, and its working groups.  Note that  other
          groups  may  also  distribute working documents as Internet-
          Drafts.

          Internet-Drafts are draft documents valid for a  maximum  of
          six  months  and  may  be updated, replaced, or obsoleted by
          other documents at any time.  It  is  inappropriate  to  use
          Internet- Drafts as reference material or to cite them other
          than as ``work in progress.''

          To learn the current status of  any  Internet-Draft,  please
          check  the  ``1id-abstracts.txt''  listing  contained in the
          Internet- Drafts Shadow Directories on  ds.internic.net  (US
          East  Coast),  nic.nordu.net  (Europe), ftp.isi.edu (US West
          Coast), or munnari.oz.au (Pacific Rim).
          
          Discussions of the chemical MIME group are archived at URL: 
          http://www.ch.ic.ac.uk/hypermail/chemime/

Abstract

The purpose of this Internet Draft is to propose an update to Internet
RFC 1521 to include a new primary content-type to be known as
chemical. RFC 1521[1] describes mechanisms for specifying and
describing the format of Internet Message Bodies via
content-type/subtype pairs. We believe that chemical defines a
fundamentally type of content with unique presentational and
processing aspects. We outline the typical expected uses of such a
content type and propose a number of chemical sub-types. This
document updates IETF Internet Draft
draft-rzepa-chemical-mime-type-00.txt in which this specific proposal
was made, incorporates suggestions received during the initial
discussion period and indicates scientific support and uptake for
this proposal.

Rzepa, Murray-Rust and Whitaker                               [Page 1]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995


Table of Contents:

1.  Intended Audience for this Document.
2.  The Need for a Chemical Primary Content Type.
3.  A Proposed Core Set of Chemical Media Types.
4.  Consultation Mechanisms. 
5.  References.
6.  Security Considerations Section
7.  Authors' Addresses.
8.  Suggested Registration Mechanism for New chemical/subtype Values


1. Intended Audience for this Document.

This is directed at anyone who is concerned with implementing
Electronic Mail, Gopher, World-Wide Web and other information
services supporting MIME (RFC 1521) in a local environment where
specifically chemical information is processed. We do not expect the
"average" chemist, or molecular biologist to concern themselves with
the details of defining specific Media types, but we would expect
them to have access to local knowledge, or to specific examples and
implementations of chemical Media types. This Internet Draft is
intended to set out a discussion document for a standard definition
such that local implementors can comply with the proposed standards.
We are also targetting developers of chemically cognisant and MIME
compliant software who may wish to include default types in their
configurations.

2. The Need for a Chemical Primary Content Type.

The following quote by MIT Lab Director Nicholas Negroponte appeared
in the Scientific American Special Issue 1995 p.102;

"In the long run, model-based image transmission and encoding are
better than transmission of pictures alone.  Mathematical models of
a scene can describe the spatial relations of the objects in it and
maneuver them through space.  The idea of capturing a picture with
a camera is obsolete if one can instead capture a realistic model
from which the receiver can generate any picture.  For instance,
from a real-time model of a baseball game, a fan watching at home
could get the view from anywhere in the ballpark -- including the
perspective of the baseball"

In this paper, we present a case for recognising that chemicals form a
well-bounded and standardised example of this type of model, and
further applications not envisaged above are enabled by considering
chemical as a new primary media type.

Rzepa, Murray-Rust and Whitaker                               [Page 2]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995

2.1  Chemical Data Types are Well Defined and Widely Used Already.

In excess of thirteen million chemicals are currently known to
science, and many others as yet unknown have been speculated upon. A
number of highly reliable, cross-referenced and indexed collections
of molecular information have long been available in printed form,
and since the 1970s these have been globally accessible in digital
form on-line from organisations such as STN (Scientific and Technical
Networks) or via locally implemented databases. During this period a
number of well defined and documented formats for encoding chemical
information have become accepted and widely used by molecular
scientists. In recent years, the development of the Internet as the
prime delivery mechanism of chemical content has accelerated. In
addition to e-mail, mechanisms such as Gopher and the World-Wide Web
system were being widely adopted by the chemistry and biology
communities. Most recently, chemical electronic journals and extended
learning courses in these subjects based on these delivery mechanisms
have begun appearing.

2.2 The Role of Primary and Secondary MIME Content Types.

Central to all these developments is the MIME concept as defined in
Internet RFC 1521 [1]. This defines standards for the inclusion of
message bodies in e-mail messages and other information systems such
as the World-Wide Web. A two level mechanism exists comprising top
level and sub-types. The MIME top-level types exist to allow mail
gateways and other agents such as World-Wide Web clients to do
filtering and/or conversion properly, and to allow user agents to
have a default behavior for certain classes of objects. Neither the
currently accepted primary content types nor the existing sub types
contain any explicit proposals for actions on message bodies
containing chemical information. It is our intention in this Internet
draft to suggest a mechanism for the handling of such information in
a consistent and extensible manner with the existing MIME guidelines.  

Central to our argument is that the media content of chemical
information is quite different from "image", "audio" or "video" media
types, and that manner in which the information is likely to be
processed and presented is also fundamentally different. Existing
top-level types were chosen very carefully to reflect the
capabilities required for the media and presentation, and not the
specific subject matter. In this context, we argue that a "chemical"
type will have generic media and presentation requirements that are
not fulfilled by the image, audio or video content types currently
defined. For example, the default presentation behaviour for chemical
content is expected to result in the rendering of a molecular scene
or interpretation in either two or three dimensions, with some degree
of navigation through chemical objects being possible, and with
control over how individual objects (atoms for example) are
displayed.

Rzepa, Murray-Rust and Whitaker                               [Page 3]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995 

A degree of semantic interpretation in the process is essential. For
example, the presentation of a carbon atom might require
consideration of the valency at that atom, or depth cueing to
represent the "stereochemistry" at that point. We considered the
existing "image" type to be too restrictive for the presentational
needs of the molecular science community. An image is a
two-dimensional representation - generally a 2D bitmap. Chemical
information can be intrinsically three dimensional, and normally
comprises a collection of well defined objects.  We envisage most, if
not all, chemical media types to carry very substantial semantic
content, e.g. structural information relating to the individual
objects and their relationships. We also recognise that there is no
uniquely definable way of specifying how any chemical content must be
presented as an image. Nevertheless, several excellent "helper"
programs for achieving a default process are readily available to the
community and have become widely used in the last few years. It can
be said therefore that default actions on 3D chemical information
have achieved a certain level of de facto status, which would be
consolidated by this proposal.

More elaborate and truly innovative forms of processing of the content
are also envisaged, including association with scientific
instruments, presentation to automated synthesis robots and
harvesting via indexing agents searching for structural features or
themes in the chemical content. Both the diversity and the novelty of
these forms of presentation again distinguish chemical types from
other existing primary types.

RFC 1521 also defines a primary content type known as "application"
to be used for data which do not fit in any of the other categories,
particularly in the context of data to be processed by mail-based
uses of application programs. Specific applications cited were e.g.
the processing of Postscript or portable document format files. These
are specific examples of precisely defined presentation formats with
faithfull reproduction of the original intent and no default semantic
parsing of higher level structures.  With "chemical" content types,
the presentation requirements are based much more on the existance of
well defined generic semantic content based on chemical attributes,
which have an easily specified default treatment, and which we argue
need a well defined separate treatment. Features unique to chemical
content are molecular markup-styles where individual objects within
the content can be highlighted in a manner similar to that defined in
say Hyper-text-markup-language, or where objects can be viewed from a
variety of perspectives (for example space-fill, ribbon or wireframe
presentation of chemical structures).

The boundary conditions of a chemical type can be specified, and an
initial minimal set is defined below. The over-riding concern is to
preserve this semantic interpretation of the presentation in any
default treatment of the message body. We believe that such actions
are not best implemented using the "application" type but should be
allocated to a separate "chemical" type.

Rzepa, Murray-Rust and Whitaker                               [Page 4]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995


3. A Proposed Core Set of Chemical Media Types.

We are proposing in this draft to focus on small number of existing
standards that cover the span of chemical information, which are in
wide use and which are clearly defined in the scientific literature.
In no sense are they necessarily superior to other file types, but
each has a specific use which a significant proportion of the
community regard as serving a purpose. By encoding these formats
within a chemical content definition, we hope to enhance the
processing and perception of such information for the end user.

CXF and MIF are included as examples of modern, semantically rich and
structured datatypes. The CXF format is presented as essentially an
exchange format, although the long term goal is its adoption as a
general purpose molecular description format. The MIF format by way
of contrast is an user-extensible and human readable format which is
also characterised by its compact nature.  We think that both formats
will have their areas of application, and should be included here. A
number of other definitions are so called "legacy" systems which are
generally accepted as likely to continue in use for some years, if
only because a considerable amount of software has yet to be
developed to allow such migrations. We believe such formats will play
a major role in new initiatives in electronic journal publishing,
on-line conferences and other keynote areas. Because of the variety
of existing chemical definitions, we think it unlikely the chemistry
community will ever agree upon a single common standard and that a
mechanism such as MIME is likely to have a useful purpose in the
foreseeable future.

Table 1 lists the proposed chemical sub-type names, together with one
suggested filename qualifier. The latter are deliberately kept to three
characters for DOS compatibility.


Rzepa, Murray-Rust and Whitaker                               [Page 5]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995

Table 1.

Primary/sub-type           Suggested qualifier(s)  Reference

chemical/cxf                 cxf                   [8] 
chemical/mif                 mif                   [9] 
chemical/pdb                 pdb                   [10] 
chemical/cif                 cif                   [11] 
chemical/mdl-molfile         mol                   [12]
chemical/mdl-sdf             sdf                   [12] 
chemical/mdl-rdf             rdf                   [12] 
chemical/mdl-rxn             rxn                   [12] 
chemical/embl-dl-nucleotide  emb, embl             [13] 
chemical/genbank             gen                   [14]
chemical/gcg8-sequence       gcg                   [15]
chemical/daylight-smiles     smi                   [16] 
chemical/rosdal              ros                   [17] 
chemical/macromodel-input    mmd, mmod             [18] 
chemical/mopac-input         mop                   [19] 
chemical/gaussian-input      gau                   [20]
chemical/jcamp-dx            jdx                   [21]
chemical/kinemage            kin                   [22]


4.  Consultation Mechanisms. 

Our original suggestion for a primary chemical Media content type was
made on the Computational Chemistry discussion list (CCL) and on
newsgroups associated with molecular sciences (chemistry and biology)
in February 1994. Subsequently, the first Internet-Draft archived as
draft-rzepa-chemical-mime-type-00.txtwas available for discussion
during the period May-October, 1994. Its existence became widely
known and subsequently a number of working "proofs-of-concept" of
chemical Media types have become available[2-7]. A discussion list
exists, and can be joined by sending a message to
listserver@ic.ac.uk with the one line message

subscribe chemime your name.

Consolidated proceedings of this discussion group are available as:
http://www.ch.ic.ac.uk/hypermail/chemime/ Copies of relevant Internet
drafts and RFCs are available on http://www.ch.ic.ac.uk/internet/

Our proposal has proved extremely effective already in focusing on how
a consensus on handling the presentation and processing of this type
of media content can be achieved. We also have been delighted with
the way in which the chemical community have accepted and used the
Internet-Draft mechanism for developing standards and achieving such
consensus on a wider scale, having hitherto not been active in

Rzepa, Murray-Rust and Whitaker                               [Page 6]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995
 
defining Internet standards.  Many major bodies (learned societies,
data providers, chemical information vendors, etc) have partcipated
in discussion and none have seen the approach as unsuitable. We are
therefore extending the discussion to capitalise on this. That
section of the biological community which is related to molecular
information has participated in these discussions, and we are not
aware of any moves to define a separate type for biological content.
We nevertheless suggest that chemical content should not extend
further into biology than that which requires the particular
presentation style and default actions associated with the chemical
content. We would expect that further chemical sub-content types may
be proposed by sections of the community in the future, if
appropriate by the mechanism indicated in Appendix 1 of this
document.

5. References.

[1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail
Extensions) Part One:  Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
September 1993.

[2]  Fitzgerald, P., Computational Molecular Biology Section, National
Institutes of Health, USA.

[3] Bachrach S., (Conference Organiser), First Electronic Computational
Chemistry Conference, November, 1994.

[4] Richardson D.C., and Richardson J.S., Protein Science, 1992,  1, 3;
D. C. Richardson D. C., and Richardson J.S., Trends in Biochem.
Sci.1994, 19, 135.

[5] Rzepa H. S., Whitaker B. J., and Winter M. J., "Chemical
Applications of the World-Wide-Web", J. Chem. Soc., Chem. Commun.,
1994, 1907.

[6]  Casher O., Chandramohan G., Hargreaves M., Murray-Rust P., Sayle
R., Rzepa H.S., and Whitaker B. J., "Hyperactive Molecules and the
World-Wide-Web Information System", J. Chem. Soc., Perkin Trans 2,
1995, 7.

[7] Rzepa H.S.,  WWW94 Chemistry Workshop, Computer Networks and ISDN
Systems, 1994, 27, 317-8. See also page 328.

[8] Steckert T., and Mockus J., "Standard Chemical eXchange Format".
Chemical Abstracts Service, September, 1994.

[9].Allen F. H., Barnard A., Cook A., and Hall S. R., "Molecular
Information File", J. Chem. Inf. Comp. Sci., 1995, in press.

Rzepa, Murray-Rust and Whitaker                               [Page 7]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995

[10] Bernstein F.C., "The Protein Data Bank: A Computer-Based Archival
File for Macromolecular Structures", Journal of Molecular Biology,
1977, 112, pp. 535-542. See gopher://pdb.pdb.bnl.gov/ for the most
recent information.

[11] Hall R., Allen F.H., and Brown I.D., "The Crystallographic
Information File (cif) - A new standard archive file for
crystallography", Acta. Cryst. 1991, A47, 655.

[12] Dalby A., "The MDL Molfile, the concatenated MDL Molfile and the
MDL reactions datafile", J. Chem. Info. Comp. Sci., 1992, 32, 244.

[13] Rice C.M., Fuchs R., Higgins D.G., Stoehr P.J., Cameron G.N.,
"EMBL nucleotide Format EMBL Data Library", Nucleic Acids Research ,
1993, 21, 2967-2971.

[14]  Benson D., Lipman D.J., Ostell J., "GENbank", Nucleic Acids
Research , 1993, 21, 2963.

[15] Eberhardt, N.L., "The GCG format for sequence information",
Biotechniques, 1992, 13, 914-917.

[16] Weininger D., "The SMILES Format", J. Chem. Inf. Comput. Sco.,
1988, 28, 31.

[17] Barnard J., Jochum C., Welford S., Rosdal - A Universal Structure
Substructure Representation For Pc-Host Communication, , Abstracts Of
Papers Of The American Chemical Society, 1988, 196, Pp.13.

[18] Mohamadi F.,  Richards N.J.G., Guida W.C.,  Liskamp R.,  Lipton
M., Caufield C., Chang G., Hendrickson T., Still W.C., "Macromodel - an
integrated software system for Modeling Organic and Bioorganic
Molecules using Molecular Mechanics", Journal of Computational
Chemistry, 1990, 11, 440-467.

[19] Stewart J.J.P., "MOPAC a Semiempirical Molecular-Orbital Program",
Journal of Computer-aided Molecular Design, 1990, 4, 1-45.

[20] Frisch M.J., Trucks G.W., Head-Gordon M., Gill P.M.W., Wong M.W.,
Foresman J.B., Johnson B.G., Schlegel H.B., Robb M.A., Replogle E.S.,
Gomperts G., Andres J.L., Raghavachari K.,  Binkley J.S., Gonzalez C.,
Martin R.L., Fox D.J., Defrees D.J.,  Baker J., Stewart J.J.P., and
Pople J.A., "Gaussian 92", (Gaussian, Inc., Pittsburgh PA, 1992.

[21] Davies A.N., Lampen P., "JCAMP-DX for NMR, , Applied Spectroscopy,
1993, 47, 1093-1099; Rutledge D.N., Mcintyre P., "A proposed European
Implementation of the JCAMP-DX Format", Chemometrics and Intelligent
Laboratory Systems, 1992, 16, 95-101; JCAMP-DX, A standard format for
exchange of infrared-spectra in computer readable form, J. G.
Grasselli, Pure and Applied Chemistry 1991, 63, 1781-1792.

Rzepa, Murray-Rust and Whitaker                               [Page 8]

 Expires 09/03/draft-rzepa-chemical-mime-01.txt      March 1995

[22]  Richardson D.C.,  and Richardson J.S., "The Kinemage format", 
Protein Science, 1992,  1, 3;  Richardson D.C.,  and Richardson J.S.,  
Trends in Biochem. Sci. 1994, 19, 135.


6.  Security Considerations Section

The security implications of chemical MIME types do not differ from
those relevant to RFC 1521. We reproduce the section in RFC 1521 for
reference here; "Security issues are discussed in Section 7.4.2 and in
Appendix F. Implementors should pay special attention to the security
implications of any mail content-types that can cause the remote
execution of any actions in the recipient's environment. In such cases,
the discussion of the application/postscript content-type in Section
7.4.2 may serve as a model for considering other content- types with
remote execution capabilities."

7. Authors' Addresses.

Dr H S Rzepa, Department of Chemistry, Imperial College, London, SW7
2AY Phone: +44 171 594 5774 Fax +44 71 589 3869. 
E-Mail: rzepa@ic.ac.uk

Dr P. Murray-Rust, Glaxo Group Research, Greenford, Middlesex, UK
E-Mail: pmr1716@ggr.co.uk

Dr B. J. Whitaker, School of Chemistry, University of Leeds, UK.
E-Mail: benw@chemistry.leeds.ac.uk


8.  Suggested Mechanism for Registration of New Content-type/subtype Values

This is explained in details in RFC 1590, "Media Type Registration
Procedure". Send a proposed Media Type (content-type/subtype) to the
"ietf- types@cs.utk.edu" mailing list.  This mailing list has been
established for the sole purpose of reviewing proposed Media Types.
Proposed content-types are not formally registered and must use the
"x-" notation for the subtype name.

The intent of the public posting is to solicit comments and feedback on
the choice of content-type/subtype name, the unambiguity of the
references with respect to versions and external profiling information,
the choice of which OIDs to use, and a review of the security
considerations section.  It should be noted that the proposed Media
Type does not need to make sense for every possible application.  If
the Media Type is intended for a limited or specific use, this should
be noted in the submission.


After two weeks, submit the proposed Media Type to the IANA for
registration.  The request and supporting documentation should be sent
to "iana@isi.edu".  Provided a reasonable review period has elapsed,
the IANA will register the Media Type, assign an OID under the IANA
branch, and make the Media Type registration available to the
community.

It is strongly recommended that prior to such submissions, the
discussions of the chemical MIME group are consulted.  Details of how
to do this are given in section 4 of this document.

Rzepa, Murray-Rust and Whitaker                               [Page 9]