Network Working Group H. Rzepa Internet Draft Imperial College, London, SW7 2AY, UK. P. Murray-Rust Glaxo Group research, Greenford, Middlesex, UK. B. Whitaker School of Chemistry, University of Leeds, LS2 9JT, UK. Category: Standards Track February 1995 A Chemical Primary Content Type for Multipurpose Internet Mail Extensions. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Discussions of the chemical MIME group are archived at URL: http://www.ch.ic.ac.uk/hypermail/chemime/ Abstract The purpose of this Internet Draft is to propose an update to Internet RFC 1521 to include a new primary content-type to be known as chemical. RFC 1521[1] describes mechanisms for specifying and describing the format of Internet Message Bodies via content-type/subtype pairs. We believe that chemical defines a fundamentally type of content with unique presentational and processing aspects. We outline the typical expected uses of such a content type and propose a number of chemical sub-types. This document updates IETF Internet Draft draft-rzepa-chemical-mime-type-00.txt in which this specific proposal was made, incorporates suggestions received during the initial discussion period and indicates scientific support and uptake for this proposal. Rzepa, Murray-Rust and Whitaker [Page 1] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 Table of Contents: 1. Intended Audience for this Document. 2. The Need for a Chemical Primary Content Type. 3. A Proposed Core Set of Chemical Media Types. 4. Consultation Mechanisms. 5. References. 6. Security Considerations Section 7. Authors' Addresses. 8. Suggested Registration Mechanism for New chemical/subtype Values 1. Intended Audience for this Document. This is directed at anyone who is concerned with implementing Electronic Mail, Gopher, World-Wide Web and other information services supporting MIME (RFC 1521) in a local environment where specifically chemical information is processed. We do not expect the "average" chemist, or molecular biologist to concern themselves with the details of defining specific Media types, but we would expect them to have access to local knowledge, or to specific examples and implementations of chemical Media types. This Internet Draft is intended to set out a discussion document for a standard definition such that local implementors can comply with the proposed standards. We are also targetting developers of chemically cognisant and MIME compliant software who may wish to include default types in their configurations. 2. The Need for a Chemical Primary Content Type. The following quote by MIT Lab Director Nicholas Negroponte appeared in the Scientific American Special Issue 1995 p.102; "In the long run, model-based image transmission and encoding are better than transmission of pictures alone. Mathematical models of a scene can describe the spatial relations of the objects in it and maneuver them through space. The idea of capturing a picture with a camera is obsolete if one can instead capture a realistic model from which the receiver can generate any picture. For instance, from a real-time model of a baseball game, a fan watching at home could get the view from anywhere in the ballpark -- including the perspective of the baseball" In this paper, we present a case for recognising that chemicals form a well-bounded and standardised example of this type of model, and further applications not envisaged above are enabled by considering chemical as a new primary media type. Rzepa, Murray-Rust and Whitaker [Page 2] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 2.1 Chemical Data Types are Well Defined and Widely Used Already. In excess of thirteen million chemicals are currently known to science, and many others as yet unknown have been speculated upon. A number of highly reliable, cross-referenced and indexed collections of molecular information have long been available in printed form, and since the 1970s these have been globally accessible in digital form on-line from organisations such as STN (Scientific and Technical Networks) or via locally implemented databases. During this period a number of well defined and documented formats for encoding chemical information have become accepted and widely used by molecular scientists. In recent years, the development of the Internet as the prime delivery mechanism of chemical content has accelerated. In addition to e-mail, mechanisms such as Gopher and the World-Wide Web system were being widely adopted by the chemistry and biology communities. Most recently, chemical electronic journals and extended learning courses in these subjects based on these delivery mechanisms have begun appearing. 2.2 The Role of Primary and Secondary MIME Content Types. Central to all these developments is the MIME concept as defined in Internet RFC 1521 [1]. This defines standards for the inclusion of message bodies in e-mail messages and other information systems such as the World-Wide Web. A two level mechanism exists comprising top level and sub-types. The MIME top-level types exist to allow mail gateways and other agents such as World-Wide Web clients to do filtering and/or conversion properly, and to allow user agents to have a default behavior for certain classes of objects. Neither the currently accepted primary content types nor the existing sub types contain any explicit proposals for actions on message bodies containing chemical information. It is our intention in this Internet draft to suggest a mechanism for the handling of such information in a consistent and extensible manner with the existing MIME guidelines. Central to our argument is that the media content of chemical information is quite different from "image", "audio" or "video" media types, and that manner in which the information is likely to be processed and presented is also fundamentally different. Existing top-level types were chosen very carefully to reflect the capabilities required for the media and presentation, and not the specific subject matter. In this context, we argue that a "chemical" type will have generic media and presentation requirements that are not fulfilled by the image, audio or video content types currently defined. For example, the default presentation behaviour for chemical content is expected to result in the rendering of a molecular scene or interpretation in either two or three dimensions, with some degree of navigation through chemical objects being possible, and with control over how individual objects (atoms for example) are displayed. Rzepa, Murray-Rust and Whitaker [Page 3] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 A degree of semantic interpretation in the process is essential. For example, the presentation of a carbon atom might require consideration of the valency at that atom, or depth cueing to represent the "stereochemistry" at that point. We considered the existing "image" type to be too restrictive for the presentational needs of the molecular science community. An image is a two-dimensional representation - generally a 2D bitmap. Chemical information can be intrinsically three dimensional, and normally comprises a collection of well defined objects. We envisage most, if not all, chemical media types to carry very substantial semantic content, e.g. structural information relating to the individual objects and their relationships. We also recognise that there is no uniquely definable way of specifying how any chemical content must be presented as an image. Nevertheless, several excellent "helper" programs for achieving a default process are readily available to the community and have become widely used in the last few years. It can be said therefore that default actions on 3D chemical information have achieved a certain level of de facto status, which would be consolidated by this proposal. More elaborate and truly innovative forms of processing of the content are also envisaged, including association with scientific instruments, presentation to automated synthesis robots and harvesting via indexing agents searching for structural features or themes in the chemical content. Both the diversity and the novelty of these forms of presentation again distinguish chemical types from other existing primary types. RFC 1521 also defines a primary content type known as "application" to be used for data which do not fit in any of the other categories, particularly in the context of data to be processed by mail-based uses of application programs. Specific applications cited were e.g. the processing of Postscript or portable document format files. These are specific examples of precisely defined presentation formats with faithfull reproduction of the original intent and no default semantic parsing of higher level structures. With "chemical" content types, the presentation requirements are based much more on the existance of well defined generic semantic content based on chemical attributes, which have an easily specified default treatment, and which we argue need a well defined separate treatment. Features unique to chemical content are molecular markup-styles where individual objects within the content can be highlighted in a manner similar to that defined in say Hyper-text-markup-language, or where objects can be viewed from a variety of perspectives (for example space-fill, ribbon or wireframe presentation of chemical structures). The boundary conditions of a chemical type can be specified, and an initial minimal set is defined below. The over-riding concern is to preserve this semantic interpretation of the presentation in any default treatment of the message body. We believe that such actions are not best implemented using the "application" type but should be allocated to a separate "chemical" type. Rzepa, Murray-Rust and Whitaker [Page 4] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 3. A Proposed Core Set of Chemical Media Types. We are proposing in this draft to focus on small number of existing standards that cover the span of chemical information, which are in wide use and which are clearly defined in the scientific literature. In no sense are they necessarily superior to other file types, but each has a specific use which a significant proportion of the community regard as serving a purpose. By encoding these formats within a chemical content definition, we hope to enhance the processing and perception of such information for the end user. CXF and MIF are included as examples of modern, semantically rich and structured datatypes. The CXF format is presented as essentially an exchange format, although the long term goal is its adoption as a general purpose molecular description format. The MIF format by way of contrast is an user-extensible and human readable format which is also characterised by its compact nature. We think that both formats will have their areas of application, and should be included here. A number of other definitions are so called "legacy" systems which are generally accepted as likely to continue in use for some years, if only because a considerable amount of software has yet to be developed to allow such migrations. We believe such formats will play a major role in new initiatives in electronic journal publishing, on-line conferences and other keynote areas. Because of the variety of existing chemical definitions, we think it unlikely the chemistry community will ever agree upon a single common standard and that a mechanism such as MIME is likely to have a useful purpose in the foreseeable future. Table 1 lists the proposed chemical sub-type names, together with one suggested filename qualifier. The latter are deliberately kept to three characters for DOS compatibility. Rzepa, Murray-Rust and Whitaker [Page 5] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 Table 1. Primary/sub-type Suggested qualifier(s) Reference chemical/cxf cxf [8] chemical/mif mif [9] chemical/pdb pdb [10] chemical/cif cif [11] chemical/mdl-molfile mol [12] chemical/mdl-sdf sdf [12] chemical/mdl-rdf rdf [12] chemical/mdl-rxn rxn [12] chemical/embl-dl-nucleotide emb, embl [13] chemical/genbank gen [14] chemical/gcg8-sequence gcg [15] chemical/daylight-smiles smi [16] chemical/rosdal ros [17] chemical/macromodel-input mmd, mmod [18] chemical/mopac-input mop [19] chemical/gaussian-input gau [20] chemical/jcamp-dx jdx [21] chemical/kinemage kin [22] 4. Consultation Mechanisms. Our original suggestion for a primary chemical Media content type was made on the Computational Chemistry discussion list (CCL) and on newsgroups associated with molecular sciences (chemistry and biology) in February 1994. Subsequently, the first Internet-Draft archived as draft-rzepa-chemical-mime-type-00.txtwas available for discussion during the period May-October, 1994. Its existence became widely known and subsequently a number of working "proofs-of-concept" of chemical Media types have become available[2-7]. A discussion list exists, and can be joined by sending a message to listserver@ic.ac.uk with the one line message subscribe chemime your name. Consolidated proceedings of this discussion group are available as: http://www.ch.ic.ac.uk/hypermail/chemime/ Copies of relevant Internet drafts and RFCs are available on http://www.ch.ic.ac.uk/internet/ Our proposal has proved extremely effective already in focusing on how a consensus on handling the presentation and processing of this type of media content can be achieved. We also have been delighted with the way in which the chemical community have accepted and used the Internet-Draft mechanism for developing standards and achieving such consensus on a wider scale, having hitherto not been active in Rzepa, Murray-Rust and Whitaker [Page 6] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 defining Internet standards. Many major bodies (learned societies, data providers, chemical information vendors, etc) have partcipated in discussion and none have seen the approach as unsuitable. We are therefore extending the discussion to capitalise on this. That section of the biological community which is related to molecular information has participated in these discussions, and we are not aware of any moves to define a separate type for biological content. We nevertheless suggest that chemical content should not extend further into biology than that which requires the particular presentation style and default actions associated with the chemical content. We would expect that further chemical sub-content types may be proposed by sections of the community in the future, if appropriate by the mechanism indicated in Appendix 1 of this document. 5. References. [1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993. [2] Fitzgerald, P., Computational Molecular Biology Section, National Institutes of Health, USA. [3] Bachrach S., (Conference Organiser), First Electronic Computational Chemistry Conference, November, 1994. [4] Richardson D.C., and Richardson J.S., Protein Science, 1992, 1, 3; D. C. Richardson D. C., and Richardson J.S., Trends in Biochem. Sci.1994, 19, 135. [5] Rzepa H. S., Whitaker B. J., and Winter M. J., "Chemical Applications of the World-Wide-Web", J. Chem. Soc., Chem. Commun., 1994, 1907. [6] Casher O., Chandramohan G., Hargreaves M., Murray-Rust P., Sayle R., Rzepa H.S., and Whitaker B. J., "Hyperactive Molecules and the World-Wide-Web Information System", J. Chem. Soc., Perkin Trans 2, 1995, 7. [7] Rzepa H.S., WWW94 Chemistry Workshop, Computer Networks and ISDN Systems, 1994, 27, 317-8. See also page 328. [8] Steckert T., and Mockus J., "Standard Chemical eXchange Format". Chemical Abstracts Service, September, 1994. [9].Allen F. H., Barnard A., Cook A., and Hall S. R., "Molecular Information File", J. Chem. Inf. Comp. Sci., 1995, in press. Rzepa, Murray-Rust and Whitaker [Page 7] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 [10] Bernstein F.C., "The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures", Journal of Molecular Biology, 1977, 112, pp. 535-542. See gopher://pdb.pdb.bnl.gov/ for the most recent information. [11] Hall R., Allen F.H., and Brown I.D., "The Crystallographic Information File (cif) - A new standard archive file for crystallography", Acta. Cryst. 1991, A47, 655. [12] Dalby A., "The MDL Molfile, the concatenated MDL Molfile and the MDL reactions datafile", J. Chem. Info. Comp. Sci., 1992, 32, 244. [13] Rice C.M., Fuchs R., Higgins D.G., Stoehr P.J., Cameron G.N., "EMBL nucleotide Format EMBL Data Library", Nucleic Acids Research , 1993, 21, 2967-2971. [14] Benson D., Lipman D.J., Ostell J., "GENbank", Nucleic Acids Research , 1993, 21, 2963. [15] Eberhardt, N.L., "The GCG format for sequence information", Biotechniques, 1992, 13, 914-917. [16] Weininger D., "The SMILES Format", J. Chem. Inf. Comput. Sco., 1988, 28, 31. [17] Barnard J., Jochum C., Welford S., Rosdal - A Universal Structure Substructure Representation For Pc-Host Communication, , Abstracts Of Papers Of The American Chemical Society, 1988, 196, Pp.13. [18] Mohamadi F., Richards N.J.G., Guida W.C., Liskamp R., Lipton M., Caufield C., Chang G., Hendrickson T., Still W.C., "Macromodel - an integrated software system for Modeling Organic and Bioorganic Molecules using Molecular Mechanics", Journal of Computational Chemistry, 1990, 11, 440-467. [19] Stewart J.J.P., "MOPAC a Semiempirical Molecular-Orbital Program", Journal of Computer-aided Molecular Design, 1990, 4, 1-45. [20] Frisch M.J., Trucks G.W., Head-Gordon M., Gill P.M.W., Wong M.W., Foresman J.B., Johnson B.G., Schlegel H.B., Robb M.A., Replogle E.S., Gomperts G., Andres J.L., Raghavachari K., Binkley J.S., Gonzalez C., Martin R.L., Fox D.J., Defrees D.J., Baker J., Stewart J.J.P., and Pople J.A., "Gaussian 92", (Gaussian, Inc., Pittsburgh PA, 1992. [21] Davies A.N., Lampen P., "JCAMP-DX for NMR, , Applied Spectroscopy, 1993, 47, 1093-1099; Rutledge D.N., Mcintyre P., "A proposed European Implementation of the JCAMP-DX Format", Chemometrics and Intelligent Laboratory Systems, 1992, 16, 95-101; JCAMP-DX, A standard format for exchange of infrared-spectra in computer readable form, J. G. Grasselli, Pure and Applied Chemistry 1991, 63, 1781-1792. Rzepa, Murray-Rust and Whitaker [Page 8] Expires 09/03/draft-rzepa-chemical-mime-01.txt March 1995 [22] Richardson D.C., and Richardson J.S., "The Kinemage format", Protein Science, 1992, 1, 3; Richardson D.C., and Richardson J.S., Trends in Biochem. Sci. 1994, 19, 135. 6. Security Considerations Section The security implications of chemical MIME types do not differ from those relevant to RFC 1521. We reproduce the section in RFC 1521 for reference here; "Security issues are discussed in Section 7.4.2 and in Appendix F. Implementors should pay special attention to the security implications of any mail content-types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the application/postscript content-type in Section 7.4.2 may serve as a model for considering other content- types with remote execution capabilities." 7. Authors' Addresses. Dr H S Rzepa, Department of Chemistry, Imperial College, London, SW7 2AY Phone: +44 171 594 5774 Fax +44 71 589 3869. E-Mail: rzepa@ic.ac.uk Dr P. Murray-Rust, Glaxo Group Research, Greenford, Middlesex, UK E-Mail: pmr1716@ggr.co.uk Dr B. J. Whitaker, School of Chemistry, University of Leeds, UK. E-Mail: benw@chemistry.leeds.ac.uk 8. Suggested Mechanism for Registration of New Content-type/subtype Values This is explained in details in RFC 1590, "Media Type Registration Procedure". Send a proposed Media Type (content-type/subtype) to the "ietf- types@cs.utk.edu" mailing list. This mailing list has been established for the sole purpose of reviewing proposed Media Types. Proposed content-types are not formally registered and must use the "x-" notation for the subtype name. The intent of the public posting is to solicit comments and feedback on the choice of content-type/subtype name, the unambiguity of the references with respect to versions and external profiling information, the choice of which OIDs to use, and a review of the security considerations section. It should be noted that the proposed Media Type does not need to make sense for every possible application. If the Media Type is intended for a limited or specific use, this should be noted in the submission. After two weeks, submit the proposed Media Type to the IANA for registration. The request and supporting documentation should be sent to "iana@isi.edu". Provided a reasonable review period has elapsed, the IANA will register the Media Type, assign an OID under the IANA branch, and make the Media Type registration available to the community. It is strongly recommended that prior to such submissions, the discussions of the chemical MIME group are consulted. Details of how to do this are given in section 4 of this document. Rzepa, Murray-Rust and Whitaker [Page 9]