Network Working Group H. Rzepa Internet Draft Imperial College, London, SW7 2AY, UK. P. Murray-Rust Glaxo Group research, Greenford, Middlesex, UK. B. Whitaker School of Chemistry, University of Leeds, LS2 9JT, UK. Category: Standards Track February 1995 A Chemical Primary Content Type for Multipurpose Internet Mail Extensions. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Discussions of the chemical MIME discussion group are archived at URL: http://www.ch.ic.ac.uk/hypermail/chemime/ Abstract The purpose of this Internet Draft is to propose an update to Internet RFC 1521 to include a new primary content-type to be known as chemical. RFC 1521[1] describes mechanisms for specifying and describing the format of Internet Message Bodies via content-type/subtype pairs. We believe that chemical defines a fundamental type of content with unique presentational and processing aspects. We outline the typical expected uses of such a content type and propose a number of chemical sub-types. This document updates IETF Internet Draft draft-rzepa-chemical-mime-type-00.txt in which this specific proposal was made, incorporates suggestions received during the initial discussion period and indicates scientific support for and uptake of this proposal[2-7]. Rzepa, Murray-Rust and Whitaker [Page 1] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 Table of Contents: 1. Intended Audience for this Document. 2. The Need for a Chemical Primary Content Type. 3. A Proposed Core Set of Chemical Media Types. 4. Consultation Mechanisms. 5. References and Citations. 6. Security Considerations Section 7. Authors' Addresses. 8. Suggested Registration Mechanism for New Chemical/Subtype Values 1. Intended Audience for this Document. This is directed at anyone who is concerned with implementing Electronic Mail, Gopher, World-Wide Web and other information services supporting MIME (RFC 1521) in a local environment where chemical media information needs to be processed. We do not expect the "average" scientist to concern themselves with the details of defining specific Media types, but we would expect them to have access to local knowledge, or to specific examples and implementations of chemical Media types. This Internet Draft is a discussion document for an agreed definition, intended eventually to form a standard accepted extension to RFC 1521. We are also targetting developers of chemically cognisant and MIME compliant software who may wish to include default types in their configurations. 2. The Need for a Chemical Primary Content Type. The following quote by MIT Lab Director Nicholas Negroponte appeared in the Scientific American Special Issue 1995 p.102; "In the long run, model-based image transmission and encoding are better than transmission of pictures alone. Mathematical models of a scene can describe the spatial relations of the objects in it and maneuver them through space. The idea of capturing a picture with a camera is obsolete if one can instead capture a realistic model from which the receiver can generate any picture. For instance, from a real-time model of a baseball game, a fan watching at home could get the view from anywhere in the ballpark -- including the perspective of the baseball." In this paper, we present a case for recognising that chemicals form a well-bounded, standardised and accepted implementation of this type of mathematical model, but with their own unique features. To promote the implementation, use and development of such a model, we propose that a new primary media type of chemical be introduced. Rzepa, Murray-Rust and Whitaker [Page 2] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 2.1 Chemical Data Types are Well Defined and Widely Used Already. In excess of thirteen million chemicals are currently known to science, and many others as yet unknown have been speculated upon. A number of highly reliable, cross-referenced and indexed collections of molecular information have long been available in printed form, and since the 1970s these have been globally accessible in digital form on-line from organisations such as STN (Scientific and Technical Networks) or via locally implemented databases. During this period a number of well defined and documented formats for encoding chemical information have become accepted and widely used by molecular scientists. In recent years, the development of the Internet as the prime delivery mechanism of chemical content has accelerated. In addition to e-mail, mechanisms such as Gopher and the World-Wide Web system were being widely adopted by the chemistry and biology communities. Most recently, chemical electronic journals and extended learning courses in these subjects based on these delivery mechanisms have begun appearing. 2.2 The Role of Primary and Secondary MIME Content Types. Central to all these developments is the MIME concept as defined in Internet RFC 1521 [1]. This defines standards for the inclusion of message bodies in e-mail messages and other information systems such as the World-Wide Web. A two level mechanism exists comprising top level and sub-types. The MIME top-level types exist to allow mail gateways and other agents such as World-Wide Web clients to do filtering and/or conversion properly, and to allow user agents to have a default behavior for certain classes of objects. Neither the currently accepted primary content types nor the existing sub types contain any explicit proposals for actions on message bodies containing chemical information. It is our intention in this Internet draft to suggest a mechanism for the handling of such information in a consistent and extensible manner with the existing MIME guidelines. Central to our argument is that the media content of chemical information is quite different from "image", "audio" or "video" media types, and that manner in which the information is likely to be processed and presented is also fundamentally different. Existing top-level types were chosen very carefully to reflect the capabilities required for the media and presentation, and not the specific subject matter. In this context, we argue that a "chemical" type will have generic media and presentation requirements that are not fulfilled by the image, audio or video content types currently defined. For example, the default presentation behaviour for chemical content is expected to result in the rendering of a molecular scene or interpretation in either two or three dimensions, with some degree of navigation through chemical objects being possible, and with control over how individual objects (atoms for example) are displayed. Rzepa, Murray-Rust and Whitaker [Page 3] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 A degree of semantic interpretation in the process is essential. Chemical information can be intrinsically three dimensional, and normally comprises a collection of well defined objects. We envisage most, if not all, chemical media types to carry very substantial semantic content, e.g. structural information relating to the individual objects and their relationships. For example, the presentation of a carbon atom might require consideration of the valency at that atom, or depth cueing to represent the "stereochemistry" at that point. Further elaborations include model rotation, geometry calculations, extraction of numerical information, calculation of associated properties such as wavefunctions and their subsequent presentation. For these reasons, we considered the existing "image" type to be too restrictive for the presentational needs of the molecular science community, and would actually inhibit further development. An image is intrinsically taken to be a two-dimensional representation - normally a 2D bitmap rather than an object collection. In this context, we note that there is no uniquely definable way of specifying how any chemical content must be presented as a two dimensional image. In three dimensional terms, several excellent "helper" programs for achieving a default process with chemical content are readily available to the community and have become widely used in the last few years. It can be said therefore that default actions on 3D chemical information have achieved a certain level of de facto status, which would be consolidated by this proposal. More elaborate and truly innovative forms of processing of the content are also envisaged, including association with scientific instruments, presentation to automated synthesis robots and harvesting via indexing agents searching for structural features or themes in the chemical content. We note for example the rapidly developing area of advanced nanotechnology, where synthesis of arbitrary chemicals could indeed be performed by such machinery, potentially by acquiring the blueprints from network sources. Presentation of the media content in this context is quite different in nature from a display on a computer screen, and serves to emphasize the fundamentally different nature of chemical media types from other primary content types. RFC 1521 also defines a primary content type known as "application" to be used for data which do not fit in any of the other categories, particularly in the context of data to be processed by mail-based uses of application programs. Specific applications cited were e.g. the processing of Postscript or portable document format files. These are specific examples of precisely defined presentation formats with faithfull reproduction of the original intent and no default semantic parsing of higher level structures. With "chemical" content types, the presentation requirements are based much more on the existance of well defined generic semantic content based on chemical attributes, which have an easily specified default treatment, and which we argue need a well defined separate treatment. Features unique to chemical content are molecular markup-styles where individual objects within the content can be highlighted in a manner similar to that defined in say Hyper-text-markup-language, or where objects can be viewed from a variety of perspectives (for example space-fill, ribbon or wireframe presentation of chemical structures). Rzepa, Murray-Rust and Whitaker [Page 4] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 The boundary conditions of a chemical type can be specified, and an initial minimal set is defined below. The over-riding concern is to preserve this semantic interpretation of the presentation in any default treatment of the message body. We believe that such actions are not best implemented using the "application" type but should be allocated to a separate "chemical" type to convey the generic and semantic nature of the content, rather than necessarily the subject content. 3. A Proposed Core Set of Chemical Media Types. We are proposing in this draft to focus on small number of existing standards that cover the span of chemical information, which are in wide use and which are clearly defined in the scientific literature. In no sense are they necessarily superior to other file types, but each has a specific use which a significant proportion of the community regard as serving a purpose. By encoding these formats within a chemical content definition, we hope to enhance the presentation and perception of such information for the end user. CXF and MIF are included as examples of modern, semantically rich and structured datatypes. The CXF format is presented as essentially an exchange format, although the long term goal is its adoption as a general purpose molecular description format. The MIF format by way of contrast is an user-extensible and human readable format which is also characterised by its compact nature. We think that both formats will have their areas of application, and should be included here. A number of other definitions are so called "legacy" systems which are generally accepted as likely to continue in use for some years, if only because a considerable amount of software has yet to be developed to allow such migrations. We believe such formats will play a major role in new initiatives in electronic journal publishing, on-line conferences and other keynote areas. Because of the variety of existing chemical definitions, we think it unlikely the chemistry community will ever agree upon a single common standard and that a mechanism such as MIME is likely to have a useful purpose in the foreseeable future. Table 1 lists the proposed chemical sub-type names, together with one suggested filename qualifier. The latter are deliberately kept to three characters for DOS compatibility. Rzepa, Murray-Rust and Whitaker [Page 5] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 Table 1. Primary/sub-type Suggested qualifier(s) Reference chemical/cxf cxf [8] chemical/mif mif [9] chemical/pdb pdb [10] chemical/cif cif [11] chemical/mdl-molfile mol [12] chemical/mdl-sdf sdf [12] chemical/mdl-rdf rdf [12] chemical/mdl-rxn rxn [12] chemical/embl-dl-nucleotide emb, embl [13] chemical/genbank gen [14] chemical/ncbi-asn1 asn [14] chemical/gcg8-sequence gcg [15] chemical/daylight-smiles smi [16] chemical/rosdal ros [17] chemical/macromodel-input mmd, mmod [18] chemical/mopac-input mop [19] chemical/gaussian-input gau [20] chemical/jcamp-dx jdx [21] chemical/kinemage kin [4] 4. Consultation Mechanisms. Our original suggestion for a primary chemical Media content type was made on the Computational Chemistry discussion list (CCL) and on newsgroups associated with molecular sciences (chemistry and biology) in February 1994. Subsequently, the first Internet-Draft archived as draft-rzepa-chemical-mime-type-00.txt was available for discussion during the period May-October, 1994. Its existence became widely known and subsequently a number of working "proofs-of-concept" of chemical Media types have become available[2-7]. A discussion list exists, and can be joined by sending a message to listserver@ic.ac.uk with the one line message subscribe chemime your name. Consolidated proceedings of this discussion group are available as: http://www.ch.ic.ac.uk/hypermail/chemime/ Copies of relevant Internet drafts and RFCs are available on http://www.ch.ic.ac.uk/internet/ Our proposal has proved extremely effective already in focusing on how a consensus on handling the presentation and processing of this type of media content can be achieved. We also have been delighted with the way in which the chemical community have accepted and used the Internet-Draft mechanism for developing standards and achieving such consensus on a wider scale, having hitherto not been active in Rzepa, Murray-Rust and Whitaker [Page 6] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 defining Internet standards. Many major bodies (learned societies, data providers, chemical information vendors, etc) have partcipated in discussion and none have seen the approach as unsuitable. We are therefore extending the discussion to capitalise on this. That section of the biological community which is related to molecular information has participated in these discussions, and we are not aware of any moves to define a separate type for biological content. We nevertheless suggest that chemical content should not extend further into biology than that which requires the particular presentation style and default actions associated with the chemical content. We would expect that further chemical sub-content types may be proposed by sections of the community in the future, if appropriate by the mechanism indicated in section 8 of this document. 5. References and Citations. [1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993. [2] Fitzgerald P., "Molecules-R-Us Interface to the Brookhaven Data Base", Computational Molecular Biology Section, National Institutes of Health, USA; see http://www.nih.gov/htbin/pdb for further details; Peitsch M.C, Wells T.N.C., Stampf D.R., Sussman S. J., "The Swiss-3D Image Collection And PDP-Browser On The Worldwide Web", Trends In Biochemical Sciences, 1995, 20, 82. [3] "Proceedings of the First Electronic Computational Chemistry Conference", Eds. Bachrach, S. M., Boyd D. B., Gray S. K, Hase W., Rzepa H.S, ARInternet: Landover, Nov. 7- Dec. 2, 1994, in press; Bachrach S. M, J. Chem. Inf. Comp. Sci., 1995, in press. [4] Richardson D.C., and Richardson J.S., Protein Science, 1992, 1, 3; D. C. Richardson D. C., and Richardson J.S., Trends in Biochem. Sci.,1994, 19, 135. [5] Rzepa H. S., Whitaker B. J., and Winter M. J., "Chemical Applications of the World-Wide-Web", J. Chem. Soc., Chem. Commun., 1994, 1907; Casher O., Chandramohan G., Hargreaves M., Murray-Rust P., Sayle R., Rzepa H.S., and Whitaker B. J., "Hyperactive Molecules and the World-Wide-Web Information System", J. Chem. Soc., Perkin Trans 2, 1995, 7; Baggott J., "Biochemistry On The Web", Chemical & Engineering News, 1995, 73, 36; Schwartz A.T, Bunce D.M, Silberman R.G, Stanitski C.L, Stratton W.J, Zipp A.P, "Chemistry In Context - Weaving The Web", Journal Of Chemical Education, 1994, 71, 1041. Rzepa, Murray-Rust and Whitaker [Page 7] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 [6] Rzepa H.S., "WWW94 Chemistry Workshop", Computer Networks and ISDN Systems, 1994, 27, 317 and 328. [7] Nelson S.D., "Email MIME test page", Lawrence Livermore National Laboratory, 1994. See http://www-dsed.llnl.gov/documents/WWWtest.html and http://www-dsed.llnl.gov//documents/tests/email.html [8] Steckert T., and Mockus J., "Standard Chemical eXchange Format". Chemical Abstracts Service, September, 1994. [9] Allen F. H., Barnard A., Cook A., and Hall S. R., "Molecular Information File", J. Chem. Inf. Comp. Sci., 1995, in press; Hall S.R, Spadaccini N, "The Star File - Detailed Specifications", Journal Of Chemical Information And Computer Sciences, 1994, 34, 505. [10] Koetzle T.F, Abola E.E, Bernstein F.C, Callaway J.A, Christian J.J, Deroski B.R, Esposito P.A, Forman A, Mccarthy J.E, Skora J.G, "The Protein Data-Bank - Present Status And Future-Plans", Abstracts Of Papers Of The American Chemical Society 1991, 202, 31-Cinf. See also "The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures", Journal of Molecular Biology, 1977, 112, pp. 535-542. See http://www.pdb.bnl.gov/ for the most recent information. [11] Hall S. R., Allen F.H., and Brown I.D., "The Crystallographic Information File (cif) - A new standard archive file for crystallography", Acta. Cryst. 1991, A47, 655. [12] Dalby A, Nourse J.G, Hounshell W.D, Gushurst A.K.I, Grier D.L, Leland B.A, Laufer J., "Description of Several Chemical Structure File Formats Used by Computer Programs Developed at Molecular Design Limited", J. Chem. Inf. Comput. Sci., 1992, 32(3), 244. [13] Rice C.M., Fuchs R., Higgins D.G., Stoehr P.J., Cameron G.N., "EMBL nucleotide Format EMBL Data Library", Nucleic Acids Research , 1993, 21, 2967. [14] Benson D.A, Boguski M, Lipman D.J, Ostell J, "Genbank", Nucleic Acids Research 1994, 22, 3441; Benson D., Lipman D.J., Ostell J., "GENbank", Nucleic Acids Research , 1993, 21, 2963. [15] Eberhardt, N.L., "The GCG format for sequence information", Biotechniques, 1992, 13, 914-917. Rzepa, Murray-Rust and Whitaker [Page 8] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 [16] Weininger D., "The SMILES Format", J. Chem. Inf. Comput. Sco., 1988, 28, 31. [17] Barnard J., Jochum C., Welford S., "Rosdal - A Universal Structure Substructure Representation For Pc-Host Communication", Abstracts Of Papers Of The American Chemical Society, 1988, 196, 13. [18] Mohamadi F., Richards N.J.G., Guida W.C., Liskamp R., Lipton M., Caufield C., Chang G., Hendrickson T., Still W.C., "Macromodel - an integrated software system for Modeling Organic and Bioorganic Molecules using Molecular Mechanics", Journal of Computational Chemistry, 1990, 11, 440. [19] Stewart J.J.P., "MOPAC: A Semiempirical Molecular-Orbital Program", Journal of Computer-aided Molecular Design, 1990, 4, 1. [20] Frisch M.J., Trucks G.W., Head-Gordon M., Gill P.M.W., Wong M.W., Foresman J.B., Johnson B.G., Schlegel H.B., Robb M.A., Replogle E.S., Gomperts G., Andres J.L., Raghavachari K., Binkley J.S., Gonzalez C., Martin R.L., Fox D.J., Defrees D.J., Baker J., Stewart J.J.P., and Pople J.A., "Gaussian 92", (Gaussian, Inc., Pittsburgh PA, 1992. [21] Davies A.N., Lampen P., "JCAMP-DX for NMR, , Applied Spectroscopy, 1993, 47, 1093-1099; Rutledge D.N., Mcintyre P., "A proposed European Implementation of the JCAMP-DX Format", Chemometrics and Intelligent Laboratory Systems, 1992, 16, 95-101; JCAMP-DX, A standard format for exchange of infrared-spectra in computer readable form, J. G. Grasselli, Pure and Applied Chemistry 1991, 63, 1781-1792. 6. Security Considerations Section The security implications of chemical MIME types do not differ from those relevant to RFC 1521. We reproduce the section in RFC 1521 for reference here; ''Security issues are discussed in Section 7.4.2 and in Appendix F. Implementors should pay special attention to the security implications of any mail content-types that can cause the remote execution of any actions in the recipient's environment. In such cases, the discussion of the application/postscript content-type in Section 7.4.2 may serve as a model for considering other content- types with remote execution capabilities. '' 7. Authors' Addresses. Dr H S Rzepa, Department of Chemistry, Imperial College, London, SW7 2AY Phone: +44 171 594 5774 Fax +44 71 594 5804. E-Mail: rzepa@ic.ac.uk Dr P. Murray-Rust, Glaxo Group Research, Stevenage, Herts, UK E-Mail: pmr1716@ggr.co.uk Dr B. J. Whitaker, School of Chemistry, University of Leeds, UK. E-Mail: benw@chemistry.leeds.ac.uk Rzepa, Murray-Rust and Whitaker [Page 9] Expires 09/20/95 draft-rzepa-chemical-mime-01.txt March 1995 8. Suggested Registration Mechanism for New Chemical/Subtype Values This is explained in RFC 1590, "Media Type Registration Procedure", from which the following is quoted. This document should be consulted for further detail. It is strongly recommended that prior to submissions, the discussions of the chemical MIME group are consulted. Details of how to do this are given in section 4 of this document. '' Send a proposed Media Type (content-type/subtype) to the "ietf-types@cs.utk.edu" mailing list. This mailing list has been established for the sole purpose of reviewing proposed Media Types. Proposed content-types are not formally registered and must use the "x-" notation for the subtype name. The intent of the public posting is to solicit comments and feedback on the choice of content-type/subtype name, the unambiguity of the references with respect to versions and external profiling information, the choice of which OIDs to use, and a review of the security considerations section. It should be noted that the proposed Media Type does not need to make sense for every possible application. If the Media Type is intended for a limited or specific use, this should be noted in the submission. After two weeks, submit the proposed Media Type to the IANA for registration. The request and supporting documentation should be sent to "iana@isi.edu". Provided a reasonable review period has elapsed, the IANA will register the Media Type, assign an OID under the IANA branch, and make the Media Type registration available to the community. '' Rzepa, Murray-Rust and Whitaker [Page 10]