This document is available on-line as http://www.ch.ic.ac.uk/chemime/iupac.html
If you have any comments, please mail any of the principal authors of this document
(rzepa@ic.ac.uk, p.murray-rust@mail.cryst.bbk.ac.uk, benw@chemistry.leeds.ac.uk)
or send a comment to the discussion forum chemime@ic.ac.uk)
The Chemical MIME Project
Henry S. Rzepa,(a) Peter-Murray Rust (b) and Benjamin Whitaker (c)
August, 1996
(a) Department of Chemistry, Imperial College, London.
(b) Department of Pharmacology, University of Nottingham.
(c) School of Chemistry, University of Leeds.
Contents.
- Background and History to the Project
- Why do we Need Chemical Internet Standards?
- Chemical MIME Types included in the May - October 1995 IETF draft
- New types Proposed since the Original IETF Draft
- The Chemime Discussion list archives
- Uptake of Chemical MIME Usage (Alta Vista Statistics)
- A List of Projects Utilising Chemical MIME
- Software which supports chemical MIME media types directly
- Background articles and other information about chemical MIME
1. Background and History to the Project
Prior to 1992, the "Internet" was essentially a matrix of computer networks
bound by a common "network" protocol and used predominantly as a computer
file transfer mechanism and electronic mail carrying mechanism. Standards were
in place, but they tended to be generic ones dealing with technical issues. No
explicit chemical standards were in place. Around 1993, two new mechanisms were
introduced on the Internet.
(a) Electronic mail evolved from purely text based communication, to systems where
"attachments" to the message could be included. For the first time, it became possible
to include attachments which could have chemical content.
(b) A mechanism for document delivery called the World-Wide Web was introduced.
Here too, a document could be associated with chemical content via a device known
as a "hyperlink".
It became obvious during 1993 that the enormous potential for exchange
of structured information that the Internet now offered would have to be
matched by globally accepted standards for such information.
At this stage, we considered that "chemical" information represented a potentially definable
class of "media type" that had certain unique characteristics that would require
particular handling by the recipient of such information. We started a
project in January 1994 which we called "Chemical MIME". This was first announced during the
Chemistry workshop
at the First WWW International Conference, held at CERN in May 1994.
Our intention was to establish a set of standard "headers"
that would unambiguously identify "chemical content" in Internet
Electronic Mail message bodies and World-Wide Web documents.
We originally identified a small number of relatively
standard file types containing chemical information, which together
with the addition of a chemical MIME header, would enable the content
to be sensibly processed by the recipient of the information.
Essentially, this was an addition to the MIME standard which had been
proposed and ratified via a body called the IETF (Internet Engineering
Task Force) in 1993. We initially approached the IETF with our
proposal, via a discussion document called an Internet Draft. The
first version of this was published during May-October 1994, and a
second revised version during April-September 1995. These two
proposals each expired after six months of discussion. In July
1995, we presented our case in person at the IETF meeting in
Stockholm. Out of this meeting there emerged several conclusions.
- The IETF mechanisms for discussing
and accepting such enhancements of the MIME standards were not functioning efficiently
at that time.
- The IETF committees were focused largely on Electronic Mail
handling and as a result, discussion centered on a relatively
non-productive attempt at semantic distinction between "chemistry" as
a "subject type" and "chemical" as a "media type". Here, the
sociological question of whether all subject disciplines would rush to
register their own discipline as a MIME type was the perceived danger
- Application of MIME to the
World-Wide Web was viewed by some purists as departing from the original strict interpretation
of the MIME standard, and the interaction between the two communities appeared strained
and at the time unproductive.
Meanwhile, the original chemical MIME proposals were widely disseminated throughout the chemical
communities, and have been widely adopted via various electronic forums. It is the
proposal of the current discussion document to define a set of chemical MIME
standards for ensorsement by the IUPAC committees.
2. Why do we Need Chemical Internet Standards?
Effective information exchange takes place when everyone uses the same tools.
There are a number of ways this can happen, varying from careful planning over
several years to the adoption of a de facto approach that everyone
uses. When organisations try to develop informatics tools without the general
knowledge and consent of the community great tensions usually result, and
it is our intention to try facilitate progress with as little conflict as
possible.
Communities are usually suspicious of organisations that 'go it alone' in
developing informatics tools and this often results in competing systems
developed under a veils of secrecy; there is a built-in disadvantage to
those outside the developer's organisation. This is evidenced by the
flame-wars that are common on many public newsgroups and discussion
groups at present.
Chemistry has inherited a large number of legacy approaches to information
and whilst these are useful for some subsets of the discipline, we feel
strongly that the tools of the future will only come through public debate and
cooperation. Also, however, we need a variety of ideas and approaches so it
is valuable to see which ones 'evolve' as well as being planned. If a
subcommunity finds a useful de facto standard, that may well be worthy
of recognition as such; but it may also need careful tailoring so that it
interfaces well with other areas. This can only come through public activity.
Many tools developed by single organisations in a competitive situation are
not future-proof; i.e. they may not be interpretable in a few years'
time and the information may be effectively lost. This is particularly likely
for binary files, but may also happen when numbers or abbreviations are used.
Examples of this are common, and it would be presumptuous to guess which
products were still supported in the future.
Terms are often given different semantics or used with default units. It is
therefore important to agree with the rest of the community how a term is
to be interpreted, and ideally there should be algorithms to convert to related
terms.
Guidelines
We propose that those developing informatics standards commit to the following
guidelines:
General
- All developers of informatics tools commit to these guidelines; there
should be no reason to depart from them without warning and discussion.
- Organisations should be prepared, where appropriate, to enter the
standards creation process in a constructive manner.
- Chemical information exchanged in the public arena should follow these
guidelines.
- All documentation required to interpret the syntax and
semantics of the information in a document should be publicly accessible.
- Documents should be validatable against this documentation.
- Where possible, developers should make code available for reading and
writing documents in a given format.
- Where possible, distributors of information should use the chemical/*
MIME approach described below.
- New technology opens new ways of managing information, some of which
may challenge or undermine the guidelines here. All developers should
try to anticipate this by raising such issues with the community, and not
pre-empt the communal view by inappropriate use of technology.
Chemical/* MIME
A proposal for classifying and regulating the types of chemical document has
been submitted to the IETF. A number of existing file types were proposed
which have met with wide acceptance in the molecular community. Until the IETF
or other body ratifies the proposal, the following guidelines for the use
of MIME types are proposed:
- The number of types should be strictly limited. (Usually each type
requires specific software to read it, thus placing a burden on readers).
- MIME types should not be used or developed as a means of gaining a market
monopoly; it is required that the description of a document's format be made
public and reading/writing software is encouraged.
- New versions of exisiting types must not be introduced without prior
discusssion and acceptance. This includes clear versioning procedures
and detailed documentation of revisions.
- New subtypes will be discouraged if there is already a subtype which
is capable of carrying that information. chemical/* should not be
seen as simply a means of authenticating current legacy systems.
- The use of chemical/* should not be used as a means of adding
apparent authority to an organisation's project or system if this has not
already been discussed. The use of chemical/* in the public arena
in ways not sanctioned is deprecated.
- Developers are urged to use markup languages where possible (e.g. CIF,
ASN.1, CML) and to define the semantics of use (e.g. through glossaries of
terms). These definitions should be
public, and if possible equivalences to other definitions should be given.
3. Chemical MIME Types included in the May - October 1995 draft as part of a "standards track" process.
The following list of chemical MIME types forms the main body of
the IETF Internet draft valid during the period May - October 1995.
Type | Filename extension
|
---|
chemical/x-cxf | cxf
|
chemical/x-mif | mif
|
chemical/x-pdb | pdb
|
chemical/x-cif | cif
|
chemical/x-mdl-molfile | mol
|
chemical/x-mdl-sdf | sdf
|
chemical/x-mdl-rdf | rdf
|
chemical/x-mdl-rxn | rxn
|
chemical/x-embl-dl-nucleotide | emb, embl
|
chemical/x-genbank | gen
|
chemical/ncbi-asn1-binary | val
|
chemical/x-gcg8-sequence | gcg
|
chemical/x-daylight-smiles | smi
|
chemical/x-rosdal | ros
|
chemical/x-macromodel-input | mmd, mmod
|
chemical/x-mopac-input | mop
|
chemical/x-gaussian-input | gau
|
chemical/x-jcamp-dx | jdx
|
chemical/x-kinemage | kin
|
4. New types Proposed since the Original IETF Draft
5. The Chemime Discussion list archives.
During the Period November 1994 - present, a discussion list has been active for people to
discuss various aspects of the proposals. Users can
subscribe by sending a message to listserver@ic.ac.uk with the content
subscribe chemime your name
The discussions of this forum are archived under
http://www.ch.ic.ac.uk/hypermail/chemime/
6. Uptake of Chemical MIME Usage (Alta Vista Statistics)
The following url fragments represent Alta Vista
(http://www.altavista.digital.com/)
searches using the Advanced Query feature
after the keyword "link:" (e.g. link:www.ch.ic.ac.uk). It can report an
estimate or actual count of the number of pages pointing to a particular link.
This search was performed on July 23, 1996.
URL Used for Alta Vista Search | Number of other documents with a Hyperlink to this page | Comment on page
|
---|
"www.ch.ic.ac.uk/chemical_mime.html" | 550 | One of the twooriginal project pages used to illustrat the use of chemical MIME-types.
|
"chem.leeds.ac.uk/Project/MIME.html" | 400 | The second MIME project page
|
"www.ch.ic.ac.uk/chemime/chemime2.html" | 250 | The original IETF Chemical MIME-types standards document
|
>"www.ncbi.nlm.nih.gov" | 8000 | NCBI
|
"www.pdb.bnl.gov" | 3000 | Brookhaven
|
"www.prosci.uci.edu" | 2000 | The Electronic Journal Protein Science
|
"structbio.nature.com" | 900 | Nature
|
7. A List of Projects Utilising Chemical MIME
- Original Examples at Imperial and Leeds Universities.
- NCBI Project at NIH
- Brookhaven Protein Databank
- Molecules R Us facility at the NIH
- Protein Science E-Journal
- Journal of Molecular Modelling
- Nature Science Journal
- Electronic Conferences in Trends in Organic Chemistry: ECTOC-1 and ECHET96
- Klotho Project at WUSTL.
- Project CORINA at Erlangen University
- ChemFinder Project by CambridgeSoft
- Demos by Daylight Software.
- Molecule-of-the-Month Collections
- Chemical and Drug Structure Display at the NIH
8. Software which supports chemical MIME media types directly
- Chemscape Chime by MDLI: A Netscape plug-in.
9. Background articles and other information about chemical MIME.
- Antony N. Davies, "Internet Chemical MIME", Spectroscopy Europe, 1996, 8(1), 42.
- H. S. Rzepa, B. J. Whitaker and M. J. Winter, J. Chem. Soc., Chem. Commun., 1994, 1907.
- O. Casher, G. Chandramohan, M. Hargreaves, C. Leach, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, J. Chem. Soc., Perkin Trans 2, 1995, 7.
- S. M. Bachrach, P. Murray-Rust, H. S. Rzepa and B. J. Whitaker, Network Science, March, 1996.
- Maryilyn Dunker, Indiana University, Chemical Information Viewers: A Collection of programs that can be used with chemical MIME datasets.
- Scott Nelson, Lawrence Livermore National Laboratory, A test page for checking your MIME Configurations