Standards in Chemical Information
DRAFT
Effective information exchange takes place when everyone uses the same tools.
There are a number of ways this can happen, varying from careful planning over
several years to the adoption of a de facto approach that everyone
uses. When organisations try to develop informatics tools without the general
knowledge and consent of the community great tensions usually result, and
it is our intention to try facilitate progress with as little conflict as
possible.
Communities are usually suspicious of organisations that 'go it alone' in
developing informatics tools and this often results in competing systems
developed under a veils of secrecy; there is a built-in disadvantage to
those outside the developer's organisation. This is evidenced by the
flame-wars that are common on many public newsgroups and discussion
groups at present.
Chemistry has inherited a large number of legacy approaches to information
and whilst these are useful for some subsets of the discipline, we feel
strongly that the tools of the future will only come through public debate and
cooperation. Also, however, we need a variety of ideas and approaches so it
is valuable to see which ones 'evolve' as well as being planned. If a
subcommunity finds a useful de facto standard, that may well be worthy
of recognition as such; but it may also need careful tailoring so that it
interfaces well with other areas. This can only come through public activity.
Many tools developed by single organisations in a competitive situation are
not future-proof; i.e. they may not be interpretable in a few years'
time and the information may be effectively lost. This is particularly likely
for binary files, but may also happen when numbers or abbreviations are used.
Examples of this are common, and it would be presumptuous to guess which
products were still supported in the future.
Terms are often given different semantics or used with default units. It is
therefore important to agree with the rest of the community how a term is
to be interpreted, and ideally there should be algorithms to convert to related
terms.
Guidelines
We propose that those developing informatics standards commit to the following
guidelines:
General
- All developers of informatics tools commit to these guidelines; there
should be no reason to depart from them without warning and discussion.
- Organisations should be prepared, where appropriate, to enter the
standards creation process in a constructive manner.
- Chemical information exchanged in the public arena should follow these
guidelines.
- All documentation required to interpret the syntax and
semantics of the information in a document should be publicly accessible.
- Documents should be validatable against this documentation.
- Where possible, developers should make code available for reading and
writing documents in a given format.
- Where possible, distributors of information should use the chemical/*
MIME approach described below.
- New technology opens new ways of managing information, some of which
may challenge or undermine the guidelines here. All developers should
try to anticipate this by raising such issues with the community, and not
pre-empt the communal view by inappropriate use of technology.
Chemical/* MIME
A proposal for classifying and regulating the types of chemical document has
been submitted to the IETF. A number of existing file types were proposed
which have met with wide acceptance in the molecular community. Until the IETF
or other body ratifies the proposal, the following guidelines for the use
of MIME types are proposed:
- The number of types should be strictly limited. (Usually each type
requires specific software to read it, thus placing a burden on readers).
- MIME types should not be used or developed as a means of gaining a market
monopoly; it is required that the description of a document's format be made
public and reading/writing software is encouraged.
- New versions of exisiting types must not be introduced without prior
discusssion and acceptance. This includes clear versioning procedures
and detailed documentation of revisions.
- New subtypes will be discouraged if there is already a subtype which
is capable of carrying that information. chemical/* should not be
seen as simply a means of authenticating current legacy systems.
- The use of chemical/* should not be used as a means of adding
apparent authority to an organisation's project or system if this has not
already been discussed. The use of chemical/* in the public arena
in ways not sanctioned is deprecated.
- Developers are urged to use markup languages where possible (e.g. CIF,
ASN.1, CML) and to define the semantics of use (e.g. through glossaries of
terms). These definitions should be
public, and if possible equivalences to other definitions should be given.
Peter Murray-Rust February 27, 1996.