ATOMS
ATOMS is used for describing either single atoms, or more commonly, a
list of atoms in a 'molecule' or 'compound' when it contained within
MOL.
ATOMS is the heart of MOL.DTD and (in MOL) represents an atom-centred
description with optional bonds (BONDS). (This is perhaps driven by my
background as an inorganic crystallographer, where bonds are a personal
'opinion'!)
Elemental identity, atomic
positions and spacegroup are often necessary and sufficient to describe
what the substance is. Many theoretical chemists would agree that,
with the addition of the total electron count, everything else is opinion.
Like the rest of CML, ATOMS does not dictate how a molecule is described,
and an author can create whatever atomic (or bond) properties they wish.
There is a set of BUILTINs to cover most common cases, but other can be
added by using REL="glossary" and HREF= in the
ATOMS attributes. The incentive to use BUILTINs is that they will be
recognised by the postprocessing software, whilst for a glossary item
the author will have to write code themselves.
ATOMS may have CONVENTION and DICTNAME at tributes and these can be used
to resolve problems of convention (e.g. 'charge', 'valence'). It is assumed
that the contained ARRAYs use the same convention unless overridden.
ATOMS and BONDS can be used to give the molecular formula
(connectivity) by the use of attributes such as formal ligand count, number
of attached hydrogen atoms, formal charge, etc. Where possible, however,
we recommend that FORMULA is used since standardisation is likely to
be clearer in that format. The current conventions (SMILES and MOL) could be
expanded to include others.
ATOMS/BONDS may be difficult to relate to FORMULA. Where ATOMS
represents coordinate data, this might relate to multiple copies of a
molecule (as in crystallography where an asymmetric unit can contain
several identical molecules and all the coordinates must be included so that
the crystal structure can be recreated.) A related problem is where some of
the atomic coordinates are not determined, a frequent occurrence in some
techniques. Hydrogen atoms represent a particular problem - CML does not
lay down rules as to how these are used.
ATOMS and BONDS are linked by the ATID attribute of XVAR within ATOMS,
and the ATID1 or ATID2 of BONDS.
This need not be an
integer, and could be a construct such as CA15. If the tables are edited or
modified it will be important to make sure that consistency is obtained and
that ATODs are always unique.
The content model is simple: an optional description (XHTML), followed
by a number of (column) arrays all of length equivalent to the number of atoms.
Each ARRAY corresponds to an atomic attribute. The semantics of the
attribute is given by one of two mechanisms:
- hardcoded: A number of key attributes describe
what an molecule is rather than our opinion or calculations.
- links to glossaries (use of HREF with REL='glossary').
The actual enumeration of the attributes are given in a file
../mol-arr-bui.ent
and this is definitive, rather than what is written below (although hopefully
they are in sync!). In many cases it is difficult to decide whether something
is a number or ID or a type. The file contains:
- ATOMNO. (optional) serial number. An atom must have some way of being identified uniquely
in a CML document. This is used for addressing from other parts of the
document, and required in certain fields (e.g. ATNO1, etc.). There are two
main ways of doing this: serial number, and IDs. CML caters for both.
An atom may be given a serial number which must be a positive unique integer,
but the atoms need not be ordered. If ATOMNO is NOT given, the
atoms are assumed to be numbered from 1...NATOMS in their occurrence in
the ARRAY container. This is potentially fragile, however, and it's best
to include explicit ATOMNOs.
- ATID (optional) unique id. (See ATOMNO).
ATID has no implied semantics and could be
C123A (e.g. CCDC), GLU13CA (e.g. PDB), etc. ATID may be used in a variety
of places (such as ATID1, PARIDS, ZMATIDS, etc.) and the processing software
should be able to make these crossreferences. ATIDs may not have internal
whitespace.
- ELSYM ELEMENT symbol. The element symbol is very important
and takes precedence over other methods of specifying the element (such as
number). It MUST relate to a standard table of elements and is 2-letters.
Additionally allowed elements are: D, T (hydrogen); * (any atom); ? (unknown);
DD (dummy); EP (electron pair); E (electron).
- ELNO ELEMENT (atomic) number. The atomic number is
subordinate to the element symbol, and cannot deal with dummy, etc.
- X2 2-D X-coordinate. The X-coordinate of an atom in a
conventional chemical structure diagram. This is in arbitrary units and
will have no relation to the 3-D coordinates.
- Y2 2-D Y-coordinate. The corresponding Y-coordinate.
- X3 3-D X-coordinate (Cartesian, A). The 3-dimensional
Cartesian coordinate of the atom in Angstrom units. Note that without an
orthogonalisation matrix it is normally impossible to recreate Fractional
coordinates from Cartesian ones, where this is meaningful.
- Y3 3-D Y-coordinate (Cartesian, A).
The corresponding Y-coordinate.
- Z3 3-D Z-coordinate (Cartesian, A).
The corresponding Z-coordinate.
- XF 3-D X-coordinate (Fractional).
The X-coordinate of an atom in fractions of the corresponding unit cell length.
Fractional coordinates only have meaning for a molecule located in a
unit cell (CRYST). They are required if the symmetry opertions of the
unit cell are to be applied to the molecule. Cartesian coordinates can
be obtained from Fractional with an orthogonalisation matrix, but there
are several conventions and you you state which one is used.
- YF 3-D Y-coordinate (Fractional).
The corresponding Y-coordinate.
- ZF 3-D Z-coordinate (Fractional).
The corresponding Z-coordinate.
- ZL Z-matrix length. The molecular geometry can be
represented by internal coordinates (bond lengths, valence angles and torsional
angles.) Note that these do not have to involve atoms bonded in the
conventional way. Each atom requires the ATOMNOs or ATIDs of three other
atoms (Q1, Q2 and Q3). The position of the current atom is such that
length ATID-Q3 is ZL, angle ATID-Q3-Q2 is ZA and torsion ATID-Q3-Q2-Q1 is
ZT. Some atoms may require some or all of Q1, Q2 or Q3 to be dummy atoms.
- ZA Z-matrix angle. See above.
- ZT Z-matrix torsion. See above.
- ZMATNOS Three ATOMNOs defining the coordinates (see ZL) as
a space-separated string (e.g. "3 5 12").
- ZMATIDS Three ATIDs defining the coordinates (see ZL) as
a space-separated string (e.g. "C3 N5A H12'").
- DISORDDisorder code. Application dependent at present.
- ATTYPatom type. Atom type is a subjective concept. At
present this is used for PDB atom types such as
"SG" "CG1" etc.
- CHAINChain ID. Application dependent at present.
- RESIDResidue ID. Application dependent at present.
- RESNAMResidue Name. Application dependent at present.
- RESTYPResidue Type. Application dependent at present.
- OCCOccupancy. At present this is application dependent.
(There is often confusion between atoms which are not at full occupancy, and
atoms on symmetry elements. The present value is the value of the
occupancy after any symmetry elements have been applied.)
- TOTL Total number of ligands. When ATOMS is being used to
describe connectivity, the formal number of ligands may be useful. This
is what might appear in a chemical structure diagram and may bear no relation
to the proximity of atoms in 3-Dimensional space.
It is often conventional to split the ligands into hydrogen atoms and others
because many chemical structure diagrams and many connection tables are
hydrogen-suppressed. Note that bridging hydrogens (as in electron-deficient
compounds) and isotopically substituted hydrogen atoms may need explicit
inclusion here.
- NONH number of NON-H ligands. See above.
- NUMH TERMINAL hydrogen count. See above.
- PARITYATOM parity (-1,0,+1). We strongly recommend that
stereochemistry and chirality are approached through the use of chiral volumes
rather than descriptors such as CIP or annotations to a 2-Dimensional diagram.
For a chiral volume, 4 atoms must be described, and it is most common that
these represent 4 ligands to a central atom such as C. However some or all of
the atoms (including the central one) may be dummy atoms as, for example, in
the biphenyls where a dummy atom could be placed halfway between the benzene
rings. Since only the sign of the volume is required, accurate placement of
dummy atoms is unimportant.
The chiral volume of a tetrahdron with 4 vertices at X1,
X2, X3, X4, is given by the determinant:
|1 1 1 1 |
|x1 x2 x3 x4| /6
|y1 y2 y3 y4|
|z1 z2 z3 z4|
The four atoms representing the corners of the tetrahedron (PID1-PID4)
must be specified. For atoms without described parity, these fields
should be NULL.
- PARNOS3 or 4 ATOMNOs defining the 'ligands'. See above.
- PARIDS3 or 4 ATIDs defining the 'ligands'. See above.
(Either PARNOS or PARIDS must be given.)
- FORMCHARGE FORMAL atom charge. The formal (integral)
charge on the atom,
used only to determine the chemical identity of the molecule. The sum of the
formal charges should represent the charge on the molecule.
- ISOTOPE Isotope number. The isotope number of the element.
This will normally be an integer, rather than an accurate atomic mass.
- C13 C13 chemical shift. Under review.
- PDBTYPE Under review (?obsolete).
- SYMOP Under review.
Note that an XVAR can contain pointers to other objects, so that if you need
(say) to have multipoles attached to atoms, they can be set up elsewhere
(for example in an XLIST) and XVAR TYPE="ADDRESS" can be used to point to them.
Content Model
The generic content model. This is interpreted as:
- An optional XHTML description.
- A number of equal length ARRAYs holding atomic properties (see above).
- Optional/repeatable XVAR/ARRAY.XLIST for additional content qualifying
the atomic information. Examples
might be:
- One or more ARRAYs describing the translation or rotation that has been
or is to be applied to this set of coordinates.
Note that ATOMS can be used for single atoms with contained XVARto hold their
properties.
Note that any ARRAY can have a CONVENTION attribute, so that different
ways of holding information can be identified.
Content
- array -- A very flexible matrix/array/geometry container.
- xhtml -- A hypertext container for use in TecML and CML.
- xlist -- A very flexible generic list/tree/table container.
- xvar -- A generic, flexible, container for scalar information.
ATTRIBUTES
CONTENT DECLARATION
- Tag Minimization
-
Open Tag: REQUIRED
Close Tag: REQUIRED
Parent Elements
- cml -- A toplevel DTD encompassing HTML 2.0, TecML and MOL.
- mol -- Toplevel container for molecular information.
Top Elements
All Elements
Tree
cml DTD