ATOMS is used for describing either single atoms, or more commonly, a list of atoms in a 'molecule' or 'compound' when it contained within MOL.

ATOMS is the heart of MOL.DTD and (in MOL) represents an atom-centred description with optional bonds (BONDS). (This is perhaps driven by my background as an inorganic crystallographer, where bonds are a personal 'opinion'!) Elemental identity, atomic positions and spacegroup are often necessary and sufficient to describe what the substance is. Many theoretical chemists would agree that, with the addition of the total electron count, everything else is opinion.

Like the rest of CML, ATOMS does not dictate how a molecule is described, and an author can create whatever atomic (or bond) properties they wish. There is a set of BUILTINs to cover most common cases, but other can be added by using REL="glossary" and HREF= in the ATOMS attributes. The incentive to use BUILTINs is that they will be recognised by the postprocessing software, whilst for a glossary item the author will have to write code themselves.

ATOMS may have CONVENTION and DICTNAME at tributes and these can be used to resolve problems of convention (e.g. 'charge', 'valence'). It is assumed that the contained ARRAYs use the same convention unless overridden.

ATOMS and BONDS can be used to give the molecular formula (connectivity) by the use of attributes such as formal ligand count, number of attached hydrogen atoms, formal charge, etc. Where possible, however, we recommend that FORMULA is used since standardisation is likely to be clearer in that format. The current conventions (SMILES and MOL) could be expanded to include others.

ATOMS/BONDS may be difficult to relate to FORMULA. Where ATOMS represents coordinate data, this might relate to multiple copies of a molecule (as in crystallography where an asymmetric unit can contain several identical molecules and all the coordinates must be included so that the crystal structure can be recreated.) A related problem is where some of the atomic coordinates are not determined, a frequent occurrence in some techniques. Hydrogen atoms represent a particular problem - CML does not lay down rules as to how these are used.

ATOMS and BONDS are linked by the ATID attribute of XVAR within ATOMS, and the ATID1 or ATID2 of BONDS. This need not be an integer, and could be a construct such as CA15. If the tables are edited or modified it will be important to make sure that consistency is obtained and that ATODs are always unique.

The content model is simple: an optional description (XHTML), followed by a number of (column) arrays all of length equivalent to the number of atoms. Each ARRAY corresponds to an atomic attribute. The semantics of the attribute is given by one of two mechanisms:

The actual enumeration of the attributes are given in a file ../mol-arr-bui.ent and this is definitive, rather than what is written below (although hopefully they are in sync!). In many cases it is difficult to decide whether something is a number or ID or a type. The file contains:

Note that an XVAR can contain pointers to other objects, so that if you need (say) to have multipoles attached to atoms, they can be set up elsewhere (for example in an XLIST) and XVAR TYPE="ADDRESS" can be used to point to them.

Content Model

The generic content model. This is interpreted as:

Note that any ARRAY can have a CONVENTION attribute, so that different ways of holding information can be identified.



Tag Minimization

Parent Elements

Top Elements
All Elements

cml DTD