molecule[el.molecule]

A container for atoms, bonds and submolecules

molecule is a container for atoms, bonds and submolecules along with properties such as crystal and non-builtin properties. It should either contain molecule or *Array for atoms and bonds. A molecule can be empty (e.g. we just know its name, id, etc.)

"Molecule" need not represent a chemically meaningful molecule. It can contain atoms with bonds (as in the solid-sate) and it could simply carry a name (e.g. "taxol") without formal representation of the structure. It can contain "sub molecules", which are often discrete subcomponents (e.g. guest-host).

Molecule can contain a <list> element to contain data related to the molecule. Within this can be string/float/integer and other nested lists

example

<molecule id="dummyId">
  <atomArray>
    <atom id="a1" elementType="C" 
      hydrogenCount="0" x2="6.1964" y2="8.988"/>
    <atom id="a2" elementType="C" 
      hydrogenCount="0" x2="6.1964" y2="7.587"/>
    <atom id="a3" elementType="C" 
      hydrogenCount="2" x2="4.983" y2="6.887"/>
<!-- omitted -->
    <atom id="a28" elementType="C" 
      hydrogenCount="3" x2="15.777" y2="6.554"/>
    <atom id="a29" elementType="O" 
      hydrogenCount="0" x2="13.388" y2="6.188"/>
  </atomArray>
  <bondArray>
    <bond atomRefs2="a1 a2" order="1"/>
    <bond atomRefs2="a2 a3" order="1"/>
    <bond atomRefs2="a3 a4" order="1"/>
<!-- omitted -->
    <bond atomRefs2="a11 a15" order="1"/>
    <bond atomRefs2="a12 a18" order="1">
      <stereo>W</stereo>
    </bond>
    <bond atomRefs2="a2 a19" order="1">
      <stereo>W</stereo>
    </bond>
    <bond atomRefs2="a5 a20" order="2"/>
    <bond atomRefs2="a17 a21" order="1"/>
    <bond atomRefs2="a21 a22" order="1"/>
<!-- omitted -->
    <bond atomRefs2="a10 a9" order="1"/>
    <bond atomRefs2="a16 a29" order="2"/>
  </bondArray>
</molecule>

Content Model

(stm:metadataList*,formula?,name*,symmetry?,crystal?,(molecule*|(atomArray,bondArray?,electron*,length*,angle*,torsion*)),(stm:list|stm:scalar)*)

dictRef[att.dictRef]

A string referencing a dictionary, units, convention or other metadata.

The namespace is optional but recommended where possible

Note: this convention is only used within STMML and related languages; it is NOT a generic URI.

example

<list>
<!-- dictRef is of namespaceRefType -->
  <scalar dictRef="chem:mpt">123</scalar>  
<!-- error -->
  <scalar dictRef="mpt23">123</scalar>  
</list>

[xsd:string]

Pattern: [A-Za-z][A-Za-z0-9_]*(:[A-Za-z][A-Za-z0-9_]*)?

A reference to a dictionary entry.

Elements in data instances such as scalar may have a dictRef attribute to point to an entry in a dictionary. To avoid excessive use of (mutable) filenames and URIs we recommend a namespace prefix, mapped to a namespace URI in the normal manner. In this case, of course, the namespace URI must point to a real XML document containing entry elements and validated against STMML Schema.

Where there is concern about the dictionary becoming separated from the document the dictionary entries can be physically included as part of the data instance and the normal XPointer addressing mechanism can be used.

This attribute can also be used on dictionary elements to define the namespace prefix

example

<scalar dataType="xsd:float" title="surfaceArea" 
  dictRef="cmlPhys:surfArea" 
  xmlns:cmlPhys="http://www.xml-cml.org/dict/physical"
  units="units:cm2">50</scalar>

example

<stm:list xmlns:stm="http://www.xml-cml.org/schema/stmml">
  <stm:observation>
    <p>We observed <object count="3" dictRef="#p1"/> 
      constructing dwellings of different material</p>
  </stm:observation>
  <stm:entry id="p1" term="pig">
    <stm:definition>A domesticated animal.</stm:definition>
    <stm:description>Predators include wolves</stm:description>
    <stm:description class="scientificName">Sus scrofa</stm:description>
  </stm:entry>
</stm:list>

convention[att.convention]

A string referencing a dictionary, units, convention or other metadata.

The namespace is optional but recommended where possible

Note: this convention is only used within STMML and related languages; it is NOT a generic URI.

example

<list>
<!-- dictRef is of namespaceRefType -->
  <scalar dictRef="chem:mpt">123</scalar>  
<!-- error -->
  <scalar dictRef="mpt23">123</scalar>  
</list>

[xsd:string]

Pattern: [A-Za-z][A-Za-z0-9_]*(:[A-Za-z][A-Za-z0-9_]*)?

A reference to a convention

There is no controlled vocabulary for conventions, but the author must ensure that the semantics are openly available and that there are mechanisms for implementation. The convention is inherited by all the subelements, so that a convention for molecule would by default extend to its bond and atom children. This can be overwritten if necessary by an explicit convention.

It may be useful to create conventions with namespaces (e.g. iupac:name). Use of convention will normally require non-STMML semantics, and should be used with caution. We would expect that conventions prefixed with "ISO" would be useful, such as ISO8601 for dateTimes.

There is no default, but the conventions of STMML or the related language (e.g. CML) will be assumed.

example

<bond convention="fooChem" order="-5"
   xmlns:fooChem="http://www.fooChem/conventions"/>

title[att.title]

A title on an element.

No controlled value.

example

<action title="turn on heat" start="T09:00:00" convention="xsd"/>

id[att.id]

A unique ID for an element

This is not formally of type ID (an XML NAME which must start with a letter and contain only letters, digits and .-_:). It is recommended that IDs start with a letter, and contain no punctuation or whitespace. The function generate-id() in XSLT will generate semantically void unique IDs.

It is difficult to ensure uniqueness when documents are merged. We suggest namespacing IDs, perhaps using the containing elements as the base. Thus mol3:a1 could be a useful unique ID. However this is still experimental.

[xsd:string]

Pattern: [A-Za-z0-9_-]+(:[A-Za-z0-9_-]+)?

An attribute providing a unique ID for an element

ref[att.ref]

A reference to an existing element

A reference to an existing element in the document. The target of the ref attribute must exist. The test for validity will normally occur in the element's appinfo

Any DOM Node created from this element will normally be a reference to another Node, so that if the target node is modified a the dereferenced content is modified. At present there are no deep copy semantics hardcoded into the schema.

BASE: idType

A unique ID for an element

[xsd:string]

Pattern: [A-Za-z0-9_-]+(:[A-Za-z0-9_-]+)?

A reference to an element of given type

ref modifies an element into a reference to an existing element of that type within the document. This is similar to a pointer and it can be thought of a strongly typed hyperlink. It may also be used for "subclassing" or "overriding" elements.

example

<cml>
  <molecule id="m1">
    <atomArray>
      <atom elementType="N"/>
      <atom elementType="O"/>
    </atomArray>
  </molecule>
  <html:p>The action of <molecule ref="#m1"/> on cardiac muscle ...</html:p>
</cml>

formula[]

A concise representation for a molecular formula

This MUST adhere to a whitespaced syntax so that it is trivially machine-parsable. Each element is followed by its count, and the string is optionally ended by a formal charge. NO brackets or other nesting is allowed.

example

<stm:list xmlns:stm="http://www.xml-cml.org/schema/stmml">
  <formula id="methane" concise="C 1 H 4"/>
  <formula id="chloroacetate" concise="Cl 1 H 2 C 2 O 2 -1"/>
  <formula id="sodiumSulfate">
    <formula concise="H 2 O 1" count="10"/>
    <formula concise="Na 1 +1" count="2"/>
    <formula concise="S 1 O 4 -2"/>
  </formula>
</stm:list>

[xsd:string]

Pattern: \s*([A-Z][a-z]?\s+[1-9][0-9]*)(\s+[A-Z][a-z]?\s+[1-9][0-9]*)*(\s+[-|+]?[0-9]+)?\s*

The formula attribute should only be used for simple formulae (i.e. without brackets or other nesting for which the formula child should be used. The attribute might be used as a check on the child elements or for ease of representation.

count[]

The count for the molecule

No formal default but assumed to be 1. Fractional values are allowed to describe variable stoichiometry.

chirality[]

The chirality of the complete system

This is being actively investigated by a IUPAC committee (2002) so the convention is likely to change. No formaldefault

Allowed values

enantiomer
racemate
unknown
other

formalCharge[el.atom.formalCharge]

The formal charge on an atom

Used for electron-bookeeping. This has no relation to its calculated (fractional) charge.

example

<atomArray>
  <atom id="a1" elementType="N" formalCharge="+1"/>
  <atom id="a2" elementType="O" formalCharge="-1"/>
</atomArray>

[xsd:integer]

The formalCharge on the molecule

NOT the calculated charge or oxidation state. This attribute should be used when it is impossible or artificial to assign charges to each atom, as in coordination complexes. It is then required that all atom formalCharge attributes are omitted. No formal default, but assumed to be zero if omitted. It may become good practice to include it.

spinMultiplicity[el.atom.spinMultiplicity]

The spin multiplicity for the molecule

This attribute gives the spin multiplicity of the molecule and is independent of any atomic information. No default, and it may take any positive integer value (though values are normally between 1 and 5)

symmetryOriented[el.atom.symmetryOriented]

Is the molecule oriented to the symmetry

No formal default, but a molecule is assumed to be oriented according to any <symmetry> children. This is required for crystallographic data, but some systems for isolated molecules allow specification of arbitrary Cartesian or internal coordinates, which must be fitted or refined to a prescribed symmetry. In this case the attribute value is false.