Biomolecular Sequence. This is intended to cover only those molecules where the chemical identity is an important aspect, and is not intended to intrude into genome structure, etc. It also covers only 'simple' types of sequence (PROTein, DNA, RNA, CARBohydrate). CML will not (at present) provide a comprehensive list of monomers and there is a very limited support for covalently modified molecules, although this will be a major role for CML. (The MOL TYPE=FRAGMENT may be used to describe small molecules for attachment to proteins. although at present this can only be done if the atoms are explicit as in PDB).
In general, therfore, SEQUENCE should only be used for 'normal' proteins, small stretches of DNA or RNA without 'unusual' components, and carbohydrates which can be represented by a simple linear text string. It is unsuitable for cyclic molecules, modified bases, unusual aminoacids, branched saccharides, etc. The chain termination is also unlikely to be well defined (e.g. monophosphate?, acetylated N-terminus?). Covalent modifications may be described textually (e.g. 'glycosylated').
SEQUENCE supports CONVENTION/DICTNAME which should allow precise management of macromolecular data entries.
There is a BUILTIN=STRAND option for XVAR, which could be used as follows: