Molecular Hyperglossaries

Glossaries are very important in CML as a way of resolving the semantics of objects, such as XVAR. Ideally every marked-up object should either be hardcoded into CML or point to an entry in a glossary. As CML develops these glossaries can contain not only textual descriptions, but also data (e.g. molecular coordinates or physical quantities) and methods. The first steps in this process are the use of scientific Units in a glossary.

Several molecular glossaries have been or are being marked up and the latest position can be found in the Virtual Hyperglossary Project. Here we shall give a small number of simple examples.

Crystallographic CIF files

The International Union of Crystallography (IUCr) has sponsored the development of a set of dictionaries (a combination of glossaries and data dictionaries). Here is a demo of the Core dictionary (1993 version). The glossary consists of about 240 terms organised into categories, and a short Java/CML program has parsed this into a table of contents (ciftoc.cml). The main body of this is a list of terms arranged under categories (XLIST is used to contain each group). However the terms are not part of the document, but are separate as in a typical entry: ( atom_site_aniso_type_symbol.cml). Looking at this you will see that its components are XVARs, with an XHTML for the definition, and some ADMIN nodes. The key components are the ID and TERM (held in XVARs) and the definition (held in XHTML). The owner of the entry ("sourceIdentifier") is held in an ADMIN node.

The TOC contains a list of pointers to these entries, of the form:

<XVAR TYPE="URL" MIME="chemical/x-cml" TITLE="someTitle">someFilename</XVAR>
In CML TYPE="URL" implies that the XVAR contains pointer information as its content, and that the TYPE of the information is given by the MIME attribute (i.e. the target is a CML file). (The target can be any file type, but the application program may not always be able to process it! This is a much safer method than relying on suffixes).

This is resolved by JUMBO in the TOC where some of the terms have been unhidden.

Clicking on a term brings up the corresponding entry (TERMENTRY) as shown. (CML has reserved TERMENTRY as an ELEMENT because of the importance of terminology in resolving semantics). TERMENTRY can contain enough components for any terminological application, but it shouldn't be used as an encyclopedia or database component.

Molecules in Glossaries - PPS

Since the target of a CML/URL pointer can be any type of file, molecular information can be contained in glossaries. In the Principles of Protein Structure virtual course, we developed a hyperglossary for protein structure terms and I have now converted this to CML. Part of the TOC is shown with the entry for tryptophan (tryptophan.ent):

So far this is similar to the CIF glossary, but the pointers are now to a wider range of objects. Some are to external URLs (e.g. the Klotho database), and two are to local objects (trp.pdb and trp.mol). Since these are given specific chemical/ MIME types it is possible to import them and to use JUMBO to view them as hyperactive molecules.

By adding TYPEs and other constraints to hyperlinks, CML makes it possible to represent complex information architectures, and to make it easier to develop and test the software required.

Back to index
© Peter Murray-Rust, 1996, 1997