JUMBO Frequently asked Questions

Introduction

JUMBO is an object-oriented browser for documents and files in molecular sciences. It's based on Java and XML(SGML) so that it will also browse XML documents of any nature. It's very closely linked to Chemical Markup Language and its prime purpose is to help in the development of that approach.

The FAQ has been prepared (Feb 1997) for the V0.1 release of JUMBO. A previous version (Jan 1997) was mounted on the WWW, but this should be seen as superseding it.

Questions

Answers

What is JUMBO?
JUMBO is the 'Java Universal Molecular Browser for Objects'. (If you are not a molecular scientist, then the 'M' stands for Markup.

JUMBO is completely written in JAVA and so is effectively portable to any platform which supports Java.

JUMBO is UNIVERSAL in that it can, in principle, browse any molecular or scientific information. (It can even browse Shakepeare). To avoid being accused of selling snake-oil - I know there are no free lunches - I'll explain what this means below.

JUMBO is MOLECULAR because I have written specific input tools for many of the commonest molecular applications. There are also a wide range of processing tools for molecular information (2- and 3- D rendering, sequences, etc.). JUMBO is MARKUP, because it will read any XML file (eXtensible Markup Language) file, so this is where Shakespeare comes in.

JUMBO is a BROWSER. It reads and displays information. It also has a search capability (primitive in Feb97), and can transform certain types of information. It can output certain file types (e.g. XML/CML). It is not, however, a replacement for conventional browsers and will normally interoperate with them.

JUMBO supports OBJECTS. Everything inside JUMBO is an object and has associated methods, either default or supplied by subclasses. When XML develops a means of passing objects over the WWW (very soon), JUMBO will be able to download those objects.

What disciplines does it cover?
JUMBO is developed primarily to interact with CML documents and therfore supports the disciplines CML covers. CML has already been used to manage documents and information in: See the CML FAQ for more information.
JUMBO will also read any XML file and display it as a TOC. However, unless the file is mainly text-based, you will need to provide specific subclasses to make more sense of it.
How does JUMBO work?
Understanding JUMBO's insides will be very valuable in using it effectively. It will help if you are familiar with tools like the Mac interface and a graphical file browser (like Windows).
JUMBO reads information and stores it as an object. An object can contain smaller objects, and so on. For example a MOL can contain an ATOMSNode which itself contains ARRAYNodes in which the coordinates, charges, etc. are held. The information is organised as a Tree, with Nodes. We frequently refer to containing nodes as Parents or Ancestors and contained nodes as Children. A family tree is a useful analogy, but remember that a Node can only have one Parent.
Conventional software is often command-driven (e.g. 'convert coordinates from factional to orthogonal'.) Objects carry their own 'methods', and they apply them when requested. For example most objects have a display() method, which is customised for it. The method can be tailored so that it recognises features of the object (through a process() routine) and supplies appropriate methods. So a MOLNode works out whether it has 2-D or 3-D coordinates and the appropriate display() is used.
Complex objects can examine other objects they may be related to and use methods common to both. So SEQUENCE and FEATURE are separate objects but they are related and JUMBO can map one onto the other.
One attraction of OO software is that objects can carry their own Help routines and this provides a measure of context-dependent help. Therefore if you use the Help on a window it will normally refer to the type of Object. Similarly each object has an icon (MOL has a ball-and-stick) and this is a useful way of examining the structure of the Tree.
What do I need to run JUMBO?
JUMBO is written in Java and will run in any Java environment. However these can be confusing and here are some examples:
Do I have to know SGML/XML/CML?
You are most unlikely to need to know SGML or XML, unless you are actively developing CML. You only need to know CML if you are using it for transfer of information and you are involved in developing that process. If you are using JUMBO to browse CML or non-CML files, you don't need to know anything.
How can I create CML files/documents?
JUMBO will read in many current non-CML files and convert them internally to CML. If you are using the TOC they can be saved by dragging the CML node to the floppyDisk icon. It's also possible to save subtrees by dragging the appropriate subnode, and this is a useful way of dividing up complex files.
If you wish to write a Foo2CML converter, please contact me and I'll be delighted to help. If you wish your program to output CML, then I'll be happy to advise you.
Can I edit CML documents with JUMBO?
Yes! This is one of the first tools for editing molecular documents. In the TOC you can: (At present there is no attribute editor, but I'll make one).
The software guarantees that the resultant file will be well-formed, but not necessarily valid. (It does check whether a new child is INvalid, when it rejects the edit, but it doesn't yet check validity. I hope someone else will write that.) This facility is most likely to be used for constructing compound documents and is not yet robust. (It cannot, of course, check whether the result of editing is sensible!)
Does JUMBO understand aromaticity?
This is typical of a wide range of questions about the level of detail and the algorithmic support that JUMBO provides.
Molecular science has many ways of representing concepts, some of which map onto each other and some that conflict. JUMBO does not try to provide a single 'correct' view, nor does it attempt to translate between different represenations. For example, aromaticity can be defined by alternating single/double bonds (Kekulé); by 'aromatic' bonds (e.g. -5 in CCDC files) or by 'aromatic atoms' (e.g. SMILES uses lowercase as in 'c1ccccc1' for benzene.) These are not always reconcilable either at the algorithmic or the human level.
JUMBO does its best to display the information usefully. Ultimately this may mean translating some of these concepts into a JUMBO-centric approach. (Remember that CML does not specify how this type of information should be carried and allows individual freedom). It is unlikely that JUMBO, as a core tool, will convert from one representation to another. However JUMBO makes it much easier for developers to get handles on these components, so that you can extract molecules with a know convention and process them appropriately.
There is a difference between chemical perception and display. Two groups might both agree on the same definition of aromaticity, but one wish to display it in Kekulé form and the other with a central ring. This comes closer to the idea of stylesheets.
How does JUMBO use chemical/ MIME?
There are about 20-30 molecular filetypes in use which are regularly stamped with chemical/x- (e.g. chemical/x-pdb). These differ in their consistency of definition (and the availability of it!). If there is a well-defined format for a file, it is fairly straightforward to write a reader for it. I have done this for about 12 types, ranging from small molecules, through crystallography, to protein sequences and theoretical calculations. None of these are verified against a formal spec, but many work well in practice.
I am extremely keen to cooperate with anyone who is formally responsible for a molecular file type. For very simple ones (e.g. only atoms and bonds) it is not a huge amount of work to write a reader, but for large outputs (e.g. MOPAC) it is very substantial. [I have included partial implementations of some of these].
JUMBO must take a strict approach to chemical/ and, for example, chemical/x-pdb will only parse files which are valid against the PDB documentation. In this case (and maybe in others) I have relaxed this and provided a separate type, chemical/x-pdb-fuzzy. This attempts to read a PDB file and make whatever sense it can of it (normally only a few ATOM cards).
If there is an implementation of a chemical/ filetype and you receive a file from a server with this stamp, then you can configure JUMBO as a helper application in the normal way (you will have to have JDK at present, but I'm working at how it can be integrated into a browser).
How can I use my *.foo file with JUMBO?
If *.foo is one the chemical/* MIME list there is a good chance that a full or partial converter has been written. If not, one will need to be written I shall give example of how to do this soon, but the main task is to identify the information components within your *.foo file, such as molecules, scalar data, text, citations, arrays, tables, graphs, dates, URLs, etc. You must then write a parser that reads one of your files and extracts this information. For each of these there is a simple routine which allows you to poke the object into a CML object in JUMBO's memory.
At this stage you will need to think about the logical structure of your information. What data belongs to this molecule (e.g. a date)? Should all the annotations be in a separate section (e.g. an XLIST?). You will also find that you start creating your own DICTNAMEs for information and so you should draw up a glossary of all those terms used (if you have a user manual this should be in it anyway).
If you are in charge of the generation of *.foo files (e.g. they are output by your software or instrument) consider adding a CML option to the system. This is much easier than writing a parser and is not a lengthy process.
Note also that you don't have to covert every piece of information initially. In some cases it can be held as text (XVAR or XHTML) until you have decided what to do with it (an example is the REMARK cards in the CML version of PDB). But the more markup you add, the more valuable it will be to your readers.
I am not able (and I'm not the right person) to write these converters! However I will give active help to anyone interested in enabling JUMBO to read a ChemFOO file. Once it's done, JUMBO can output the result in CML so that any ChemFOO file can be converted into CML (though the reverse is not normally true unless the CML file was created directly from a ChemFOO file). If the ChemFOO file is simple (e.g. it contains a 'small molecule') it shouldn't be a major effort. If it's 100 pages of FORTRAN output from a simulation or theoretical calculation it's still straightforward but will need to be planned as a project (possibly with the OMF).
Can JUMBO draw 2-D molecular structures?
CML can hold 2-D molecular information in a variety of ways: JUMBO is able to use the 2-D coordinates to draw diagrams, and a connectionTable to 2-D diagram tool is under way. At present JUMBO cannot convert SMILES to a connection table (because I'm planning the architecture of the classes). Remember that JUMBO is not intended to duplicate systems which already exist. It's likely that the 2-D display will evolve to select atom and bond picking because these will be essential for markup, and it might evolve some simple editing tools for the same reason.
Can JUMBO draw 3-D molecular structures?
CML can hold 3-D molecular information in a variety of ways: JUMBO can use the first two to create single molecules but does not yet apply crystal or molecular symmetry. I shall probably develop a simple pcaking diagram generator since I need that to test CML, but it's unlikely to generate a crystallochemical unit, or calculate intermolecular contacts. Remember that JUMBO is not intended to duplicate systems which already exist. It's likely that JUMBO will evolve to support the simple interrogation of molecular components (e.g. ATOMS and BONDS can be selcted and used for creating further markup). I am not going to add rendering to JUMBO, but others are welcome to subclass it for that purpose.
Can I search CML documents with JUMBO?
Yes! JUMBO has an architecture which supports Structured Document queries. There is a standards language for this (SDQL) and it's likely that JUMBO and others XML applications will support it. At present there is a rather crude prototype and no very good way of viewing the ouptut, but that will change.
Can I run CML from Netscape/MSIE, etc?
There are several possibilities.
Browsers recognise the MIME type of a document and can be configured to launch an appropriate helper application. It's common for a browser to launch RasMol when it gets a file of type "chemical/x-pdb", and it would be simple to configure your browser to recognise "chemical/x-cml" and launch JUMBO or some other tool.
It would be possible to write (or convert) a JUMBO plugin. I don't intend to do this myself - offers?
Many browsers are java-enabled. This means that if the chemical/x-cml file comes from a server which also has the JUMBO *.class files, you can view the files automatically in your browser with no effort! I shall provide an example of this and I hope to get collaborators who provide other CML-viewable information. Because JUMBO can convert other file types, it can be used to view a wide range of molecular data files. For some of these - such as Quantum Chemical calculations - there are no current viewers.
The WWW is moving towards the use of SGML and I expect browsers to become more SGML/XML-aware. CML is ideally placed to take advantage of this.
What does JUMBO NOT do?
JUMBO will attempt to support all the requirements of CML, and therefore cannot do anything that CML can NOT do (such as parsable mathematics).
JUMBO cannot support some other aspects of CML at present (e.g. rendering hypertext and tables). These will come as soon as I either find time or find code I can bolt it. Other things that are not robust are internal hyperlinking. Some Elements (e.g. SEQUENCE) may not have all the features you would expect.
Are there any known bugs?
Zillions! (Actually rather more than that.)
Bugs come in various categories: There are many more, but that will do for starters.
How is JUMBO likely to develop?
JUMBO has been written primarily to develop CML and prototype applications. It also has an important role in developing the classes for each CML Element. (For example, if I add a FIGURE Element, there will need to be a FIGURENode to test the idea out.). These classes have both a display and a processing functionality. At present these are slightly intertwined, but I will probably disassemble them as afar as possible. For each class there is a process() and a display() method. The first deals with things like counting subelements (atoms, etc.) and converting them between different approaches (e.g. CT to SMILES). The second should be largely independent. My own interest is in the data structures rather than rendering, so if you want to draw attractive molecules, subclass the display() method of MOL.

JUMBO is offered as a tool for developing CML and related approaches, and I'm extremely keen that this is done collaboratively. I believe this is now possible:
Can I develop my own applications with JUMBO?
Yes. Theexciting thing about java is that you can develop your code without having to have teh source code of JUMBO. You simply subclass the appropriate routines. For example, if you have a CHEMFOO filetype and want to write a reader for it, you subclass ChemTree like this:
package mypackage;

public class ChemFOO extends ChemTree {
    public  ChemFOO(String filename, StringList lines) {
    ...mycode...
    }

    public void process() {
    ...more code...
    }
}
I will give some examples of how to do this. JUMBO will have an API that you can access and this will lead to rapid development of your own algorithms because you don't have to worry about input/output and other housekeeping.
Are there restrictions on the use of CML?
There is no charge for JUMBO, but it is NOT in the public domain. That means that you may not alter documents in this distribution, nor distribute them to third parties without permission. You may, of course, point them to this page, and this should be used as the definitive reference.
JUMBO consists of a set of Java classes and these may be freely used over the Internet. I intend that their distribution is managed by the Open Molecule Foundation and the intention is that they will be free, but not in the public domain. The classes may not be redistributed without permission but the OMF is actively looking at ways of doing this which will be beneficial to the community. If you wish to include the classes in a product, please contact me.
If you wish to mount the system on your server, there will be a distribution kit, which I hope will be free. The API for the classes will be published. You may therefore extend the classes by standard mechanisms without needing to have source code. This is one of the great benefits of Java and means that the community can rely on a single, stable, core on which they can build. If the extensions are widely valuable it may be possible to incorporate them in future versions.
There will be a community of committed developers who will have access to the source code. This is likely to be managed through the OMF.

Up to index
© Peter Murray-Rust, 1996, 1997