Chemistry and the WWW

Henry S Rzepa, Benjamin J Whitaker, and Mark J Winter

Department of Chemistry, Imperial college, London, SW7 2AY, UK
School of Chemistry, University of Leeds, Leeds LS2 9JT, UK
Department of Chemistry, The University of Sheffield, Sheffield S3 7HF, UK


The World Wide Web Internet system offers great potential to chemists for both the dissemination of research results and for teaching.


The purpose of this article is to acquaint chemists with some basic features of World Wide Web (WWW), to instil the notion that there is something in the WWW for the chemical community and to encourage participation. It is not in any way a definitive description of the WWW.

Hypertext network

By now many will be aware of the existence of a computer network called the Internet, and perhaps to some extent of the World Wide Web (WWW). The WWW was first conceived at the European Laboratory for Particle Physics (CERN) in 1989 and first implemented in late 1990.It is all about accessing information on remote computers. Until relatively recently, accessing information on other InterNet-linked computers has been perceived as only for wireheads typing out bizarre and cryptic textual incantations on a keyboard throughout the night until the early hours. If that was ever true, it is at least no longer completely true.

Early mechanisms for transferring files from a remote computers involved a method called file transfer protocol (FTP) whose operation was not easy for most people. A further disadvantage is that text file contents cannot be inspected without completely transferring it to the users own computer. Another method of file access called GOPHER is more sophisticated and allows the user to read text files, at least, during transfer, as well as allowing the transfer of graphics files, application files, and formatted text files. These days, implementations of FTP and GOPHER on sophisticated micro computers such as the Macintosh using programs such as Fetch, Anarchie, and Turbogopher is superb, with these programs taking care of all the difficult commands for the user.

At the heart of the WWW system is another file transfer method called HTTP (Hypertext Transfer Protocol) At its simplest, the WWW system is a 'web' of linked 'hypertext' documents. By hypertext, here we mean that clicking the mouse on a hilighted piece of text contained in a document being displayed on a computer screen causes the computer to retrieve and display some new document. Its beauty is its interface. Once the user's computer is linked in some way to the Internet (this is likely to involve the novice seeking help), the user can make very effective use of the WWW system with little more knowledge of computers than how to manipulate a mouse.

The new document might reside on a completely different computer to the first document. This basic concept of hypertext is now extended so that the WWW has evolved into an on-line hypermedia system. Hypermedia documents are hypertext documents that contain, in addition to text, embedded graphics, movie clips, and sound clips.

Servers and browsers

In order to read documents on the WWW, the user's computer must be connected to the InterNet. In many institutions such connections will already exist, while the home user may have to resort to a modem connection. The user also requires a computer program called a browser in order to view the WWW documents. Good examples of such programs are NCSA Mosaic and Netscape Navigator. These are available are available for many hardware systems (such as Macintosh, UNIX Workstations, or PC). For educational users at least, some of these programs possess the endearing quality in that, while copyrighted, a license to use them is often free. In addition to coping with hypertext documents, Many of these programs are also capable of transferring files by FTP, GOPHER, (and other important methods not discussed here) using the same 'click-and-get' interface.

The documents which the user 'reads' are provided, or 'served', by software mounted on remote computers called 'server' programs. These programs are may be mounted on any of several hardware platforms such as UNIX workstations, Macintoshes, or even PCs. There is no need for the browser program to reside on a similar piece of hardware to the server program. This is a key feature - the transparency of the interface between different pieces of hardware.

Click and go

A document being read by a user will normally contain one or more 'hotspots' (usually colour coded) which when the mouse cursor is clicked on that spot causes something to happen. Typically, the user is taken automatically to a different document. This is the basic concept of 'hypertext' and will be familiar to those who have used 'HyperCard' or 'ToolBook' on Macs and PCs. The key feature of a WWW hypertext document is its linking via these hotspots to other documents which might well be written by completely different authors in a different country. In a sense, hotspot linking in this fashion is rather like having instant access to another document referenced in a footnote of the first document.

There are other possibilities, however. Perhaps clicking on the hotspot causes a file to be transferred (downloaded) automatically to the user's computer. More advanced facilities allow the user to search a database for documents containing specific words chosen by the user. The user is required to fill in boxes in an on-line 'form' with a short piece of text and a remote computer program does the rest. Even when searching databases on another continent, the search normally only takes a few seconds. The user need not need to know how the search works, and may not even know in which country the search computer is located.

It is not uncommon to read suggestions that in some way on-line hypertext books will replace printed books. This seems somewhat unlikely as a near-future scenario, simply on grounds of convenience. However, on-line hypertext documents do have some advantages over the printed word. Hypertext documents contain text, but can also contain embedded graphics (such as reaction schemes), movie clips (perhaps for animating molecular vibrations), and even sound clips (good for commentaries). If nothing else, these embellishments can make the document more interesting and hopefully informative. Hypertext documents containing these features are referred to as hypermedia documents. At the user's choice, the graphics are displayed automatically. The sounds and movie clips are activated with a single mouse click on an appropriate hotspot. Display and utilization of these features is expensive in terms of transfer times since sound and graphic files are often large.

HTML

No programming experience is required in order to construct documents to be placed on a WWW server. Documents on the WWW are written in a 'markup language' called HTML (Hypertext Markup Language). These documents are plain text files which any word or text processor is capable of writing to disk. The hotspots and various formatting options such as headings or emphasized text are special plain text strings (called tags or elements) contained between <angle> brackets within the document. A number of on-line tutorials and guides for writing HTML are available

For instance, the start of a major heading (heading level 1) is signalled by <H1> and terminated by </H1>. A short example of HTML text is given in Fig. 1. Excellent on-line files are available which give advice on preparation of HTML files. The tags are recognized by the browser program when it comes across them, and dealt with in an appropriate fashion. So, headings might appear larger and bolder than normal body test, for instance. The tags are not displayed to the user, they are instructions to the browser program on how to display the file.

Figure 1. How a short segment of HTML is displayed upon the screen

The publisher of a book has complete control over the appearance of the publication, but this is not the case for a WWW publication. This feature of the WWW takes a little getting used to. While the text content is fixed, the browser, not the publisher, has control over the appearance of the document in terms of, for instance, the text font, font sizes, some colours, and heading formatting.

URL

One of the most important constructions in HTML is the anchor tag - a text string which defines a hotspot and containing information referred to as a ' URL'. The URL (Uniform Resource Locator) specifies uniquely an object (such as another file) on the Internet. In effect, the URL is the address of a document on the Internet. The form of a URL is shown in Fig. 2.

Figure 2. The structure of a typical URL

The part of the URL before the colon (HTTP in this case) specifies an access method or protocol. The part of the URL after the two slashes and before the first single slash is the address of the machine on which the target document is held. The remaining part is the directory address of the file on that machine.

In the case of this URL, the browser would retrieve and display the 'home page' of the Department of Chemistry at the University of Sheffield. The URLs ' gopher://acsinfo.acs.org/1' and ' gopher://jchemed.chem.wisc.edu/1' represent the gopher addresses of the American Chemical Society gopher server and the Journal of Chemical Education gopher server. While the documents at these addresses can be read by WWW browser programs, gopher documents are not hypertext documents. The URL ' telnet://bids.ac.uk/' activates an interactive TELNET session to the BIDS system at Bath. The URL ' ftp://ftp.shef.ac.uk/pub/uni/academic/A-C/chem/' gives a list of files available by FTP at the Sheffield Macintosh Archive of Chemistry Software.

HTML version 3

The markup language HTML is an evolving standard. It seems that the next version will be version 3. Standards for it are not yet fixed, making the task of browser program programmers more difficult. Currently, most browsers support only some features of HTML version 3. The chemist has always placed great demands upon conventional typesetters, requiring a variety of special and Greek characters, not to mention sub- and superscripts and more exotic symbols. Standards have not been agreed for these requirements as yet, and most are not implemented. However some browser programs such as NCSA Mosaic do at least support sub- and superscripts. At this stage, only a limited number of Greek characters are supported by browser programs, while the implementation of equations is still at a discussion stage (see Box 1). It is only a question of time before full implementation of these features.

Box 1 is here

Fortunately, it is not always necessary to retype existing documents in order to create HTML files. There are many utilities that are capable of converting already-existing but properly constructed word-processor files into the required HTML format. For instance, a number of word processors such as Microsoft Word read and write a format called RTF (rich text format). Programs (called filters) such as rtftoftml convert, as the name suggests, such files into HTML format. Filters also exist to convert LATEX files to html.

Chemistry on the WWW

So, what is there in all this for the chemist. The possibilities for chemists are numerous, profound, and barely perceived. Currently, well over 200 Departments of Chemistry around the world have some kind of WWW site and are listed at http://www.shef.ac.uk/uni/academic/A-C/chem/chemistry-www-sites.html. Most sites are in the USA, England, and Germany. At least 25 chemistry departments in older universities in the UK have their own sites. Arguably, most of these were developed by and are maintained by enthusiastic individuals as a side-line to their normal tasks of research and teaching. Most sites are relatively simple and typically advertise, often elegantly, details of undergraduate courses, postgraduate opportunities, and academic staff research interests. This is clearly useful to both the reader and the department providing the information. However, there are some innovative and interesting chemical uses of the WWW.

A natural use of hypertext is in teaching. There are a number of preliminary efforts directed in this direction around the world, for instance at Duke University, the University of Leeds, ' The Virtual Classroom' at Rensselaer Polytechnic Institute, and, perhaps in particular, Virginia Tech. A useful list of resources is maintained at: http://www-hpcc.astro.washington.edu/scied/science.html.

The University of Sheffield's contribution to chemistry on the WWW is WebElements - an evolving periodic table database developed in Sheffield and now 'mirrored' at nine other sites (three in the USA, two in Germany, one each in Austria, Brazil, Australia, and China) around the world. The WebElementshome page presents a periodic table (Fig. 3) which when an element is clicked upon, gives the user information on that element.

Figure 3. The Periodic Table on the WWW

The user can also view attractive and informative graphical representations of the data (Figures 4 and 5). These representations were created originally using MacElements, a periodic table database program running on a Macintosh, but can be viewed on any other hardware system. While not yet implemented, one can envisage a situation in the near future where such graphical representations are created on-line (for a set of elements chosen by the browser) at the server computer for transmission to the browser. WebElements also illustrates the concept of clickable graphics. When the user is presented with a periodic table such as the representation in Figures 4 and 5, clicking on an element will take the user to a file containing data for that element. Clickable graphics also offer interesting and interactive ways to unfold the complexities of reaction sequneces (Box 2).

Figure 4. Trend in covalent radius as displayed by WebElements on the WWW

Figure 5. Trend in melting point as displayed by WebElements on the WWW

Box 2 is here

WebElements already offers some other interactive features, currently two simple on-line calculation services - isotope patterns and element percentages. The user fills in a simple on-screen box with a chemical formula and the browser program then requests that the server program executes the calculation. In turn, the server program requests a slave program (in this case a component of MacElements, running on a Macintosh) to execute the calculation and to return the result via an automatically generated HTML document (Fig. 6). All the work is hidden from the user. These calculation services were developed originally developed as proof-of-concept devices and can clearly be extended in scope. Figure 6 also demonstrates NCSA Mosaic's support for subscript characters.

Figure 6. An isotope pattern calculated over the WWW

A number of sites, (for instance Duke University, Virginia Tech, the University of California at Berkeley, and the Edison project at Columbia University) particularly in the USA are using the WWW as a shop window to chemistry multimedia projects which once obtained by the user are designed to run on a local machine.

Chemistry is a very visual subject. A WWW site called the Chemist's Art Gallery at http://www.csc.fi/lul/chem/graphics.html in Finland gives hypertext links to many examples of visualizations in chemistry from various groups around the world. One desirable aim of the chemist would be the transfer of small files containing molecule data (such as coordinates) so that they can be processed and manipulated on the user's machine. This is part of the aim of the chemical MIME project based at the Departments of Chemistry at the University of Leeds and Imperial College - a mechanism (boxes 3 and 4) that has been proposed to enable the intelligent handling of molecular information using 'helper' programs.

Box 3 is here

Box 4 is here

The WWW makes an ideal interface for providing gathered information to interested individuals on given topics. As examples, everything you ever wished to know about crystallography is to be found at http://www.unige.ch/crystal/crystal_index.html, the WWW Virtual Library: Crystallography site in Switzerland. The University of York's Department of Chemistry NMR service maintains a useful WWW site. Some of the information is special to local users but external users will find a good list of NMR-related software and information.

What next?

The only way to discover more about the WWW is to browse the WWW for a while. One useful and sophisticated feature of the WWW is the ability to search the WWW for documents whose names or content contain specific text. The BBC maintains a useful site at http://www.bbcnc.org.uk/babbage/iap.html or the University of Sheffield's Department of Information Studies provide good interfaces to and explanations of these 'search engines'. Start off by entering the single word 'chemistry' as a target piece of text. Browsing around the WWW can be as addictive as browsing around any good book shop, and perhaps far more time consuming.

This article is on the WWW

This article is available as a WWW document on-line with the URL: http://www.shef.ac.uk/uni/academic/A-C/chem/www-publications/chem-in-brit-95.html. You are encouraged to read the document on-line. The on-line version contains active links embedded within the text whose positions in thisarticle are indicated as underlined text. It also contains a dynamic appendix consisting of a list of 'highlights' on the WWW .to which will be added innovative chemistry links as the author becomes aware of their existence.

Readers wishing to obtain computer software or advice related to the WWW are requested respectfully to consult their institution's computer advisory service and not the authors of this article. If you already have access to the WWW, you should have a look at the WWW FAQ (frequently asked questions) at http://sunsite.unc.edu/boutell/faq/www_faq.html and at the 'YAHOO' site at http://akebono.stanford.edu/yahoo/Computers/World_Wide_Web/.


References

You are the first visitor to this page.