The CLIC Consortium. A Flagship Chemistry Electronic Journal

Annual Report

May 1995 - July 1996

H. S. Rzepa (Project Director), D. James (Project Manager), J. M. Goodman (Cambridge), B. J. Whitaker (Leeds).


The consortium operates between the Royal Society of Chemistry, and the chemistry departments at the Universities of Leeds and Cambridge and Imperial College. Each partner has a well defined role in the project. In this report we will present an overview of the activities and progress of the project, with a summary of the contributions from the individual sites, including an analysis by each site of what they have learnt from the process of implementation, and interim evaluation results. The report will conclude with a statement of our objectives for the next phase of the project.

1. Activities and Progress.

The first stage of the project, which includes the first two designated deliverable milestones in our original proposal, had the following major activities.

1.1 The project started with the appointment of Omer Casher (Imperial), Christopher Hildyard (Leeds) and David Riddick (Cambridge) to the project. A dedicated SGI Unix server, together with disk sub-systems, and ATM communications technologies were purchased for the project. The server is currently located at Imperial College as: http://chemcomm.clic.ac.uk/. Six computer systems for libraries, and licenses for various software products were also purchased. These purchases correspond to those indicated in the original proposal.

1.2 The initial activity was to start establishing a distributed and scaleable infra-structure for handling an electronic journal on a regular basis. This has involved developing SGML technologies for processing material derived from the existing printed journal and converting it in as flexible manner as possible to HTML based content for distribution through a Web server. This component of the project is led by Leeds University.

1.3 The second theme was the development of server technologies for creating, indexing, hyper linking and maintaining a scaleable document database. This was done using the Hyperwave server. In addition to these generic technologies, we have started a development program for delivering active molecular content as part of the journal, based on the chemical MIME standards we have promoted, and the technologies of VRML and Java. This component of the project is led by Imperial College.

1.4 The third theme was the development of macros, style sheets, software and other training materials for authors at the RSC Cambridge offices, and evaluation of various commercial authoring packages already available. This component also involved liaising with commercial chemistry software vendors for software that would be of use in the journal, and marketing products resulting from activities of the CLIC consortium such as the ECTOC CD-ROM. This component of the project is led by Cambridge University and the Royal Society of Chemistry in Cambridge.

1.5 All four sites have been involved in raising awareness in the community by means of three focusing activities; chemistry electronic conferences, e-mail discussion lists, and one day discussion days for the chemical community (the Chemistry Webmaster meetings). These activities also form one part of our structured evaluation program of user response to the delivery of chemical information in this form.

1.6 We have established a centre in six libraries in the three university sites devoted to the CLIC project, as part of the structured evaluation approach to the project. The structured evaluation program has been developed by the Computer based Learning Unit at Leeds University.

1.7 Members of the CLIC consortium are involved in an initiative to form what we call the "Open Molecule Foundation" (OMF). This will be an independently funded organisation. The intention of OMF is to promote the development of methods for enhancing the inter-operability of molecular information. Projects associated with OMF include the development of CML (Chemical Markup Language, a formal SGML DTD specifically for chemical content), the principal author of which is Peter Murray-Rust, who also attends CLIC project meetings. OMF also promotes the development of object oriented class libraries and programming languages such as Java in the area of molecular science. We expect this activity will directly result in software that can be used in conjunction with the CLIC project.

Outputs and Deliverables.

1.8 To use the infra-structure noted in 1.1 - 1.7 to deliver graphically enhanced contents pages for the journal.

1.9 To deliver complete issues of Chemical Communications using on-the-fly conversion to HTML from an SGML database. Issues 3 and 4 have been converted. Items 1.8 and 1.9 constitute two of the formal milestones for our project. These deliverables are available via the project server: http://chemcomm.clic.ac.uk/ and via the project description pages on http://www.ch.ic.ac.uk/clic/

1.10. We have converted one "keynote" article that has been enhanced with active chemical content, and a further five articles have been identified for conversion in the near future. This activity is part of our development program for years two and three of the CLIC project. These are available on http://chemcomm.clic.ac.uk/

1.11 ECTOC-1 (Electronic Conference on Trends in Organic Chemistry) was organised as an awareness raising event in July 1995, and was edited jointly by Imperial College and Cambridge University. This achieved international prominence, and appears to be regarded as a seminal event in electronic conferencing. Of the 75 articles from 13 countries submitted to the conference, 66 were formally abstracted by Chemical Abstracts, and the CD-ROM produced of the proceedings is now on sale via the Royal Society of Chemistry. A second conference ECHET96 (Electronic Conference on Heterocyclic Chemistry) attracted widespread support from the USA, Japan and Europe with 120 submitted articles, including 12 keynote articles from internationally recognised chemists. Both conferences are considered outstanding successes in generally raising awareness.

1.12 Two "Webmasters" days were organised, in November 1995 and June 1996 and a third is planned for December 1996. Each attracted around 80 attendees and are regarded as successful events. The CLIC project was presented as a talk on each occasion, and there was ample opportunity for demonstrations and informal interaction. We regard the people that attended as the key personnel that will be involved in raising awareness of the CLIC product in chemistry departments and chemical industry. In conjunction with these meetings an e-mail discussion list has been established (chemweb@ic.ac.uk), and this now has >200 subscribers with a significant international following (see http://www.ch.ic.ac.uk/hypermail/chemweb/ )

1.13 A number of scholarly articles and talks have been presented in which the CLIC project is discussed. These include;

(a) H. S. Rzepa, "The Future of Electronic Journals in Chemistry". Trends in Analytical Chemistry, 1995, 14, 464.

(b) B. J. Whitaker and H. S. Rzepa, "Chemical Publishing on the Internet", Conference on Chemical Information, Nimes, France, October, 1995.

(c) D. James, B. J. Whitaker, C. Hildyard, H. S. Rzepa, O. Casher, J. M. Goodman, D. Riddick, P. Murray-Rust The Case for Content Integrity in Electronic Chemistry Journals: The CLIC Project., New Review of Information Networking, 1996, 61-70.

(d) O. Casher and H. S. Rzepa, "The Molecular Object Toolkit: A New Generation of VRML Visualisation tools for use in Electronic Journals", Proceedings of the 14th UK Eurographics Conference, March, 1996.

(e) S. M. Bachrach, P. Murray-Rust, H. S. Rzepa, B. J. Whitaker, "Publishing Chemistry on the Internet", Network Science, 1996, 2 (3).

(f) C. J. Hildyard and B. J. Whitaker, "Chemical Publishing on the Internet: Electronic Journals - Who needs them?, Chemistry On-line Conference, London, December 1996.

(g) H. S. Rzepa, O. Casher and B. J. Whitaker, "A Paradigm Shift in Chemistry Electronic Publishing", Conference on Chemical Information, Nimes, France, October, 1996.

2.0 Learning from the Process of Implementation.

2.1 Leeds University. As anticipated HTML "standards" are still something of a moving target, although there seems to be a determined effort to focus on HTML 3.2. We have learnt that it would be unwise to rely on HTML as the definitive mark-up for an electronic journal. Instead we have concentrated on developing text conversion tools from the ISO standard SGML. We have developed tools which still allow us to make use of the Web as a text delivery mechanism by converting documents held in the SGML archive on demand into HTML. The advantage is that we can quickly accomodate future revisions and enhancements in HTML without compromising the integrity of the archive. The process is achieved by use of a CGI script to invoke the translation using a CoST/Perl program. In the course of this work we unearthed a number of deficiencies in the SGML being produced by the RSC's typesetters (which have now largely been corrected). It also became clear that the ISO 12083 DTD for scientific documents lacks some required functionality. This will be addressed in year 2. Our experience relates to issues of sustainability and the development of the SGML DTD. See Section 3.5 below.

2.2 Cambridge University and the Royal Society of Chemistry. A number of points have come to light during the implementation of the project plan:

(a) Author (and editor) feedback suggests that style sheets and rigid instructions for authors would not be practicable
(b) The amount of manual processing involved in producing an electronic version of a printed journal is considerable.
The conclusions to be drawn from these points are:
(c) to convert a manuscript to SGML, a thorough yet flexible conversion routine is required
(d) Tables, maths and images need to be defined in the DTD to reduce manual processing.
(e) While typesetter-processed SGML is satisfactory, in order to implement these changes quickly and to know precisely how the SGML is being generated at every point, the RSC needs to be in control of the conversion process. Therefore, in parallel with typesetter-derived SGML it is evident that further work on in-house SGML processing is required. If the generation of SGML can be achieved by the RSC routinely rather the by its suppliers, the process can more adequately be incorporated into routine production alongside other useful developments such as electronic refereeing and onscreen editing. But most importantly, the CLIC project team will determine what is required in the RSC's SGML dtd to enable the sustainable delivery of electronic journals on the WWW.
2.3 Imperial. The prime focus here was on establishing a robust document handling technology that could be scaled up as required, and could form the basis for the implementation strategy in year 3. Our initial focus was on server solutions provided by Netscape, but we soon realised that "hyper link maintenance" was in fact a major problem that needed to be addressed. Following initial experiments with the Harvest server, we settled on the Hyper-G (now Hyperwave) solution to this problem. We have experienced significant difficulty with support for this product from the University of Graz where it originates. To better understand the development strategy of this product, three of us (DJ, HSR and OC) visited the Graz development team, and we are now in the process of joining the HyperWave consortium so that access to the latest information is available to us. Whatever the future of HyperWave as a Web solution, we believe that we have gained invaluable experience in maintaining and indexing a structured document collection.

Our second focus has been on developing new formats for integrating complex information into the journal format. This has focused on VRML as a 3D descriptor, and we have been closely involved in the evolution towards VRML 2.0 (the Moving Worlds Proposal). We have learnt that the VRML standards process is not as closely synchronised with the activities of the W3C organisation as we would wish, and that the hardware requirements of VRML currently exceed those available to many typical potential users of the journal. We are currently optimising the VRML files and content to address this aspect.

We are also developing alternative technologies involving the object oriented Java language. This we believe will form a major thrust for the CLIC project in its second half. To help focus on standards in this area, we have been pro-active in setting up the "Open Molecule Foundation", with which CLIC will collaborate closely.

We identified early on that rapid-response evaluation mechanism for the technologies and styles that the electronic journal will eventually use was essential. Our solution was to organise two electronic conferences, and via these an enormous amount of experience has been gained in areas such as how authors submit articles in electronic form, how an on-line refereeing process works, how articles gain from a 4 week discussion period (as an alternative paradigm to conventional refereeing), how Chemical Abstracts can process electronic materials, the problems of subsequent CD ROM production, and the problems of "Atlantic" bandwidth resulting in poor response for the end user. As a result, we have increased our priority in developing a strategy for international mirroring of the delivery of the electronic journal.

3. Interim Evaluation Results.

3.1 Delivery of the Journal. Enhanced Graphical Contents pages have been on-line for about 6 months, taking the form of three different alternatives that people were invited to comment on. Some 50 replies have been received, containing valuable feedback. The principle conclusion was that people were prepared to wait a little longer to get more comprehensive on-screen displays. In the last two months, one "enhanced" Keynote article has been available for inspection and comment. The responses have been largely favourable. Where the products attracted criticism we have identified as being due to

a) Inadequate initial documentation and training for readers on how to actually use the product
b) Inadequate quality of materials received from authors
c) Limitations of the HTML standard.

3.2 Mobilisation. We believe that we have made a successful start to mobilising the community towards using the CLIC product via the awareness raising forums (Chemistry Webmasters), the two electronic conferences, the two one-day meetings and the e-mail discussion list. In addition, articles written in Chemistry in Britain (the current awareness publication of the RSC), Chemical and Engineering News (the equivalent publication of the American Chemical Society) and a German paper in the Austrian Chemical Society have been published. Finally, Information Packs for the journal editors have been prepared and distributed.

3.3 Cultural Change. Within a large community such as chemistry, it is difficult to assess what proportion of potential readers is in a receptive frame of mind to wish to evaluate an electronic journal. Our experiences with authors, both of the journal and the associated conferences, is that some authors resolutely refuse to modify their standard method of preparing a manuscript, whilst others enthusiastically provide high quality materials for inclusion. Of the 120 articles submitted for example to the ECHET96 conference, some 45 were prepared to reasonable or high quality in electronic form by the authors, and some 30 more were prepared to edit their contribution after initial processing by the editors. ECHET96 derived from a special interest group of the Royal Society of Chemistry, which would normally hold a conference with perhaps 20-30 discussion papers. From this criterion, ECHET96 succeeded in mobilising and inducing cultural change in this community on a very short time scale.

The high level of attendance at the Chemistry Webmasters meetings also indicated significant interest from the community.

3.4 Cost Effectiveness/value-added.The CLIC e-journal is currently at the stage of establishing and testing various mechanisms which will form the basis for a sustainable process in the future. This is being done with a relatively small resource compared to that devoted to the printed version of the journal, and there is little doubt that it forms a cost effective supplement to the printed version. The most striking success is in the perceived value-added component. The inclusion of 3D models within the body of the CLIC journal has been possible because of a) our developments of MIME standards to accomplish this and b) our presenting the CLIC project at an early stage to MDLI, a commercial company that specialises in chemical databases. This has resulted in the production of a Netscape plug-in called Chime, itself based on the standard visualisation program RasMol. We have thus been able to achieve an early added value to the electronic journal that to a significant extent anticipates our stated deliverables for year 3 of the project. We are now focusing on other value-added aspects such as delivery of analytical data, mathematical markups, numerical information, and various forms of usage statistics and index searching.

3.5 Sustainability. This aspect, which is central to our policy, is being persued by adoption of SGML derived technologies, together with the development of parsing and conversion tools, and the use of scaleable servers such as HyperWave. Another important component for sustainability, and we are striving to adopt standards (or proposed standards) such as HTML 3.2, VRML 2.0 and those emerging in the Java area wherever possible.

Our eventual aim is the production of a self-sustaining electronic journal. The tools we develop for SGML to HTML conversion cannot therefore involve more than the minimum of human intervention (ideally none). At present this is not possible because of an inadequate specification of Document Type Definition (DTD). In year 2 of the project we plan to concentrate our efforts in this area - particularly for the "added value" components that we have demonstrated in the feature articles (currently hand edited). Central to sustainability is the on-the-fly conversion of archived SGML instances, removing the need for a parallel HTML archive. A semantically rich, chemistry-specific DTD is being developed to represent the diversity of chemical data as completely as possible. This allows automatic conversion tools to be developed, which are sufficiently flexible as to allow rapid, efficient upgrading of the electronic journal to reflect the inevitable changes in HTML and/or browser technology.

3.6 Demand/Performance and Future Scenarios. Although many publishers now claim to offer an electronic version of their journal, in the most part this comprises either index pages only, or exact replications of the printed form. Responses from users, both via conferences, at the one-day meetings and via e-mail discussion lists, re-inforces our belief that demand for such products will be generated largely on the value-added components present, and the simplest possible end-user installation requirements. As demand increases, so delivery performance will become a large issue. We intend to address that by investigating mirroring solutions in the USA and elsewhere, by looking at subject specific Caching solutions based on the UK Hensa Caching site. Finally, issues of end-user software installations via Java are classed as a high priority.

4. Future Development and Main objectives.

4.1 We will move from developing the basic infra-structure to sustaining the on-going production of the electronic journal. This will involve extending the working RSC DTD to including the value-added molecular components we have already demonstrated in the keynote articles and the conferences (Leeds University and the RSC). Our development work on the chemistry specific CML DTD will continue via association with the Open Molecule Foundation, as will a strategy for integrating the two threads for the final stage of the CLIC project. (RSC).

4.2 We will concentrate on the complex issues of indexing not merely the text based content of the journal (an intrinsic feature of the Hyperwave server) but the chemistry based content. To this purpose, the project will focus in the next stage on implementing an arbitrary DTD into the HyperWave server. To facilitate this, we expect to make further visits to Graz to enable us to work closely with the development team there (Imperial, Cambridge, Leeds). We are also exploring modular servers based on Java such as Jigsaw and Jeeves to see how "CGI" functionality can be made scalable and sustainable.

4.3 We will concentrate on implementing Java and VRML based value-added components. A demonstration of Java technology to display spectral data is already available, and we are actively developing VRML 2.0 solutions to the display of more complex visual data (Imperial).

4.4 We will continue to develop tools for analysing usage statistics and reader-profiles and the use of persistent client states in this context (Imperial).

4.5. Charging and Authentication. In the first year, the electronic version of the journal has been made freely available to any institute that already subscribes to the printed version. In year 2, various charging models for access to the enhanced electronic version of ChemComm will be proposed by the RSC and evaluated. Various charging and authentication mechanisms will also be investigated, including Java-based solutions.

4.6. Issues of security: Areas such as Java and persistent client states need to be addressed in the context of operation by the Royal Society of Chemistry. In part, this will be facilitated by our affiliation with the Open Molecule Foundation and our proposed membership of the HyperWave Consortium.

4.7. Object-relational database management systems such as Illlustra are viewed as holding great promise for the CLIC project. We intend to evaluate such products for potential use by the project.

4.8 Electronic refereeing: Though a small percentage of authors contribute manuscripts on disk, for the most part, manuscripts are currently received on paper and refereeing and editing of the manuscript is carried out using paper copy. If the refeering and editorial processes could be handled using manuscripts supplied by authors either on disk or via the Internet, then the possibility exists for a reduction in the publication time. It is expected that the receipt of authors' manuscripts electronically will also facilitate the generation of SGML.

4.9 RSC and CD-ROM: It is expected that electronic journals and electronic conferences will be archived and distributed on CD-ROM. Technologies to achieve this will be evaluated at the RSC.

4.10. Working Relationships with Commercial Developers: The close relationship with the Open Molecule Foundation will be continue. In addition, we hope to liaise with software developers such as MDLI, SGI and SUN, and to interact where possible with other publishers to develop new tools applicable for chemical electronic journals.

4.1.11 Implementation Strategy. Associated with the sustainability issues noted above, we are actively working with the RSC to develop an Implementation strategy for 1998.