CLIC Project Management Meeting.

Progress Report: November 1, 1995

http://www.ch.ic.ac.uk/clic/pr/imperial.html

H. S. Rzepa and O. Casher, Imperial College.


1. Document Management. Any electronic journal will comprise a blend of real documents, and virtual documents created on the fly. Managing the permanent documents in a scalable manner, and ensuring hyperlink integrity is a major current issue. We have investigated two products in this regard; Hyper-G and Harvest.

Hyper-G is a server in its own right where document clusters are an inherent part of the design. We have mounted a Hyper-G server at Imperial College, and conducted experiments on its performance and characteristics. We hope to visit Graz to discuss with the Hyper-G development team our own specific requirements in this area, with the possibility of collaboration. We would also hope to evaluate another, this time commercial, product called SiteMill, which also promises to structure document collections at a web Server.

Harvest offers a facility to integrate separate Web servers to ensure that documents across different sites are automatically synchronised. The port from an Sun to an SGI proved problematic, but the system is now compiled and working. We intend testing this system by synchronising progress reports from the separate CLIC sites. The interaction between Harvest and e.g. SiteMill or Hyper-G needs to be evaluated.

2. Netsite Commerce. This secure server has been running in stable mode for some two months now. Experiments in secure transfer of documents are on-going.

3. Access Statistics. We have developed programs to analyse the Server log files for access statistics, including a profile across the time of day, and the day of the week, the types of file accessed, and the sites accessing them. These are present in either HTML or VRML formats. These statistics are presented as "on-the-fly" documents, as models of other aspects of in-situ document generation.

4. Models for Complex Data Presentation. We currently favour a small number of "3D" file formats to be associated with the early stages of the electronic journal component. The handling of "pdb" and "tgf" files is easily resolved, and was indeed thoroughly tested via the ECTOC Conference. The "CIF" file format we anticipate to serve primarily an archival purpose, since no good "CIF" viewers currently exist (Query Synopsis?). After extensive evaluation, we propose here that VRML be actively considered by the journal for representing more complex visual properties. An example may be a crystal structure in which key elements are hyperlinked via the text based discussion. Thus VRML represents the display representation component and a natural evolution of the "pdb" format into electronic publishing.

5. Chemistry Markup Language. Lead by Peter Murray-Rust, the development of an SGML dtd for specifically identified areas of chemistry has made excellent progress. The dtd is based on HTML 2.0, and features data validation and transformation through glossary-based code. We need to identify how the strategic integration of this dtd with existing dtds based on modification of ISO 12083 should be planned.

6. Indexing. Whilst indexing of SGML documents and text based material is well understood, indexing more complex data presentations is much less developed. Two particular challenges are indexing 2D images and 3D objects encoded in say VRML. We are beginning to identify the work needed in this area. CML is expected to play a major role in the overall problem of indexing a Chemical Journal where a proportion of the semantic content is represented not by text but by complex datasets.


Details