Conference access statistics and the browsing habits of ECHET96 readers

Christopher Leach and Henry S. Rzepa
Department of Chemistry, Imperial College of Science, Technology and Medicine, London SW7 2AY.

One of the most characteristic features of the World-Wide Web is the degree of fine detail one can obtain regarding transactions between the server and a remote client. It thus provides a natural mechanism for attempting to assess the reaction of an audience to a collection of material such as found in an international conference.In this article, we present an analysis of the ECHET96 accesses.

Not only is the information of when a particular document was accessed but also the machine and the browser the user used to read the document. Popularity of articles and facilities within the conference can be monitored and compared from the information in logs. Various statistics from the access logs have been complied to show how the conference was used.

We do note that this analysis does not include accesses via the North American mirror site, not does it include statistics for the 34 articles provided by authors from their own servers. For this reason, we estimate that the analysis below relates to around 60% of the conference totals only.

General properties of the conference

There were 120 articles and poster abstracts were initially submitted to the conference editors and accepted by the scientific committee. Of these, 17 are not present on these conference proceedings either because (a) the authors requested they not be included or (b) no full article following the abstract was received. We have included the access statistics derived from the articles present in June 1996 (rather than those present on this CD-ROM) in the analysis below. Three contributions were added after the conference proceedings relating to post-conference analysis, of which this is one.

Conference Theme Number of Articles
Reaction Mechanisms and Conformational Analysis 12
Molecular Modelling and Databases 8
New Synthetic Methodology 59
Keynotes 12
Stereochemical Reactions 29
Total 120
Table 1. Distribution of articles in the different conference themes during June 1996.

The authors were allowed to submit their articles in various ways. Thirty four authors chose to use their own World-Wide Web servers. Although there were the advantages that the authors could update their own articles easily and the network load on the conference server was reduced, these papers were unavailable for us to create accurate access statistics. There was also the risk that the author's server was on an unreliable network.

Most of the authors (59% - 72/123) chose to use the forms mechanism to submit their abstract while they were registering themselves to the conference as authors. The other popular method (18% - 22/123) of sending abstracts was by e-mail in various formats including Microsoft Word documents and HTML. When the full articles were submitted, the e-mail approach was more popular (55% - 68/123) against the forms method (15% - 19/123). It appeared that it was more comfortable to send multiple files in e-mail attachments than to rely on the forms system and rely on extraction of the files from uploaded archives.

Accesses to the conference as a whole

The conference started on 24 June 1996 for four weeks until 22 July 1996. During this time there were 71,498 accesses by remote users, those accessing outside the Chemistry Department at Imperial College. The heavy testing of the conference by the conference editors and the range of computer systems held within the department would have skewed the access results. For this reason, accesses within the department have been excluded. In fact, remote accesses during the conference where responsible for 97.3% of the total, so excluding local access does not significantly ifluence the final results. The actual number of bytes downloaded also showed the same proportion between the remote users and the local users, 246 Mbytes against 6.5 Mbytes.

One of the various methods to classify the accesses to the conference is to consider the type of file retrieved, whether it is an HTML file, one of the several types of picture file types or molecule coordinates. The number of pictures in the conference was much greater than the number of HTML files, so the number of accesses to these files ought to reflect this.

Table 2. Accesses to the conference divided into file type
File type Accesses Percentage Number Percentage Average accesses per file
gif 43629 59.4% 1101 59.0% 39.63
GIF 2826 3.8% 121 6.5% 23.36
jpg 242 0.3% 14 0.7% 17.29
JPG 73 0.1% 4 0.2% 18.25
Pictures 46770 63.6% 1240 66.4% 37.72
htm 3148 4.3% 190 10.2% 16.57
html 14533 19.8% 363 19.4% 40.04
HTM 87 0.1% 9 0.5% 9.67
Articles 17768 24.2% 562 30.1% 31.62
pdb 1824 2.5% 59 3.2% 30.92
xyz 68 0.1% 6 0.3% 11.33
Chemical 1892 2.6% 65 3.5% 29.11
cgi-bin 3821 5.2% na na na
Unaccounted for 3241 4.4%
Total 73492 1867

Most authors had inlined pictures in their articles. These were often downloaded when the document was retrieved over the internet. The average accesses per file normalised any differences between the number of pictures and HTML documents and their access statistics. Unfortunately some HTML documents had more pictures associated with them than others, which skewed the averages. The picture access average was greater than the HTML average indicating that the number of documents that were accessed had a larger proportion of associated pictures than the conference average.

Weekly Accesses to ECHET96

Figure 1. Weekly accesses to ECHET96 from January 1996

The number of local accesses to the conference pages were so small that they have been excluded from further discussion. There were three important peaks in the weekly accesses at weeks 17, 22 and 25. Week one was the first week corresponding to January 1996.

Table 3. Milestones for ECHET96
Week Milestone
6 Chemweb Announcement
17 Abstracts in
18 Abstracts Refereed
22 Full versions in
25 ECHET96 Conference Starts
33 Deadline for updates
42 End Refereeing
56 End Changes to CD-ROM version

The milestones show clearly that the increased activity by the authors were due to the uploading and checking of their articles in week 17 and 22 before the two major deadlines for the abstracts and the articles to be in. The largest activity occurred in week 25 at the start of the conference. This activity was divided into three parts:

  1. Authors uploading and checking their articles
  2. Participants reading the papers
  3. Accesses to the rest of the conference

Authors uploading and checking their articles

Figure 2. Accesses to the ECHET96 cgi-bin programs

Cgi-bin requests to the server were very useful in the way that they could not be cached on the participants' machines. These could provide an accurate method of determining how the conference was used. There were several programs written for ECHET96 to administer various parts of the conference.

This program was activated when anyone looked at the main conference page. It created a new counter at the bottom of the page indicating how many people had visited the conference. Unfortunately it does not get updated if the user has the auto image loading facility on their browser switched off. Otherwise it was a very good indication of conference usage. It had the same characteristics as the general accesses to the conference, assuming that everyone went through the main conference page when they started a new conference session.
When the authors registered themselves to the conference and uploaded their abstract, they had to use this program. This program was only used in the weeks leading up to the two dead lines for authors to submit their abstracts and the main articles.
This program replaced the cgi-bin/abstract_submit to prevent new authors submitting brand new articles after the first deadline in week 17. Its main function was to allow authors to update their articles during the weeks before the conference.
The users used this program to search the conference using keywords. The access statistics reflect the general usage during the conference when the general accesses were high.
There was the facility for authors to add their own keywords to their article. The service supplied by pursuit-echet96 was very general since the algorithm used to create the indexes did not understand the relevance of chemistry. Unfortunately this was not used as often or complete as hoped. The keywords would have been used to cluster the papers using their keywords.
Uniquely to this conference was a system that allowed participants to submit a photo of themselves of their research group. This was used a little just before the conference started. The system allowed the pictures to be arbitrated before they became public. There was no knowing whether the picture was of the person in question, but we could exclude any questionable submissions by this procedure.

Participants reading the papers

Figure 3. Accesses to ECHET96 papers

There was a very small peak in week 17 when the abstracts were submitted. The accesses to the papers reflected the number of available complete articles to the conference until week 22. As expected, the majority of the conference statistics were to the paper directories since this was where the majority of the conference files were.

There was some elevated activity recently due to the deadlines for the final versions of articles for the CD-ROM version.

Accesses to the rest of the conference

Figure 4. Accesses to ECHET96 administrative pages

These pages contained the information on how to write and submit articles and participation to the conference. These were available from the start of the conference to encourage people to contribute and participate. The accesses were at the random noise level before and after the conference, the graph showed the same shape as the previous graphs for weeks 17, 22 and 25.

Outside the conference period the accesses experienced a random fluctuation similar to random noise. These accesses were from people who were either new to the web and wanted to have a quick browser though the site to see what there was or the few who bothered to come back to see if there were any updates. This conference is quite unusual in the way that the documents have not changed after the conference. Therefore it is unlikely that there will be people coming back to check whether there is anything new. Therefore it was possible to see the life cycle of an unchanged document. Before the conference the documents were being accessed by the participants and the authors as the conference was being built. On the start of the conference the readers assumed that the documents would no longer change, so the accesses started to fall as people finished reading the final versions of the conference. Seven weeks later the documents were of no interest except for the random browsers on the internet.

Hourly Accesses to the conference

Figure 5. Hourly Accesses to ECHET96

The accesses during the conference were viewed by the hour. Figure 5 shows that during the first week of the conference the accesses were up to 800 hits an hour. The weekends showed very little activity in the conference, shown by the blue troughs. Each week the majority of the accesses occurred during the hours of 09:00 to 17:00 BST indicating that most of the users were from Europe. The American accesses have not been included as most of these accesses occurred on the mirror site in America.

Browsers and machines used in ECHET96

When the participants accessed the conference they left behind in the logs information about the machine and operating system and the type of browser that they use. The absolute monthly statistics followed the same shape as the weekly statistics peaking in June.

Figure 6. Browsers used in ECHET96

It is important to know what browsers were using to view the conference to see whether the advanced technologies such as frames and plugins were being used. Fortunately 70% of the users were using Netscape 2.x which allowed them to view the frames and plugins in the conference. Unfortunately none of the Javascript was being used until October when the number of users of Netscape 3.x became the preferred browser.

Figure 7. Machines used to access ECHET96

The number of Windows 3.x/95 users were more abundant than the Macintosh users since August. The number of Windows 95 machines has only recently been the most popular operating system as more people upgrade from Windows 3.x. The specification of the operating system has an effect on what the participants can view. The latest technologies usually come out for Windows 95 before the Macintosh and Windows 3.x. Majority of the participants during the conference where more likely to use a Macintosh computer than a Windows based machine machine, indicating that chemists were more likely to use Macintoshes whereas the general usage of computers on the internet is predominantly Windows 95.


Various types of statistical analysis can be done on the conference or on individual articles or files to see how users use the conference. It should be possible to follow a user's route through a session seeing what they visit first and how they visit other sections using the navigation systems set out in the conference. If multiple sessions are mapped out, then it would be possible to work out the best routes though the information. This could be used to improve the flow of information through the rest of the conference. Unfortunately there were problems with caching which make it difficult to trace the complete sessions, as not all accesses conducted with the server but with the local file cache.

It is very important to keep an eye on the information that is produced by the access logs as they indicate how users using the web site. They can help to improve the workings of the site. The availability of browser information allows the use of advanced HTML techniques to be monitor since not all users have accesses to the latest browsing technology.