A group of short Cprogams which simplify the transfer of data from sets of gaussian output files to spreadsheet software operating on a PC platform.

Laurence P. Cuffe, Noel J. Fitzpatrick.

Department of Chemistry, University College Dublin , Belfield, Dublin 4, Ireland.

We describe a series of small programs developed to explore a medium sized data set consisting of log files from approximately 2000 Gaussian jobs. These programs automate to some degree the process of abstracting data from sets of Gaussian calculations. This reduces the opportunity for human error and simplifies the task of analysing large sets of MO data.

Introduction

As part of an investigation into the structure and stability's of a series of related small molecules we generated approximately 2000 output or log files using the Gaussian suit of programs. These came from a variety platforms and we describe the set of tools we developed to analyse this data in a systematic way.

The Components

There are three components:

A search tool DoZout which runs under UNIX and searches all log files in a given directory for those which result from converged jobs and produces a summary file for each such job.

A generation tool DoZcom.c which runs under UNIX and searches a directory for all such output files and produces for each of them a .com file based on the converged geometry and whatever method and basis set was used to generate the original log file.

An analysis and summarisation tool opthit which runs on a PC and reads in a file consisting of the concatenated output files produced by DoZout and produced a file consisting of one line summaries of each output file in a form suitable for importation into Microsoft Excel.(tm)

Detailed description of these components

DoZout

This is written in c and should run on any UNIX system.

This program uses the UNIX utility grep to find all files in the current directory which have completed successfully. It then spawns a series of sub-processes to extract the final distance matrix, The Zero point energy (if calculated ) and the archive entry along with any associated quotations. It places this information into a file with the same file name as the original file except that .log is replaced by .out.

There are some parameters in the file which can be changed to adapt to the type of log file we are scanning. The parameter Zlen adjusts the number of lines which of the distance matrix that we save. The atom types for which bondlengths are exported are also mentioned explicitly.

The code DoZout.c

Code for the spawned process. sz.c

The line of code

"execl("/users/larry/sZ","sZ",CurrentLine,CurrentLine,(char *)0);

should be changed so that /users/larry/ is replaced by the path to the executable of sZ (which should be compiled separately) on your system

DoZcom.c

This program converts summary (.out) files to com files.

When upgrading the method used or the basis set for a calculation based on a previously converged job it is convenient to have a .com file which incorporates a Z-Matrix based on the final geometry of the prior job. This then requires minimal modification for the new job. These .com files are also useful in enabling us to use XMOL to visualise the results of our work.

This program operates in a manner similar to that of the previous code. It uses grep to build a list of all the files in the current directory with an extension of .out and then uses this list to spawn sub-processes which generate .com files based on the information contained in each of the .out files. The use of spawn rather than a procedure call ensures that if the sub-process crashes due to inappropriate input, such as eating an a.out file, it does not terminate the operation of the calling program.

DoZcom.c

sZcom.c

Again if you wish to use this code you should note that the line of code

"execl("/users/larry/sZcom","sZcom",CurrentLine,CurrentLine,(char *)0);

should be changed so that /users/larry/ is replaced by the path to the executable of sZcom (which should be compiled separately) on your system

opthit

This program summarises a series of summary (.out) files and produces an output in the form suitable for importation into Excel.

For importation Excel uses the end of line character to separate rows. To separate columns you can use any designated character, we chose to use a comma.. This code is all pretty basic scanning and pattern matching with the exception of the code which deals with the distance matrix.

As atoms can be placed in the Z-matrix in arbitrary order, the distance-matrix is not standard in its lay out. We wish to compare bond-lengths etc. as other parameters of the molecules changed and so we standardised the sequence in which this data was exported. To this end the atom type is used as an index into a matrix of interatomic distances which is built up as the distance matrix is scanned. On encountering a line with no atom type in the required position, thus indicating the termination of the distance-matrix, triggers the program to print out the interatomic distances in standard sequence.

Although this method depended on all atoms being unique species, it could be easily modified to use an atom type and number (C1, C2, etc.) as indices. However in this case more care would be required in constructing the Z-matrix as now atoms of the same type should occur in a fixed order of input if Excel is to be used to compare equivalent bondlengths.

Other code details: as each field in our output record is terminated by a comma we initialise each field to a sequence of blank characters terminated by a comma. then when scanning the input If the corresponding variable is detected using the string function strstr() we copy the corresponding data into the relevant field character by character.

opthit.c

Usage

The code is used as follows

1 All the .log files that are to be compared and analysed should be transferred to one directory. A copy of DoZout should also be present in this directory and when it is run it will generate a .out file for each .log file that it finds which represents a successfully completed job.

2 If you wish to generate .com files corresponding to these out put files you may now run DoZcom.c.

3 Prepare a file for analysis by opthit. Do this by concatenating all the .out files into one big summary file, called allout.txt in the following example. This can be done with the following command pr *.out > allout.txt

4 Ship this file to a PC. using FTP, or whatever.

5 Run opthit with allout.txt as input and say test.txt as output

This will produce a text file text.out which you now feed to excel.

Do this by starting windows, running excel, and using the file open command to first change the column delimiter to a `,' (this is under the text button) and importing the file test.txt.

Summary

A group of tools have been developed in the C language which simplify extraction and analysis of the results of large numbers of Gaussian molecular modelling jobs. Although the code given relates to Gaussian output it could be easily modified to use the results of other MO packages. Two of the tools developed run on UNIX systems .

The first generates a set of summary files for all logfile in the current directory which correspond to successfully completed jobs. The second uses these output files to produce a set of com files which have the same key words as the original but with the original geometry replaced by the final optimised geometry of the original job. A third tool which runs on a PC extracts data from these summary files and formats it in a form suitable for importation into an Excel spreadsheet.

References

G90, G92, G94, Copyright Gaussian, Inc. Carnegie Office Park, Building 6 Pittsburg, PA 15106 U.S.A.