Overview of Molecular Modelling

Molecular modelling is a very diverse subject, ranging from the acquisition and subsequent display of molecular coordinates through to highly accurate (i.e. better than experiment) numerical simulation using theoretically derived functions. Depending on the context and the rigour, the subject itself is also often referred to as "molecular graphics", "molecular visualisation", "computational chemistry", "computational quantum chemistry" or "theoretical chemistry". A related area known as "molecular simulation" relates the use of molecular modelling techniques to describing and understanding the statistical behaviour and properties of collections of molecules on a "macroscopic" scale. "Molecular dynamics" deals with those time-dependent properties of collections of molecules, and uses many of the techniques of molecule modelling and statistical mechanics. Both these last two methods are beyond the scope of this lecture course.

Six Characteristic Features and Classifications

  1. Molecular Scales. Molecular modelling spans an enormous range of molecule size.
  2. Molecular Coordinates. Molecular modellers were amongst the first to take full advantage of on-line databases and information sources, and were the first chemists to adapt to the modern Internet. 3D atom coordinates are an essential feature of many modelling methods. Accurate or approximate 3D coordinates can be obtained from several experimental sources:
    1. Coordinate Systems: For N atoms in a system to be modelled, at least 3N-6 coordinates are needed to specify the system geometrically. These coordinates can either be XYZ (resulting in 3N coordinates, of which 6 are normally redundant, corresponding to translations and rotations of the molecule) or so-called Internal or "Z" matrix coordinates;

      H
      O  0.96  1
      O  1.4   2   111   1
      H  0.96  3   111   2   90    1
      

      Some simple modelling methods (Huckel) need only the atom connectivity, and not the geometric information. Other modelling methods abandon the atom as the smallest unit whose coordinate needs to be known, and use larger scale approximations such as protein backbone positions, or even spherical or ellipsoidal approximations to whole molecules. For specialised cases (where group theoretical information is used/required to e.g.speed up calculations) symmetry adapted coordinates can be specified using exact symmetry restrictions (Gaussview is a program that can symmetrize a coordinate set). A Web site for handling coordinate symmetry even allows you to determine the symmetry group by providing XYZ coordinates.

    2. Coordinate File Types: Historically, various computer file formats were developed to described these coordinates, of which the best known are the "Molfile", the "PDB" and the "XYZ" formats. The first two are really database formats, not modelling formats, and can lead to difficulties for small molecule modellers. The "XYZ" file is used almost entirely for animating molecular vibrations.

      h2o2.mol
        4  3  0  0  0                 1 V2000
          0.1332    0.6883    2.1950 O   0  0  0  0  0
          0.2562    0.6410    0.9013 O   0  0  0  0  0
          0.8290    1.3074    2.5089 H   0  0  0  0  0
          0.2935   -0.3133    0.6690 H   0  0  0  0  0
        1  2  1  6  0  0
        1  3  1  0  0  0
        2  4  1  0  0  0
      M  END
      

      The PDB format contains much more information about bio-molecules (note that atom coordinates are specified to only 3 decimal places, in Angstroms).

      SEQRES   1 A  467  GLY ALA MET ALA SER SER VAL LEU VAL THR GLN GLU PRO          
      SEQRES   2 A  467  GLU ILE GLU LEU PRO ARG GLU PRO ARG PRO ASN GLU GLU                                                              
      HET    COA    101      48                                                       
      HETNAM     COA COENZYME A                                                       
      HETNAM     MAH 3-HYDROXY-3-METHYL-GLUTARIC ACID                                 
      FORMUL   5  COA    4(C21 H36 N7 O16 P3 S1)                                      
      HELIX    1   1 PRO A  444  LEU A  449  1                                   6    
      HELIX    2   2 SER A  463  LYS A  474  1                                  12    
      SHEET    1   A 4 LYS A 549  ALA A 556  0                                        
      SHEET    2   A 4 VAL A 530  LEU A 546 -1  N  GLY A 539   O  MET A 555           
      CISPEP   1 GLY A  542    PRO A  543          0         0.61                     
      CRYST1   75.297  130.182   92.547  90.00 106.48  90.00 P 1 21 1      8          
      ATOM      1  N   PRO A 439      -7.194 -13.702  30.538  1.00 76.06           N  
      ATOM      8  N   ARG A 440      -7.440 -15.246  28.234  1.00 76.37           N 
      

      A more modern example is the CML format, which is an extensible format which can carry as much (molecular modelling) information as is needed:

      <cml:molecule xmlns:cml="http://www.xml-cml.org/schema/cml2/core">
      <cml:metadataList title="generated automatically from Openbabel">
      <cml:metadata name="dc:creator" content="OpenBabel version 1-100.1"/>
      <cml:metadata name="dc:description" content="CCSD(T)//CCSD/6-31G(d) Gaussian 03 optimised geometries"/>
      </cml:metadataList>
      <cml:atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14" elementType="C C O O C C C C H H H H H H" formalCharge="0 0 0 0 0 0 0 0 0 0 0 0 0 0" x3="0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000100" y3="0.675900 -0.675900 -1.705300 1.705300 -1.701000 1.701000 -0.722800 0.722800 1.111600 -1.111600 -1.147000 1.147000 2.739700 -2.739700" z3="-1.572000 -1.572000 -0.678200 -0.678200 0.682200 0.682200 1.617300 1.617300 -2.568500 -2.568500 2.622800 2.622800 1.006400 1.006400"/>
      <cml:bondArray atomRef1="a1 a1 a1 a2 a2 a3 a4 a5 a5 a6 a6 a7 a7 a8" atomRef2="a2 a4 a9 a3 a10 a5 a6 a7 a14 a8 a13 a8 a11 a12" order="2 1 1 1 1 1 1 2 1 2 1 1 1 1"/>
      </cml:molecule>
      

      The advantage of such modern formats is that e.g. molecular coordinates and properties can be embedded in a variety of delivery systems, including podcasts!

  3. Molecular Visualisation. Once 3D coordinates are available, they can be visualised, an important aid to interpretation of molecular modelling:
  4. Molecular Structure Analysis. Once a visual model is available, simple "heuristics" can be applied. These can range from detecting close contacts due to e.g. hydrogen bonding, bond lengths and the pattern of bond length alternation (e.g. aromaticity), to e.g. stereoelectronic effects such as atom antiperiplanarity or ring planarity. This corresponds to the simple ideas of e.g.arrow pushing which organic chemists tend to develop (and carry around in their head). IsoSTAR is a "data mining" method which can detect structural patterns in a large number of related structures.
  5. Molecular Structure and Property Prediction
  6. Molecular Reactivity
  7. The next two features are largely beyond the scope of this lecture course:
    1. Molecular Solvation and Condensed phase properties Supermolecule and condensed phase models of specific and bulk solvation effects.
    2. Molecular Dynamics and Simulations. Theories of free energies and reaction kinetics.

Typical Molecular Modelling Software Tools

The "tools of the trade" have gradually evolved from physical models (Dreiding, CPK, etc) and calculators, including the use of programmable computers (starting around 1956 with the introduction of the first scientific programming language called Fortran), computers as visualisation aids (around 1970-), computers running commercially written analysis "packages" such as e.g. Sybyl (around 1984-) and most recently integration using Internet based tools and Workbenches (1994-) based on languages such as HTML, JavaScript, Java and C++. A Forum for discussing such tools, and other general queries is the Computational Chemistry List (CCL).

A typical selection of molecular modelling teaching tools available within the department is listed below.

  1. Mercury: A (free) Crystallographic unit cell viewer and editor.
  2. Jmol: A (free) Web-browser applet that can display molecules, and some of their properties such as surfaces, spectra, vibrations, etc.
  3. Ghemical: an OpenSource molecular editing and molecular mechanics program. Superceded by Avogadro.
  4. ChemDraw/ChemBio3D: Molecule editor and 3D geometry molecular mechanics/quantum mechanics optimisation and display tool (STEREO ENABLED)
  5. Gaussview+Gaussian 03: Ab initio quantum mechanics editors and programs.
  6. DS Viewer Pro (STEREO ENABLED)
  7. VMD visualisation of molecular dynamics (STEREO ENABLED)

Return to overview|Forward to Mechanics| Forward to MO Reactant|Forward to MO TS| Forward to MO Advanced|
(c) H. S. Rzepa 1998-2009. No reproduction rights granted to this material without permission.