Coordinate Systems: For N atoms in a system to be modelled, at least 3N-6 coordinates are needed to specify the system geometrically. These coordinates can either be XYZ (resulting in 3N coordinates, of which 6 are normally redundant, corresponding to translations and rotations of the molecule) or so-called Internal or "Z" matrix coordinates;
H O 0.96 1 O 1.4 2 111 1 H 0.96 3 111 2 90 1
Some simple modelling methods (Huckel) need only the atom connectivity, and not the geometric information. Other modelling methods abandon the atom as the smallest unit whose coordinate needs to be known, and use larger scale approximations such as protein backbone positions, or even spherical or ellipsoidal approximations to whole molecules. For specialised cases (where group theoretical information is used/required to e.g.speed up calculations) symmetry adapted coordinates can be specified using exact symmetry restrictions (Gaussview is a program that can symmetrize a coordinate set). A Web site for handling coordinate symmetry even allows you to determine the symmetry group by providing XYZ coordinates.
Coordinate File Types: Historically, various computer file formats were developed to described these coordinates, of which the best known are the "Molfile", the "PDB" and the "XYZ" formats. The first two are really database formats, not modelling formats, and can lead to difficulties for small molecule modellers. The "XYZ" file is used almost entirely for animating molecular vibrations.
h2o2.mol
4 3 0 0 0 1 V2000
0.1332 0.6883 2.1950 O 0 0 0 0 0
0.2562 0.6410 0.9013 O 0 0 0 0 0
0.8290 1.3074 2.5089 H 0 0 0 0 0
0.2935 -0.3133 0.6690 H 0 0 0 0 0
1 2 1 6 0 0
1 3 1 0 0 0
2 4 1 0 0 0
M END
The PDB format contains much more information about bio-molecules (note that atom coordinates are specified to only 3 decimal places, in Angstroms).
SEQRES 1 A 467 GLY ALA MET ALA SER SER VAL LEU VAL THR GLN GLU PRO SEQRES 2 A 467 GLU ILE GLU LEU PRO ARG GLU PRO ARG PRO ASN GLU GLU HET COA 101 48 HETNAM COA COENZYME A HETNAM MAH 3-HYDROXY-3-METHYL-GLUTARIC ACID FORMUL 5 COA 4(C21 H36 N7 O16 P3 S1) HELIX 1 1 PRO A 444 LEU A 449 1 6 HELIX 2 2 SER A 463 LYS A 474 1 12 SHEET 1 A 4 LYS A 549 ALA A 556 0 SHEET 2 A 4 VAL A 530 LEU A 546 -1 N GLY A 539 O MET A 555 CISPEP 1 GLY A 542 PRO A 543 0 0.61 CRYST1 75.297 130.182 92.547 90.00 106.48 90.00 P 1 21 1 8 ATOM 1 N PRO A 439 -7.194 -13.702 30.538 1.00 76.06 N ATOM 8 N ARG A 440 -7.440 -15.246 28.234 1.00 76.37 N
A more modern example is the CML format, which is an extensible format which can carry as much (molecular modelling) information as is needed:
<cml:molecule xmlns:cml="http://www.xml-cml.org/schema/cml2/core"> <cml:metadataList title="generated automatically from Openbabel"> <cml:metadata name="dc:creator" content="OpenBabel version 1-100.1"/> <cml:metadata name="dc:description" content="CCSD(T)//CCSD/6-31G(d) Gaussian 03 optimised geometries"/> </cml:metadataList> <cml:atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14" elementType="C C O O C C C C H H H H H H" formalCharge="0 0 0 0 0 0 0 0 0 0 0 0 0 0" x3="0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000100" y3="0.675900 -0.675900 -1.705300 1.705300 -1.701000 1.701000 -0.722800 0.722800 1.111600 -1.111600 -1.147000 1.147000 2.739700 -2.739700" z3="-1.572000 -1.572000 -0.678200 -0.678200 0.682200 0.682200 1.617300 1.617300 -2.568500 -2.568500 2.622800 2.622800 1.006400 1.006400"/> <cml:bondArray atomRef1="a1 a1 a1 a2 a2 a3 a4 a5 a5 a6 a6 a7 a7 a8" atomRef2="a2 a4 a9 a3 a10 a5 a6 a7 a14 a8 a13 a8 a11 a12" order="2 1 1 1 1 1 1 2 1 2 1 1 1 1"/> </cml:molecule>
The advantage of such modern formats is that e.g. molecular coordinates and properties can be embedded in a variety of delivery systems, including podcasts!
A typical selection of molecular modelling teaching tools available within the department is listed below.