A scalemic molecule is the term used by Eliel to describe any non-racemic chiral compound. Synthetic chemists imply it when they describe a synthetic product with an observable enantiomeric excess or ee (which can range from close to 0% to almost 100%). There are two cheminformatics questions of interest to me:
- How many non-trivial scalemic molecules have been reported in the literature (let’s assume their ee is significantly greater than 0%)?
- The distribution function for the ee of these molecules would be most interesting!
- Of those, how many have the absolute configuration of the predominant enantiomer established with high confidence?
- Or, to put this another way, how many may prove to be mis-assigned?
Note the careful qualification in the above questions. Thus by non-trivial, I mean compounds whose scalemic attributes persist in solution for a chemically useful duration. That could be taken to mean configurationally stable chiral molecules, rather than those that might be conformationally chiral (an example of a trivial scalemic molecule would be e.g. the twist-boat conformation of cyclohexane, which having D2 symmetry is dissymetric, but which would only retain its scalemic property for a trivially short timescale).
What are boundary values? These are some:
- As I write this, CAS records 61,257,703 chemical substances. Needless to say (unless I missed it), the answer to my first question is not to be found there.
- Beilstein (Reaxys) records 1,126,995 compounds as having one or more reported chiroptical properties (which is the most direct way of establishing a molecule is scalemic, although strictly, having say an optical rotation of 0° does not necessarily mean the molecule is not scalemic). We have no way of knowing how many molecules are scalemic for which no chiroptical measurement has been made (but one would hope its a small proportion). Perhaps that is a good answer to question 1?
- of which 1,097,094 relate to optical rotatory power, 17,515 to optical rotatory dispersion and 62,248 to electronic circular dichroism.
- it is more difficult to answer how many of these 1,126,995 substances have a firmly established absolute configuration. Measuring a chiroptical property per se does NOT in itself establish the absolute configuration. Doing so is a fascinating exercise in sequential logical argument, and how one does it has changed quite a lot over time. And what might I mean with high confidence? An older assignment (made say > 40 years ago) might be less confident than one established in 2011 (fortunately, we can probably trust the absolute configurations of the amino acids!). A bit of a can of worms, nevertheless. But it interests me because it is a good example of what the semantic web is supposed to be all about.
- The Cambridge crystallographic database reports 560,307 entries, of which 72,340 are in chiral space groups (in which a chiral molecule can crystallise) and exhibit no disorder or other errors. We do not know how many of these are non-trivial, since all manner of small (and low energy) distortions can create a chiral species (in the solid state), but which would not persist for a chemically useful duration in solution (i.e. it might for example immediately racemize and become non-scalemic).
- The Flack parameter has been used since 1983 for enantiomorph estimation (a value of ~≤ 0.10(10) would be considered meaningful). This could in principle provide an answer of known confidence to my question 2 above (but would not address the issue of non-triviality).
- The challenge now is to quantify how many compounds have a meaningful reported Flack parameter (presumably a sub-set of 72,340?)
Let me declare one personal interest. Over the last four years or so, we have been asked to confirm the absolute configuration of around eight scalemic molecules. After a detailed study, we concluded three were mis-assigned. Now this in no way implies anything about what the answer to question 2 above might be! But it does make one think!
Tags: Cambridge, chemical substances, chiral, chiroptical, disorder, dissymetric, low energy, scalemic molecules, semantic web, synthetic product