{"id":24723,"date":"2022-03-01T14:16:13","date_gmt":"2022-03-01T14:16:13","guid":{"rendered":"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24723"},"modified":"2022-03-03T10:39:05","modified_gmt":"2022-03-03T10:39:05","slug":"raw-data-the-evolution-of-fair-data-and-crystallography","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723","title":{"rendered":"Raw data: the evolution of FAIR data and crystallography."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"24723\">\n<p>Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s.<span id=\"cite_ITEM-24723-0\" name=\"citation\"><a href=\"#ITEM-24723-0\">[1]<\/a><\/span> The next phase was the introduction of data repositories in the early naughties. Now associated with innovative commercial companies such as Figshare and later the non-commercial Zenodo, such repositories have also spread to institutional form such as <em>eg<\/em> the earlier SPECTRa project of 2006<span id=\"cite_ITEM-24723-1\" name=\"citation\"><a href=\"#ITEM-24723-1\">[2]<\/a><\/span> and still evolving.<span id=\"cite_ITEM-24723-2\" name=\"citation\"><a href=\"#ITEM-24723-2\">[3]<\/a><\/span> Perhaps the best known, and certainly one of the oldest examples of curated structural data in chemistry is the CCDC (Cambridge crystallographic data centre) CSD (Cambridge structural database) which has been operating for more than 55 years now, even before the online era! Curation here is the important context, since there you will find crystal diffraction data which has been refined into a structural model, firstly by the authors reporting the structure and then by CSD who amongst other operations, validate the associated data using a utility called <a href=\"http:\/\/checkcif.iucr.org\">CheckCIF<\/a>.<span id=\"cite_ITEM-24723-3\" name=\"citation\"><a href=\"#ITEM-24723-3\">[4]<\/a><\/span> What perhaps is not realised by most users of this data source is that the original or &#8220;raw&#8221; data, as obtained from a X-ray diffractometer and which the CSD data is derived from, is not actually available from the CSD. This primary form of crystallographic data is the topic of this post.<\/p>\n<p>Most chemical data now emerges from an instrument, where it is already partially processed internally before being offered. Such raw\/primary data is perhaps best known in the form of NMR information, where it is offered by the instrument in the form of an FID or free induction decay.\u00a0Its transformation from this form into what all chemists know as a spectrum requires further software processing, and including other operations such as peak integration. It is this processed spectrum that had traditionally been offered as part of a scientific article (often only in visual, or peak listed form) and rarely has the\u00a0FID form been made available to anyone interested. It is important to state that the transformation to spectrum also incurrs significant loss of data. An interesting project led by the editors of two organic chemistry journals<span id=\"cite_ITEM-24723-4\" name=\"citation\"><a href=\"#ITEM-24723-4\">[5]<\/a><\/span>,<span id=\"cite_ITEM-24723-5\" name=\"citation\"><a href=\"#ITEM-24723-5\">[6]<\/a><\/span> had the aim of encouraging the submission of FAIR data to the journal, although in fact the project actually concentrated on the submission of raw NMR data. As it turned out, only a very small proportion of all the submissions to these journals over the period of a year actually provided such data (~113 datasets) in the form of ZIP archives<sup>\u2021<\/sup> and containing anywhere between one and ~100 actual sets of raw NMR data per archive. One should make the point that raw data is not necessarily FAIR data. The latter requires rich metadata describing the data to become findable, accessible, interoperable and reusable (FAIR), and such metadata was not actually generated as part of this publisher project.\u00a0<\/p>\n<p>Here I will take a closer look at potentially FAIR raw data in the area of crystallography. This project is perhaps less well known than the previous one,<span id=\"cite_ITEM-24723-4\" name=\"citation\"><a href=\"#ITEM-24723-4\">[5]<\/a><\/span>,<span id=\"cite_ITEM-24723-5\" name=\"citation\"><a href=\"#ITEM-24723-5\">[6]<\/a><\/span> hence the present post strives to make it better known. As with NMR, a useful starting point is to describe the various stages in the lifecycle of crystal data.<\/p>\n<ol>\n<li>A crystal is mounted in the diffractometer and x-ray diffraction images are recorded. These are considered the raw data, and as with most instruments, their form is determined both by the instrument itself and the software used to start the refinement process into a molecular structure.<\/li>\n<li>This refinement then assigns a space group to the data and derives so-called structure factors or <em>hkl<\/em> data. This data can now be captured in a much more standard form known as a CIF (crystallographic information file) and is nowadays the format that is deposited with CSD.<\/li>\n<li>A reduced form of the CIF file, containing a sub-set of the information but lacking the <em>hkl<\/em> data is much the more common, and was the form originally sent to CSD until a few years ago.<\/li>\n<li>Very often an image of the resulting model for the molecular structure is also included. Whilst it is based on the data in the CIF file, it does not contain reusable data as such and is considered as being made available only for human use and perception.<\/li>\n<\/ol>\n<p>It is form 1 that is missing from the CSD datasets. Because it can be quite large (~0.5-9 Gbyte), the current recommendation is that it is not stored on the CSD but on local data repositories.<sup>\u2020<\/sup> So now we see a need to establish if possible bidirectional links between type 1 and types 2-4 and to identify what characteristics of FAIR each has. Primarily, the F (findable) of FAIR will be explored here. This is done by illustrating some searches for this data, based on the metadata registered for it with DataCite.<\/p>\n<ol start=\"5\">\n<li><a href=\"https:\/\/commons.datacite.org\/?query=relatedIdentifiers.relatedIdentifier:10.5517*\" target=\"_blank\" rel=\"noopener\">https:\/\/commons.datacite.org\/?query=relatedIdentifiers.relatedIdentifier:10.5517*<\/a>\u00a0 (157 works)<br \/>\nThis simple search identifies any entry in any repository which cites in its metadata record the DOI for an entry in CSD, taking the form <strong>10.5517*<\/strong> which is common to all entries.<\/li>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=relatedIdentifiers.relatedIdentifier:*10.5517*+AND+(media.media_type:chemical\/x-cif+OR+media.media_type:application\/x-7z-compressed+OR+media.media_type:application\/gzip+OR+media.media_type:application\/zip)\" target=\"_blank\" rel=\"noopener\">?query=relatedIdentifiers.relatedIdentifier:*10.5517*+AND+(media.media_type:chemical\/x-cif+OR+media.media_type:application\/x-7z-compressed+OR+media.media_type:application\/gzip+OR+media.media_type:application\/zip)<\/a> (9 works).<br \/>\nThis also specifies that search 5 is further constrained by requiring one of four media types to ALSO be present in the repository metadata record. These types are standard compressed archives which the raw crystal data is likely to be stored as, along with a CIF entry that is clearly associated with crystal structure data. The Boolean OR indicates that any one of them can be present! One can now be a little more certain that these entries contain crystal structure data. That we cannot be absolutely certain is clearly a current deficiency of the metadata present for the entries!\u00a0<\/li>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=identifier:*10.5517*+AND+(relatedIdentifiers.relatedIdentifier:*10.14469\/hpc\/*)\" target=\"_blank\" rel=\"noopener\">?query=identifier:*10.5517*+AND+(relatedIdentifiers.relatedIdentifier:*10.14469*)<\/a> (7 works)<br \/>\nEight works from search 6 originate from a repository with the prefix <strong>10.14469*<\/strong> and so now one can reverse the direction and ask how many are referenced in the metadata for each published item in the CSD? Around 945,473 entries in the CSD currently have a persistent DOI identifier associated with them, all starting with <strong>10.5517*<\/strong>\u00a0and so now one can search for how many of these also reference a related identifier at <strong>10.14469*<\/strong>\u00a0 Seven of them show up there.<\/li>\n<li>Also in the CSD metadata records is an item with the attribute <em>relationType=&#8221;IsDerivedFrom&#8221;<\/em> carrying the meaning that the CSD data is itself derived from (raw) data held elsewhere.\u00a0This information is captured during the deposition process with CCDC as per below. <br \/>\n<a href=\"https:\/\/www.ccdc.cam.ac.uk\/deposit\/upload\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-24730\" src=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg\" alt=\"\" width=\"540\" srcset=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg 1024w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-300x238.jpg 300w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-768x608.jpg 768w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1536x1216.jpg 1536w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1.jpg 1729w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><br \/>\n<a href=\"https:\/\/commons.datacite.org\/?query=identifier:*10.5517*+AND+(relatedIdentifiers.relationType:IsSourceOf+OR+relatedIdentifiers.relationType:IsDerivedFrom)\" target=\"_blank\" rel=\"noopener\">https:\/\/commons.datacite.org\/?query=identifier:*10.5517*+AND+(relatedIdentifiers.relationType:IsSourceOf+OR+relatedIdentifiers.relationType:IsDerivedFrom) <\/a>\u00a0(7 works)<br \/>\nThis constrains to datasets at\u00a0CSD that are associated with additional raw data by <strong>IsDerivedFrom<\/strong> or <strong>IsSourceOf<\/strong> relationships.<sup>\u2665<\/sup> CCDC tell me the true number is around 65 so the origins of\u00a0this mismatch need to be identified.<\/li>\n<\/ol>\n<p>So projects aiming to capture data from chemical instrumentation are just starting to reveal the potential of this modern system for storing data in two or more locations and reconciling various forms of this data, from raw form to derived or processed data. The interested user can then use whichever form is most relevant to their needs, and having found one form can then trace back to the other form(s). We might anticipate many developments in this area in the near future.\u00a0<\/p>\n<hr \/>\n<p><sup>\u2021<\/sup>One has to expand the archive to find out how many actual raw datasets are inside, rather than knowing beforehand how many datasets are contained there, or anything else about their properties.\u00a0<sup>\u2020<\/sup>The publication process is described here for one repository at DOI: <a href=\"https:\/\/doi.org\/10.14469\/hpc\/10178\" target=\"_blank\" rel=\"noopener\">10.14469\/hpc\/10178<\/a> <sup>\u2665<\/sup>From the DataCite schema; <small><code>&lt;relatedIdentifier  relationType=\"IsDerivedFrom\"&gt;... &lt;\/relatedIdentifier&gt;<\/code><\/small> <em><strong>IsDerivedFrom<\/strong> should be used for a resource that is a derivative of an original resource. In this example, the dataset is derived from a larger dataset and data values have been manipulated from their original state.<\/em> <small><code>&lt;relatedIdentifier  relationType=\"IsSourceOf\"&gt;... &lt;\/relatedIdentifier&gt;<\/code><\/small> <em><strong>IsSourceOf<\/strong> is the original resource from which a derivative resource was created.\u00a0In this example, this is the original dataset without value manipulation.<\/em><\/p>\n<hr \/>\n<p>This post has DOI: 10.14469\/hpc\/10177<\/p>\n<hr \/>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-24723-0\">A.M. Hunter, and A.B. Smith, \"Review of Supporting Information at &lt;i&gt;Organic Letters&lt;\/i&gt;\", <i>Organic Letters<\/i>, vol. 17, pp. 2867-2869, 2015. <a href=\"https:\/\/doi.org\/10.1021\/acs.orglett.5b01700\">https:\/\/doi.org\/10.1021\/acs.orglett.5b01700<\/a>\n\n<\/li>\n<li id=\"ITEM-24723-1\">J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, \"SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories\", <i>Journal of Chemical Information and Modeling<\/i>, vol. 48, pp. 1571-1581, 2008. <a href=\"https:\/\/doi.org\/10.1021\/ci7004737\">https:\/\/doi.org\/10.1021\/ci7004737<\/a>\n\n<\/li>\n<li id=\"ITEM-24723-2\">M.J. Harvey, A. McLean, and H.S. Rzepa, \"A metadata-driven approach to data repository design\", <i>Journal of Cheminformatics<\/i>, vol. 9, 2017. <a href=\"https:\/\/doi.org\/10.1186\/s13321-017-0190-6\">https:\/\/doi.org\/10.1186\/s13321-017-0190-6<\/a>\n\n<\/li>\n<li id=\"ITEM-24723-3\">A.L. Spek, \"Structure validation in chemical crystallography\", <i>Acta Crystallographica Section D Biological Crystallography<\/i>, vol. 65, pp. 148-155, 2009. <a href=\"https:\/\/doi.org\/10.1107\/s090744490804362x\">https:\/\/doi.org\/10.1107\/s090744490804362x<\/a>\n\n<\/li>\n<li id=\"ITEM-24723-4\">A.M. Hunter, E.M. Carreira, and S.J. Miller, \"Encouraging Submission of FAIR Data at &lt;i&gt;The Journal of Organic Chemistry&lt;\/i&gt; and &lt;i&gt;Organic Letters&lt;\/i&gt;\", <i>The Journal of Organic Chemistry<\/i>, vol. 85, pp. 1773-1774, 2020. <a href=\"https:\/\/doi.org\/10.1021\/acs.joc.0c00248\">https:\/\/doi.org\/10.1021\/acs.joc.0c00248<\/a>\n\n<\/li>\n<li id=\"ITEM-24723-5\">A.M. Hunter, E.M. Carreira, and S.J. Miller, \"Encouraging Submission of FAIR Data at &lt;i&gt;The Journal of Organic Chemistry&lt;\/i&gt; and &lt;i&gt;Organic Letters&lt;\/i&gt;\", <i>Organic Letters<\/i>, vol. 22, pp. 1231-1232, 2020. <a href=\"https:\/\/doi.org\/10.1021\/acs.orglett.0c00383\">https:\/\/doi.org\/10.1021\/acs.orglett.0c00383<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 24723 -->","protected":false},"excerpt":{"rendered":"<p>Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s. The next phase was the introduction of data repositories in the early [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2],"tags":[],"ppma_author":[2661],"class_list":["post-24723","post","type-post","status-publish","format-standard","hentry","category-chemical-it"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Raw data: the evolution of FAIR data and crystallography. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Raw data: the evolution of FAIR data and crystallography. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s. The next phase was the introduction of data repositories in the early [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2022-03-01T14:16:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-03-03T10:39:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Raw data: the evolution of FAIR data and crystallography. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723","og_locale":"en_GB","og_type":"article","og_title":"Raw data: the evolution of FAIR data and crystallography. - Henry Rzepa&#039;s Blog","og_description":"Scientific data in chemistry has come a long way in the last few decades. Originally entangled into scientific articles in the form of tables of numbers or diagrams, it was (partially) disentangled into supporting information when journals became electronic in the late 1990s. The next phase was the introduction of data repositories in the early [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2022-03-01T14:16:13+00:00","article_modified_time":"2022-03-03T10:39:05+00:00","og_image":[{"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg","type":"","width":"","height":""}],"author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"Raw data: the evolution of FAIR data and crystallography.","datePublished":"2022-03-01T14:16:13+00:00","dateModified":"2022-03-03T10:39:05+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723"},"wordCount":1389,"commentCount":1,"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#primaryimage"},"thumbnailUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg","articleSection":["Chemical IT"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723","name":"Raw data: the evolution of FAIR data and crystallography. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#primaryimage"},"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#primaryimage"},"thumbnailUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1-1024x811.jpg","datePublished":"2022-03-01T14:16:13+00:00","dateModified":"2022-03-03T10:39:05+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#primaryimage","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1.jpg","contentUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/03\/CCSD-enhance1.jpg","width":1729,"height":1369},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24723#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"Raw data: the evolution of FAIR data and crystallography."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-6qL","jetpack-related-posts":[{"id":16251,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=16251","url_meta":{"origin":24723,"position":0},"title":"Metametadata: data about data about (chemical) data.","author":"Henry Rzepa","date":"April 16, 2016","format":false,"excerpt":"Scientists are familiar with the term data, at least in a scientific or chemical context, but appreciating metadata (meaning \"after\", or \"beyond\") is slightly more subtle, in the sense of using it to mean data about data. The challenge lies in clarifying where the boundary between data and its metadata\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12932,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","url_meta":{"origin":24723,"position":1},"title":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.","author":"Henry Rzepa","date":"September 8, 2014","format":false,"excerpt":"In the beginning (taken here as\u00a0prior to ~1980) libraries held\u00a0five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":18344,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18344","url_meta":{"origin":24723,"position":2},"title":"How to search data repositories for FAIR chemical content and data: SubjectScheme","author":"Henry Rzepa","date":"June 8, 2017","format":false,"excerpt":"As data repositories start to flourish, it is reasonable to ask questions such as what sort of chemistry can be found there and how can I find it? Here I give an updated worked example of a digital repository search for chemical content and also pose an important issue for\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/06\/171-1024x196.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":24561,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24561","url_meta":{"origin":24723,"position":3},"title":"Data base or Data repository? &#8211; A brief and very selective history of data management in chemistry.","author":"Henry Rzepa","date":"January 26, 2022","format":false,"excerpt":"Way back in the late 1980s or so, research groups in chemistry started to replace the filing of their paper-based research data by storing it in an easily retrievable digital form. This required a computer database and initially these were accessible only on specific dedicated computers in the laboratory. These\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/01\/Screenshot-1015-1024x521.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":15313,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=15313","url_meta":{"origin":24723,"position":4},"title":"Some examples of open access publications citing managed research data (RDM).","author":"Henry Rzepa","date":"January 5, 2016","format":false,"excerpt":"In May 2015, the EPSRC funding council in the UK required researchers to publish the outcomes of the funded work to include an OA (open access) version of the narrative and to cite the managed research data used to support the research with\u00a0a DOI (digital object identifier). I was discussing\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14454,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","url_meta":{"origin":24723,"position":5},"title":"A (light) introductory tutorial on Research Data Management (in chemistry).","author":"Henry Rzepa","date":"August 20, 2015","format":false,"excerpt":"Management of research (data) outputs is a hot topic in the UK at the moment, although the topic has been rumbling for five years or more. Most research-active higher educational establishments have or are about to publish general guidelines, which predominantly take the form of aspirational targets rather than actionable\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","author_category":"1","first_name":"Henry","last_name":"Rzepa","user_url":"https:\/\/orcid.org\/0000-0002-8635-8390","job_title":"","description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London."}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24723","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24723"}],"version-history":[{"count":30,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24723\/revisions"}],"predecessor-version":[{"id":24754,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24723\/revisions\/24754"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24723"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=24723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}