{"id":12932,"date":"2014-09-08T16:26:10","date_gmt":"2014-09-08T15:26:10","guid":{"rendered":"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=12932"},"modified":"2014-09-17T06:51:45","modified_gmt":"2014-09-17T05:51:45","slug":"one-molecule-one-identifier-viewing-molecular-files-from-a-digital-repository-using-metadata-standards","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","title":{"rendered":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"12932\">\n<p>In the beginning (taken here as\u00a0prior to ~1980) libraries held\u00a0five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about molecules prior to ~1980 spent many an afternoon (or indeed a whole day) in the libraries\u00a0thumbing through these weighty volumes. Fast forward to the present, when (closed) commercial databases such as SciFinder, Reaxys and CCDC offer information online for around 100 million molecules (CAS indicates it has 89,506,154 today for example). These have been\u00a0joined by many open databases (<em>e.g.<\/em>\u00a0<a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pccompound\/advanced\" target=\"_blank\">PubChem<\/a>).\u00a0All these\u00a0sources of molecular information have their own way of accessing individual entries, and the wonderful program Jmol (nowadays <a href=\"http:\/\/wiki.jmol.org\/index.php\/Main_Page\" target=\"_blank\">JSmol<\/a>) has several of these custom interfaces programmed in. Here I describe some work we have recently done<span id=\"cite_ITEM-12932-0\" name=\"citation\"><a href=\"#ITEM-12932-0\">[1]<\/a><\/span> on how one might generalise access to an individual molecule held in what is now called a <strong><em>digital data repository<\/em><\/strong>.<\/p>\n<p>Such repositories are gradually becoming more common. Unlike most (all?) of the bespoke molecular repositories noted above, metadata (<a href=\"http:\/\/www.xml-sitemaps.com\" target=\"_blank\">XML<\/a>) resourcemap standards have been developed<span id=\"cite_ITEM-12932-1\" name=\"citation\"><a href=\"#ITEM-12932-1\">[2]<\/a><\/span> for data repositories to enable<a href=\"http:\/\/search.datacite.org\/ui\" target=\"_blank\">\u00a0rich and open searches<\/a>\u00a0and to help in the discoverability of individual entries (<i>e.g.\u00a0<\/i><a href=\"http:\/\/www.openarchives.org\/ore\/1.0\/datamodel\" target=\"_blank\">OAI-ORE<\/a>). Each dataset\u00a0is characterised by\u00a0a DOI (digital object identifier), just like individual\u00a0articles found in a conventional\u00a0journal. However, there is an issue in\u00a0quoting just a conventional DOI to describe a dataset. The DOI points to what is called the article<em>\u00a0landing page<\/em>\u00a0in the journal. A landing page which\u00a0by and large is meant to be navigated by a human. To get a flavour for how this works (or more accurately does not work) for data, visit this DOI<span id=\"cite_ITEM-12932-2\" name=\"citation\"><a href=\"#ITEM-12932-2\">[3]<\/a><\/span> for an entry in the CCDC crystal database noted above (and about which I have<a title=\"The Amsterdam Manifesto and crystal structures.\" href=\"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=12182\" target=\"_blank\"> previously blogged<\/a>). In essence, a human is needed to complete the requested information in order to proceed to retrieving the data. Data, I contend here,\u00a0should not\u00a0need a landing page. It can benefit from being passed straight on to <em>e.g.<\/em> a visualising program such as JSmol. So a mechanism is needed to encapsulate any bespoke (and potentially changeable) access path to the data by expressing it instead in standard metadata form.<\/p>\n<p>In our first solution to this issue, and the one illustrated here, we used a standard known as 10320\/loc<span id=\"cite_ITEM-12932-1\" name=\"citation\"><a href=\"#ITEM-12932-1\">[2]<\/a><\/span>. A datafile need only be specified by its DOI (or more generically, its handle) to be recovered\u00a0from the data repository; no landing page need be involved (and no human need ponder what next to do with the data).<\/p>\n<ol>\n<li>First, let me reference a molecule (as it happens the one described in the<a title=\"Computationally directed synthesis:  2,3-dimethyl-2-butene + NO(+).\" href=\"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=12895\" target=\"_blank\"> preceding post<\/a>), using the normal invocation<span id=\"cite_ITEM-12932-3\" name=\"citation\"><a href=\"#ITEM-12932-3\">[4]<\/a><\/span>. This will take you to a conventional landing page.<\/li>\n<li>The next example is the same dataset, but this time with the landing page replaced by a Javascript\/JSmol wrapping. This is achieved using a utility which is\u00a0itself packaged up and placed on a repository (shortdoi: <a href=\"http:\/\/doi.org\/vjj\" target=\"_blank\">vjj<\/a>)<span id=\"cite_ITEM-12932-4\" name=\"citation\"><a href=\"#ITEM-12932-4\">[5]<\/a><\/span>, and which is embedded here for you to try out. If you want the technical detail, read about it here.<span id=\"cite_ITEM-12932-0\" name=\"citation\"><a href=\"#ITEM-12932-0\">[1]<\/a><\/span><\/li>\n<\/ol>\n<p><iframe loading=\"lazy\" src=\"http:\/\/wl.figshare.com\/articles\/1164282\/embed?show_title=0\" width=\"470\" height=\"770\" frameborder=\"0\"><\/iframe><\/p>\n<p>There is more to come. But you will have to wait for part 2!<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-12932-0\">M.J. Harvey, N.J. Mason, and H.S. Rzepa, \"Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks\", <i>Journal of Chemical Information and Modeling<\/i>, vol. 54, pp. 2627-2635, 2014. <a href=\"https:\/\/doi.org\/10.1021\/ci500302p\">https:\/\/doi.org\/10.1021\/ci500302p<\/a>\n\n<\/li>\n<li id=\"ITEM-12932-2\">Jana, Anukul., Omlor, Isabell., Huch, Volker., Rzepa, Henry S.., and Scheschkewitz, David., \"CCDC 967887: Experimental Crystal Structure Determination\", 2014. <a href=\"https:\/\/doi.org\/10.5517\/cc11h55w\">https:\/\/doi.org\/10.5517\/cc11h55w<\/a>\n\n<\/li>\n<li id=\"ITEM-12932-4\">H.S. Rzepa, N. Mason, and M J Harvey., \"Retrieval and display of Gaussian log files from a digital repository\", 2014. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.1164282\">https:\/\/doi.org\/10.6084\/m9.figshare.1164282<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 12932 -->","protected":false},"excerpt":{"rendered":"<p>In the beginning (taken here as\u00a0prior to ~1980) libraries held\u00a0five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2],"tags":[806,124],"ppma_author":[2661],"class_list":["post-12932","post","type-post","status-publish","format-standard","hentry","category-chemical-it","tag-digital-object-identifier","tag-xml"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"In the beginning (taken here as\u00a0prior to ~1980) libraries held\u00a0five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2014-09-08T15:26:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2014-09-17T05:51:45+00:00\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","og_locale":"en_GB","og_type":"article","og_title":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards. - Henry Rzepa&#039;s Blog","og_description":"In the beginning (taken here as\u00a0prior to ~1980) libraries held\u00a0five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2014-09-08T15:26:10+00:00","article_modified_time":"2014-09-17T05:51:45+00:00","author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.","datePublished":"2014-09-08T15:26:10+00:00","dateModified":"2014-09-17T05:51:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932"},"wordCount":592,"commentCount":0,"keywords":["Digital Object Identifier","XML"],"articleSection":["Chemical IT"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932","name":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"datePublished":"2014-09-08T15:26:10+00:00","dateModified":"2014-09-17T05:51:45+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12932#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-3mA","jetpack-related-posts":[],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12932"}],"version-history":[{"count":25,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12932\/revisions"}],"predecessor-version":[{"id":12966,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12932\/revisions\/12966"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12932"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=12932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}