{"id":12526,"date":"2014-05-17T16:06:36","date_gmt":"2014-05-17T15:06:36","guid":{"rendered":"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=12526"},"modified":"2014-05-17T19:26:02","modified_gmt":"2014-05-17T18:26:02","slug":"a-newcomer-in-the-game-of-how-we-find-and-use-data","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","title":{"rendered":"A newcomer in the game of how we find and use data."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"12526\">\n<p>I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut to the chase: consider this URL:\u00a0<a href=\"http:\/\/search.datacite.org\/ui?q=InChIKey%3DLQPOSWKBQVCBKS-PGMHMLKASA-N\" target=\"_blank\">http:\/\/search.datacite.org\/ui?q=InChIKey%3DLQPOSWKBQVCBKS-PGMHMLKASA-N<\/a> The site is datacite, which collects metadata about cited data! Most of that data is open in the sense that it can be retrieved without a subscription (but see here that it is not always made <a title=\"The Amsterdam Manifesto and crystal structures.\" href=\"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=12182\" target=\"_blank\">easy to do so<\/a>). So, the above is a search for cited data which contains the InChIkey\u00a0<strong>LQPOSWKBQVCBKS-PGMHMLKASA-N<\/strong>. This produces the result:<br \/>\n<a href=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-12530\" src=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg\" alt=\"datacite1\" width=\"440\" srcset=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg 778w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1-300x264.jpg 300w\" sizes=\"(max-width: 778px) 100vw, 778px\" \/><\/a><br \/>\nThis tells you who published the data (but oddly, its date is merely to the nearest year? It is beta software after all). The advanced equivalent of this search looks like this:<\/p>\n<p><a href=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite2.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-12529\" src=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite2.jpg\" alt=\"datacite2\" width=\"440\" srcset=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite2.jpg 546w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite2-300x296.jpg 300w\" sizes=\"(max-width: 546px) 100vw, 546px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>where the subject of the search is now the InChIkey. If you are familiar with the various molecular search engines, you will appreciate that this generic data search is still fairly primitive. But SEO (search engine optimisation) achieved by improving the quality of the metadata would help improve that experience.<\/p>\n<p>The important thing about DataCite is that it only searches the metacontent of digital repositories, wherein one may expect to find properly curated data, and in particular the possibility of not merely finding highly processed data, but also of the original (instrumental or computational) datafile from which the metadata was abstracted. Rather than a visual graph, one might expect to also find the original data (to however many decimal points). Rather than just molecular coordinates, one might also find a full wavefunction describing the electron density distribution, or a full spectral\u00a0analysis. In the original form as deposited by researchers, and not in a processed form as supplied by an &#8220;added value&#8221;\u00a0resource. Don&#8217;t get me wrong; validated data is wonderful, but validation has to be done according to a schema, and such schemas change, improve, evolve over time.<\/p>\n<p>The other important point I think which the above introduces is the concept that DataCite (and similar organisations) might act as a portal, through which software agents might act to validate\/aggregate data. The utopian world would be that\u00a0every organisation that produces data captures it in a form that DataCite and others can find. Unless of course the data is in itself also their business model, and they wish to exert a monopoly over it. One might appreciate monopolies if the alternative is not having access to the data at all, but\u00a0perhaps at the expense of innovation? I cannot help but feel that once data citation as shown above becomes a generally accepted best practice amongst scientists, then entirely new ways of adding value to it will emerge in abundance. It would be interesting to see whether the current more monopolistic models survive this transition by upping their own game.<\/p>\n<!-- kcite active, but no citations found -->\n<\/div> <!-- kcite-section 12526 -->","protected":false},"excerpt":{"rendered":"<p>I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut to the chase: consider this [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2],"tags":[1219,1221,1218,1222,1217,1220,1216],"ppma_author":[2661],"class_list":["post-12526","post","type-post","status-publish","format-standard","hentry","category-chemical-it","tag-beta-software","tag-generic-data-search","tag-molecular-search-engines","tag-search-engine","tag-search-engine-optimisation","tag-search-looks","tag-software-agents"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A newcomer in the game of how we find and use data. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A newcomer in the game of how we find and use data. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut to the chase: consider this [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2014-05-17T15:06:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2014-05-17T18:26:02+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A newcomer in the game of how we find and use data. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","og_locale":"en_GB","og_type":"article","og_title":"A newcomer in the game of how we find and use data. - Henry Rzepa&#039;s Blog","og_description":"I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut to the chase: consider this [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2014-05-17T15:06:36+00:00","article_modified_time":"2014-05-17T18:26:02+00:00","og_image":[{"url":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg","type":"","width":"","height":""}],"author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"A newcomer in the game of how we find and use data.","datePublished":"2014-05-17T15:06:36+00:00","dateModified":"2014-05-17T18:26:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526"},"wordCount":508,"commentCount":0,"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#primaryimage"},"thumbnailUrl":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg","keywords":["beta software","generic data search","molecular search engines","search engine","search engine optimisation","search looks","software agents"],"articleSection":["Chemical IT"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","name":"A newcomer in the game of how we find and use data. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#primaryimage"},"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#primaryimage"},"thumbnailUrl":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg","datePublished":"2014-05-17T15:06:36+00:00","dateModified":"2014-05-17T18:26:02+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#primaryimage","url":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg","contentUrl":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2014\/05\/datacite1.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"A newcomer in the game of how we find and use data."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-3g2","jetpack-related-posts":[{"id":22059,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=22059","url_meta":{"origin":12526,"position":0},"title":"A cascading tutorial in finding rich NMR data using the Datacite datasearch engine.","author":"Henry Rzepa","date":"April 11, 2020","format":false,"excerpt":"In the previous post, I introduced three of a new generation of search engines specialising in the discovery of data. Data has some special features which make its properties slightly different from the conceptual (or natural language) searches we are used to performing for general information and so a search\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24314,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","url_meta":{"origin":12526,"position":1},"title":"A comparison of searches based on metadata records from three (update: five) research repositories.","author":"Henry Rzepa","date":"September 28, 2021","format":false,"excerpt":"In the previous blog post, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":19892,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=19892","url_meta":{"origin":12526,"position":2},"title":"Harnessing FAIR data:  A suggested useful persistent identifier  (PID) for quantum chemical calculations.","author":"Henry Rzepa","date":"August 7, 2018","format":false,"excerpt":"Harnessing FAIR data is an event being held in London on September 3rd; no doubt all the speakers will espouse its virtues and speculate about how to realize its potential.\u2665 Admirable aspirations indeed. Capturing hearts and minds also needs lots of real life applications! Whilst assembling a forthcoming post on\u2026","rel":"","context":"In &quot;Interesting chemistry&quot;","block_context":{"text":"Interesting chemistry","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=4"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":18344,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18344","url_meta":{"origin":12526,"position":3},"title":"How to search data repositories for FAIR chemical content and data: SubjectScheme","author":"Henry Rzepa","date":"June 8, 2017","format":false,"excerpt":"As data repositories start to flourish, it is reasonable to ask questions such as what sort of chemistry can be found there and how can I find it? Here I give an updated worked example of a digital repository search for chemical content and also pose an important issue for\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/06\/171-1024x196.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":10679,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=10679","url_meta":{"origin":12526,"position":4},"title":"What can chemistry learn from photos?","author":"Henry Rzepa","date":"June 2, 2013","format":false,"excerpt":"A few years ago, we published an article which drew a formal analogy between chemistry and iTunes\u00a0(sic). iTunes was the first really large commercial digital music library, and a feature under-the-skin was the use of meta-data to aid discoverability of any of the 10 million (26M in 2013) or so\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/downlode.org\/Etext\/MCF\/hotsauce.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":16251,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=16251","url_meta":{"origin":12526,"position":5},"title":"Metametadata: data about data about (chemical) data.","author":"Henry Rzepa","date":"April 16, 2016","format":false,"excerpt":"Scientists are familiar with the term data, at least in a scientific or chemical context, but appreciating metadata (meaning \"after\", or \"beyond\") is slightly more subtle, in the sense of using it to mean data about data. The challenge lies in clarifying where the boundary between data and its metadata\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12526"}],"version-history":[{"count":5,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12526\/revisions"}],"predecessor-version":[{"id":12534,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/12526\/revisions\/12534"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12526"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12526"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=12526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}