{"id":24314,"date":"2021-09-28T17:34:47","date_gmt":"2021-09-28T16:34:47","guid":{"rendered":"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24314"},"modified":"2021-10-05T08:34:04","modified_gmt":"2021-10-05T07:34:04","slug":"a-comparison-of-searches-based-on-metadata-records-from-three-research-repositories","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","title":{"rendered":"A comparison of searches based on metadata records from three (update: five) research repositories."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"24314\">\n<p>In <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24286\" target=\"_blank\" rel=\"noopener\">the previous blog post<\/a>, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata associated with this dataset compare.<\/p>\n<p><strong>Search 1:<\/strong> The metadata value of <strong>-1705.490787<\/strong> is actually the\u00a0Gibbs Free energy computed for the molecule associated with the data set, a molecule which featured in this <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24019\">blog post <\/a>.\u00a0<a href=\"https:\/\/commons.datacite.org\/?query=*\\-170*\">https:\/\/commons.datacite.org\/?query=*\\-170*<\/a> is an un-fielded search for the truncated string <strong>-170*<\/strong> (where <strong>*<\/strong> is a wild card character and <strong>\\<\/strong> is said to &#8220;escape&#8221; the minus sign, since on its own a minus can also indicate a Boolean NOT operator), resulting in\u00a0<strong>70,918<\/strong> works matching the query. From what we know about the dataset in question, this is a vast number of false positives. How can we reduce them?<\/p>\n<p><strong>Search 1a:<\/strong> <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:\\-170*\">https:\/\/commons.datacite.org\/?query=subjects.subject:\\-170*<\/a>\u00a0is a fielded search, specifying that the string <strong>must<\/strong> occur in the subject field (62 works) but this still has 57 false positives.<\/p>\n<p><strong>Search 1b<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:\\-1705.490787*\">https:\/\/commons.datacite.org\/?query=subjects.subject:\\-1705.490787*<\/a> (in fact precision of <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:\\-1705.4*\">-1705.4*<\/a>\u00a0is also sufficient) removes all the false positives (5 works). But are there any false negatives? In fact, for other reasons, we know that there are two works in the Figshare repository where the value of of <strong>-1705.490787\u00a0<\/strong>appears in the keyword items on the landing page of e.g. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.16685497\">10.6084\/m9.figshare.16685497<\/a> and is indexed and searchable locally, but does not appear in the registered metadata and hence is not included in the results of the above searches.<\/p>\n<p><a href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg\"><img decoding=\"async\" class=\"aligncenter size-medium wp-image-24329\" src=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg\" alt=\"\" width=\"540\" srcset=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg 2200w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913-300x238.jpg 300w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913-1024x814.jpg 1024w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913-768x610.jpg 768w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913-1536x1220.jpg 1536w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913-2048x1627.jpg 2048w\" sizes=\"(max-width: 2200px) 100vw, 2200px\" \/><\/a><\/p>\n<p><strong>Search 2<\/strong>: A further, formally much stronger constraint on the search is\u00a0<a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\\-1705.490787*\">https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\\-1705.490787*<\/a> whereby a <em>subjectScheme<\/em> is added to search <strong>1b<\/strong>, constrained to the value <strong>Gibbs_Energy<\/strong>. This now returns\u00a03 works, two less than search <strong>1b<\/strong>. There are two further false negatives because, as <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24286\">noted previously<\/a>, the <em>subjectScheme<\/em> term is not defined in the Zenodo repository metadata record, where the missing two items are located.\u00a0<\/p>\n<p><strong>Search 2a<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:*1705.490787*+AND+subjects.schemeUri:*goldbook*\">https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:*1705.490787*+AND+subjects.schemeUri:*goldbook*<\/a> is even further constrained to specify a \u00a0<strong>Gibbs _Energy<\/strong> according to the \u00a0IUPAC Gold book definition.<\/p>\n<p><strong>Search 2b<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:*1705.490787*+AND+subjects.schemeUri:*goldbook*+AND+subjects.valueUri:*gaussian*\">https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:*1705.490787*+AND+subjects.schemeUri:*goldbook*+AND+subjects.valueUri:*gaussian*<\/a> is the highest level of constraint, implying not only that the term \u00a0<strong>Gibbs_Energy<\/strong> is specified by the IUPAC Gold book definition, but that its value is that determined by (in this example) the Gaussian (implementation).\u00a0<\/p>\n<p>So to summarise what we have thus far established, we can successfully eliminate false positives by specifying a fielded search with a requirement that the field specifically relates to <strong>Gibbs_Energy<\/strong>. But because of omissions in the metadata records, we also have four false negatives resulting from doing this.<\/p>\n<p><strong>Search 3<\/strong>:\u00a0<a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N\">https:\/\/commons.datacite.org\/?query=subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N<\/a> searches for another subject term, the<strong> InChI key<\/strong> for the molecule relating to the data (5 works). Here again however context for the string <strong>VELNVPXNOKVVTC-VJKZSTDTSA-N<\/strong> is missing, although again the string is long enough to ensure it is unique. But we could go one step further.<\/p>\n<p><strong>Search 4<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchikey+AND+subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N\">https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchikey+AND+subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N<\/a> constrains the subject term to only those strings describing an InChIkey (3 works). This again is due to Zenodo not specifying the subjectScheme and Figshare not even containing the InChIkey in its metadata record.<\/p>\n<p><strong>Search 4a<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchikey+AND+subjects.schemeUri:*inchi-trust*+AND+subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N\">https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchikey+AND+subjects.schemeUri:*inchi-trust*+AND+subjects.subject:VELNVPXNOKVVTC-VJKZSTDTSA-N<\/a> constrains the inchikey further by specifying the authority for the scheme definition as the InChI Trust.<sup>\u2021<\/sup>\u00a0<\/p>\n<p><strong>Search 5<\/strong>:\u00a0<a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=*1S\/C25H39NO9*\" target=\"_blank\" rel=\"noopener\">https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=1S\/C25H39NO9*<\/a> is query 1, but on the InChI string rather than the InChI key, and with the same results as before (5 works). Here, the string is deliberately truncated to return only the molecular formula of the molecule.<\/p>\n<p><strong>Search 5a<\/strong>:<a href=\"https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchi+AND+subjects.subject:InChI=*1S\/C25H39NO9*\"> https:\/\/commons.datacite.org\/?query=subjects.subjectScheme:inchi+AND+subjects.subject:InChI=1S\/C25H39NO9*<\/a> is query 4, with the subjectScheme changed to only the molecular formula component of an InChI (3 works).\u00a0<\/p>\n<p><strong>Search 5b<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=*1S\/C25H39NO9\/c1-6-26-20-24-13-9-12-14\\(31-2\\)10-23\\(29,16\\(13\\)17\\(12\\)33-4\\)25\\(26,30*\">https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=1S\/C25H39NO9\/c1-6-26-20-24-13-9-12-14\\(31-2\\)10-23\\(29,16\\(13\\)17\\(12\\)33-4\\)25\\(26,30*<\/a> truncates much less of the InChI string, extending it to the molecular connection table. Notice how characters such as <strong>(<\/strong> or <strong>)<\/strong> have been escaped with a <strong>\\<\/strong> prefix. Such characters are used for grouping in the search query and so must be escaped to be included in the query.<\/p>\n<p><strong>Search 5c<\/strong>: <a href=\"https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=*1S\/C25H39NO9\/c1-6-26-20-24-13-9-12-14\\(31-2\\)10-23\\(29,16\\(13\\)17\\(12\\)33-4\\)25\\(26,30\\)19\\(34-5\\)18\\(24\\)22\\(11-27,21\\(28\\)35-20\\)8-7-15\\(24\\)32-3\\\/h12-20,27,29-30H,6-11H2,1-5H3*\">https:\/\/commons.datacite.org\/?query=subjects.subject:InChI=1S\/C25H39NO9\/c1-6-26-20-24-13-9-12-14\\(31-2\\)10-23\\(29,16\\(13\\)17\\(12\\)33-4\\)25\\(26,30\\)19\\(34-5\\)18\\(24\\)22\\(11-27,21\\(28\\)35-20\\)8-7-15\\(24\\)32-3\\\/h12-20,27,29-30H,6-11H2,1-5H3*<\/a> For this length string (and InChI strings can get very long!) an unidentified error can occur, suggesting that the full InChI string is best not used for such searches.<\/p>\n<p>Search 6:\u00a0<\/p>\n<p>From these experiments, we learn that the quality and completeness\/richness of the metadata record is vital to ensure no false positives or negatives are returned by the search. Ensuring such metadata richness is something that a repository should do, and it is interesting that two of the best known repositories both currently have failings in this regard. I might try one or two other popular repositories to see how they behave and will report back if I find anything interesting.<sup>\u2020<\/sup><\/p>\n<hr \/>\n<p><sup>\u2021<\/sup>Thus <a href=\"https:\/\/commons.datacite.org\/doi.org?query=subjects.subjectScheme:*inchikey*\" target=\"_blank\" rel=\"noopener\">https:\/\/commons.datacite.org\/doi.org?query=subjects.subjectScheme:*inchikey*<\/a> reveals all entries that specify an InChIkey in the subject metadata (185,414 works) but <a href=\"https:\/\/commons.datacite.org\/doi.org?query=subjects.subjectScheme:*inchikey*+AND+subjects.schemeUri:*inchi-trust*\" target=\"_blank\" rel=\"noopener\">https:\/\/commons.datacite.org\/doi.org?query=subjects.subjectScheme:*inchikey*+AND+subjects.schemeUri:*inchi-trust*<\/a> reveals only 1748 of these further specify the InChI trust as the authority. <sup>\u2020<\/sup>Two more depositories, <b>Mendeley Data<\/b> and <b>Harvard Dataverse<\/b> have been populated with the same data. See <a href=\"https:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=24286\">here<\/a>.<\/p>\n<hr \/>\n<p>This post has DOI: <a href=\"https:\/\/doi.org\/10.14469\/hpc\/9162\">10.14469\/hpc\/9162<\/a><\/p>\n<!-- kcite active, but no citations found -->\n<\/div> <!-- kcite-section 24314 -->","protected":false},"excerpt":{"rendered":"<p>In the previous blog post, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata associated with this dataset compare. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2],"tags":[],"ppma_author":[2661],"class_list":["post-24314","post","type-post","status-publish","format-standard","hentry","category-chemical-it"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A comparison of searches based on metadata records from three (update: five) research repositories. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A comparison of searches based on metadata records from three (update: five) research repositories. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"In the previous blog post, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata associated with this dataset compare. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-28T16:34:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-10-05T07:34:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A comparison of searches based on metadata records from three (update: five) research repositories. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","og_locale":"en_GB","og_type":"article","og_title":"A comparison of searches based on metadata records from three (update: five) research repositories. - Henry Rzepa&#039;s Blog","og_description":"In the previous blog post, I looked at the metadata records registered with DataCite for some chemical computational modelling files as published in three different repositories. Here I take it one stage further, by looking at how searches of the DataCite metadata store for three particular values of the metadata associated with this dataset compare. [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2021-09-28T16:34:47+00:00","article_modified_time":"2021-10-05T07:34:04+00:00","og_image":[{"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg","type":"","width":"","height":""}],"author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"A comparison of searches based on metadata records from three (update: five) research repositories.","datePublished":"2021-09-28T16:34:47+00:00","dateModified":"2021-10-05T07:34:04+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314"},"wordCount":1020,"commentCount":0,"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#primaryimage"},"thumbnailUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg","articleSection":["Chemical IT"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314","name":"A comparison of searches based on metadata records from three (update: five) research repositories. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#primaryimage"},"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#primaryimage"},"thumbnailUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg","datePublished":"2021-09-28T16:34:47+00:00","dateModified":"2021-10-05T07:34:04+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#primaryimage","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg","contentUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-913.jpg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24314#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"A comparison of searches based on metadata records from three (update: five) research repositories."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-6ka","jetpack-related-posts":[{"id":18344,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18344","url_meta":{"origin":24314,"position":0},"title":"How to search data repositories for FAIR chemical content and data: SubjectScheme","author":"Henry Rzepa","date":"June 8, 2017","format":false,"excerpt":"As data repositories start to flourish, it is reasonable to ask questions such as what sort of chemistry can be found there and how can I find it? Here I give an updated worked example of a digital repository search for chemical content and also pose an important issue for\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/06\/171-1024x196.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":24286,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24286","url_meta":{"origin":24314,"position":1},"title":"A comparison of descriptive metadata across different data repositories.","author":"Henry Rzepa","date":"September 28, 2021","format":false,"excerpt":"The number of repositories which accept research data across a wide spectrum of disciplines is on the up. Here I report the results of conducting an experiment in which chemical modelling data was deposited in six such repositories and comparing the richness of the metadata describing the essential properties of\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2021\/09\/Screenshot-909-300x243.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":12526,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=12526","url_meta":{"origin":24314,"position":2},"title":"A newcomer in the game of how we find and use data.","author":"Henry Rzepa","date":"May 17, 2014","format":false,"excerpt":"I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":21080,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=21080","url_meta":{"origin":24314,"position":3},"title":"Metadata. Why?","author":"Henry Rzepa","date":"July 2, 2019","format":false,"excerpt":"I have had some interesting discussions recently regarding metadata. What emerges is that it can be quite a broadly defined concept and it is clear that a variety of answers might be obtained when asking the simple question \"what is it useful for?\" Here I set out some of my\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":20669,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=20669","url_meta":{"origin":24314,"position":4},"title":"A search of some major chemistry publishers for FAIR data records.","author":"Henry Rzepa","date":"April 12, 2019","format":false,"excerpt":"In recent years, findable data has become ever more important (the F in FAIR). Here I test that F using the DataCite search service. Firstly an introduction to this service. This is a metadata database about datasets and other research objects. One of the properties is\u00a0relatedIdentifier which records other identifiers\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":19892,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=19892","url_meta":{"origin":24314,"position":5},"title":"Harnessing FAIR data:  A suggested useful persistent identifier  (PID) for quantum chemical calculations.","author":"Henry Rzepa","date":"August 7, 2018","format":false,"excerpt":"Harnessing FAIR data is an event being held in London on September 3rd; no doubt all the speakers will espouse its virtues and speculate about how to realize its potential.\u2665 Admirable aspirations indeed. Capturing hearts and minds also needs lots of real life applications! Whilst assembling a forthcoming post on\u2026","rel":"","context":"In &quot;Interesting chemistry&quot;","block_context":{"text":"Interesting chemistry","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=4"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24314","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24314"}],"version-history":[{"count":24,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24314\/revisions"}],"predecessor-version":[{"id":24346,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/24314\/revisions\/24346"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24314"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24314"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24314"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=24314"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}