{"id":28045,"date":"2024-11-25T09:45:43","date_gmt":"2024-11-25T09:45:43","guid":{"rendered":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045"},"modified":"2024-12-05T12:08:45","modified_gmt":"2024-12-05T12:08:45","slug":"data-discovery-a-pick-n-mix-library-of-useful-fair-data-searches-and-a-call-for-new-search-suggestions","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","title":{"rendered":"Data Discovery: A pick-n-mix library of useful FAIR Data searches &#8211; and a call for new search suggestions."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"28045\">\n<p>With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an aggregated and federated metadata store, such as that curated by DataCite. How to construct a suitable search is however still not entirely human-friendly. The start point for understanding how to search\u00a0is this resource: <a href=\"https:\/\/support.datacite.org\/docs\/datacite-xml-to-json-mapping\" target=\"_blank\" rel=\"noopener\">XML to JSON<\/a> mappings and the XML referred to can be found here. <span id=\"cite_ITEM-28045-0\" name=\"citation\"><a href=\"#ITEM-28045-0\">[1]<\/a><\/span> Since the learning curve to construct such data searches can be quite steep, I thought I would share as a library some recent searches\u00a0I constructed for a talk I am giving. This post is essentially an extension and update of an earlier challenge I was set along these lines and which appeared here.<span id=\"cite_ITEM-28045-1\" name=\"citation\"><a href=\"#ITEM-28045-1\">[2]<\/a><\/span><\/p>\n<p>You can see that the searches come as components linked by Boolean operators, separated by strings such as +AND+,\u00a0+OR+ or +NOT+. Essentially like a Lego constructor set, you can create your own searches by combining these components to suit your own needs.\u00a0No doubt some AI-based procedure will come along that will convert natural language expressions of the intended search into the JSON-friendly strings you see below &#8211;\u00a0at least that is the hope.<\/p>\n<h2>Part 1: Data discovery based on general properties such as the reporting Institution, the publisher or the Researcher<\/h2>\n<ol>\n<li>Find all Data-related Works associated with <b>Cambridge University<\/b> and the <b>American Chemical Society<\/b> Publisher\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*013meh722)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*013meh722)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*\" target=\"DOIs\" rel=\"noopener\">https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*013meh722)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*013meh722)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">232 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Data-related Works associated with <b>Imperial College<\/b> and the <b>American Chemical Society<\/b> Publisher\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*\" target=\"DOIs\" rel=\"noopener\">?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">304 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets OR Collections associated with <b>Imperial College<\/b> and the <b>American Chemical Society<\/b> Publisher and the term<br \/>\n<b>Pyrazol<\/b> in the Title or Description<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+((Types.resourceTypeGeneral:dataset)+OR+(types.resourceTypeGeneral:Collection))\" target=\"DOIs\" rel=\"noopener\">?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+((types.resourceTypeGeneral:Dataset)+OR+(types.resourceTypeGeneral:Collection))<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">3 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets OR Collections associated with <b>Imperial College<\/b> and the <b>American Chemical Society<\/b> Publisher and the term<br \/>\n<b>Pyrazol<\/b> in the Title or Description and a specified <b>Researcher<\/b><\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+((Types.resourceTypeGeneral:dataset)+OR+(types.resourceTypeGeneral:Collection))+AND+((contributors.nameIdentifiers.nameIdentifier:*000-0002-3296-6817)+OR+(creators.nameIdentifiers.nameIdentifier:*000-0002-3296-6817))\" target=\"DOIs\" rel=\"noopener\">?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+relatedIdentifiers.relatedIdentifier:10.1021*+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+((types.resourceTypeGeneral:Dataset)+OR+(types.resourceTypeGeneral:Collection))+AND+((contributors.nameIdentifiers.nameIdentifier:*000-0002-3296-6817)+OR+(creators.nameIdentifiers.nameIdentifier:*000-0002-3296-6817))<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">1 Work<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find Datasets only associated with <b>Imperial College<\/b> and the term <b>Pyrazol<\/b> in the Title or Description\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+(types.resourceTypeGeneral:Dataset)\" target=\"DOIs\" rel=\"noopener\">?query=((contributors.affiliation.affiliationIdentifier:*041kmwe10)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*041kmwe10)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)+AND+types.resourceTypeGeneral:Dataset<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">270 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find just Datasets associated with a <b>specific researcher<\/b>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=types.resourceTypeGeneral:Dataset+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-7816-0042+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-7816-0042)\" target=\"DOIs\" rel=\"noopener\">?query=types.resourceTypeGeneral:Dataset+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-7816-0042+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-7816-0042)<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">8 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find Data-related \u00a0Works associated with <b>Cambridge University<\/b>, the <b>SubjectScheme<\/b> FOS (Field of Science) and the Subject term <b>*Chemical*<\/b>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=(subjects.subjectScheme:*FOS*)+AND+(subjects.subject:*Chemical*)+AND+((creators.affiliation.affiliationIdentifier:*013meh722)+OR+(contributors.affiliation.affiliationIdentifier:*013meh722))\" target=\"DOIs\" rel=\"noopener\">?query=(subjects.subjectScheme:*FOS*)+AND+(subjects.subject:*Chemical*)+AND+((creators.affiliation.affiliationIdentifier:*013meh722)+OR+(contributors.affiliation.affiliationIdentifier:*013meh722))<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">440 Works<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Establish if a <b>specified publication<\/b> with a <b>specified author<\/b> has an associated FAIR <b>Dataset<\/b> or FAIR <b>Collection<\/b>:\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=(types.resourceTypeGeneral:Dataset+OR+types.resourceTypeGeneral:Collection)+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(relatedIdentifiers.relatedIdentifierType:DOI+AND+relatedIdentifiers.resourceTypeGeneral:JournalArticle+AND+relatedIdentifiers.relatedIdentifier:10.1021\/acs.inorgchem.3c01506) \" target=\"DOIs\" rel=\"noopener\">?query=(types.resourceTypeGeneral:Dataset+OR+types.resourceTypeGeneral:Collection)+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(relatedIdentifiers.relatedIdentifierType:DOI+AND+relatedIdentifiers.resourceTypeGeneral:JournalArticle+AND+relatedIdentifiers.relatedIdentifier:10.1021\/acs.inorgchem.3c01506)<br \/>\n<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">1 Work<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Establish how many <b>journal publications<\/b> by a <b>specified author<\/b> have an associated FAIR <b>Dataset<\/b> or FAIR <b>Collection<\/b>:\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/doi.org?query=(types.resourceTypeGeneral:Dataset+OR+types.resourceTypeGeneral:Collection)+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(relatedIdentifiers.relatedIdentifierType:DOI+AND+relatedIdentifiers.resourceTypeGeneral:JournalArticle+AND+relatedIdentifiers.relatedIdentifier:*) \" target=\"DOIs\" rel=\"noopener\">?query=(types.resourceTypeGeneral:Dataset+OR+types.resourceTypeGeneral:Collection)+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+OR+creators.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(relatedIdentifiers.relatedIdentifierType:DOI+AND+relatedIdentifiers.resourceTypeGeneral:JournalArticle+AND+relatedIdentifiers.relatedIdentifier:*)<br \/>\n<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">1 Work<\/b><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h2>Part 2: Data discovery based on chemical properties such as NMR, IR or X-ray spectroscopy<\/h2>\n<ol start=\"10\">\n<li>Find all Datasets associated with Chemical structure representation and NMR Media types,<br \/>\n<b>NMR<\/b> as a Subject and the title or description term<br \/>\n&#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:chemical\/x-cdxml+OR+media.media_type:chemical\/x-mdl-molfile)+AND+(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:chemical\/x-cdxml+OR+media.media_type:chemical\/x-mdl-molfile)+AND+(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">150 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with Chemical structure representation and NMR Media types,<br \/>\n<b>NMR Nuclei<\/b> as a Subject, for <b>13C<\/b> and the title or description term<br \/>\n&#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Nucleus)+AND+(subjects.subject:13C)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Nucleus)+AND+(subjects.subject:13C)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">41 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with Chemical structure representation and NMR Media types,<br \/>\n<b>NMR<\/b> as a Subject, for <b>HMBC<\/b> Experiments and the title or description term<br \/>\n&#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><\/li>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Expt)+AND+(subjects.subject:HMBC)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Expt)+AND+(subjects.subject:HMBC)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)&#8221;<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">26 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with Chemical structure representation and NMR Media types,<br \/>\n<b>NMR<\/b> as a Subject, using <b>solvent<\/b> &#8220;CD<sub>3<\/sub>OD&#8221; and the title or description term<br \/>\n&#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Solvent)+AND+(subjects.subject:*CD3OD)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\">?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR_Solvent)+AND+(subjects.subject:*CD3OD)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">22 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with NMR Media types,<br \/>\n<b>NMR<\/b> as a Subject and <b>InChIKey<\/b> : OZEYXLXJQKVGCZ-UHFFFAOYSA-L<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+((subjects.subjectScheme:inchikey)+AND+(subjects.subject:OZEYXLXJQKVGCZ-UHFFFAOYSA-L))\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+((subjects.subjectScheme:inchikey)+AND+(subjects.subject:OZEYXLXJQKVGCZ-UHFFFAOYSA-L))<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">5 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with NMR Media types,<br \/>\n<b>NMR<\/b> as a Subject and the molecular formula component of the full <b>InChI<\/b> : InChI=1S\/2C18H16N2O3.2C2H6O.Ca\/c2*1-23-15-9-7-13 etc<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+((subjects.subjectScheme:inchi)+AND+(subjects.subject:InChI=1S\/2C18H16N2O3.2C2H6O.Ca*))\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:application\/zip+OR+media.media_type:chemical\/x-mnova)+AND+(subjects.subjectScheme:*NMR*)+AND+((subjects.subjectScheme:inchikey)+AND+(subjects.subject:InChI=1S\/2C18H16N2O3.2C2H6O.Ca*))<\/a> <b>5 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with Chemical structure representation Media types,<br \/>\n<b>IR<\/b> as a Subject and the title or description term<br \/>\n&#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=media.media_type:chemical\/x-cdxml+AND+(subjects.subjectScheme:*IFD.IR*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=media.media_type:chemical\/x-cdxml+AND+(subjects.subjectScheme:*IFD.IR*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<br \/>\n<b style=\"background-color:darkgreen;color:white;\">36 datasets<\/b><\/a><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with a Chemical structure representation and Crystal structure<br \/>\nMedia types, <b>XRAY<\/b> as a Subject and the<br \/>\ntitle or description term &#8220;Pyrazol&#8221;<\/p>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=media.media_type:chemical\/x-cif+AND+(subjects.subjectScheme:*IFD.XRAY*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=media.media_type:chemical\/x-cif+AND+(subjects.subjectScheme:*IFD.XRAY*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<br \/>\n<b style=\"background-color:darkgreen;color:white;\">38 datasets<\/b><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h2>Part 3: Data discovery based on chemical properties such as Computational modelling<\/h2>\n<ol start=\"18\">\n<li>Find all Datasets associated with Chemical structure representation and Computation Media<br \/>\ntypes, <b>COMP<\/b> as a Subject and the title<br \/>\nor description term &#8220;<b>Pyrazol&#8221;<\/b><\/li>\n<li>\n<ul>\n<li><a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+(subjects.subjectScheme:*IFD.Comp*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+(subjects.subjectScheme:*IFD.Comp*)+AND+(titles.title:*pyrazol*+OR+descriptions.description:*pyrazol*)<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">4 datasets<\/b><\/li>\n<\/ul>\n<\/li>\n<li>Find all Datasets associated with Computation Media types and the subject <b>KIE<\/b> for Hydrogen isotopes.\n<ul>\n<li><b>Visual search<\/b>:<br \/>\n<a href=\"https:\/\/commons.datacite.org\/?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+media.media_type:text\/plain+AND+(titles.title:*Endo*+OR+descriptions.description:*Endo*+OR+titles.title:*Exo*+OR+descriptions.description:*Exo*)+AND+(subjects.subjectScheme:*KIE*)+AND+subjects.subject:1H\/2H\" target=\"DOIs\" rel=\"noopener\">?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+media.media_type:text\/plain+AND+(titles.title:*Endo*+OR+descriptions.description:*Endo*+OR+titles.title:*Exo*+OR+descriptions.description:*Exo*)+AND+(subjects.subjectScheme:*KIE*)+AND+subjects.subject:1H\/2H<\/a><br \/>\n<b style=\"background-color:darkgreen;color:white;\">17 datasets<\/b><\/li>\n<li><b>API Search<\/b>:<br \/>\n<a href=\"https:\/\/api.datacite.org\/dois\/?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+media.media_type:text\/plain+AND+(titles.title:*Endo*+OR+descriptions.description:*Endo*+OR+titles.title:*Exo*+OR+descriptions.description:*Exo*)+AND+(subjects.subjectScheme:*KIE*)+AND+subjects.subject:1H\/2H\" target=\"DOIs\" rel=\"noopener\">https:\/\/api.datacite.org\/dois\/?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+media.media_type:text\/plain+AND+(titles.title:*Endo*+OR+descriptions.description:*Endo*+OR+titles.title:*Exo*+OR+descriptions.description:*Exo*)+AND+(subjects.subjectScheme:*KIE*)+AND+subjects.subject:1H\/2H<\/a><\/li>\n<li><b>Command line search<\/b>:<br \/>\n<b>curl<\/b> https:\/\/api.datacite.org\/dois\/?query=(media.media_type:chemical\/x-gaussian-log+OR+media.media_type:chemical\/x-gaussian-checkpoint)+AND+media.media_type:text\/plain+AND+(titles.title:*Endo*+OR+descriptions.description:*Endo*+OR+titles.title:*Exo*+OR+descriptions.description:*Exo*)+AND+(subjects.subjectScheme:*KIE*)+AND+subjects.subject:1H\/2H<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<hr \/>\n<p>One feature of this approach is that the searches themselves, which are across a globally aggregated metadata store, can change with time. So repeating some of the searches at defined time intervals can also give a dynamic indication of how a particular area of data is growing. Other searches are of course designed to give a single hit which probably will not change with time.<\/p>\n<p>The above is based on an interpretation and implementation of the DataCite Schema, one which will eventually need to be agreed by the communities and sub-communities that might wish to use them. <strong>So beware<\/strong>, there may be other implementations covering similar data that would not eg be found by the above searches, particularly in the way the subject terms above are used. They are therefore included here purely to raise awareness of the potential that such an approach has &#8211; along with my observation that I had never attended any presentation where they have been discussed or shown. In the future, it seems likely that these JSON-based searches will themselves get automated and generated by software rather than by a human as here. When that comes, searching will never be the same again!<\/p>\n<hr \/>\n<p><strong>I also welcome suggestions for new search queries.\u00a0This might either be accommodated using the existing metadata, or might require new additions to the metadata record. Please send them here as comments.<\/strong><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-28045-0\">DataCite Metadata Working Group., \"DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs v4.5\", <i>DataCite<\/i>, 2024. <a href=\"https:\/\/doi.org\/10.14454\/g8e5-6293\">https:\/\/doi.org\/10.14454\/g8e5-6293<\/a>\n\n<\/li>\n<li id=\"ITEM-28045-1\">H. Rzepa, and T. Davies, \"Open publishing FAIR spectra for and by students\", <i>Spectroscopy Europe<\/i>, pp. 22, 2022. <a href=\"https:\/\/doi.org\/10.1255\/sew.2022.a10\">https:\/\/doi.org\/10.1255\/sew.2022.a10<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 28045 -->","protected":false},"excerpt":{"rendered":"<p>With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an aggregated and federated metadata store, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[4],"tags":[],"ppma_author":[2661],"class_list":["post-28045","post","type-post","status-publish","format-standard","hentry","category-interesting-chemistry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Discovery: A pick-n-mix library of useful FAIR Data searches - and a call for new search suggestions. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Discovery: A pick-n-mix library of useful FAIR Data searches - and a call for new search suggestions. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an aggregated and federated metadata store, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-25T09:45:43+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-12-05T12:08:45+00:00\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Discovery: A pick-n-mix library of useful FAIR Data searches - and a call for new search suggestions. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","og_locale":"en_GB","og_type":"article","og_title":"Data Discovery: A pick-n-mix library of useful FAIR Data searches - and a call for new search suggestions. - Henry Rzepa&#039;s Blog","og_description":"With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an aggregated and federated metadata store, [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2024-11-25T09:45:43+00:00","article_modified_time":"2024-12-05T12:08:45+00:00","author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"Data Discovery: A pick-n-mix library of useful FAIR Data searches &#8211; and a call for new search suggestions.","datePublished":"2024-11-25T09:45:43+00:00","dateModified":"2024-12-05T12:08:45+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045"},"wordCount":1612,"commentCount":0,"articleSection":["Interesting chemistry"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","name":"Data Discovery: A pick-n-mix library of useful FAIR Data searches - and a call for new search suggestions. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"datePublished":"2024-11-25T09:45:43+00:00","dateModified":"2024-12-05T12:08:45+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"Data Discovery: A pick-n-mix library of useful FAIR Data searches &#8211; and a call for new search suggestions."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-7il","jetpack-related-posts":[{"id":15907,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=15907","url_meta":{"origin":28045,"position":0},"title":"Global initiatives in research data management and discovery: searching metadata.","author":"Henry Rzepa","date":"March 7, 2016","format":false,"excerpt":"The upcoming ACS national meeting in San Diego has a CINF\u00a0(chemical information division) session entitled \"Global initiatives in research data management and discovery\". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session. Data, if you think about it,\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":28305,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28305","url_meta":{"origin":28045,"position":1},"title":"Finding and Discovery Aids as part of data availability statements for research articles.","author":"Henry Rzepa","date":"February 19, 2025","format":false,"excerpt":"Starting around 2016, journal publishers started including mandatory \"Data Availability\" statements as part of research articles; a typical (dated) example is linked here, including guidelines for how to cite the data itself. I wrote about these aspects last year in a blog post for the RSC journal Digital Discovery and\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":27090,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=27090","url_meta":{"origin":28045,"position":2},"title":"Data Discoverability as a feature of Journal Articles.","author":"Henry Rzepa","date":"June 11, 2024","format":false,"excerpt":"I can remember a time when journal articles carried selected data within their body as e.g. Tables, Figures or Experimental procedures, with the rest consigned to a box of paper deposited (for UK journals) at the British library. Then came ESI or electronic supporting information. Most recently, many journals are\u2026","rel":"","context":"In &quot;Interesting chemistry&quot;","block_context":{"text":"Interesting chemistry","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=4"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22059,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=22059","url_meta":{"origin":28045,"position":3},"title":"A cascading tutorial in finding rich NMR data using the Datacite datasearch engine.","author":"Henry Rzepa","date":"April 11, 2020","format":false,"excerpt":"In the previous post, I introduced three of a new generation of search engines specialising in the discovery of data. Data has some special features which make its properties slightly different from the conceptual (or natural language) searches we are used to performing for general information and so a search\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":18344,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18344","url_meta":{"origin":28045,"position":4},"title":"How to search data repositories for FAIR chemical content and data: SubjectScheme","author":"Henry Rzepa","date":"June 8, 2017","format":false,"excerpt":"As data repositories start to flourish, it is reasonable to ask questions such as what sort of chemistry can be found there and how can I find it? Here I give an updated worked example of a digital repository search for chemical content and also pose an important issue for\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/06\/171-1024x196.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":21928,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=21928","url_meta":{"origin":28045,"position":5},"title":"Encouraging Submission of FAIR Data at the Journal of Organic Chemistry and Organic Letters","author":"Henry Rzepa","date":"February 14, 2020","format":false,"excerpt":"In a welcome move, one of the American chemical society journals has published an encouragement to submit what is called FAIR data to the journal.. A reminder that FAIR data is data that can be Found (F), Accessed (A), Interoperated(I) and Re-used( R). I thought I might try to explore\u2026","rel":"","context":"In &quot;Interesting chemistry&quot;","block_context":{"text":"Interesting chemistry","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=4"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/28045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=28045"}],"version-history":[{"count":40,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/28045\/revisions"}],"predecessor-version":[{"id":28152,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/28045\/revisions\/28152"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=28045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=28045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=28045"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=28045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}