{"id":18257,"date":"2017-04-28T15:42:09","date_gmt":"2017-04-28T14:42:09","guid":{"rendered":"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=18257"},"modified":"2017-05-30T07:37:27","modified_gmt":"2017-05-30T06:37:27","slug":"the-challenges-in-curating-research-data-one-case-study","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257","title":{"rendered":"The challenges in curating research data: one case study."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"18257\">\n<p>Research data (and its management) is rapidly emerging as a focal point for the development of research dissemination practices. An important aspect of ensuring that such data remains fit for purpose is identifying what curation activities need to be associated with it. Here I revisit one particular case study associated with the molecular structure of a product identified from a photolysis reaction<span id=\"cite_ITEM-18257-0\" name=\"citation\"><a href=\"#ITEM-18257-0\">[1]<\/a><\/span> and the curation of the crystallographic data associated with this study.<\/p>\n<p>This particular dataset (CSD, dataDOI:\u00a0<a href=\"https:\/\/dx.doi.org\/10.5517\/cctnx5j\" target=\"_blank\" rel=\"noopener noreferrer\">10.5517\/cctnx5j<\/a>)\u00a0is associated with an article entitled &#8220;<em>Single-Crystal X-ray Structure of 1,3-Dimethylcyclobutadiene by Confinement in a Crystalline Matrix<\/em>&#8220;.<span id=\"cite_ITEM-18257-0\" name=\"citation\"><a href=\"#ITEM-18257-0\">[1]<\/a><\/span> Data for\u00a0crystal structures supporting a research article is required (at least in part) to be deposited into the Cambridge structure database (internal reference MUWMEX)\u00a0and for which a significant level of curation is performed. Although the definition of the term curation has evolved over the last few years, here I take it to include the following:<\/p>\n<ol>\n<li>Identification of appropriate metadata describing the data. For molecules, this would include any identifiers such as the name of the molecule and the connectivities of the atoms constituting that molecule.<\/li>\n<li>The submission of this metadata to a suitable aggregator, such as <em>e.g.<\/em> DataCite and its inclusion in any other databases associated with the data. These two tests are part of the FAIR data guidelines<span id=\"cite_ITEM-18257-1\" name=\"citation\"><a href=\"#ITEM-18257-1\">[2]<\/a><\/span>, covering the F (findable) and A (accessible).<\/li>\n<li>Performing any validation tests for the data that can be identified. With crystal structure data in CIF format, this is defined by the utility <a href=\"http:\/\/checkcif.iucr.org\">checkCIF<\/a>\u00a0and helps to ensure the I (inter-operable) of FAIR. The R refers in part to the licenses\u00a0under which the data can be re-used.<\/li>\n<\/ol>\n<p>On (it has to be said rare) occasions, these procedures can lead to a disparity between the author&#8217;s conclusions arrived on the basis of their acquired data and the metadata identified\u00a0by the independent curators. This difference is most obviously illustrated in this case study by the chemical names inferred by the curation process for\u00a0the structure represented by the data in the CSD:<\/p>\n<ul>\n<li>chemical name: &#8220;<em>tetrakis(Guanidinium) 25,26,27,28-tetrahydroxycalix(4)arene-5,11,17,23-tetrasulfonate 1,5-dimethyl-2-oxabicyclo[2.2.0]hex-5-en-3-one clathrate trihydrate<\/em>&#8220;<\/li>\n<li>chemical name\u00a0synonym:\u00a0&#8220;<em>tetrakis(Guanidinium) tetra-p-sulfocalix(4)arene 1,3-dimethylcyclobutadiene carbon dioxide clathrate trihydrate<\/em>&#8220;.<\/li>\n<\/ul>\n<p>Only the synonym agrees with the title given by the original authors in their publication.<span id=\"cite_ITEM-18257-0\" name=\"citation\"><a href=\"#ITEM-18257-0\">[1]<\/a><\/span> One might indeed strongly argue that these two names are <strong>not<\/strong> in fact synonyms, since they refer to quite different chemical structures with different atom connectivities. A search of the database for the sub-structure corresponding to\u00a0<em>1,3-dimethylcyclobutadiene <\/em>does not reveal any hits and so the information implied by this synonym is not recorded in the index created for the CSD database.<\/p>\n<p>I asked the scientific editors of the CSD for some guidance on the curation procedures applied to crystal structure datasets and they have kindly allowed me to quote some of this.<\/p>\n<ol>\n<li><em>&#8220;In cases such as this, we as editors are sometimes faced with conflicting information and have to try our best to strike a balance between the data presented in the CIF, a published interpretation and our knowledge based on the information already in the CSD&#8221;<\/em>.<\/li>\n<li><em>&#8220;In areas where there is a particular conflict between these, we often would include a comment (usually in the Remarks or Disorder field as appropriate)&#8221;. <\/em>For this particular dataset, one finds the following under the Disorder field:\n<ul>\n<li><em>&#8220;Under UV radiation the clathrated pyrone molecule converts to a disordered mixture of square-planar 1, 3-dimethylcyclobutadiene and rectangular-bent 1, 3-dimethylcyclobutadiene in van der Waals contact with a carbon dioxide molecule. The ratio of the square-planar to rectangular-bent 1, 3-dimethylcyclobutadiene clathrate is modelled with occupancies 0.6292:0.3708&#8221;. <\/em><\/li>\n<li>It is not entirely obvious however whether this last comment originates from the original authors or from the data curators. It does <strong>not<\/strong> resolve the difference between the assigned chemical name and the indicated chemical name synonym.<\/li>\n<\/ul>\n<\/li>\n<li><em>&#8220;In the case of MUWMEX, I think that the editor produced a diagram <\/em>(below)<em> which seems chemically reasonable based on the crystallographic data with which we were provided and tried to cover the situation regarding disorder, van der Waals contacts etc in the \u2018Disorder\u2019 field. At this point, it is left to the CSD user to decide for themselves.&#8221;<\/em><br \/>\n <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-18266\" src=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg\" alt=\"\" width=\"432\" height=\"280\" srcset=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg 432w, https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1-300x194.jpg 300w\" sizes=\"auto, (max-width: 432px) 100vw, 432px\" \/><\/li>\n<\/ol>\n<p>We have arrived at a point where the CSD user must indeed decide what the species described by this dataset actually is. Ideally, the best recourse would be to acquire the original data in full and repeat the crystallographic analysis. This is an aspect of the curation of crystallographic data that is <strong>not conducted<\/strong> as part of the current\u00a0processes, which would require as a minimum a superset known as the <strong>hkl information<\/strong> to be present in the data. Again, to quote the CSD scientific editors:<\/p>\n<ol start=\"4\">\n<li><em>&#8220;With regard to your question: Is there any mechanism in the Conquest search to identify structures where the hkl information is present? I understand that it is not currently possible to do this in ConQuest. It is, however, possible &#8230;\u00a0to access structure factor data (where available) using <a href=\"https:\/\/www.ccdc.cam.ac.uk\/support-and-resources\/support\/case\/?caseid=36aa08d2-52a5-401f-804a-d278338df633\">Access Structures<\/a>.&#8221;<\/em><\/li>\n<\/ol>\n<p>For MUWMEX, the hkl information is not present in the CSD dataset and in 2010 when the structure was published would have to be\u00a0obtained directly from the authors.\u00a0By 2016 however, its presence in deposited datasets was becoming far more common. It is worth pointing out that even the\u00a0hkl information is not the complete data\u00a0recorded for the experiment. \u00a0That is represented by the original image files recording the X-ray diffractions. This latter is hardly ever available as FAIR data even nowadays.<\/p>\n<p>I hope I have here illustrated at least some of the challenging aspects of curating scientific data and the issues that can arise when derived metadata (in this case the name and the atom connectivities of a molecule) reveal conflicts with the original interpretations. This for an area of chemistry where both the data deposition and its curation is a very mature subject, having operated for ~52 years now. It is still a process that requires the intervention of skilled curators of the data, but perhaps even more importantly it reveals the need to identify even more strictly what the provenance of the interpretations is. Should the CSD curation rest merely at the stage of teasing out and flagging inconsistencies and allowing the user to then take over to resolve the conflicts? Should it be more active, in re-analyzing data for each entry where conflicts have been detected? Perhaps the latter is not practical now, but it might be in the near future. What is certain is that with increasing availability of FAIR data these sorts of issues will increasingly come to the fore. And not just for the very well understood case of crystallographic data but for many other types of data.<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-18257-0\">Y. Legrand, A. van der Lee, and M. Barboiu, \"Single-Crystal X-ray Structure of 1,3-Dimethylcyclobutadiene by Confinement in a Crystalline Matrix\", <i>Science<\/i>, vol. 329, pp. 299-302, 2010. <a href=\"https:\/\/doi.org\/10.1126\/science.1188002\">https:\/\/doi.org\/10.1126\/science.1188002<\/a>\n\n<\/li>\n<li id=\"ITEM-18257-1\">M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. Boiten, L.B. da Silva Santos, P.E. Bourne, J. Bouwman, A.J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C.T. Evelo, R. Finkers, A. Gonzalez-Beltran, A.J. Gray, P. Groth, C. Goble, J.S. Grethe, J. Heringa, P.A. \u2019t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S.J. Lusher, M.E. Martone, A. Mons, A.L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M.A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, \"The FAIR Guiding Principles for scientific data management and stewardship\", <i>Scientific Data<\/i>, vol. 3, 2016. <a href=\"https:\/\/doi.org\/10.1038\/sdata.2016.18\">https:\/\/doi.org\/10.1038\/sdata.2016.18<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 18257 -->","protected":false},"excerpt":{"rendered":"<p>Research data (and its management) is rapidly emerging as a focal point for the development of research dissemination practices. An important aspect of ensuring that such data remains fit for purpose is identifying what curation activities need to be associated with it. Here I revisit one particular case study associated with the molecular structure of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2,1745],"tags":[2192,281,301,2193,2191,325,988,2190,1425,1042,1648,1645,1405,42],"ppma_author":[2661],"class_list":["post-18257","post","type-post","status-publish","format-standard","hentry","category-chemical-it","category-crystal_structure_mining","tag-assigned-chemical-name","tag-author","tag-chemical-name","tag-chemical-name-synonym","tag-chemical-names","tag-chemical-structures","tag-editor","tag-indicated-chemical-name-synonym","tag-knowledge","tag-radiation","tag-research","tag-scientific-method","tag-technologyinternet","tag-x-ray"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>The challenges in curating research data: one case study. - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The challenges in curating research data: one case study. - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"Research data (and its management) is rapidly emerging as a focal point for the development of research dissemination practices. An important aspect of ensuring that such data remains fit for purpose is identifying what curation activities need to be associated with it. Here I revisit one particular case study associated with the molecular structure of [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2017-04-28T14:42:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-05-30T06:37:27+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The challenges in curating research data: one case study. - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257","og_locale":"en_GB","og_type":"article","og_title":"The challenges in curating research data: one case study. - Henry Rzepa&#039;s Blog","og_description":"Research data (and its management) is rapidly emerging as a focal point for the development of research dissemination practices. An important aspect of ensuring that such data remains fit for purpose is identifying what curation activities need to be associated with it. Here I revisit one particular case study associated with the molecular structure of [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2017-04-28T14:42:09+00:00","article_modified_time":"2017-05-30T06:37:27+00:00","og_image":[{"url":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg","type":"","width":"","height":""}],"author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"The challenges in curating research data: one case study.","datePublished":"2017-04-28T14:42:09+00:00","dateModified":"2017-05-30T06:37:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257"},"wordCount":1132,"commentCount":5,"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#primaryimage"},"thumbnailUrl":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg","keywords":["assigned chemical name","author","chemical name","chemical name synonym","chemical names","chemical structures","editor","indicated chemical name synonym","Knowledge","radiation","Research","Scientific method","Technology\/Internet","X-ray"],"articleSection":["Chemical IT","crystal_structure_mining"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257","name":"The challenges in curating research data: one case study. - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#primaryimage"},"image":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#primaryimage"},"thumbnailUrl":"http:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg","datePublished":"2017-04-28T14:42:09+00:00","dateModified":"2017-05-30T06:37:27+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257"]}]},{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#primaryimage","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg","contentUrl":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/04\/077-1.jpg","width":432,"height":280},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18257#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"The challenges in curating research data: one case study."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-4Kt","jetpack-related-posts":[{"id":16251,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=16251","url_meta":{"origin":18257,"position":0},"title":"Metametadata: data about data about (chemical) data.","author":"Henry Rzepa","date":"April 16, 2016","format":false,"excerpt":"Scientists are familiar with the term data, at least in a scientific or chemical context, but appreciating metadata (meaning \"after\", or \"beyond\") is slightly more subtle, in the sense of using it to mean data about data. The challenge lies in clarifying where the boundary between data and its metadata\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":18465,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=18465","url_meta":{"origin":18257,"position":1},"title":"FAIR Research data: Gravitational waves as an example from the astrophysics community.","author":"Henry Rzepa","date":"June 2, 2017","format":false,"excerpt":"In 2016, the world heard that gravitational waves had been detected and\u00a0now a third instance is reported.\u2021 Given that the data associated with these detections are perhaps amongst the most important instances in recent times, I thought I might take a peek at how it was managed. The original report\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2017\/06\/117-1024x584.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":17951,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=17951","url_meta":{"origin":18257,"position":2},"title":"Supporting information: chemical graveyard or invaluable resource for chemical structures.","author":"Henry Rzepa","date":"March 31, 2017","format":false,"excerpt":"Nowadays, data supporting\u00a0most publications relating to the synthesis of organic compounds is more likely than not to be found in associated \"supporting information\" rather than the (often page limited) article itself. For example, this article has an SI which is paginated at 907; almost a mini-database in its own right!\u2020\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":28045,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=28045","url_meta":{"origin":18257,"position":3},"title":"Data Discovery: A pick-n-mix library of useful FAIR Data searches &#8211; and a call for new search suggestions.","author":"Henry Rzepa","date":"November 25, 2024","format":false,"excerpt":"With AI and Machine learning needing data in abundance, interest in data discovery is intense. However, this type of discovery is somewhat different from more traditional data base searches, in that it is particularly suited for machine discovery as well as by humans. The discovery searches are conducted using an\u2026","rel":"","context":"In &quot;Interesting chemistry&quot;","block_context":{"text":"Interesting chemistry","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=4"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":15907,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=15907","url_meta":{"origin":18257,"position":4},"title":"Global initiatives in research data management and discovery: searching metadata.","author":"Henry Rzepa","date":"March 7, 2016","format":false,"excerpt":"The upcoming ACS national meeting in San Diego has a CINF\u00a0(chemical information division) session entitled \"Global initiatives in research data management and discovery\". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session. Data, if you think about it,\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22043,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=22043","url_meta":{"origin":18257,"position":5},"title":"New generations of globally aggregating search engines &#8211; for (chemical) data.","author":"Henry Rzepa","date":"April 7, 2020","format":false,"excerpt":"Chemists have long been familiar with search engines that aspire to index a large proportion of the chemical literature. Think for example the old-generation (and commercial)\u00a0SciFinder (Scholar)\u00a0and Reaxys\u00a0or those that arrived in the 1990s in the online era\u2021 such as the non-commercial Pubchem or ChemSpider (there are more). But you\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2020\/04\/google-1024x1004.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/18257","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=18257"}],"version-history":[{"count":17,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/18257\/revisions"}],"predecessor-version":[{"id":18316,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/18257\/revisions\/18316"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=18257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=18257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=18257"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=18257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}