{"id":14454,"date":"2015-08-20T17:34:45","date_gmt":"2015-08-20T16:34:45","guid":{"rendered":"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=14454"},"modified":"2015-08-23T11:37:14","modified_gmt":"2015-08-23T10:37:14","slug":"a-light-introductory-tutorial-on-research-data-management-in-chemistry","status":"publish","type":"post","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","title":{"rendered":"A (light) introductory tutorial on Research Data Management (in chemistry)."},"content":{"rendered":"<div class=\"kcite-section\" kcite-section-id=\"14454\">\n<p>Management of research (data) outputs is a hot topic in the UK at the moment, although the topic has been rumbling for five years or more. Most research-active higher educational establishments have or are about to publish general guidelines, which predominantly take the form of aspirational targets rather than actionable examples\u00a0or\u00a0use-cases.<sup>\u2021<\/sup> Because the concepts remain somewhat abstract, one can encounter questions from researchers such as &#8220;how should I go about achieving such RDM (research data management)?&#8221; I thought it might be useful for me to here summarise some key features in the form of an FAQ that can help answer that question. I will concentrate purely on the sub-set chemistry about which I know most.<\/p>\n<hr \/>\n<p>I will start by exploring the acronym <a href=\"https:\/\/www.force11.org\/group\/fairgroup\/fairprinciples\" target=\"_blank\">FAIR<\/a>\u00a0data.<\/p>\n<ul>\n<li><b>F<\/b> is findable. This means that metadata is a key part of the process, since it is this information that allows the research data to be more easily found, not only by other humans but by software engines which specialise in such activity.<\/li>\n<li><b>A<\/b> is accessible. And easily so. Which means a standard identifier to get to the research data, with no paywalls, account registrations or other obstructions. It should ideally\u00a0be possible to access data anonymously, without necessarily revealing personal information.<\/li>\n<li><b>I<\/b> is inter-operable. This is harder to define exactly, but the essence is that it should be possible to re-use the data in a context different from the original, and perhaps even outside the subject domain where it was created. For example, if data was collected using one specific instrument, it should\u00a0be able to use it without necessarily having access to either an identical instrument or to the software associated with that instrument.<\/li>\n<li><b>R<\/b> is reusable. There should be sufficient information about the data and its parameters to if necessary repeat its collection independently of the original, or to re-use it to start a new data collection. Reusable also means by software, and not just by a human.<\/li>\n<\/ul>\n<p>The first two properties are easily achieved, since standard procedures can be used. The last two properties are potentially more difficult, since they require more intervention or thought by both the depositor and the re-user. So I will concentrate really on the first two, since by and large they will satisfy most of the general guidelines issued by funders and universities, but note that we must not in the medium to longer term forget the last two.<\/p>\n<hr \/>\n<p>I will now list some typical types of data that I have personal experience of. As the community increasingly participates in such RDM, this list will expand by &#8220;crowd-sourcing&#8221;; if your type of data is not listed, do not give up!\u00a0<\/p>\n<ol>\n<li>Data generated by software without instrumental inputs, a good example of which are the outputs of computational chemistry. I have the most personal experience in this area, having been at it for ten years or more<span id=\"cite_ITEM-14454-0\" name=\"citation\"><a href=\"#ITEM-14454-0\">[1]<\/a><\/span>,<span id=\"cite_ITEM-14454-1\" name=\"citation\"><a href=\"#ITEM-14454-1\">[2]<\/a><\/span> and examples are scattered throughout this blog (and in many of our recent research publications).<\/li>\n<li>Software developed as part of the data collection process\u00a0and which might be required by others to re-use the data. An example of such was described in a previous post, and has been RDMed here.<span id=\"cite_ITEM-14454-2\" name=\"citation\"><a href=\"#ITEM-14454-2\">[3]<\/a><\/span>.<\/li>\n<li>Data generated by software associated with instrumental outputs. In chemistry this means\u00a0spectrometers and other instruments, most of which now have computers which handle\u00a0the data outputs. Specific examples might be crystal structures, NMR, IR, MS and optical (including chiroptical) spectra.\n<ul>\n<li>Crystal structures are the gold standard in RDM, since they fulfil all the requirement of FAIR and so merit a special mention here. In the last year, the Cambridge structural database (CSD) has had\u00a0implemented a standard access mechanism based on a digital object identifier (DOI).<span id=\"cite_ITEM-14454-3\" name=\"citation\"><a href=\"#ITEM-14454-3\">[4]<\/a><\/span><\/li>\n<li>The end point of many other instrumental outputs are\u00a0PDF files. These do not easily achieve the IR of FAIR (see my comment above), but we will admit the PDF format as a temporary expedient until the use of semantically richer formats increases (the gold example here being the CIF format for crystal structures). You can see an example of PDF files here as a fileset<span id=\"cite_ITEM-14454-4\" name=\"citation\"><a href=\"#ITEM-14454-4\">[5]<\/a><\/span> describing <sup>1<\/sup>H, <sup>13<\/sup>C NMR, Mass spectrometry, ECD (electronic circular dichroism) and VCD (vibrational circular dichroism). Perhaps a better format for expressing many types of spectra is the Excel spreadsheet, which achieves a reasonable proportion of the IR aspirations of FAIR. Both expressions can be included in the collection.\u00a0<\/li>\n<li>As a postscript to this list, I should mention that instrumental data is often found as:\n<ul>\n<li>raw (unreduced or unprocessed) data, which can be very large (e.g. Free induction decay time-domain data in NMR).<\/li>\n<li>A version which has already been subjected to processing (Fourier transformed frequency-domain data in NMR, i.e. a spectrum). This is probably more suitable for archiving, but its a fine judgement.<\/li>\n<li>A a rough rule of thumb, chemistry data intended for archival should be ~ &lt; 1 Gb.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Synthetic methodologies that describe the preparation and characterisation of molecules. You can see an example of such data here.<span id=\"cite_ITEM-14454-5\" name=\"citation\"><a href=\"#ITEM-14454-5\">[6]<\/a><\/span>\u00a0<\/li>\n<\/ol>\n<hr \/>\n<p>Now I come to\u00a0how the (molecular) data is packaged, and this is best described in terms of its granularity. There are perhaps four classes:<\/p>\n<ol>\n<li>All the data is packaged into a single compressed (ZIP) archive. An example can be found here<span id=\"cite_ITEM-14454-6\" name=\"citation\"><a href=\"#ITEM-14454-6\">[7]<\/a><\/span> containing coordinates for 134,000 molecules. If your interest is in just one of these molecules, then you could argue that this data does not fully conform to the <b>F<\/b> of FAIR, since it contains no information (metadata) about individual molecules.<\/li>\n<li>The next packaging is (in chemistry) for a specific molecule (or perhaps reaction). An example is again<span id=\"cite_ITEM-14454-4\" name=\"citation\"><a href=\"#ITEM-14454-4\">[5]<\/a><\/span>, which contains data about a specific molecule, and that molecule is itself defined by the inclusion of e.g. a Chemdraw file. Another example<span id=\"cite_ITEM-14454-5\" name=\"citation\"><a href=\"#ITEM-14454-5\">[6]<\/a><\/span> relates to reaction information, and also includes spectroscopic data in the form of a JCAMP-DX file, which is semantically preferable to eg an Excel spreadsheet or just a PDF file. Most of the examples on this blog are in this category, relating to quantum chemical computations of a specific molecule.<span id=\"cite_ITEM-14454-7\" name=\"citation\"><a href=\"#ITEM-14454-7\">[8]<\/a><\/span>\u00a0I will concentrate here just on this second type of packaging.<\/li>\n<li><span style=\"color: #000000;\">The most finely-grained\u00a0packaging is at the molecular property level. To illustrate this, go visit e.g. the <a style=\"color: #000000;\" href=\"https:\/\/en.wikipedia.org\/wiki\/Aspirin\" target=\"_blank\">Wikipedia page for aspirin<\/a>, where you will find a ChemBox containing property data. In the future, these ChemBox properties will be interactively\u00a0populated from a data repository known as WikiData. This type of RDM is still developing, and I include it here as a placeholder and to counterbalance the first category above!<\/span><\/li>\n<li><span style=\"color: #000000;\">Thus category is a little different from the previous three; it relates to a <a style=\"color: #000000;\" href=\"http:\/\/www.ch.imperial.ac.uk\/rzepa\/blog\/?p=4930\">collection of packages<\/a>, where the granularity of class 2 above is retained, but boxed up into a project collection.<span id=\"cite_ITEM-14454-8\" name=\"citation\"><a href=\"#ITEM-14454-8\">[9]<\/a><\/span><\/span><\/li>\n<\/ol>\n<hr \/>\n<p>\u00a0 And now to look at the life cycle of some data.<\/p>\n<ol>\n<li>The data starts off as <b>live<\/b>. This is some sort of holding store which members of the group can access\/contribute to. It can be a local sharepoint or a cloud-based resource such as DropBox, but it could still be a simple DVD or USB storage device.\n<ul>\n<li>We have for some ten years now used a locally built live data store (which is itself archived at Zenodo as software<span id=\"cite_ITEM-14454-9\" name=\"citation\"><a href=\"#ITEM-14454-9\">[10]<\/a><\/span>) and which serves to track a user&#8217;s experiments, including initiation and completion dates and times, to serve as a simple interface for archival, to record published experiments and to flag requested data embargoes (see below) and to provide a search interface for all of this. Pretty much the description of an electronic (laboratory) notebook. We created our own<span id=\"cite_ITEM-14454-1\" name=\"citation\"><a href=\"#ITEM-14454-1\">[2]<\/a><\/span> because few commercial products (either ten years ago, or even now) offer the ability to seamlessly incorporate a <strong>Publish<\/strong>\u00a0workflow which automates\u00a0all the required actions of RDM as described here, and because it is something we might want to do 5-20 times a day. If your requirement is much less, such automation may not be needed.<\/li>\n<\/ul>\n<\/li>\n<li>When the data is stable and edited down to that which needs to be associated with an article (the narrative), it now needs archiving in a manner that will ensure its persistence for at least a decade or even longer.<\/li>\n<li>Associated metadata describing the data also now needs to be assembled\u00a0and this combined package is now sent to a\u00a0data archive. These archives have special characteristics, one of which is that they can issue a persistent identifier we know as the DOI. This itself is issued by a registry, which for data is usefully done by an organisation known as DataCite. If desired, two or more of these packages can be associated with a <strong>collection<\/strong>, and the <strong>collection<\/strong> itself can also be given a DOI.<span id=\"cite_ITEM-14454-8\" name=\"citation\"><a href=\"#ITEM-14454-8\">[9]<\/a><\/span><\/li>\n<li>A copy of the metadata is sent to DataCite when the DOI is issued. The search engine that indexes this information is also at\u00a0DataCite.<\/li>\n<li>Now all that needs doing is that the Data DOIs are\u00a0all\u00a0cited in the article to be published, or you can (also or instead) cite the DOI for a <strong>collection<\/strong>. An\u00a0accepted article\u00a0is itself issued in due course with a DOI (this time by an agency known as CrossRef on behalf of the publisher).\u00a0<\/li>\n<li>To complete the virtuous cycle, the article DOIs can be retrospectively added to the metadata for each data package (or the collection of packages), ensuring that the data references the narrative, and that the narrative references the data.\u00a0<\/li>\n<li>You will note from the virtuous cycle in item 5, that timing becomes important. You have to archive the data and mint a DOI in order to cite it in an article. This sounds like publishing the data before the article has been accepted, which would have the advantage that referees could access it as part of their QA process for the article. However, it may be more suitable to simply reserve a DOI for the data for inclusion in an article, but not make it public until that article has itself been accepted and published. This process is called <strong>embargoing<\/strong>; I will defer discussion of this, because this tends to vary according to repository and its implementation is still evolving.<\/li>\n<li>The final action might be to register this activity on any institutional software that monitors and aggregates research outputs. We use Symplectic to achieve this, it having the ability to record both a research publication and increasingly properties of the data itself.<\/li>\n<\/ol>\n<hr \/>\n<p>By now you might be asking where you could\u00a0explore further, and perchance even try things out.<\/p>\n<ol>\n<li><a href=\"https:\/\/zenodo.org\/features\" target=\"_blank\">zenodo.org\/features<\/a>\u00a0 is one good place to start; it will cost nothing; there is\u00a0(within reason) no limitation to how much data can be archived. Zenodo also allows data to be retrieved from DropBox and Github (for code) for archival.<\/li>\n<li><a href=\"http:\/\/figshare.com\" target=\"_blank\">figshare.com<\/a>\u00a0allows you to sign up for free, but with limitations to the total data storage unless you upgrade to an institutional or paid account.<\/li>\n<li><a href=\"http:\/\/www.datadryad.org\/pages\/faq\" target=\"_blank\">www.datadryad.org\/pages\/faq<\/a>\u00a0 which charges $80-90 per deposition.<\/li>\n<li>Institutional data repositories. The notes above were written based on the experiences we have had for almost nine years now with a local data repository we call SPECTRa,<span id=\"cite_ITEM-14454-0\" name=\"citation\"><a href=\"#ITEM-14454-0\">[1]<\/a><\/span> where some 230,000 individual data packages are now archived. This one<span id=\"cite_ITEM-14454-10\" name=\"citation\"><a href=\"#ITEM-14454-10\">[11]<\/a><\/span> dates from 2007 to illustrate its longevity. Unfortunately, only members of \u00a0Imperial College can <a href=\"https:\/\/portal.hpc.imperial.ac.uk\/\" target=\"_blank\">make use of it<\/a>.<\/li>\n<\/ol>\n<hr \/>\n<p>I realise now that I have written this all down that it is somewhat longer than I was expecting, and that this very length may well put some researchers off.\u00a0Apart from RDM now being mandatory in the UK, it is also reasonable for researchers to ask &#8220;what was in it for me?&#8221; as a reward for persisting.\u00a0I can only answer that one from my personal experiences:<\/p>\n<ul>\n<li>The live data store (or uportal as we call it) has proved invaluable for recording our (computational) experiments. I often use it to track down calculations from years ago. As a laboratory notebook, it is minimalist, as is the learning curve and hence does not overwhelm. If more information is needed, one simply goes to the DOI recorded there for each experiment if archived, or the original inputs and outputs if not.<\/li>\n<li>Assigning a DOI to a data package makes it really easy to share this with both collaborators and other researchers who express interest (the data is often too large to send by email).<\/li>\n<li>Sometimes I use e.g.\u00a0<a href=\"http:\/\/search.labs.datacite.org\/help\/examples\" target=\"_blank\">search.labs.datacite.org\/help\/examples<\/a>\u00a0to search the metadata created during the process in order to find (<strong>F<\/strong>) and access (<strong>A<\/strong>) old data, which is then very quickly amenable to re-use (<strong>R<\/strong>). OK, SciFinder or Reaxys it is not (yet!), but it is getting there.<\/li>\n<li>One can get <a href=\"http:\/\/stats.datacite.org\/?fq=datacentre_facet%3A%22BL.IMPERIAL+-+Imperial+College+London%22&amp;fq=allocator_facet%3A%22BL+-+The+British+Library%22&amp;q=#tab-resolution-report\">access statistics<\/a> for the data. If you click on the link, you can see some datasets have been accessed more than 200 times. Someone must be finding them valuable! If you want to find out how much (UK) data is searchable in this manner, <a href=\"http:\/\/stats.datacite.org\/?fq=allocator_facet%3A%22BL+-+The+British+Library%22&amp;#tab-datacentres\" target=\"_blank\">click here<\/a>. Perhaps such statistics may even help get you promoted one day!<\/li>\n<li>Having data available in this way enables one to construct more interesting tables or figures. This &#8220;<em>figable<\/em>&#8221; (yes, its both a table and a figure)\u00a0comes from a recent publication of ours.<span id=\"cite_ITEM-14454-11\" name=\"citation\"><a href=\"#ITEM-14454-11\">[12]<\/a><\/span> It retrieves the data purely by its DOI and inserts it into display software (JSmol) to construct an instant molecular model.\u00a0One can also use this approach for lecture notes and labs,<span id=\"cite_ITEM-14454-12\" name=\"citation\"><a href=\"#ITEM-14454-12\">[13]<\/a><\/span> for blogs as here,\u00a0and\u00a0(if you are very brave) for research presentations.<\/li>\n<li>Google Scholar detects data and citations to it equally with journal articles.\u00a0<a href=\"https:\/\/scholar.google.co.uk\/citations?user=ljZtPwkAAAAJ&amp;hl=en&amp;cstart=295&amp;pagesize=20\" target=\"_blank\">This<\/a> is part of\u00a0my profile there, and there you can see both articles AND data. If you are keen-eyed, you will however note that the data does not contribute to my\u00a0h-index (but arguably, it is more valuable to have some data sets\u00a0accessed 200+ times rather than to be cited!).<\/li>\n<\/ul>\n<hr \/>\n<p><sup>\u2021<\/sup>Some\u00a0selected use-case examples can be viewed,<span id=\"cite_ITEM-14454-13\" name=\"citation\"><a href=\"#ITEM-14454-13\">[14]<\/a><\/span> along with one specific to computational chemistry<span id=\"cite_ITEM-14454-14\" name=\"citation\"><a href=\"#ITEM-14454-14\">[15]<\/a><\/span>.<\/p>\n<h2>References<\/h2>\n    <ol class=\"kcite-bibliography csl-bib-body\"><li id=\"ITEM-14454-0\">J. Downing, P. Murray-Rust, A.P. Tonge, P. Morgan, H.S. Rzepa, F. Cotterill, N. Day, and M.J. Harvey, \"SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories\", <i>Journal of Chemical Information and Modeling<\/i>, vol. 48, pp. 1571-1581, 2008. <a href=\"https:\/\/doi.org\/10.1021\/ci7004737\">https:\/\/doi.org\/10.1021\/ci7004737<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-1\">M.J. Harvey, N.J. Mason, and H.S. Rzepa, \"Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks\", <i>Journal of Chemical Information and Modeling<\/i>, vol. 54, pp. 2627-2635, 2014. <a href=\"https:\/\/doi.org\/10.1021\/ci500302p\">https:\/\/doi.org\/10.1021\/ci500302p<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-2\">H.S. Rzepa, \"Reproducibility In Science: Calculated Kinetic Isotope Effects For Cyclopropyl Carbonyl Radical.\", 2015. <a href=\"https:\/\/doi.org\/10.5281\/zenodo.19949\">https:\/\/doi.org\/10.5281\/zenodo.19949<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-3\">Jana, Anukul., Huch, Volker., Rzepa, Henry S.., and Scheschkewitz, David., \"CCDC 977840: Experimental Crystal Structure Determination\", 2014. <a href=\"https:\/\/doi.org\/10.5517\/cc11tj7m\">https:\/\/doi.org\/10.5517\/cc11tj7m<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-4\">H.S. Rzepa, F.L. Cherblanc, W.A. Herrebout, P. Bultinck, M.J. Fuchter, and Ya-Pei Lo., \"Mechanistic and chiroptical studies on the desulfurization of epidithiodioxopiperazines reveal universal retention of configuration at the bridgehead carbon atoms.\", 2013. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.777773\">https:\/\/doi.org\/10.6084\/m9.figshare.777773<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-5\">S. G\u00fclten, \"Bis dihydropyrimidine\", <i>ChemSpider Synthetic Pages<\/i>, 2011. <a href=\"https:\/\/doi.org\/10.1039\/sp501\">https:\/\/doi.org\/10.1039\/sp501<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-6\">Raghunathan Ramakrishnan., P. Dral, P.O. Dral, M. Rupp, and O. Anatole Von Lilienfeld., \"Quantum chemistry structures and properties of 134 kilo molecules\", 2014. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.978904\">https:\/\/doi.org\/10.6084\/m9.figshare.978904<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-7\">H.S. Rzepa, \"C 8 H 8 B 2\", 2015. <a href=\"https:\/\/doi.org\/10.14469\/ch\/191378\">https:\/\/doi.org\/10.14469\/ch\/191378<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-8\">Y. Zhang, H.S. Rzepa, J.J.P. Stewart, P. Murray-Rust, M.J. Harvey, N. Mason, A. McLean, and Imperial College High Performance Computing Service., \"Revised Cambridge NCI database\", 2014. <a href=\"https:\/\/doi.org\/10.14469\/ch\/2\">https:\/\/doi.org\/10.14469\/ch\/2<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-9\">SimonClifford., and M J Harvey., \"hpc-portal: Public release\", 2015. <a href=\"https:\/\/doi.org\/10.5281\/zenodo.19174\">https:\/\/doi.org\/10.5281\/zenodo.19174<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-10\">H.S. Rzepa, \"C 7 H 10 Br 1 1\", 2007. <a href=\"https:\/\/doi.org\/10.14469\/ch\/46\">https:\/\/doi.org\/10.14469\/ch\/46<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-11\">H.S. Rzepa, A.V. Shernyukov, G.E. Salnikov, V.G. Shubin, and A.M. Genaev, \"Noncatalytic Bromination of Benzene: A Combined Computational and Experimental Study\", 2015. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.1299202\">https:\/\/doi.org\/10.6084\/m9.figshare.1299202<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-12\">K.K.(. Hii, H.S. Rzepa, and E.H. Smith, \"Asymmetric Epoxidation: A Twinned Laboratory and Molecular Modeling Experiment for Upper-Level Organic Chemistry Students\", <i>Journal of Chemical Education<\/i>, vol. 92, pp. 1385-1389, 2015. <a href=\"https:\/\/doi.org\/10.1021\/ed500398e\">https:\/\/doi.org\/10.1021\/ed500398e<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-13\">M. Addis, \"RDM workflows and integrations for HEIs using hosted services\", <i>figshare<\/i>, 2015. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.1476832\">https:\/\/doi.org\/10.6084\/m9.figshare.1476832<\/a>\n\n<\/li>\n<li id=\"ITEM-14454-14\">M. Addis, and H.S. Rzepa, \"Use of DOIs in data publishing in Computational Chemistry at Imperial College London\", 2015. <a href=\"https:\/\/doi.org\/10.6084\/m9.figshare.1477994\">https:\/\/doi.org\/10.6084\/m9.figshare.1477994<\/a>\n\n<\/li>\n<\/ol>\n\n<\/div> <!-- kcite-section 14454 -->","protected":false},"excerpt":{"rendered":"<p>Management of research (data) outputs is a hot topic in the UK at the moment, although the topic has been rumbling for five years or more. Most research-active higher educational establishments have or are about to publish general guidelines, which predominantly take the form of aspirational targets rather than actionable examples\u00a0or\u00a0use-cases.\u2021 Because the concepts remain [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":5,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2],"tags":[],"ppma_author":[2661],"class_list":["post-14454","post","type-post","status-publish","format-standard","hentry","category-chemical-it"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A (light) introductory tutorial on Research Data Management (in chemistry). - Henry Rzepa&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A (light) introductory tutorial on Research Data Management (in chemistry). - Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"Management of research (data) outputs is a hot topic in the UK at the moment, although the topic has been rumbling for five years or more. Most research-active higher educational establishments have or are about to publish general guidelines, which predominantly take the form of aspirational targets rather than actionable examples\u00a0or\u00a0use-cases.\u2021 Because the concepts remain [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454\" \/>\n<meta property=\"og:site_name\" content=\"Henry Rzepa&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2015-08-20T16:34:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2015-08-23T10:37:14+00:00\" \/>\n<meta name=\"author\" content=\"Henry Rzepa\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Henry Rzepa\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A (light) introductory tutorial on Research Data Management (in chemistry). - Henry Rzepa&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","og_locale":"en_GB","og_type":"article","og_title":"A (light) introductory tutorial on Research Data Management (in chemistry). - Henry Rzepa&#039;s Blog","og_description":"Management of research (data) outputs is a hot topic in the UK at the moment, although the topic has been rumbling for five years or more. Most research-active higher educational establishments have or are about to publish general guidelines, which predominantly take the form of aspirational targets rather than actionable examples\u00a0or\u00a0use-cases.\u2021 Because the concepts remain [&hellip;]","og_url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","og_site_name":"Henry Rzepa&#039;s Blog","article_published_time":"2015-08-20T16:34:45+00:00","article_modified_time":"2015-08-23T10:37:14+00:00","author":"Henry Rzepa","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Henry Rzepa","Estimated reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454#article","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454"},"author":{"name":"Henry Rzepa","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"headline":"A (light) introductory tutorial on Research Data Management (in chemistry).","datePublished":"2015-08-20T16:34:45+00:00","dateModified":"2015-08-23T10:37:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454"},"wordCount":2387,"commentCount":4,"articleSection":["Chemical IT"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454","name":"A (light) introductory tutorial on Research Data Management (in chemistry). - Henry Rzepa&#039;s Blog","isPartOf":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website"},"datePublished":"2015-08-20T16:34:45+00:00","dateModified":"2015-08-23T10:37:14+00:00","author":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281"},"breadcrumb":{"@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=14454#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog"},{"@type":"ListItem","position":2,"name":"A (light) introductory tutorial on Research Data Management (in chemistry)."}]},{"@type":"WebSite","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#website","url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/","name":"Henry Rzepa&#039;s Blog","description":"Chemistry with a twist","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/#\/schema\/person\/2b40f7b9c872a4dc1547e040a11b6281","name":"Henry Rzepa","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g370be3a7397865e4fd161aefeb0a5a85","url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","caption":"Henry Rzepa"},"description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London.","sameAs":["https:\/\/orcid.org\/0000-0002-8635-8390"],"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?author=1"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pDef7-3L8","jetpack-related-posts":[{"id":15907,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=15907","url_meta":{"origin":14454,"position":0},"title":"Global initiatives in research data management and discovery: searching metadata.","author":"Henry Rzepa","date":"March 7, 2016","format":false,"excerpt":"The upcoming ACS national meeting in San Diego has a CINF\u00a0(chemical information division) session entitled \"Global initiatives in research data management and discovery\". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session. Data, if you think about it,\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16391,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=16391","url_meta":{"origin":14454,"position":1},"title":"Data-free research data management? Not an oxymoron.","author":"Henry Rzepa","date":"May 24, 2016","format":false,"excerpt":"I occasionally post about \"RDM\" (research data management), an activity that has recently become a formalised\u00a0essential part of the research processes. I say recently formalised, since researchers have of course kept\u00a0research notebooks recording their activities and their data since the dawn of science, but not\u00a0always in an open and transparent\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24561,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=24561","url_meta":{"origin":14454,"position":2},"title":"Data base or Data repository? &#8211; A brief and very selective history of data management in chemistry.","author":"Henry Rzepa","date":"January 26, 2022","format":false,"excerpt":"Way back in the late 1980s or so, research groups in chemistry started to replace the filing of their paper-based research data by storing it in an easily retrievable digital form. This required a computer database and initially these were accessible only on specific dedicated computers in the laboratory. These\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.ch.ic.ac.uk\/rzepa\/blog\/wp-content\/uploads\/2022\/01\/Screenshot-1015-1024x521.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":20342,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=20342","url_meta":{"origin":14454,"position":3},"title":"Open Access journal publishing debates &#8211; the elephant in the room?","author":"Henry Rzepa","date":"November 4, 2018","format":false,"excerpt":"For perhaps ten years now, the future of scientific publishing has been hotly debated. The traditional models are often thought to be badly broken, although convergence to a consensus of what a better model should be is not apparently close. But to my mind, much of this debate seems to\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16164,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=16164","url_meta":{"origin":14454,"position":4},"title":"Publishing embargoes.","author":"Henry Rzepa","date":"April 13, 2016","format":false,"excerpt":"Publishing embargoes seem a relatively new phenomenon, probably starting in areas of science when the data produced for a scientific article was considered more valuable than the narrative of that article. However, the concept of the embargo seems to be spreading to cover other aspects of publishing, and I came\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":20394,"url":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?p=20394","url_meta":{"origin":14454,"position":5},"title":"Re-inventing the anatomy of a research article.","author":"Henry Rzepa","date":"December 29, 2018","format":false,"excerpt":"The traditional structure of the research article has been honed and perfected for over 350 years by its custodians, the publishers of scientific journals. Nowadays, for some journals at least, it might be viewed as much as a profit centre as the perfected mechanism for scientific communication. Here I take\u2026","rel":"","context":"In &quot;Chemical IT&quot;","block_context":{"text":"Chemical IT","link":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/?cat=2"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":false,"authors":[{"term_id":2661,"user_id":1,"is_guest":0,"slug":"admin","display_name":"Henry Rzepa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/897b6740f7f599bca7942cdf7d7914af5988937ae0e3869ab09aebb87f26a731?s=96&d=blank&r=g","author_category":"1","first_name":"Henry","last_name":"Rzepa","user_url":"https:\/\/orcid.org\/0000-0002-8635-8390","job_title":"","description":"Henry Rzepa is Emeritus Professor of Computational Chemistry at Imperial College London."}],"_links":{"self":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/14454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14454"}],"version-history":[{"count":30,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/14454\/revisions"}],"predecessor-version":[{"id":14484,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=\/wp\/v2\/posts\/14454\/revisions\/14484"}],"wp:attachment":[{"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14454"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14454"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14454"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.ch.ic.ac.uk\/rzepa\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fppma_author&post=14454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}