Research data management in Computational and Experimental Molecular Sciences.

Demonstrations and deliverables resulting from the Project:

# Summary Link or DOI
1 Method 1. A procedure for directly (silently) retrieving data from a digital repository and displaying it using Javascript components. This Method 1 involves adding a custom DSpace repository record known as the 10320/loc. The procedure is deployed in a table, which is itself part of a peer-reviewed published scientific narrative, DOI: 10.1039/C3SC53416B. The procedure itself is described at DOI: 10.1021/ci500302p and involves modifications to the DSpace HANDLE-manager on SPECTRa-DSpace (not the repository itself). DOI: 10.6084/m9.figshare.840483 (original working published example, shortDOI: qcc)
DOI: 10042/a3v24 (new stand-alone demo)
DOI: 10042/a3v25 (updated working example)
2 Method 2. Method 1 is not currently supported by e.g. the DataCite organisation (see DOI: 10.1007/s10822-014-9776-5)) and part of the current project was to re-engineer the protocols in favour of the more general and robust method 2. This version makes use of the DataCite metadata associated with a given DOI to locate METS or ORE metadata held on the original repository. This retrieved metadata is then processed to return the desired URL for the dataset, as specified by a selector. For successful resolution, this requires that a given DOI be issued by the DataCite DOI registration agency, with DataCite metadata that references METS or an ORE resource map using the 'relatedIdentifier', 'relationType = HasMetadata' and 'relatedMetadataScheme = METS' or 'relatedMetadataScheme = ORE' properties.
  1. DOI: 10042/a3v26 (demo)
  2. DOI: 10042/a3v22 (working example)
  3. DOI: 10042/a3v2a (an example produced for chemistry lecture notes)
3 Method 3. A third method has been developed based on the DataCite Media API. This involves registering new MIME types with DataCite, which we have done, and then exploiting these MIME types to negotiate content. Instead of redirecting to the usual landing page, data DOIs can then resolve to these alternative URLs through HTTP content negotiation. Typically this works as invoking e.g. http://data.datacite.org/chemical/x-cml/10.14469/ch/25099. This method can be seen operational at DOI-2-Data
4 Discovery 1. The use of DOIs issued by agencies affiliated with DataCite enables rich meta-data searches. In this example, we have configured our DSpace repository to forward metadata related to the ORCID identifer.
  1. http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+publicationYear:[2014+TO+2014]
    DOI: 10042/a3v1x
  2. http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469
    (the media type in this instance is chemical and the prefix is Imperial College)
  3. http://search.datacite.org/ui?q=ORCID:*+prefix:10.14469
    (all entries at Imperial College which an associated ORCID).
5 Discovery 2. Other metadata searches enable chemical discoverability using a unique molecular identifier, the InChI.
  1. http://search.datacite.org/ui?q=InChIKey=LQPOSWKBQVCBKS-PGMHMLKASA-N
    DOI: 10042/a3v1y
  2. http://search.datacite.org/ui?q=alternateIdentifier:InChIKey\:*
6 Use-Metrics. Propagating persistent identifiers (DOIs) and metadata to e.g. DataCite allows other interesting applications. This one shows the total resolutions, per month, for datasets held at Imperial. This in turn provides a measure of impact of the datasets. http://stats.datacite.org/?fq=datacentre_facet%3A%22BL.IMPERIAL+-+Imperial+College+London%22&fq=allocator_facet%3A%22BL+-+The+British+Library%22&q=#tab-resolution-report
DOI: 10042/a3v1z
7 Content Creation. A simple 40-line HTML template for invoking data from a repository using Method 2 and displaying it. This is customised to view molecular information using the JSmol Javascript procedures. We recommend Javascript, which allows a wide variety of devices to display (including tablets). The template can be modified to allow other visualisers to be used (the program must be capable of resolving a URL provided to it by our methods). DOI: 10042/a3v26 (demo)
8 Data Curation. A 9-year old data set deposited in the Cambridge university DSpace was created in 2005 as part of project published as DOI: 10.1007/s00894-005-0278-1. The current project investigated a model for its curation, by retrieving this set (comprising some 174,000 individual entries), updating the data container using a more modern and standard XML schema, annotating the data with new results using a new algorithm (PM7) via the College HPC resource, and (re)depositing the aggregated results into the Imperial SPECTRA-DSpace repository. Unlike the original dataset, the new version will be a fully-compliant SWORD-endpoint with discovery-metadata and DataCite persistent identifiers. DOI: 10042/31117 (invoke e.g. this link to view the individual depositions by date). An article describing this process is about to be submitted.
9 Dissemination 1. Invited presentations to the ODIN (ORCID-DataCite interoperability network) session at the RDA (Research data alliance) meeting in Amsterdam, September 21-24. Our work was identified as providing an effective and currently unique use-case example for ODIN, and further opportunities to discuss our work were identified.
  1. ODIN Talk DOI: 10042/a3v21
  2. FORCE2015 Talk: 10042/a3v2f
  3. Winterschool 2015 Talk: 10042/a3v2d
10 Dissemination 2. A stated objective was to convert the current RDM system, comprising a submission portal interfacing to a DSpace repository, into a self-contained software installation package. This would enable easy installation of the complete system on other sites (a collaborator willing to test this procedure has been identified). The package would also include a standalone command line Dspace deposition tool for individual items. Github repository :
  1. Uportal package: https://github.com/ICHPC/hpc-portal (doi: 10.5281/zenodo.19174) and installation instructions with test installation.
  2. Command line deposition tool for Figshare: https://github.com/ICHPC/figshare-importer
  3. Command line deposition tool for DSpace: https://github.com/ICHPC/dspace-importer
11 Dissemination 3. We have developed an essential metadata protocol for molecular species as part of a digital repository. We have contacted two other molecular-based repositories with details of this protocol, and both have now implemented these enhancements on their own systems.
  1. Southampton eCrystals (Department of Chemistry, University of Southampton): DOI: 10.5258/ECRYSTALS/145 (shortDOI: v24) and associated metadata: data.datacite.org/10.5258/ECRYSTALS/145
  2. Chemotion (Karlsruhe Institute of Technology, Institute of Organic Chemistry and Institute of Toxicology and Genetics): 10.14272/XFNLWZCTEDTRGB-KJWHEZOQSA-N.1 (shortDOI: v25) and associated metadata: data.datacite.org/10.14272/XFNLWZCTEDTRGB-KJWHEZOQSA-N.1
12 Dissemination 4. The FORCE11 meeting series is one outcome from original event where the Amsterdam Manifesto on research data was formulated. The 2015 event is in Oxford, and brings together much of the recent activity in this area. An online demonstration which combines all the features of methods 1-3 as outlined above has been accepted, together with an oral presentation. https://www.force11.org/meetings/force2015/forms/call-for-demos-posters
DOI-2-Data, metadata-based retrieval of scientific data for re-use and visual presentation.
13 Dissemination 5. The following query was posted to a highly active discussion list:
On 6 Dec 2014, at 00:32, XXXXXX wrote:

Is there a site where I can post a link to a structure - I'm thinking
of a GAMESS (US) log file - which would return the jmol object of the
structure in the file?

That is, say I want to share a file with a broader audience, rather
than building and hosting a web page, is there a web page out there
where I can ask others to post a figshare url which would show the
structure in jmol?
This was our response to the list:

We have a demo (and a journal article) coming up at the conference FORCE2015 in January where we present a solution to this problem based on three different standards-based solutions using metadata and a persistent time-stamped identifier (DOI) assigned to the file(set). See http://doi.org/10042/a3v2b for demos. The dataDOI can hence be cited in the manner conventional for referencing journal articles, and has the further advantage that the system allows searching in various ways, ie http://search.datacite.org/ui?q=InChIKey=LQPOSWKBQVCBKS-PGMHMLKASA-N or http://search.datacite.org/ui?q=alternateIdentifier:InChIKey\:* or http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469 (the prefix identifies the registered site)

Unfortunately the Figshare repository specifically does not currently support the requisite standards. In our case they were all implemented in a Dspace repository supporting the required standards.

In general, we think that the "hard coded" non-persistent URL based method of retrieving files from remote locations (such as used by Jmol with eg PubChem, PDB etc) should be replaced in the future by DOI based methods which also associate and export rich metadata with the object.

14 Dissemination 6. The Royal Society of Chemistry has a current awareness publication, Chemistry World, which has recently published an opinion on research metrics. Included in that analysis is the role of data citation metrics. metrics-have-their-place-peer-review-remains-king-hefce-review
15 Open solutions. The MediaWiki foundation has launched a project called WikiData, with the objective of using this as the basis for an infrastructure to provide data for WikiPedia and other MediaWiki pages A test bed for use of WikiData for research can be found at https://www.wikidata.org/wiki/Wikidata:WikiProject_Wikidata_for_research
156 Commercial solutions. This recent collaboration between Arkivium and Figshare incorporates a generic solution to RDM. The technologies we illustrate in 1-10 above are not (fully) supported in this commercial product. It would be of interest however to track this product in this regard, and possibly to request these as features in future releases. http://arkivum.com/loughborough-university-partners-figshare-symplectic-arkivum/. Arkivum have produced three open documets relating to RDM, two of which reference the work done at Imperial during this project:
  1. RDM workflows and integrations for HEIs using hosted services, doi:547
  2. Use of DOIs in data publishing in Computational Chemistry at Imperial College London, doi:548
  3. Flowchart for Researchers to meet Imperial College London RDM policy, doi:549
17 Policy statements. A number of funding organisations in the USA and elsewhere have issued policy statements regarding open data and its management.
  1. Canadian Government: The Government of Canada will maximize access to federally funded scientific research to encourage greater collaboration and engagement with the scientific community, the private sector, and the public.
  2. USAID announces open data policy
  3. Department of Energy (DOE): The DOE Public Access Gateway for Energy and Science will make scholarly scientific publications resulting from DOE research funding publicly accessible and searchable at no charge to readers; and to instituting data management principles and requirements that ultimately will apply to proposals for research funding submitted to all DOE program offices.
  4. OSTP: expanding public access to results offederally-funded-research

Home