DOI-2-data:

Interoperability for Data Repositories. Metadata-based procedures for Retrieving Data for Display or Mining Utilising Persistent (data-DOI) Identifiers.

Matthew J. Harvey
, Nicholas J. Mason, Andrew McLean and Henry S. Rzepa
Imperial College London, South Kensington Campus, London, SW7 2AZ.

Abstract: In the spirit of the Joint Declaration of Data Citation Principles, we present three flexible solutions for enabling machine-access to repository-based scientific data, given knowledge of the persistent identifier (DOI) of the dataset. Data from published chemical datasets will be retrieved by these methods for visualisation by clicking any of the buttons below.

Graphical abstract showing the path from a data-DOI citation in a scholarly article to data visualisation.

This demonstration uses JSmol as an archetypal web application for visualising and processing chemical data.

Using features of the Handle System

The handle system, maintained by CNRI, is the underlying technology behind the DOI system: a popular persistent identifier widely used in the scholarly publishing industry. Typically, the handle record of a DOI will consist of a URL value, which is redirected to when the DOI is resolved. By convention, this URL points to a human-readable landing page.

1 10320/loc

10320/loc is a handle value type that was introduced to improve the selection of specific resource URLs and to add features to the handle-to-URL resolution process. The type consists of an xml-encoded list of locations that can be filtered by attribute by appending a locatt parameter to the DOI-string upon resolution.



Alternatively, since the 10320/loc locations are machine readable, the handle record can be retrieved using the Handle REST API, and processed using JavaScript to return the URL of a file of interest.

(default selection set to chemical/x-cml)

This method has been used to construct interactive tables as part of a peer-reviewed article in a high impact journal:

... and is further described here:

Using the DataCite infrastructure

2 The DataCite Media API

The DataCite Metadata Store (MDS) API includes a media resource, where MIME types can be associated with additional URLs as key:value pairs. Instead of redirecting to the usual landing page, DOIs can then resolve to these alternative URLs through HTTP content negotiation. Files can also be retrieved from their DataCite URL, as demonstrated below.

The Media API has its limitations when a fileset has more than one file of the same MIME type. In this respect, it is less flexible that the 10320/loc based solution previously described, where files may be selected based on MIME-type, filename or any other specified attribute; and where content negotiation is also possible.

3 OAI-ORE Resource Maps exposed through DataCite metadata

The limitations of DataCite Media can be overcome by exposing structural metadata of the published fileset in a machine discoverable and readable way.

ORE (Object Reuse and Exchange) is a standard, maintained by the Open Archives Initiative (OAI), for describing aggregations of web resources through documents referred to as Resource Maps, serialised in atom, RDF or RDF-a. We have made the OAI-ORE Resource Map and METS metadata (as generated by DSpace) discoverable by including their locations as relatedIdentifiers within the DataCite metadata for the dataset. For this we use the HasMetadata relation type, introduced in version 3.0 of the DataCite Schema.

The DataCite metadata for a given assigned DOI can be retrieved through content negotiation. From this, ORE or METS metadata, if found, can then be retrieved and processed to return the URL of a specified file of interest. The flexibility of this approach makes it complimentary to DataCite Media. However, it is less efficient that the previous methods, requiring several HTTP requests and xml processing, which is done here with javascript.


(Selection by MIME-type)
(Selection by Filename)

This method has been used to construct an interactive table as part of a peer-reviewed article in a high impact journal: