12. Linking instrument PIDs to datasets

One major purpose of PIDINST is to ease tracking the scientific output of the instrument. In order to benefit from this, it is important to establish the relation between the datasets and the instrument being used to collect the data in a machine readable way.

12.1. DataCite metadata

Datasets are usually published with a DataCite DOI. The DataCite Metadata Schema allows to link the instrument from the metadata registered with that DOI for a data publication using the RelatedIdentifier and RelatedItem properties. The recommended relationType is IsCollectedBy in this case. Figure 12.1 shows an example for a dataset published by HZB (https://doi.org/10.5442/ND000001). The data has been collected using neutron diffraction with the E2 - Flat-Cone Diffractometer beamline at BER II. The image show a screenshot of the data publication landing page which links the PID of the instrument. Snippet 12.1 and Snippet 12.2 show sections from the DOI metadata from the same data publication containing this link.

12.2. schema.org

Figure 12.2 shows an example of marine dataset (https://doi.org/10.1594/PANGAEA.887579) published through PANGAEA. The metadata of the dataset includes descriptive information about the dataset and its related entities (e.g., scholarly article, project). The dataset was gathered through sensors attached to an autonomous underwater vehicle (AWI AUV Polar Autonomous Underwater Laboratory), which was deployed as part of a cruise campaign (MSM29). The vehicle is identified through a persistent identifier assigned by https://sensor.awi.de/. The landing page of the instrument contains metadata of the instrument such as description, manufacturer, model, contact, calibration information. Figure 12.3 depicts schema.org types and properties that may be used to model the dataset’s observation event (e.g., cruise campaign) and instrument deployed (AUV). Figure 12.4 shows the snippet of actual schema.org representation. External vocabularies (NERC SeaVoX Platform Categories and GeoLink Schema) are used to indicate the additional type for Event and Vehicle. In Schema.org, ‘Event’ refers to an occurrence at a specific time and location, for example a social event. As such, new types and properties are required to support the description of observation events and related scientific instruments to ensure full compliance with Schema.org functionality.

12.3. NetCDF4

State-of-the-art research ships are multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). The National Oceanography Centre (NOC)[1] and British Antarctic Survey (BAS)[2] are currently working together to improve the integrity of the data management workflow from these sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, as part of the initiative, I/Ocean. In doing so, we can make cost effective use of vessel time while improving the FAIRness,[3] and in turn, access of data from these sensor arrays. The initial phase of the solution implements common NetCDF formats enabling harmonised access to data for researchers across ships. The formats are based on NetCDF4 and comply with Climate Forecast conventions. It has currently been proposed that NetCDF4 groups could be used to identify instruments and associated metadata in a similar way to the SONAR-netCDF4 convention for sonar data[5]. In doing so, the instrument PID is implemented as the data of a geophysical variable within a group that has an applicable date range (Snippet 12.3). For example, when the sensor was installed. Data streams are then linked to the instruments which produced them using the variable attribute instrument from Attribute Convention for Data Discovery (ACDD) 1-3. Through groups, other variables or attributes could hold more detailed information relating to an instrument. Additionally, groups may potentially offer a way to store other information with valid date ranges, such as calibrations, instrument reference frames and instrument orientations (e.g. the reference point of an anemometer).

The National Centres for Environmental Information (NCEI) at the National Oceanic and Atmospheric Administration (NOAA) in the US, report instruments using a CF-NetCDF specification[4]. These are either global attributes specified using the instrument attribute from the Attribute Convention for Data Discovery (ACDD) 1-3. Alternatively they are defined as empty geophysical variables within the root group of the NetCDF file. In the latter case, the instrument PID may be expressed as an attribute instrument_pid within the recommended variable attributes as shown in Snippet 12.4. Alternatively, an instrument_pid attribute could be added to the set of global attributes.

12.4. OpenAIRE CERIF metadata

The OpenAIRE Guidelines for CRIS Managers [6] provide orientation for Research Information System (CRIS) managers to expose their metadata in a way that is compatible with the OpenAIRE infrastructure as well as the European Open Science Cloud (EOSC). These Guidelines also serve as an example of a CERIF-based (Common European Research Information Format) standard for information interchange between individual CRISs and other research e-Infrastructures.

The metadata format described by the Guidelines are includes Equipment which could contain Instruments as well via the GeneratedBy property.

Snippet 12.5 Use of the equipment entity for an instrument in exposed in a product (dataset) metadata record. Detailed product (dataset) example at OpenAIRE Guidelines for CRIS Managers repository on GitHub.
  <GeneratedBy>
    <Equipment id="82394876">
        <Name xml:lang="en">E2 - Flat-Cone Diffractometer</Name>
        <Identifier type="DOI">https://doi.org/10.5442/NI000001</Identifier>
        <Description xml:lang="en">A 3-dimensional part of the reciprocal space can be scanned in less then five steps by combining the “off-plane Bragg-scattering” and the flat-cone layer concept while using a new computer-controlled tilting axis of the detector bank. Parasitic scattering from cryostat or furnace walls is reduced by an oscillating \"radial\" collimator. The datasets and all connected information is stored in one independent NeXus file format for each measurement and can be easily archived. The software package TVneXus deals with the raw data sets, the transformed physical spaces and the usual data analysis tools (e.g. MatLab). TVneXus can convert to various data sets e.g. into powder diffractograms, linear detector projections, rotation crystal pictures or the 2D/3D reciprocal space.</Description>
    </Equipment>
  </GeneratedBy>

The products (dataset) relates internal to the Equipment record via the id attribute, eg. 82394874. The metadata for the equipment itself is exposed via equipment metadata record and described in the Equipment entity.

Snippet 12.6 Use of the equipment entity for an instrument in exposed in a product (dataset) metadata record. Detailed equipment example at OpenAIRE Guidelines for CRIS Managers repository on GitHub.
  <Equipment xmlns="https://www.openaire.eu/cerif-profile/1.2/" id="82394876">
    <Name xml:lang="en">E2 - Flat-Cone Diffractometer</Name>
    <Identifier type="DOI">https://doi.org/10.5442/NI000001</Identifier>
    <Description xml:lang="en">A 3-dimensional part of the reciprocal space can be scanned in less then five steps by combining the “off-plane Bragg-scattering” and the flat-cone layer concept while using a new computer-controlled tilting axis of the detector bank. Parasitic scattering from cryostat or furnace walls is reduced by an oscillating \"radial\" collimator. The datasets and all connected information is stored in one independent NeXus file format for each measurement and can be easily archived. The software package TVneXus deals with the raw data sets, the transformed physical spaces and the usual data analysis tools (e.g. MatLab). TVneXus can convert to various data sets e.g. into powder diffractograms, linear detector projections, rotation crystal pictures or the 2D/3D reciprocal space.</Description>
    <Owner>
      <OrgUnit id="OrgUnits/350002">
        <Acronym>HZB</Acronym>
        <Name xml:lang="de">Helmholtz-Zentrum Berlin Für Materialien Und Energie</Name>
        <Name xml:lang="en">Helmholtz-Zentrum Berlin</Name>
        <RORID>https://ror.org/02aj13c28</RORID>
      </OrgUnit>
    </Owner>
  </Equipment>