12. Linking instrument PIDs to datasets

One major purpose of PIDINST is to ease tracking the scientific output of the instrument. In order to benefit from this, it is important to establish the relation between the datasets and the instrument being used to collect the data in a machine readable way.

12.1. DataCite metadata

Datasets are usually published with a DataCite DOI. The DataCite Metadata Schema allows to link the instrument from the metadata registered with that DOI for a data publication using the relatedIdentifier property. The recommended relationType is IsCompiledBy in this case. Figure 12.1 shows an example for a dataset published by HZB (https://doi.org/10.5442/ND000001). The data has been collected using neutron diffraction with the E2 - Flat-Cone Diffractometer beamline at BER II. The image show a screenshot of the data publication landing page which links the PID of the instrument. Snippet 12.1 shows a section of the DOI metadata from the same data publication containing this link.

12.2. schema.org

Figure 12.2 shows an example of marine dataset (https://doi.org/10.1594/PANGAEA.887579) published through PANGAEA. The metadata of the dataset includes descriptive information about the dataset and its related entities (e.g., scholarly article, project). The dataset was gathered through sensors attached to an autonomous underwater vehicle (AWI AUV Polar Autonomous Underwater Laboratory), which was deployed as part of a cruise campaign (MSM29). The vehicle is identified through a persistent identifier assigned by https://sensor.awi.de/. The landing page of the instrument contains metadata of the instrument such as description, manufacturer, model, contact, calibration information. Figure 12.3 depicts schema.org types and properties that may be used to model the dataset’s observation event (e.g., cruise campaign) and instrument deployed (AUV). Figure 12.4 shows the snippet of actual schema.org representation. External vocabularies (NERC SeaVoX Platform Categories and GeoLink Schema) are used to indicate the additional type for Event and Vehicle. In Schema.org, ‘Event’ refers to an occurrence at a specific time and location, for example a social event. As such, new types and properties are required to support the description of observation events and related scientific instruments to ensure full compliance with Schema.org functionality.

12.3. NetCDF4

State-of-the-art research ships are multimillion-pound floating laboratories which operate diverse arrays of high-powered, high-resolution sensors around-the-clock (e.g. sea-floor depth, weather, ocean current velocity and hydrography etc.). The National Oceanography Centre (NOC)1 and British Antarctic Survey (BAS)2 are currently working together to improve the integrity of the data management workflow from these sensor systems to end-users across the UK National Environment Research Council (NERC) large research vessel fleet, as part of a UK initiative, I/Ocean. In doing so, we can make cost effective use of vessel time while improving the FAIRness,3 and in turn, access of data from these sensor arrays. The initial phase of the solution implements common NetCDF formats across ships enabling harmonised access to data for researchers on board while reducing ambiguity using common metadata standards. The formats are based on NetCDF4 and comply with Climate Forecast conventions. NetCDF4 groups are used to include rich information about the instruments used to derive parameter streams. Data streams are linked to the instruments which produced them using the variable attribute instrument from Attribute Convention for Data Discovery (ACDD) 1-3 (Snippet 12.2). Each instrument is identified as a group where their properties are expressed in variables including the instrument’s PID. Each property is defined using common terminologies published on the NERC Vocabulary Server. In this way, users can express properties of their choice. Through groups, other information relating to parameter streams or instruments could be expressed, such as calibralibrations and instrument reference frames and orientations.

The National Centres for Environmental Information (NCEI) at the National Oceanic and Atmospheric Administration (NOAA) in the US, also report instruments in CF-NetCDF files but as empty data variables within the root group of the NetCDF file instead of sub groups. The PID instrument identifier may be expressed as an instrument attribute e.g. Snippet 12.3. Ideally, blank separated lists should be used if linking more than one instrument.

1

British Oceanographic Data Centre (BODC) and National Marine Facilities (NMF) divisions

2

Uk Polar Data Centre division

3

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

12.4. OpenAIRE CERIF metadata

The OpenAIRE Guidelines for CRIS Managers 4 provide orientation for Research Information System (CRIS) managers to expose their metadata in a way that is compatible with the OpenAIRE infrastructure as well as the European Open Science Cloud (EOSC). These Guidelines also serve as an example of a CERIF-based (Common European Research Information Format) standard for information interchange between individual CRISs and other research e-Infrastructures.

The metadata format described by the Guidelines are includes Equipment which could contain Instruments as well via the GeneratedBy property.

Snippet 12.4 Use of the equipment entity for an instrument in exposed in a product (dataset) metadata record. Detailed product (dataset) example at OpenAIRE Guidelines for CRIS Managers repository on GitHub.
  <GeneratedBy>
    <Equipment id="82394874">
                <Name xml:lang="en">SkyArrow 650 TCNS operated by IBIMET CNR</Name>
                    <Identifier type="Institution assigned unique equipment identifier">982340-29481/1999</Identifier>
                    <Description xml:lang="en">The SkyArrow 650 TCNS operated by IBIMET (CNR - Institute of Biometeorology) for the EUFAR project</Description>
    </Equipment>
  </GeneratedBy>

The products (dataset) relates internal to the Equipment record via the id attribute, eg. 82394874 . The metadata for the equipment itself is exposed via equipment metadata record and described in the Equipment entity.

Snippet 12.5 Use of the equipment entity for an instrument in exposed in a product (dataset) metadata record. Detailed equipment example at OpenAIRE Guidelines for CRIS Managers repository on GitHub.
      <Equipment xmlns="https://www.openaire.eu/cerif-profile/1.2/" id="82394874">
            <Name xml:lang="en">SkyArrow 650 TCNS operated by IBIMET CNR</Name>
            <Identifier type="Institution assigned unique equipment identifier">982340-29481/1999</Identifier>
            <Description xml:lang="en">The SkyArrow 650 TCNS operated by IBIMET (CNR - Institute of Biometeorology) for the EUFAR project</Description>
            <Owner>
      <OrgUnit id="OrgUnits/312346">
                    <Acronym>CNR</Acronym>
                    <Name xml:lang="it">CONSIGLIO NAZIONALE DELLE RICERCHE</Name>
                    <Name xml:lang="en">NATIONAL RESEARCH COUNCIL</Name>
              </OrgUnit>
            </Owner>
  </Equipment>
4

Dvořák, Jan, Czerniak, Andreas, & Ivanović, Dragan. (2023). OpenAIRE Guidelines for CRIS Managers 1.2 (1.2.0). Zenodo. https://doi.org/10.5281/zenodo.8050936