4. PIDINST metadata schema

The PIDINST metadata schema consists of common metadata properties that are used to consistently and accurately identify instruments across networks and infrastructures. Thus, in support of unambiguous identification, we recommend that an instrument’s associated metadata is published in a common language, specifically US English. Currently, two variants of the metadata schema exist. The original PIDINST schema, based on the evaluation of use cases collected by the working group, is used for prototypical implementation of metadata properties in the ePIC infrastructure. A second variant provides a mapping between PIDINST metadata properties and DataCite Metadata Schema 4.3.

4.1. Using common terminologies

Common terminologies such as controlled vocabularies, taxonomies or ontologies, are sets of standardised terms that solve the problem of ambiguities associated with metadata markup and enable records to be shared and interpreted semantically by computers. Many terminologies exist, covering a broad spectrum of disciplines and their best practices. The PIDINST schema is designed to complement multidisciplinary best practices for property values. Many properties allow for soft-typing (e.g. ownerName), giving users the ability to use values of their choice, such as free text or domain-specific terminologies. Property attributes enable users and machines to understand the context of the value (e.g., ownerIdentifier, ownerIdentifierType), again using free text or standardised terminologies. While free text is allowed, institutions should consider using common terminologies where practical to enhance the (semantic) interoperability of PID records, particularly where they form part of domain-specific best practice. For example, a comprehensive set of terminologies that describe instrumentType or the recently added Model (via modelIdentifier) are used widely in the Earth science marine domain (http://vocab.nerc.ac.uk/collection/L22/current/, http://vocab.nerc.ac.uk/collection/L05/current/). An example of the use of common terminologies in ePID records is shown in Table 4.1.

Table 4.1: Handle record of instrument identifier http://hdl.handle.net/21.T11998/0000-001A-3905-F displaying the use of common terminologies to identify instrument metadata compliant with the PIDINST schema as implemented by ePIC. The terminologies used are published on the NERC Vocabulary Server (NVS). The data for each metadata property is provided in JSON. The Handle record can be viewed at http://hdl.handle.net/21.T11998/0000-001A-3905-F?noredirect

Type Data
URL
https://linkedsystems.uk/system/instance/TOOL0022_2490/current/
21.T11148/8eb858ee0b12e8e463a5 (Identifier)
{
  "identifierValue":"http://hdl.handle.net/21.T11998/0000-001A-3905-F",
  "identiferType":"MeasuringInstrument"
}
21.T11148/9a15a4735d4bda329d80 (LandingPage)
https://linkedsystems.uk/system/instance/TOOL0022_2490/current/
21.T11148/709a23220f2c3d64d1e1 (Name)
Sea-Bird SBE 37-IM MicroCAT C-T Sensor
21.T11148/4eaec4bc0f1df68ab2a7 (Owners)
[{
  "Owner": {
    "ownerName":"National Oceanography Centre",
    "ownerContact":"louise.darroch@bodc.ac.uk",
    "ownerIdentifier":{
      "ownerIdentifierValue":
        "http://vocab.nerc.ac.uk/collection/B75/current/ORG00009/",
      "ownerIdentifierType":"URL"
     }
   }
}]
21.T11148/1f3e82ddf0697a497432 (Manufacturers)
[{
  "Manufacturer":{
    "manufacturerName":"Sea-Bird Scientific",
    "modelName":"SBE 37-IM",
    "manufacturerIdentifier":{
      "manufacturerIdentifierValue":
        "http://vocab.nerc.ac.uk/collection/L35/current/MAN0013/",
      "manufacturerIdentifierType":"URL"
    }
  }
}]
21.T11148/55f8ebc805e65b5b71dd (Description)
A high accuracy conductivity and temperature recorder with an optional pressure sensor
designed for deployment on moorings. The IM model has an inductive modem for real-time
data transmission plus internal flash memory data storage.
21.T11148/f76ad9d0324302fc47dd (InstrumentType)
http://vocab.nerc.ac.uk/collection/L22/current/TOOL0022/
21.T11148/72928b84e060d491ee41 (MeasuredVariables)
[{
  "MeasuredVariable":{
    "VariableMeasured":
      "http://vocab.nerc.ac.uk/collection/P01/current/CNDCPR01/"
  }
},{
  "MeasuredVariable":{
    "VariableMeasured":
      "http://vocab.nerc.ac.uk/collection/P01/current/PSALPR01/"
  }
},{
  "MeasuredVariable":{
    "VariableMeasured":
      "http://vocab.nerc.ac.uk/collection/P01/current/TEMPPR01/"
  }
},{
  "MeasuredVariable":{
    "VariableMeasured":
      "http://vocab.nerc.ac.uk/collection/P01/current/PREXMCAT/"
  }
}]
21.T11148/22c62082a4d2d9ae2602 (Dates)
[{
  "date":{
    "date":"1999-11-01",
    "dateType":"Commissioned"
  }
}]
21.T11148/eb3c713572f681e6c4c3 (AlternateIdentifiers)
[{
  "AlternateIdentifier":{
    "AlternateIdentifierValue":"2490",
    "alternateIdentifierType":"serialNumber"
  }
}]
21.T11148/178fb558abc755ca7046 (RelatedIdentifiers)
[{
  "RelatedIdentifier":{
    "RelatedIdentifierValue":
      "https://www.bodc.ac.uk/data/documents/nodb/pdf/37imbrochurejul08.pdf",
    "RelatedIdentifierType": "URL",
    "relationType":"IsDescribedBy "
  }
}]

4.2. Using other PIDs

4.2.1. RRIDs

In a similar way to common terminologies, persistent identifiers have been created to help users classify and accurately describe physical objects. A related PID is the RRID, research resource identifier, which identifies the classes of instruments (models) and not instances.[1] This work is undertaken by the UsedIT group, which is extending the RRID to instrument classes that could be used to describe the Model (via modelIdentifier) property (Table 4.2). RRIDs are not described in detail here, but it is envisioned that the RRID metadata schema, which was described in detail previously,[2] and extended by UsedIT, will be interoperable with instrument instance (PIDINST) PIDs. This interoperability should enable any project to quickly download data about the model to consistently fill mapped fields.

Why RRIDs? RRIDs are currently used in about 1000 journals to tag classes of research resources (including reagents like antibodies or plasmids, organisms, cell lines, and a relatively broad category of “tools” which includes software tools and services such as university core facilities, but recently has been extended to physical tools such as models of sequencers or microscopes). Because RRIDs were created as an agreement between a group of biological journals and the National Institutes of Health, they are most commonly found and linked in the biological sciences literature (e.g., Cell, eLife), they are part of the JATS NISO standard, STAR Methods, and the MDAR pan-publisher reproducibility checklist, resolved by identifiers.org and the n2t resolver and echoed by some of the major reagent providers (e.g., Thermo Fisher, Addgene, and the MMRRC mouse repository).

Table 4.2: Example showing the use of RRIDs in the PIDINST metadata schema.

ID Property Obligation Occ. Definition Allowed values, constraints, remarks
6 Model R 0-1 Name of the model or type of device as attributed by the manufacturer Element
6.1 modelName R 1 Full name of the model

Name field from RRID

E.g.

‘Illumina HiSeq 3000/HiSeq 4000 System’

6.2 modelIdentifier O 0-1 Persistent identifier of the model

RRID identifier

E.g.

‘RRID:SCR_016386’

6.2.1 modelIdentifierType O 1 Type of the identifier

Free text; must be identifier type

E.g. ‘RRID’

[1]Bandrowski A, Brush M, Grethe JS, Haendel MA, Kennedy DN, Hill S, Hof PR, Martone ME, Pols M, Tan SC, Washington N, Zudilova-Seinstra E, Vasilevsky N. The Resource Identification Initiative: A Cultural Shift in Publishing. J Comp Neurol. 2016 Jan 1;524(1):8-22. https://doi.org/10.1002/cne.23913
[2]Bandrowski AE, Cachat J, Li Y, Müller HM, Sternberg PW, Ciccarese P, Clark T, Marenco L, Wang R, Astakhov V, Grethe JS, Martone ME. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework. Database (Oxford). 2012 Mar 20;2012:bas005. https://doi.org/10.1093/database/bas005