Skip to content

Latest commit

 

History

History
162 lines (142 loc) · 5.23 KB

12-PANGAEA.md

File metadata and controls

162 lines (142 loc) · 5.23 KB

PANGAEA Data Publisher User Case

Anusuriya Devaraju, Michael Diepenbroek, Uwe Schindler, Robert Huber
{adevaraju, mdiepenbroek, uschindler, rhuber}@marum.de
20th September 2018

Introduction to PANGAEA

PANGAEA is a data infrastructure for archiving and publishing Earth and Environmental datasets. It is hosted by the Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research (AWI) and the Center for Marine Environmental Sciences (MARUM) of University of Bremen. The infrastructure holds more than 370000 datasets from individual researchers, projects, data centers and research infrastructures. The datasets include quantitative, textual data and binary files such as audio and video. Datasets are archived with their metadata in a relational database. They are published and registered with Digital Object Identifiers (DOIs), and are accessible via the web portal [1]. For advanced interaction, related web services and APIs, e.g., OAI-PMH metadata provider and Elasticsearch API, and a data warehouse are also available.

Use Case

The PANGAEA metadata primarily represents descriptions of datasets. At present, it only contains device types (e.g., pollen sampling device and barometer) and method types (e.g., X-ray fluorescence and continuous flow analysis (CFA)). These types are applied as part of the faceted search on the PANGAEA web portal. We may improve the discovery of PANGAEA datasets by capturing the persistent identifiers of devices and their relations to datasets as part of the metadata. For example, the web portal may display datasets with the relevant device persistent identifiers. These identifiers should be translated into actionable links on the portal. This association between a dataset and its source (device) is vital for reproducibility of science as it adds to a better understanding of the lineage and quality of the dataset through the device information.

In PANGAEA, data owners may publish datasets at different stages, e.g., raw, derived, and data products, depending on their applications. Consequently, in some cases users misused and misinterpreted the two fields ‘device’ and ‘method types’. We may avoid this ambiguity through device persistent identifiers, as their landing pages provide more comprehensive descriptions of the devices.

Device Metadata

In the PANGAEA database, the ‘term’ table contains standard terminologies used to describe datasets including the definition of parameter (quantity and features), method and device types. Currently, there are 1364 device types defined in the table.

The metadata service only includes the device type as part of the metadata of a dataset, e.g., see the field ‘agg-device’ in the response returned by the request http://ws.pangaea.de/es/pangaea/panmd/886115. More detailed information about the device type is available through the PANGAEA ElasticSearch Term Index [2]. For example, here is the metadata of the device type ‘Box corer’.

The following table summarizes the metadata related to the device type. The table excludes the common fields (e.g., _index,_score) originated from the ElasticSearch.

Property Occurrence Definition DataType
_id 1 Id of the term Integer
name 1 Name of the term String
abbreviation 0..1 Abbreviation of the term String
terminology_id 1 The id of the terminology/ontology that specifies the term Integer
description_uri 0..1 The URI of the term String
status 1 The id of the status of the term (for internal editing purpose) Integer
terminology 1 The name of the terminology/ontology that specifies the term; identifies by the terminology id. String

The following are the required properties to support the PID Instrument use case in PANGAEA:

Property Occurrence Definition DataType
Identifier 1 Unique and persistent identifier of a device URI
Name 1 The name of the instrument, which will be used to support the full-text search. String
Device Type 0..1 Controlled vocabularies of device types, which will be used to support the full-text search. String

[1] https://www.pangaea.de/

[2] http://ws.pangaea.de/es/pangaea-terms/term/_search