Helmholtz Kernel Information Profile integration #138

christian-rli · 2024-02-21T06:19:42Z

Description of the issue

As pointed out by @carstenhoyerklick there should be a way to handle the Helmholtz Kernel Information Profile in oemetadata or to at least map it.

Ideas of solution

New field? Reference in an existing one? Please discuss.

Workflow checklist

I am aware of the workflow in CONTRIBUTING.md

carstenhoyerklick · 2024-02-21T13:18:33Z

In chapter 3 there a number of data fields with should be part of the metadata. We should look at the keys there and check which ones we are maybe missing yet.

e.g.
• UnterEmbagroUntil

For the most we have an equivalent in the OEMetadata, we should list a mapping there.

carstenhoyerklick · 2024-04-15T15:59:11Z

I started a mapping between the Helmholtz KIP and OE Metadata.

https://docs.google.com/spreadsheets/d/1Q0tWNRujw3taKw4f-jUVjlFl2anZe38WtVpcD9IlO2s/edit?usp=sharing

The legend is roughly:

White: Existing in OEMetadata
Yellow: not existing in OEMetadata
Orange: Maybe we can use information from the databus Metadata

jh-RLI · 2024-04-16T09:38:17Z

Great, that's a very helpful first step. I will provide a full example of the new oemetadata version proposal in advance of our meeting next week.

christian-rli · 2024-05-30T14:17:58Z

Thank you for a helpful start @carstenhoyerklick . Do I understand correclty that the proposed course of action is to implement all the fields highlighted in yellow and ignore the orange ones, because they are covered by the databus? Or should we implement the orange ones as well, so that the information can be shown in the regular metadata? Either way it's quite a big extension of the current standard. Do we agree that all of the fields should show up there? If yes, I'm happy to implement them in the example files and schemas.

carstenhoyerklick · 2024-05-31T07:09:40Z

My personal preference would be to implement them all and to use the meatadata string as a master source to populate the databus. On the other side, if you have information on two places there is the danger of contradicting information. Which again might be a reason to have it all in the metadata string, as an authoritative source.

christian-rli · 2024-06-04T08:51:17Z

@jh-RLI and I agree. We will implement them in the next version. The resulting list of fields will be quite long and intimidating. Therefore we also decided that the tooling will take that into account. The conversion and export software will return the metadata with only the populated fields by default - empty fields are provided optionally.

- add new trace field for traceability

christian-rli · 2024-06-18T14:42:26Z

@jh-RLI and I sorted though the new tags and came up with a structure. We thought it made sense to group almost all new keys together on resource level.

"trace": {
  "alternateOf": "",
  "checksum": "",
  "dateModified": "",
  "digtalObjectLocationAccessProtocol": "",
  "digitalObjectType": "",
  "hadPrimarySource": "",
  "hasMetadata": "",
  "isMetadataFor": "",
  "policy": "",
  "provenanceGraph": "",
  "specializationOf": "",
  "version": "",
  "wasDerivedFrom": "",
  "wasGeneratedBy": "",
  "wasRevisionOf": "",
  "wasQuotedFrom": "",
  "contributort stas": [
    {
      "title": "John Doe",
      "email": "[email protected]",
      "date": "2016-06-16",
      "object": "data and metadata",
      "comment": "Fix typo in the title."
    }
  ]
},

The key name is open for debate. We were looking for something that encompasses things you would need for provenance and reproducibility. Other candidates were 'track', 'trail', 'linked data' or 'provenance'. Currently we like 'trace', but feel free to convince us otherwise.

Other notes:

I understand "isMetadataFor" such that by default it would describe the resource on the OEP. In other words the key would be a duplicate "id" most of the time. Therefore on the OEP it should basically be hidden virtually all the time.

There is no explanation for "locationPreview". Can you help out @carstenhoyerklick ?

"underEmbargoUntil" can go next to date. It's a bit awkward to implement, because one turns into the other, ideally, but if it's not actually published on the planned date there has to be a logic on the OEP to deal with that.

@carsten can you maybe elaborate on the "wasQuotedFrom" field? What's the difference between sources? Does this concern the entire dataset (i.e. this whole table is actually a quote from another resource) or is it meant to reflect sources for parts of the data. Maybe another key within sources or a redefinition to a URI would help here. I assume it's not a "quotedBy" that lists where the resource has been quoted.

carstenhoyerklick · 2024-06-20T14:58:46Z

I understand "isMetadataFor" such that by default it would describe the resource on the OEP. In other words the key would be a duplicate "id" most of the time. Therefore on the OEP it should basically be hidden virtually all the time.

I think we should think beyond the OEP here. For the OEP id doubles, but for other repositories it may not. I think it is fair to hide it on the OEP.

There is no explanation for "locationPreview". Can you help out @carstenhoyerklick ?
According the HMC document HMC Kernel Informaiton Profile Page 22 it is a web-resolvable point to a preview, e.g. a low-resolution image of the object referenced. It comes from a RDA recommendation.

This may be relevant for non tabular data. E.g. GIS data sets, they can be connected to a preview.

"underEmbargoUntil" can go next to date. It's a bit awkward to implement, because one turns into the other, ideally, but if it's not actually published on the planned date there has to be a logic on the OEP to deal with that.

I think it is save to ignore it on the OEP, as it takes only published data. But it may be relevant for other platforms.

@carsten can you maybe elaborate on the "wasQuotedFrom" field? What's the difference between sources? Does this concern the entire dataset (i.e. this whole table is actually a quote from another resource) or is it meant to reflect sources for parts of the data. Maybe another key within sources or a redefinition to a URI would help here. I assume it's not a "quotedBy" that lists where the resource has been quoted.

What it means is that this data set which is documented is quoted in another data set. It is also an RDA recommendation. It could be that the documented data set is a sub-set of a larger data set, which has been devided. IsQuotedFrom could be an umbrella data set which references this data set as a subset. It is a kind of a backpointer.

carstenhoyerklick · 2024-06-20T15:44:13Z

@jh-RLI and I sorted though the new tags and came up with a structure. We thought it made sense to group almost all new keys together on resource level.

I thought a while about it and I think we have to make some careful thoughts.

Some of the things as alternateOf or checksum, 'digtalObjectLocationAccessProtocolor digitalObjectType` may more in the general part.

We have thing about what are source and what are revisions. In general if a data set is revised, the original data set is a source.
But you could thinks of source are data sets that we used to produce the data set. The new data set has been created by a fusion/modeling process and these are the data sources. These source may have very different characteristics than the target data set.

Revisions are a bit different. The characteristics of the data stays basically the same. A revision may also change some of the structures of the data.

The Helmholtz Kernel information profile differentiates between different types of sources. wasDerivedFrom is probably closest to the sources we have. specializationOf could be a subset of a larger data set or something similar which make this data set more special than the original or a data set specifically enriched . wasRevisionOf probably is more towards an update of the data set. The characteristics come from RDA or PROV-DM (Prov Data Model). Therefore I think we cannot ignore these. But we have to find a way to handle the difference source-target relations which come from the PROV-Data Model

jh-RLI · 2024-10-11T07:55:18Z

@Ludee We should take another look at the last two comments.

carstenhoyerklick · 2024-10-11T08:02:46Z

I have implemented the Helmholtz KIP Information a bit different in the Open Transport Metasdata. Maybe we could try to align this.

Ludee · 2024-10-11T21:28:12Z

From my point of view this is a huge overload of the metadata standard. One major principle of OEMetadata was to keep it as simple as possible. Each topic and key should be discussed separately in order to be added.
For now I will remove all keys because none of them are relevant to the OEP at the moment.

…rload Remove hkip keys #138

jh-RLI · 2024-10-23T14:52:32Z

I think most of the keys (the more technical ones) are already included in the metadata layer the MOSS tool provides ontop of the oemetadata. They will be available as soon as the data is registered there. Keeping the oemetadata more lean and then link to other resources is a good idea I think. For now, I think we can close this issue-

carstenhoyerklick · 2024-10-23T15:52:44Z

Fine with me.

christian-rli added the enhancement New feature or request label Feb 21, 2024

christian-rli assigned jh-RLI, christian-rli and carstenhoyerklick Feb 21, 2024

carstenhoyerklick assigned koubaa-hmc Feb 21, 2024

jh-RLI added a commit that referenced this issue Jun 18, 2024

update example for v2.0.0 #138

6fda5f7

- add new trace field for traceability

christian-rli added a commit that referenced this issue Jun 18, 2024

Add keys under 'trace' #138

867ef9d

christian-rli added a commit that referenced this issue Aug 15, 2024

Add provenance.json with Helmholtz KIP #138

25e48b3

christian-rli added a commit that referenced this issue Aug 15, 2024

Remove contribution.json as it's included in provenance #138

d039db4

christian-rli added a commit that referenced this issue Aug 15, 2024

Replace contributors with provenance in schema #138

589572f

christian-rli mentioned this issue Aug 15, 2024

Feature/helmholtz kernel profile 138 #150

Merged

jh-RLI closed this as completed in #150 Oct 4, 2024

jh-RLI reopened this Oct 11, 2024

Ludee added a commit that referenced this issue Oct 11, 2024

Remove hkip keys #138

762e48b

Ludee added a commit that referenced this issue Oct 11, 2024

Merge pull request #156 from OpenEnergyPlatform/feature-138-removeove…

22f1dc4

…rload Remove hkip keys #138

Ludee closed this as completed Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helmholtz Kernel Information Profile integration #138

Helmholtz Kernel Information Profile integration #138

christian-rli commented Feb 21, 2024

carstenhoyerklick commented Feb 21, 2024

carstenhoyerklick commented Apr 15, 2024

jh-RLI commented Apr 16, 2024

christian-rli commented May 30, 2024

carstenhoyerklick commented May 31, 2024

christian-rli commented Jun 4, 2024

christian-rli commented Jun 18, 2024

carstenhoyerklick commented Jun 20, 2024

carstenhoyerklick commented Jun 20, 2024

jh-RLI commented Oct 11, 2024

carstenhoyerklick commented Oct 11, 2024

Ludee commented Oct 11, 2024

jh-RLI commented Oct 23, 2024

carstenhoyerklick commented Oct 23, 2024

Helmholtz Kernel Information Profile integration #138

Helmholtz Kernel Information Profile integration #138

Comments

christian-rli commented Feb 21, 2024

Description of the issue

Ideas of solution

Workflow checklist

carstenhoyerklick commented Feb 21, 2024

carstenhoyerklick commented Apr 15, 2024

jh-RLI commented Apr 16, 2024

christian-rli commented May 30, 2024

carstenhoyerklick commented May 31, 2024

christian-rli commented Jun 4, 2024

christian-rli commented Jun 18, 2024

carstenhoyerklick commented Jun 20, 2024

carstenhoyerklick commented Jun 20, 2024

jh-RLI commented Oct 11, 2024

carstenhoyerklick commented Oct 11, 2024

Ludee commented Oct 11, 2024

jh-RLI commented Oct 23, 2024

carstenhoyerklick commented Oct 23, 2024