-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helmholtz Kernel Information Profile integration #138
Comments
In chapter 3 there a number of data fields with should be part of the metadata. We should look at the keys there and check which ones we are maybe missing yet. e.g. For the most we have an equivalent in the OEMetadata, we should list a mapping there. |
I started a mapping between the Helmholtz KIP and OE Metadata. https://docs.google.com/spreadsheets/d/1Q0tWNRujw3taKw4f-jUVjlFl2anZe38WtVpcD9IlO2s/edit?usp=sharing The legend is roughly:
|
Great, that's a very helpful first step. I will provide a full example of the new oemetadata version proposal in advance of our meeting next week. |
Thank you for a helpful start @carstenhoyerklick . Do I understand correclty that the proposed course of action is to implement all the fields highlighted in yellow and ignore the orange ones, because they are covered by the databus? Or should we implement the orange ones as well, so that the information can be shown in the regular metadata? Either way it's quite a big extension of the current standard. Do we agree that all of the fields should show up there? If yes, I'm happy to implement them in the example files and schemas. |
My personal preference would be to implement them all and to use the meatadata string as a master source to populate the databus. On the other side, if you have information on two places there is the danger of contradicting information. Which again might be a reason to have it all in the metadata string, as an authoritative source. |
@jh-RLI and I agree. We will implement them in the next version. The resulting list of fields will be quite long and intimidating. Therefore we also decided that the tooling will take that into account. The conversion and export software will return the metadata with only the populated fields by default - empty fields are provided optionally. |
- add new trace field for traceability
@jh-RLI and I sorted though the new tags and came up with a structure. We thought it made sense to group almost all new keys together on resource level.
The key name is open for debate. We were looking for something that encompasses things you would need for provenance and reproducibility. Other candidates were 'track', 'trail', 'linked data' or 'provenance'. Currently we like 'trace', but feel free to convince us otherwise. Other notes: I understand "isMetadataFor" such that by default it would describe the resource on the OEP. In other words the key would be a duplicate "id" most of the time. Therefore on the OEP it should basically be hidden virtually all the time. There is no explanation for "locationPreview". Can you help out @carstenhoyerklick ? "underEmbargoUntil" can go next to date. It's a bit awkward to implement, because one turns into the other, ideally, but if it's not actually published on the planned date there has to be a logic on the OEP to deal with that. @carsten can you maybe elaborate on the "wasQuotedFrom" field? What's the difference between sources? Does this concern the entire dataset (i.e. this whole table is actually a quote from another resource) or is it meant to reflect sources for parts of the data. Maybe another key within sources or a redefinition to a URI would help here. I assume it's not a "quotedBy" that lists where the resource has been quoted. |
I think we should think beyond the OEP here. For the OEP id doubles, but for other repositories it may not. I think it is fair to hide it on the OEP.
This may be relevant for non tabular data. E.g. GIS data sets, they can be connected to a preview.
I think it is save to ignore it on the OEP, as it takes only published data. But it may be relevant for other platforms.
What it means is that this data set which is documented is quoted in another data set. It is also an RDA recommendation. It could be that the documented data set is a sub-set of a larger data set, which has been devided. IsQuotedFrom could be an umbrella data set which references this data set as a subset. It is a kind of a backpointer. |
I thought a while about it and I think we have to make some careful thoughts. Some of the things as We have thing about what are source and what are revisions. In general if a data set is revised, the original data set is a source. Revisions are a bit different. The characteristics of the data stays basically the same. A revision may also change some of the structures of the data. The Helmholtz Kernel information profile differentiates between different types of sources. |
@Ludee We should take another look at the last two comments. |
From my point of view this is a huge overload of the metadata standard. One major principle of OEMetadata was to keep it as simple as possible. Each topic and key should be discussed separately in order to be added. |
I think most of the keys (the more technical ones) are already included in the metadata layer the MOSS tool provides ontop of the oemetadata. They will be available as soon as the data is registered there. Keeping the oemetadata more lean and then link to other resources is a good idea I think. For now, I think we can close this issue- |
Fine with me. |
Description of the issue
As pointed out by @carstenhoyerklick there should be a way to handle the Helmholtz Kernel Information Profile in oemetadata or to at least map it.
Ideas of solution
New field? Reference in an existing one? Please discuss.
Workflow checklist
The text was updated successfully, but these errors were encountered: