Add property for documentation of data transformation to sources #134

areleu · 2023-06-21T08:14:32Z

Description of the issue

We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source

Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.

Ideas of solution

I propose adding a new property to the sources items, namely transformations or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties: path, name, title, description, query/code and resource where each item should have either a path or a query/code property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.

I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions

Workflow checklist

I am aware of the workflow in CONTRIBUTING.md

The text was updated successfully, but these errors were encountered:

areleu · 2023-06-21T08:41:39Z

The reason I closed #84 was that, althought referencing software and its versions is nice, not having concrete steps on what was done with the software leaves a party trying to interpret the dataset without extra information on how the original data was modified.

areleu added the enhancement New feature or request label Jun 21, 2023

areleu mentioned this issue Jun 21, 2023

Software provenance #84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add property for documentation of data transformation to sources #134

Add property for documentation of data transformation to sources #134

areleu commented Jun 21, 2023 •

edited by Ludee

Loading

areleu commented Jun 21, 2023

Add property for documentation of data transformation to sources #134

Add property for documentation of data transformation to sources #134

Comments

areleu commented Jun 21, 2023 • edited by Ludee Loading

Description of the issue

Ideas of solution

Workflow checklist

areleu commented Jun 21, 2023

areleu commented Jun 21, 2023 •

edited by Ludee

Loading