Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add property for documentation of data transformation to sources #134

Open
1 task done
areleu opened this issue Jun 21, 2023 · 1 comment
Open
1 task done

Add property for documentation of data transformation to sources #134

areleu opened this issue Jun 21, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@areleu
Copy link

areleu commented Jun 21, 2023

Description of the issue

We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source

Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.

Ideas of solution

I propose adding a new property to the sources items, namely transformations or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties: path, name, title, description, query/code and resource where each item should have either a path or a query/code property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.

I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions

Workflow checklist

@areleu areleu added the enhancement New feature or request label Jun 21, 2023
@areleu
Copy link
Author

areleu commented Jun 21, 2023

The reason I closed #84 was that, althought referencing software and its versions is nice, not having concrete steps on what was done with the software leaves a party trying to interpret the dataset without extra information on how the original data was modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant