You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source
Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.
Ideas of solution
I propose adding a new property to the sources items, namely transformations or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties: path, name, title, description, query/code and resource where each item should have either a path or a query/code property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.
I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions
The reason I closed #84 was that, althought referencing software and its versions is nice, not having concrete steps on what was done with the software leaves a party trying to interpret the dataset without extra information on how the original data was modified.
Description of the issue
We have currently no reasonable way of documenting transformation steps applied to the referred sources. So far what I have been doing is add my transformation scripts and software as a further source. I think what I am doing is not entirely useful as there is no way of associating the added scripts to its respective source
Some data sources are in formats that are well documented and standarized. For example tabular data and RDF graphs. These can be transformed used querying languages as SQL and SPARQL. The documentation of these transformations can be done using the languages themselves! And in case of non structured data like excel files, the documentation can be done by adding urls to the repositories transforming them, this repository can be a python script for example.
Ideas of solution
I propose adding a new property to the sources items, namely
transformations
or something similar that refers to the operations done to convert the original resource. The transformations should be an list of items with properties:path
,name
,title
,description
,query
/code
andresource
where each item should have either apath
or aquery
/code
property. When more than one items are provided is understood that the output of the first item is given to the second and the last item produces a resource in the current metadata, the latter should be referred using the name.I do not know how risky is to add SQL into an instance of the OEMetadata, that can be discussed, if its really a problem we can take the necessary precausions
Workflow checklist
The text was updated successfully, but these errors were encountered: