Rework Data Nodes #93

HLWeil · 2024-01-24T09:54:18Z

Data Selectors

This PR includes the specification for annotating not only full data resources, but parts of it. For this, after specifying the resource location, a selector can be appended, separated by a #.

This design is heavily inspired by data fragment selectors that can be found in URLs and has two-fold advantage over the solution proposed in #80 (comment), where the selector is moved into another column:

In standard cases, a single column suffices, making the information more compact (and e.g. more easily copyable)
This more closely resembles URIs, potentially being more intuitive for data annotation experts

To support non-standard cases and increase verbosity, two qualifying columns were added, closely following the proposal made by @stain in ISA-tools/isa-specs#15 (comment). This goes in line with Schema.org/CreativeWork and by this I hope to increase compatiblity with RO-Crate.

Data Category annotation

Additionally, for specifying the Input and Output of an annotation table, I cut out all distinctions about the content of the Data resource (Raw Data File, Derived Data File and Image File). This is in line with many discussion about this topic, with the conclusion that this distinction is kind of artificial. I also went against Data File and Data Directory as again, this distinction tries to increase information, but by design excludes cases that do not fall under these categories.

Any input would be welcome
@kappe-c @chgarth @muehlhaus @Brilator

Brilator · 2024-01-24T14:55:52Z

I understand this as a nice additional feature, not a must.

should pair well with the isa.dataset.xlsx / data dictionary discussed, without making it obsolete
makes the flow to data node more explicit

Little off topic, but I'm wondering, wouldn't it then be consequent to remove the "artificial" complexity from Source / Sample / Material nodes as well (plus adding a similar layer to allow annotating the type of sample just like the format of data)?

HLWeil · 2024-01-25T08:28:02Z

I understand this as a nice additional feature, not a must.

I've heard this kind of comment a few times now. IMO in order to actually produce a machine actionable representation of a research cycle, this is definitely a MUST. If this is not given, associating data to the samples it was measured from will remain implicit.

With all the other points I agree.

HLWeil · 2024-01-25T08:30:34Z

We will need some great tooling though to allow both programmers and wet lab researchers to create these selectors without much hassle.

Brilator · 2024-01-25T08:35:53Z

produce a machine actionable representation of a research cycle, this is definitely a MUST

Totally agree. I just thought that's what the ISA extension with isa.dataset.xlsx is good for

HLWeil · 2024-01-25T12:25:46Z

Totally agree. I just thought that's what the ISA extension with isa.dataset.xlsx is good for

The selector will be part both of the annotation table (assay and study files) and the dataset table (dataset file). In the annotation table, the main purpose is to make a connection between the data fragments and the samples, basically ankering them in the process graph.
The dataset on the other hand is then used to add further annotations about the fragments in the datafiles. It's not about the from where and to where but more like a what.

So the two additions will work together but not fulfilling the same task.

kappe-c · 2024-01-26T12:46:53Z

Having talked about this with @HLWeil in person, I agree with this approach.
Commenting on the remarks about the separate dataset file, it was my understanding that several samples may "end up" in (better: contribute to), e.g., the same column in a tabular data file. That, I think, is another reason for the "orthogonal" dataset file: then every file fragment needs to be described only once (as one row in the dataset file, instead of a column or more in the assay file, that would potentially have the same value for several rows (=samples) – tedious and error-prone).

HLWeil · 2024-02-08T12:13:21Z

Thanks for your input @Brilator & @kappe-c!
Will merge now.

HLWeil added 4 commits January 23, 2024 17:48

start inclusion of data selectors with some links

b336710

finish up first draft of new data columns

91644ed

adjust examples according to data specification changes

6a4eefa

change Data Selector to Data Selector Format

1f6e246

HLWeil merged commit 604a083 into v2.0.0 Feb 8, 2024

Freymaurer mentioned this pull request Feb 21, 2024

ISA light #97

Closed

HLWeil mentioned this pull request May 6, 2024

Rework data nodes nfdi4plants/ARCtrl#356

Merged

HLWeil mentioned this pull request Jun 6, 2024

Include Changes for specification V2.0.0 #105

Merged

HLWeil deleted the selector branch October 29, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Data Nodes #93

Rework Data Nodes #93

HLWeil commented Jan 24, 2024 •

edited

Loading

Brilator commented Jan 24, 2024

HLWeil commented Jan 25, 2024 •

edited

Loading

HLWeil commented Jan 25, 2024

Brilator commented Jan 25, 2024

HLWeil commented Jan 25, 2024

kappe-c commented Jan 26, 2024

HLWeil commented Feb 8, 2024

Rework Data Nodes #93

Rework Data Nodes #93

Conversation

HLWeil commented Jan 24, 2024 • edited Loading

Data Selectors

Data Category annotation

Brilator commented Jan 24, 2024

HLWeil commented Jan 25, 2024 • edited Loading

HLWeil commented Jan 25, 2024

Brilator commented Jan 25, 2024

HLWeil commented Jan 25, 2024

kappe-c commented Jan 26, 2024

HLWeil commented Feb 8, 2024

HLWeil commented Jan 24, 2024 •

edited

Loading

HLWeil commented Jan 25, 2024 •

edited

Loading