-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(catalogue): change Samplesets to Sample collections and extend it #4375
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Some ontologies can be deduplicated with the existing ontologies
- Should specify codes in ontologies
For compatibility with Directory ontology MaterialTypes
I am still not sure why we are not using the Resources table for Sample collections. There are many overlapping attributes between Sample collections and Resources, and the additions can also be used by other resource types. This would also solve adding two identical quality tables: one for biobanks (Resource qualities) and one for Sample collections (Sample collection qualities). In the cohort domain we use Resources for Networks/Consortia (resource type=Network) and link them via Resources.resources ref_array. (Still need to look at details of attributes, will do that later on) |
The main reason we decided on a new table is that often the biobank has a coherent target population, whereas the collections are partitions of that, sometimes based on different types of samples, sometimes on different studies, populations, etc. So, comparing this to the current structure, Biobanks is more like Resources, and Collections is more like Subpopulations/Collection Events/Datasets. However, there are a lot of arguments for both options. The problem is that people use the structure of the data model in very different ways. Sometimes the Biobank is more like an Organisation/Network en Collections is a clearly defined population which makes it more like a Resource. Other times, and I think more often, the Biobank itself forms a defined group/population and Collections are sub sets of that group based on different criteria. The benefit of having a Sample Collections is that it can cover all different usages, although this comes at the cost of some attribute duplication. Or more concretely: there are >3000 collections in the Directory, many of which don't contain much information at all (a lot are just called 'Main Collection', for example), and I'm not sure those should be represented on the same level as a Cohort study or Biobank. (The Collections table has 3735 entries, 699 withdrawn, leaving 3036 entries. Of these, 1901 are top-level collections, the remaining 1100 being (sub)subcollections. The potential for merging subcollections or converting them to facts is around 500, based on a initial count of duplicates based on a small set of attributes. ) |
That being said, if we decide that mapping Collections to Resources is the best option after all, that's fine with me, most of the work should be transferable. As long as we don't keep going back-and-forth :) |
Collections -> Collection Events? Make a proposal. So now we have three options: Resources/Sample Collections/Collection Events |
What are the main changes you did:
How to test: