Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(catalogue): change Samplesets to Sample collections and extend it #4375

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

hslh
Copy link
Contributor

@hslh hslh commented Oct 17, 2024

What are the main changes you did:

  • Renamed and extended data table: Samplesets -> Sample collections
  • New ontology table: Sample collection designs following MIABIS v3
  • New ontology table: Sample collection settings following MIABIS v3
  • Extended ontology table: Dataset types following MIABIS v3
  • New ontology table: Sex types following MIABIS v3
  • New ontology table: Storage temperatures following MIABIS v3
  • New ontology table: Body parts following DirectoryOntologies/BodyParts
  • New ontology table: Imaging modalities following DirectoryOntologies/ImagingModalities
  • New ontology table: Image types following DirectoryOntologies/ImageDatasetTypes
  • New data table: Sample collection counts following BBMRI-ERIC's CollectionFacts
  • New data table: Resource qualities following BBMRI-ERIC's QualityInfoBiobanks tables
  • New data table: Sample collection qualities following BBMRI-ERIC's QualityInfoCollections tables
  • New ontology table: Quality standards following BBMRI-ERIC's QualityStandards
  • New ontology table: Assessment levels following BBMRI-ERIC's AssessmentLevels
  • New ontology table: Research domains following BBMRI-ERIC's Categories
  • Extended ontology table: BiospecimenType

How to test:

  • create a new schema using the data model from this pull request, including the ontology tables
  • add a Sample collection, with all fields filled in, including counts and qualities
  • add quality information to a Resource

@hslh hslh changed the title feat(catalogue): change Samplesets to Sample collections and update feat(catalogue): change Samplesets to Sample collections and extend it Oct 17, 2024
@hslh hslh marked this pull request as ready for review November 21, 2024 14:14
@hslh hslh marked this pull request as draft November 22, 2024 14:54
@hslh hslh marked this pull request as ready for review November 22, 2024 15:08
Copy link
Member

@esthervanenckevort esthervanenckevort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Some ontologies can be deduplicated with the existing ontologies
  • Should specify codes in ontologies

data/_ontologies/Sex types.csv Outdated Show resolved Hide resolved
For compatibility with Directory ontology MaterialTypes
@BrendaHijmans
Copy link
Contributor

BrendaHijmans commented Dec 6, 2024

I am still not sure why we are not using the Resources table for Sample collections. There are many overlapping attributes between Sample collections and Resources, and the additions can also be used by other resource types. This would also solve adding two identical quality tables: one for biobanks (Resource qualities) and one for Sample collections (Sample collection qualities).

In the cohort domain we use Resources for Networks/Consortia (resource type=Network) and link them via Resources.resources ref_array.

(Still need to look at details of attributes, will do that later on)

@hslh
Copy link
Contributor Author

hslh commented Dec 16, 2024

I am still not sure why we are not using the Resources table for Sample collections. There are many overlapping attributes between Sample collections and Resources, and the additions can also be used by other resource types. This would also solve adding two identical quality tables: one for biobanks (Resource qualities) and one for Sample collections (Sample collection qualities).

In the cohort domain we use Resources for Networks/Consortia (resource type=Network) and link them via Resources.resources ref_array.

The main reason we decided on a new table is that often the biobank has a coherent target population, whereas the collections are partitions of that, sometimes based on different types of samples, sometimes on different studies, populations, etc. So, comparing this to the current structure, Biobanks is more like Resources, and Collections is more like Subpopulations/Collection Events/Datasets.

However, there are a lot of arguments for both options. The problem is that people use the structure of the data model in very different ways. Sometimes the Biobank is more like an Organisation/Network en Collections is a clearly defined population which makes it more like a Resource. Other times, and I think more often, the Biobank itself forms a defined group/population and Collections are sub sets of that group based on different criteria. The benefit of having a Sample Collections is that it can cover all different usages, although this comes at the cost of some attribute duplication.

Or more concretely: there are >3000 collections in the Directory, many of which don't contain much information at all (a lot are just called 'Main Collection', for example), and I'm not sure those should be represented on the same level as a Cohort study or Biobank.

(The Collections table has 3735 entries, 699 withdrawn, leaving 3036 entries. Of these, 1901 are top-level collections, the remaining 1100 being (sub)subcollections. The potential for merging subcollections or converting them to facts is around 500, based on a initial count of duplicates based on a small set of attributes. )

@hslh
Copy link
Contributor Author

hslh commented Dec 16, 2024

That being said, if we decide that mapping Collections to Resources is the best option after all, that's fine with me, most of the work should be transferable. As long as we don't keep going back-and-forth :)

@hslh
Copy link
Contributor Author

hslh commented Jan 15, 2025

Collections -> Collection Events? Make a proposal. So now we have three options: Resources/Sample Collections/Collection Events

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants