Library Source - CDE (Various) #46
Replies: 11 comments 20 replies
-
Slack Discussion: From Clay McLeod on October 25th, 2023 - I again find the permissible values are not at the same level of granularity and are therefore orthogonal. For example, some permissible values address the source molecule (e.g., "Genomic DNA", "RNA", "Transcriptomic"), others specify the cellular context ("Bulk cells", "Single-cells"), and some are combinations of the two ("Genomic Single Cell"). It is difficult to know how to annotate these with a single value. If I have a single-cell sample that sequences the genomic DNA, is that marked as "Single-cells", "Genomic DNA", "Genomic Single Cell", or some combination of the three? Another example would be whole genome sequencing done on a population of bulk cells: is that "Genomic DNA", "Bulk cells", both? Perhaps even it is "Bulk Tissue", as the implication within that description is that there is a heterogenous population of cells (though the "bulk cells" value makes no mention of a homogenous population of cells to distinguish itself). From Patrick Dunn on November 27th, 2023 - If the two different harmonized fields and their subsetted values are available, FNL can pass this along to cDSR for their review and implementation. Geoff Lyle's data harmonization subgroup meeting notes on November 17th, 2023 -
|
Beta Was this translation helpful? Give feedback.
-
We can separate this CDE into new CDEs to accommodate the two intents - Source Material and Source Molecule. We would associate the response values as listed in this GitHub post. Would that work for your group? Would you use both of the new CDEs or only one? |
Beta Was this translation helpful? Give feedback.
-
Per @calkinsh discussion on Slack: |
Beta Was this translation helpful? Give feedback.
-
@GeoffLyle We have created these 2 new CDEs for your review. We did not add ‘Synthetic’ to library_source_molecule as we need to request a new term from our terminology group. Can you ask advise if it is synthetic DNA or synthetic RNA? library_source_material Permissible Value | VM Long Name | Concept Codes | VM Definition Bulk nuclei | Bulk Nucleus Specimen | C178224 | A biospecimen consisting of multiple nuclei as a pool. library_source_molecule Permissible Value | VM Long Name | Concept Codes | VM Definition Genomic DNA | Genomic DNA | C95940 | The DNA that is part of the normal chromosomal complement of an organism. |
Beta Was this translation helpful? Give feedback.
-
@jknable Geoff asked me to take a look at these. library_source_moleculeI suggest removing "Transcriptomic Single Cell" from library_source_molecule since it is defined by a combination of the "Transcriptomic" library_source_molecule and "Single-cells" library_source_material (as I read single-cells, see below) I take the VM definition of Transcriptomic ("A study of the complete set of RNA transcripts that are produced by the genome, under specific circumstances or in a specific cell") with a grain of salt. No transcript method captures a complete set of RNA transcripts. I do not see a need to add Synthetic to library source molecule. library_source_materialI'm confused by the definition of Single-cells in library_source_material. Does the VM definition of "Single-cells" as "A biospecimen that contains the contents of a single cell" encompass that idea that the input of the dataset is many biospecimens? Typically ~thousands of individual cells undergo library prep in such a way that the resulting dataset will contain sequence data from multiple cells distinguished by barcodes that indicate which data came from the same cell. The same issue applies to "Single-nuclei". We also need to distinguish this from bulk cells and bulk nuclei. The VM definition "A biospecimen consisting of multiple cells as a pool" could equally apply to my expectation of the input for a single-cell experiment. ideas for solving it
|
Beta Was this translation helpful? Give feedback.
-
A general discussion about CDE definitions is whether we want to try to make them parallel within a given CDE? For example, we have the following definitions for single and bulk RNA-Seq: Does that mean that the dilute suspension of cells isn't a biospecimen? Could the second definition equally be "multiple cells intended to be analyzed as a pool"? Looking forward to hearing from someone who is deeper in CDE land than I am! |
Beta Was this translation helpful? Give feedback.
-
For both library_source_material (14808227) and library_source_molecule (14808232), Federation members will review permissible values and their definitions and provide feedback by July 1, 2024. CDEs will be reviewed for adoption at July 12 Data Harmonization meeting. |
Beta Was this translation helpful? Give feedback.
-
From the July 26, 2024 meeting: |
Beta Was this translation helpful? Give feedback.
-
From Meeting on August 9th, 2024: Federation members should review Molecular Analysis Analyte Type 6142394 and determine how it fits their data, and how it fits into the currently accepted CDEs. GL Edit: Aug 12, 2024 - Fix typo - Library Source Material -> Library Source Molecule |
Beta Was this translation helpful? Give feedback.
-
From September 20th meeting: Data Harmonization team reviewed the Specimen Molecular Analyte Type CDE and Permissible Values. All current Federation members preferred using the CDE's Permissible Values over Library Source Molecule 14808232. Concern over ambiguities in Permissible Values 'mRNA' and 'Total RNA' being similar to other RNA-Seq library construction terms 'Poly-A Enriched Genomic Library' and 'rRNA Depletion'. Potential work-arounds were discussed:
TODO:
|
Beta Was this translation helpful? Give feedback.
-
From October 04, 2024 meeting:
Note: Values 'Not Reported' and 'Unknown' were not included as this information should be known when sharing this data. If a member does not fill out this field, the API should return a null response. We believe not including these values will make it easier for users to parse this field. |
Beta Was this translation helpful? Give feedback.
-
caDSR Link: 6285979
Definition: "The source of a sample collection of double stranded DNA fragments analyzed by high-throughput sequencing."
Issues with CDE:
The permissible values in this CDE encompass two orthogonal concepts, source material and source molecule.
Proposed short term solution as of December 8, 2023:
Subset the CDE:
Library Source Material (Bulk cells, Bulk nuclei, Bulk tissue, Single-cells, Single-Nuclei)
Library Source Molecule (Genomic DNA, Genomic Single Cell, Metagenomic, Metatranscriptomic, Synthetic, Transcriptomic, Transcriptomic Single Cell, Viral Small RNA)
Note: Clay mentioned 'Genomic Single Cell' and 'Transcriptomic Single Cell' confound the two concepts and should be not be included in either subset.
Beta Was this translation helpful? Give feedback.
All reactions