Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ontology curation error fixes #322

Open
25 tasks
mshadbolt opened this issue May 12, 2021 · 1 comment
Open
25 tasks

Ontology curation error fixes #322

mshadbolt opened this issue May 12, 2021 · 1 comment
Labels
operations This issue is an operational task

Comments

@mshadbolt
Copy link
Contributor

mshadbolt commented May 12, 2021

List of ontology curation errors that I discovered when preparing zooma data source as well as those identified by other components

Corrections based on HumanCellAtlas/dcp2#13 https://docs.google.com/spreadsheets/d/1Wk7SGxEz00AkNokYv3YlFHJVF9U6THKcrvgHARsnau8/edit#gid=0

  • All projects with library_preparation.method.ontology set as EFO:0009310 and
    • library_preparation.end_bias 3 prime tag OR 3 prime bias should have the ontology updated to EFO:0009899 and ontology_label updated to label for that term
    • library_preparation.end_bias 5 prime tag or 5 prime bias should have ontology updated to EFO:0009900
  • All projects with library_preparation.method set as a 10x method should have 3 prime tag (not 3 prime end bias) set in the library_preparation.end_bias field

Project specific corrections:

Requires bulk update

Single cell RNAseq characterisation of cell types produced over time in an in vitro model of human inhibitory interneuron differentiation - 2043c65a-1cf8-4828-a656-9e247d4e64f1

  • All cell_suspension.timecourse.unit.ontology fields should be changed to UO:0000033, the correct ontology for 'day' where cell_suspension.timecourse.unit.text is 'day'
  • The model organ for the cell line 557606a0-96dd-40cc-bb72-94d5711bb8ec should be changed to UBERON:0000922 (I couldn't find this term in the ui, not sure why

BM_PC - a29952d9-925e-40f4-8a1c-274f118f1f51:

  • All donor_organism.development_stage.ontology should be changed to the correct term for human adult - HsapDv:0000087
  • All specimen_from_organism.organ ontologies should be changed to the correct term for bone UBERON:0002481 and ontology label bone tissue

Single cell transcriptome analysis of human pancreas - cddab57b-6868-4be4-806f-395ed9dd635a

  • all donor_organism.development_stage.ontology should be changed to the correct term for human adult - HsapDv:0000087

GSE67833_neural_stem_cells - e8808cc8-4ca0-4096-80f2-bba73600cba6

  • current curations for organ in specimen_from_organism entities should be in 'organ_part'
  • organ should be set to 'brain' - UBERON:0000955

Non-specific 'cortex' term (UBERON:0001851) in specimen_from_organism.organ_parts should be changed to organ specific cortex term, either kidney or brain, for the following projects:

  • scRNAseqSystemicComparison - cerebral cortex UBERON:0000956
  • Diabetic Nephropathy snRNA-seq - cortex of kidney UBERON:0001225
  • AllelicExpressionPatterns - cerebral cortex UBERON:0000956
  • 1M Neurons - cerebral cortex UBERON:0000956
  • HumanBrainSubstantiaNigra - cerebral cortex UBERON:0000956

Easy to do manual fixes:

Pitx2DevelopingHeart - 7027adc6-c9c9-46f3-84ee-9badc3a4f53b

  • 45b47d43-ab70-48d9-a205-4b0509bf7114 - donor_organism.organism_age_unit.ontology should be UO:0000033
  • There is onedonor_organism.genus_species set as Mus musculus castaneus , which I think should just be Mus musculus
  • donor_organism.organism_age_unit text says 'day' but ontology is 'week'

HumanIsletType2Diabetes - 7adede6a-0ab7-45e6-9b67-ffe7466bec1f

  • donor_organism.ethnicity where text is Hispanic should be changed to HANCESTRO:0014

Things to investigate further

  • Was it intentional to use 'liver primordium' rather than 'liver' for 'AllelicExpressionPatterns' liver specimens? - curated by Rachel Schwartz

EpithelialDiversityHealthInflammation - c893cb57-5c9f-4f26-9312-21b85be84313

  • There is a mismatch between the 'text' field and 'ontology' field for some specimens where a non-inflamed sample was taken from a patient with uclerative colitis, i.e. the text says 'ulcerative colitis' but the ontology says 'normal' which I think is confusing but not sure which part needs to be corrected

HumanFirstTrimesterPlacentaDecidua - 1cd1f41f-f81a-486b-a05b-66ec60f81dcf

  • specimens with text 'decidua' and 'chorion membrane' have been curated as placenta

HeartSingleCellsAndNucleiSeq - ad98d3cd-26fb-4ee3-99c9-8a2ab085e737

  • donor_organism.disease text given as diabetes but ontology curated as hypertension (MONDO:0005015), probably a mistake in ordering on an array field

DevelopingMouseKidneyHumanKidneyOrganoids - 7b947aa2-43a7-4082-afff-222a3e3a4635

  • donor_organism.gestational_age_unit.text given as '12', should be 'week' or similar

General thoughts

  • We have two terms that we currently use to indicate 'skin' as an organ, 'zone of skin' and 'skin of body' we should probably pick one and try to be consistent
@mshadbolt mshadbolt added the operations This issue is an operational task label May 12, 2021
@ofanobilbao ofanobilbao added the blocked: bulk updates This dataset is blocked by needing bulk updates label Jul 7, 2021
@ofanobilbao ofanobilbao removed the blocked: bulk updates This dataset is blocked by needing bulk updates label Sep 8, 2021
@ofanobilbao
Copy link
Contributor

Just a thought, that maybe this ticket needs splitting in separate tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operations This issue is an operational task
Projects
None yet
Development

No branches or pull requests

2 participants