Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPmicroEnvironment update #1293

Closed
11 tasks
arschat opened this issue Aug 19, 2024 · 7 comments
Closed
11 tasks

CPmicroEnvironment update #1293

arschat opened this issue Aug 19, 2024 · 7 comments
Assignees
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9

Comments

@arschat
Copy link
Collaborator

arschat commented Aug 19, 2024

Project was published in DCP but donor metadata is inconsistent between DCP vs publication/GEO (34 vs 12 donor).

Project short name:

CPmicroEnvironment

Primary Wrangler:

Arsenios

Secondary Wrangler:

Associated files

Published study links

Key Events

  • Convert published metadata to HCA spreadsheet
  • Manually curate dataset to meet HCA metadata standard
  • Collect any matrix and cell-type annotation files
  • Are the analysis files suitable for CellxGene? If something is missing get in touch with the authors to request it
  • Upload sheet to validate metadata
  • Transfer raw files to ingest to validate data files
  • Check linking using ingest graph validator
  • Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
  • Submit dataset to Production
  • Complete the Export SOP
  • Convert project data to SCEA format following the SCEA conversion SOP if appropriate
@arschat arschat added dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 42 DCP Data Release 42 @ 26/8 labels Aug 19, 2024
@arschat
Copy link
Collaborator Author

arschat commented Aug 20, 2024

Donor information of original submission seem to be incorrect.

The number of donors according to Table 1 is 3+9=12 while in the original DCP submission is 34. It seems that each library listed in GEO had it's own donor while for each donor there were 3 different library protocols (scRNA-seq, CITE-seq, TCR-seq) with the exception of donors BL12, Idio4 and BL10, Idio2 who did not have CITE-seq (3 * 12 - 2 = 34 GSM accession).

It is now modelled with 12 entities in donor, specimen, cell_suspension.

@arschat
Copy link
Collaborator Author

arschat commented Aug 20, 2024

New submission is now in graph valid. We need to decide if we want to keep file & protocol uuids or not:

  1. new file & protocol uuids
  2. old file & protocol uuids

@arschat
Copy link
Collaborator Author

arschat commented Aug 21, 2024

will proceed with solution number 2.
@arschat to create a spreadsheet with updates and uuids, run first part of script and once good, share with @ESapenaVentura

@arschat arschat self-assigned this Aug 27, 2024
@arschat
Copy link
Collaborator Author

arschat commented Aug 27, 2024

hca-util-upload-area uuid: 7292c116-0ada-457f-8fa7-833c817674f2

@arschat
Copy link
Collaborator Author

arschat commented Aug 27, 2024

I created a spreadsheet using the updated biomaterial metadata but all the existing uuids (biomaterials, processes, protocols, files).

@arschat arschat added the Release 43 DCP Data Release 43 @ 30/9 label Sep 4, 2024
@arschat
Copy link
Collaborator Author

arschat commented Sep 10, 2024

Enrique fixed a typo I made in cell_suspensions uuids, we roll back the files schema version to the last version using data: prefix since EDAM: prefix does not yet work in prod

sequence_file: from 10.0.0 to 9.6.0
analysis_file: from 8.0.0 to 7.0.0

Fixed some graph invalid errors (specimen to file instead of cell suspension to file), fixed project title to match publication, cleaned staging area from previous submission and exported and import form sent.

@idazucchi idazucchi removed the Release 42 DCP Data Release 42 @ 26/8 label Sep 17, 2024
@arschat
Copy link
Collaborator Author

arschat commented Oct 28, 2024

verified in browser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9
Projects
None yet
Development

No branches or pull requests

2 participants