Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGAS00001004653 - PancreasTopographiesTosti10x #1296

Open
11 tasks
arschat opened this issue Sep 4, 2024 · 8 comments
Open
11 tasks

EGAS00001004653 - PancreasTopographiesTosti10x #1296

arschat opened this issue Sep 4, 2024 · 8 comments
Assignees
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9

Comments

@arschat
Copy link
Collaborator

arschat commented Sep 4, 2024

Project short name:

SPAN PancreasTopographiesTosti10x

Primary Wrangler:

Arsenios

Secondary Wrangler:

Associated files

Published study links

Key Events

  • Convert published metadata to HCA spreadsheet
  • Manually curate dataset to meet HCA metadata standard
  • Collect any matrix and cell-type annotation files
  • Are the analysis files suitable for CellxGene? If something is missing get in touch with the authors to request it
  • Upload sheet to validate metadata
  • Transfer raw files to ingest to validate data files
  • Check linking using ingest graph validator
  • Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
  • Submit dataset to Production
  • Complete the Export SOP
  • Convert project data to SCEA format following the SCEA conversion SOP if appropriate
@arschat arschat added dataset All dataset tickets should have this label, only one ticket per dataset NeedsUpdate Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9 labels Sep 4, 2024
@arschat
Copy link
Collaborator Author

arschat commented Sep 4, 2024

Dataset has been partially wrangled, but it contains invalid metadata. Should be revised and resubmitted as is part of Pancreas bionetwork list.

@arschat
Copy link
Collaborator Author

arschat commented Sep 10, 2024

Seems that project was partially wrangled by Ami, but abandoned because no sequence or analysis files were available at that time #827
Now analysis files are available at http://singlecell.charite.de/pancreas, I've downloaded them and start wrangling dataset again.

@arschat arschat self-assigned this Sep 10, 2024
@arschat arschat changed the title EGAS00001004653 - SPAN EGAS00001004653 - PancreasTopologiesTosti10x Sep 10, 2024
@arschat arschat changed the title EGAS00001004653 - PancreasTopologiesTosti10x EGAS00001004653 - PancreasTopographiesTosti10x Sep 10, 2024
@arschat
Copy link
Collaborator Author

arschat commented Sep 12, 2024

Removing previous submission since it had not being published before.
Spreadsheet saved here.

@arschat
Copy link
Collaborator Author

arschat commented Sep 12, 2024

hca-util area 39d07165-5711-4c82-870a-7496214fde73

I added the schema tab in the spreadsheeet to specify the version of the files that allows the validation (until EDAM update is completed).

@arschat
Copy link
Collaborator Author

arschat commented Sep 12, 2024

Graph valid and ready for sec review!

@idazucchi
Copy link
Collaborator

Hi!
Nice job on this dataset, I have only a couple of small comments

Donor / Specimen

  • I think the disease for TUM_13_donor could be pancreatic neuroendocrine neoplasm since the tumor is reported as a pancreatic disease
  • for donor TUM_C1_donor : From this paper it sounds like Mixed Muellerian Tumor is a malignant tumor affecting the uterus that can rarely develop metastasis in the pancreas. I would suggest carcinosarcoma in place of adenofibroma because adenofibroma has benign mixed Muellerian tumor as synonym.

Collection protocol

  • shouldn't Standford_protocol be collecting specimen from organ postmortem? the donors are dead at collection time

Cell suspension

  • very minor -but- should we change the description to Nuclei suspension?

Analysis file

  • input CS for chronic_* files should be TUM_CP1_nuc and TUM_CP2_nuc

@arschat
Copy link
Collaborator Author

arschat commented Sep 16, 2024

Nice catch on the wrong CS for the chronic_* files! Thank you for the review. I applied all changes to the submission, and is now exported!

  • import form sent

@arschat
Copy link
Collaborator Author

arschat commented Sep 17, 2024

Noticed a flip of diseases for donors TUM_25_donor and TUM_C1_donor. Replaced that in staging area's metadata jsons.

  1. Copied contents of the json files
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/b3938158-4e8d-4fdb-9e13-9e94270dde16/metadata/donor_organism/2fbd1774-f11b-46f7-983a-54544fd04824_2024-09-12T13:36:24.874000Z.json
gsutil cat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/b3938158-4e8d-4fdb-9e13-9e94270dde16/metadata/donor_organism/0887b590-a8ab-4dee-a2be-9b0bb589709d_2024-09-16T10:00:02.351000Z.json
  1. Created locally the files and flipped the diseases field with text editor
  2. Uploaded edited files to staging area
gsutil cp 2fbd1774-f11b-46f7-983a-54544fd04824_2024-09-12T13:36:24.874000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/b3938158-4e8d-4fdb-9e13-9e94270dde16/metadata/donor_organism/2fbd1774-f11b-46f7-983a-54544fd04824_2024-09-12T13:36:24.874000Z.json
gsutil cp 0887b590-a8ab-4dee-a2be-9b0bb589709d_2024-09-16T10:00:02.351000Z.json gs://broad-dsp-monster-hca-prod-ebi-storage/prod/b3938158-4e8d-4fdb-9e13-9e94270dde16/metadata/donor_organism/0887b590-a8ab-4dee-a2be-9b0bb589709d_2024-09-16T10:00:02.351000Z.json
  1. I updated both biomaterials in ingest as well TUM_25_donor, and TUM_C1_donor
  2. change status to exported
  • it was indeed a mistake:
    • TUM_C1_donor has to have pancreatic ductal adenocarcinoma and it had carcinosarcoma -> Mixed Muellerian Tumor
    • TUM_25_donor has to have carcinosarcoma -> Mixed Muellerian Tumor and it had pancreatic ductal adenocarcinoma

image.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9
Projects
None yet
Development

No branches or pull requests

2 participants