Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSE237204 - RetinaEvolutionHumanHahn #1298

Open
11 tasks
arschat opened this issue Sep 5, 2024 · 4 comments
Open
11 tasks

GSE237204 - RetinaEvolutionHumanHahn #1298

arschat opened this issue Sep 5, 2024 · 4 comments
Assignees
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9

Comments

@arschat
Copy link
Collaborator

arschat commented Sep 5, 2024

part of Retina Atlas v1.0

Project short name:

RetinaEvolutionHumanHahn

Primary Wrangler:

Ida

Secondary Wrangler:

Arsenios

Associated files

Published study links

Key Events

  • Convert published metadata to HCA spreadsheet
  • Manually curate dataset to meet HCA metadata standard
  • Collect any matrix and cell-type annotation files
  • Are the analysis files suitable for CellxGene? If something is missing get in touch with the authors to request it
  • Upload sheet to validate metadata
  • Transfer raw files to ingest to validate data files
  • Check linking using ingest graph validator
  • Ask the Secondary Wrangler for an end-to-end review of the project. Ask the Expertise Wrangler to review specific tabs if needed
  • Submit dataset to Production
  • Complete the Export SOP
  • Convert project data to SCEA format following the SCEA conversion SOP if appropriate
@arschat arschat added dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data labels Sep 5, 2024
@idazucchi idazucchi self-assigned this Sep 5, 2024
@idazucchi idazucchi changed the title GSE237204 - RetinaEvolutionHahn10x GSE237204 - RetinaEvolutionHumanHahn Sep 6, 2024
@idazucchi idazucchi added the Release 43 DCP Data Release 43 @ 30/9 label Sep 6, 2024
@idazucchi
Copy link
Collaborator

idazucchi commented Sep 10, 2024

Donor
I didn't find any donor metadata, the only metadata available is about samples here
The paper says there are 18 donors so I grouped the samples based on their names

We also profiled around 185,000 nuclei from 18 human donors, thereby allowing us to identify over 30 more cell types than had been detected in the dataset analysed previously

for developmental stage no information is available so I've used human life cycle - if you have better suggestions I'm happy to hear them

Sample
About sample Hu218OSmAll - the sample key says it's mac_neun-_218 but this sample is absent in extended data Fig1 - instead there is mac_all_218
I think mac_neun-_218 is a mistake and the sample is not enriched

@arschat
Copy link
Collaborator Author

arschat commented Sep 13, 2024

Hi Ida! There were very limited metadata available but you've done a nice job.
I only have two suggestions and one comment.

Project

  • abstract you can remove the reference numbers

Analysis protocol

  • intron_inclusion you could fill yes here

    To include both exonic and intronic reads in the quantification of gene expression for each sample, regardless of cellular or nuclear origin, we applied velocyto to the corresponding.bam files. This generated two separate gene expression matrices (GEMs) (genes × cells) for each sample, corresponding to ‘spliced’ and ‘unspliced’ reads.

Analysis file

  • input biomaterials Each of the 4 files, should have different input. If you open the count matrices the cell index includes the sample_ID. I've removed the UMI and the duplicates
sample_id extraction
for file in GSE237204*; do echo $file; zcat < $file | head -1 | sed 's/\,/\n/g' | sed 's/.\{18\}$//' | sort | uniq; done

GSE237204_Human_count_mat_1.csv.gz

Hu032616OD_macula_NeuNPos.possorted_genome_bam_PNH7N
Hu035516OS_macula_NeuNPosS1.possorted_genome_bam_TXN7J
Hu035516OS_macula_NeuNPosS2.possorted_genome_bam_D2QGV
Hu035516OS_macula_NeuNPosS2.possorted_genome_bam_PNH7N
Hu056316OD_macula_NeuNPos.possorted_genome_bam_M7UFG
Hu056416OS_macula_NeuNPos.possorted_genome_bam_RCC8P
Hu082219_macular_All.possorted_genome_bam_M28FQ
Hu086916OD_macula_NeuNPos.possorted_genome_bam_VL2TA

GSE237204_Human_count_mat_2.csv.gz

Hu086916OD_macula_NeuNPos.possorted_genome_bam_VL2TA
Hu088716OS_macula_NeuNPos.possorted_genome_bam_2L1A9
Hu105916OD_macula_NeuNPos.possorted_genome_bam_C9Z7N
Hu218OSPeriRetina.possorted_genome_bam_WYD8B
Hu218OSmAll.possorted_genome_bam_DGNOI
Hu218OSmRGC.possorted_genome_bam_Q49J3

GSE237204_Human_count_mat_3.csv.gz

Hu218OSPeriRetina.possorted_genome_bam_WYD8B
Hu220235OSmAll.possorted_genome_bam_V6O4A
Hu220OSPeriRetina.possorted_genome_bam_L340L
Hu220OSmRGC.possorted_genome_bam_Z9VX3
Hu235OSPeriRetina.possorted_genome_bam_3URIL
Hu235OSmRGC.possorted_genome_bam_W3LJK
HuCMixS1.possorted_genome_bam_B7X1A

GSE237204_Human_count_mat_4.csv.gz

HuCMixS1.possorted_genome_bam_B7X1A
HuCMixS2.possorted_genome_bam_HGXY1
HuPRet1059fc564mc.possorted_genome_bam_56BDZ
HuPRet326mc355fc.possorted_genome_bam_RJJ6M
HuPRet563mc.possorted_genome_bam_PYRF4
HuPRet887mc869fc.possorted_genome_bam_CHBYN

CS assignment

GSE237204_Human_count_mat_1.csv.gz -> Hu032616OD_macula_NeuNPos||Hu035516OS_macula_NeuNPosS1||Hu035516OS_macula_NeuNPosS2||Hu056316OD_macula_NeuNPos||Hu056416OS_macula_NeuNPos||Hu082219_macular_All||Hu086916OD_macula_NeuNPos

GSE237204_Human_count_mat_2.csv.gz -> Hu086916OD_macula_NeuNPos||Hu088716OS_macula_NeuNPos||Hu105916OD_macula_NeuNPos||Hu218OSPeriRetina||Hu218OSmAll||Hu218OSmRGC

GSE237204_Human_count_mat_3.csv.gz -> Hu218OSPeriRetina||Hu220235OSmAll||Hu220OSPeriRetina||Hu220OSmRGC||Hu235OSPeriRetina||Hu235OSmRGC||HuCMixS1

GSE237204_Human_count_mat_4.csv.gz -> HuCMixS1||HuCMixS2||HuPRet1059fc564mc||HuPRet326mc355fc||HuPRet563mc||HuPRet887mc869fc

@idazucchi
Copy link
Collaborator

fixed the issues and tried to export but the dataset is stuck on export

@idazucchi
Copy link
Collaborator

Enrique restarted the export and it worked - I've filled in the import form

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset All dataset tickets should have this label, only one ticket per dataset Publication Curated from published data Release 43 DCP Data Release 43 @ 30/9
Projects
None yet
Development

No branches or pull requests

2 participants