Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2023_VillaIslas_Science #18

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

TCLamnidis
Copy link
Member

@TCLamnidis TCLamnidis commented Jul 31, 2024

Adds package 2023_VillaIslas_Science.

Linked to poseidon-framework/minotaur-recipes#3

Publication is missing from the community archive, so metadata needs to be collected manually.

Janno for publication provided by @jbv2 here, and added by @Kavlahkaff

@stschiff
Copy link
Member

stschiff commented Aug 5, 2024

@93Boy do you think you could make a start with this?

@TCLamnidis TCLamnidis added the help wanted Extra attention is needed label Aug 8, 2024
@stschiff
Copy link
Member

New situation: We now have a Janno file PR in the PCA. Thiseas can pull from there. Nothing to do here, @93Boy

@stschiff
Copy link
Member

stschiff commented Oct 8, 2024

Update: @KenanaSa99 and @Kavlahkaff have started looking into it.

@TCLamnidis TCLamnidis removed the help wanted Extra attention is needed label Oct 10, 2024
@TCLamnidis TCLamnidis added the help wanted Extra attention is needed label Nov 22, 2024
@TCLamnidis
Copy link
Member Author

Facing the same issue as #27

@TCLamnidis TCLamnidis added the Final review needed This PR needs its final review before going live label Dec 20, 2024
Ranas 37AI_R_b_MNT Ranas 0 0 0
Toluquilla 6428A_TOL_b_MNT Toluquilla 0 1 0
Ranas 7A_R_b_MNT Ranas 0 1 0
CañadadelaVirgen E10_CdV_b_MNT CañadadelaVirgen 0 0 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already discussed elsewhere I think we should avoid non-ASCII characters in individual and primary group IDs.

333C_TOL_a_MNT F Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 1301 1351 1401 Dates are taken from Date (CE) in S15 B2c n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 94.499718 0 0.21912157243387528 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903723;ERR9576915 2023VillaIslasScience 333C_TOL_a 333C_TOL_a 0.0 0.0 0.0 0.0
333O_TOL_b_MNT U Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 650 700 750 Dates are taken from Date (CE) in S5 B2l n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 97.512945 0 0.31377645748236166 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903713;ERR9576916 2023VillaIslasScience 333O_TOL_b 333O_TOL_b 0.0 0.0 0.0 0.0
333Q_TOL_b_MNT U Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 480 520 560 Dates are taken from Date (CE) in S10 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 98.044843 0 0.20524344569288389 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903718;ERR9576917 2023VillaIslasScience 333Q_TOL_b 333Q_TOL_b 0.0 0.0 0.0 0.0
37AI_R_b_MNT U Ranas;CentralMexico;SierraGorda Mexico MX Querétaro Ranas 20.92612 -99.56488 n/a n/a n/a n/a Dates are taken from Date (CE) in S9 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 96.207416 0 0.3249466950959488 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903717;ERR9576918 2023VillaIslasScience 37AI_R_b 37AI_R_b 0.0 0.0 0.0 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date_Note says "Dates are taken from Date (CE) in S9", but there is no date information. This also applies to some other samples here. What does "S9" mean, btw.?

@@ -0,0 +1,28 @@
Poseidon_ID Genetic_Sex Group_Name Country Country_ISO Location Site Latitude Longitude Date_Type Date_BC_AD_Start Date_BC_AD_Median Date_BC_AD_Stop Date_Note MT_Haplogroup Y_Haplogroup Source_Tissue Nr_Libraries Library_Names Capture_Type UDG Library_Built Genotype_Ploidy Data_Preparation_Pipeline_URL Endogenous Nr_SNPs Damage Contamination Contamination_Err Contamination_Meas Contamination_Note Genetic_Source_Accession_IDs Publication Eager_ID Main_ID RateErrX RateErrY RateX RateY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Library_Names column is filled incorrectly. It should include "identifiers of the libraries used to generate the genotype data for the sample", but here it has entries like "mtDNA-genome;Whole-genome". These are not library IDs. What does that mean here?

@@ -0,0 +1,28 @@
Poseidon_ID Genetic_Sex Group_Name Country Country_ISO Location Site Latitude Longitude Date_Type Date_BC_AD_Start Date_BC_AD_Median Date_BC_AD_Stop Date_Note MT_Haplogroup Y_Haplogroup Source_Tissue Nr_Libraries Library_Names Capture_Type UDG Library_Built Genotype_Ploidy Data_Preparation_Pipeline_URL Endogenous Nr_SNPs Damage Contamination Contamination_Err Contamination_Meas Contamination_Note Genetic_Source_Accession_IDs Publication Eager_ID Main_ID RateErrX RateErrY RateX RateY
11R_R_b_MNT M Ranas;CentralMexico;SierraGorda Mexico MX Querétaro Ranas 20.92612 -99.56488 contextual 729 769 809 Dates are taken from Date (CE) in S6 D1m Q1a2a1-L54 tooth 2 mtDNA-genome;Whole-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 99.296225 8454 0.25554637284974846 n/a n/a n/a Nr Snps (per library): 0;0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903714;ERR9576909;ERR9169797 2023VillaIslasScience 11R_R_b 11R_R_b 0.033425961437661 0.056720746903437 0.389404695593133 0.73410877311746
2417C_TOL_a_MNT F Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 1100 1160 1220 Dates are taken from Date (CE) in S11 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 98.932551 0 0.3336998350742166 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903719;ERR9576910 2023VillaIslasScience 2417C_TOL_a 2417C_TOL_a 0.0 0.0 0.0 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha - I got the misplaced info in the Library_Names column now. So for this and other samples only mtDNA was sequenced. As a consequence the number of covered SNPs is zero. I guess it was a conscious decision to include these samples anyway in the Poseidon package, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Final review needed This PR needs its final review before going live help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants