-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 2023_VillaIslas_Science #18
base: main
Are you sure you want to change the base?
Conversation
@93Boy do you think you could make a start with this? |
New situation: We now have a Janno file PR in the PCA. Thiseas can pull from there. Nothing to do here, @93Boy |
Update: @KenanaSa99 and @Kavlahkaff have started looking into it. |
merged janno from community archive for 2023_VillaIslas_Science
… file. Add @Kavlahkaff, @jbv2 as Contributors.
Facing the same issue as #27 |
Ranas 37AI_R_b_MNT Ranas 0 0 0 | ||
Toluquilla 6428A_TOL_b_MNT Toluquilla 0 1 0 | ||
Ranas 7A_R_b_MNT Ranas 0 1 0 | ||
CañadadelaVirgen E10_CdV_b_MNT CañadadelaVirgen 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already discussed elsewhere I think we should avoid non-ASCII characters in individual and primary group IDs.
333C_TOL_a_MNT F Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 1301 1351 1401 Dates are taken from Date (CE) in S15 B2c n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 94.499718 0 0.21912157243387528 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903723;ERR9576915 2023VillaIslasScience 333C_TOL_a 333C_TOL_a 0.0 0.0 0.0 0.0 | ||
333O_TOL_b_MNT U Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 650 700 750 Dates are taken from Date (CE) in S5 B2l n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 97.512945 0 0.31377645748236166 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903713;ERR9576916 2023VillaIslasScience 333O_TOL_b 333O_TOL_b 0.0 0.0 0.0 0.0 | ||
333Q_TOL_b_MNT U Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 480 520 560 Dates are taken from Date (CE) in S10 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 98.044843 0 0.20524344569288389 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903718;ERR9576917 2023VillaIslasScience 333Q_TOL_b 333Q_TOL_b 0.0 0.0 0.0 0.0 | ||
37AI_R_b_MNT U Ranas;CentralMexico;SierraGorda Mexico MX Querétaro Ranas 20.92612 -99.56488 n/a n/a n/a n/a Dates are taken from Date (CE) in S9 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 96.207416 0 0.3249466950959488 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903717;ERR9576918 2023VillaIslasScience 37AI_R_b 37AI_R_b 0.0 0.0 0.0 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date_Note
says "Dates are taken from Date (CE) in S9", but there is no date information. This also applies to some other samples here. What does "S9" mean, btw.?
@@ -0,0 +1,28 @@ | |||
Poseidon_ID Genetic_Sex Group_Name Country Country_ISO Location Site Latitude Longitude Date_Type Date_BC_AD_Start Date_BC_AD_Median Date_BC_AD_Stop Date_Note MT_Haplogroup Y_Haplogroup Source_Tissue Nr_Libraries Library_Names Capture_Type UDG Library_Built Genotype_Ploidy Data_Preparation_Pipeline_URL Endogenous Nr_SNPs Damage Contamination Contamination_Err Contamination_Meas Contamination_Note Genetic_Source_Accession_IDs Publication Eager_ID Main_ID RateErrX RateErrY RateX RateY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Library_Names
column is filled incorrectly. It should include "identifiers of the libraries used to generate the genotype data for the sample", but here it has entries like "mtDNA-genome;Whole-genome". These are not library IDs. What does that mean here?
@@ -0,0 +1,28 @@ | |||
Poseidon_ID Genetic_Sex Group_Name Country Country_ISO Location Site Latitude Longitude Date_Type Date_BC_AD_Start Date_BC_AD_Median Date_BC_AD_Stop Date_Note MT_Haplogroup Y_Haplogroup Source_Tissue Nr_Libraries Library_Names Capture_Type UDG Library_Built Genotype_Ploidy Data_Preparation_Pipeline_URL Endogenous Nr_SNPs Damage Contamination Contamination_Err Contamination_Meas Contamination_Note Genetic_Source_Accession_IDs Publication Eager_ID Main_ID RateErrX RateErrY RateX RateY | |||
11R_R_b_MNT M Ranas;CentralMexico;SierraGorda Mexico MX Querétaro Ranas 20.92612 -99.56488 contextual 729 769 809 Dates are taken from Date (CE) in S6 D1m Q1a2a1-L54 tooth 2 mtDNA-genome;Whole-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 99.296225 8454 0.25554637284974846 n/a n/a n/a Nr Snps (per library): 0;0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903714;ERR9576909;ERR9169797 2023VillaIslasScience 11R_R_b 11R_R_b 0.033425961437661 0.056720746903437 0.389404695593133 0.73410877311746 | |||
2417C_TOL_a_MNT F Toluquilla;CentralMexico;SierraGorda Mexico MX Querétaro Toluquilla 20.88378 -99.53097 contextual 1100 1160 1220 Dates are taken from Date (CE) in S11 A2d n/a tooth 1 mtDNA-genome Shotgun minus ds haploid https://github.com/nf-core/eager/releases/tag/2.4.6 98.932551 0 0.3336998350742166 n/a n/a n/a Nr Snps (per library): 0. Estimate and error are weighted means of values per library. Libraries with fewer than 100 SNPs used in contamination estimation were excluded. PRJEB51440;ERS10903719;ERR9576910 2023VillaIslasScience 2417C_TOL_a 2417C_TOL_a 0.0 0.0 0.0 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha - I got the misplaced info in the Library_Names
column now. So for this and other samples only mtDNA was sequenced. As a consequence the number of covered SNPs is zero. I guess it was a conscious decision to include these samples anyway in the Poseidon package, right?
Adds package
2023_VillaIslas_Science
.Linked to poseidon-framework/minotaur-recipes#3
Publication is missing from the community archive, so metadata needs to be collected manually.
Janno for publication provided by @jbv2 here, and added by @Kavlahkaff