Anglo_Saxon_Project #104

93Boy · 2022-11-09T22:37:59Z

This is a draft .Janno file based on available data.

stschiff · 2022-12-19T08:48:11Z

OK, perhaps Joscha can still provide the Plink data in time, otherwise it's also in AADR (embarrassingly).

stschiff · 2022-12-19T08:48:49Z

Please make this a draft PR for now.

stschiff · 2023-05-30T13:00:18Z

Hi @93Boy. I have uploaded the genetic data. It contains 8 more rows than your original Janno file. I have now adapted the order of rows in the Janno file to the order in the genotype data. I have added rows with "n/a" for those individuals that were listed in the genotype data but not in the Janno. Perhaps you can check again in the paper tables whether you find information for those samples and add it. If not, please ask Joscha Gretzinger.

Here are a number of todos:

Check for the 8 (?) individuals with missing data ("n/a") what to do there, as said above
Fill missing genetic source ID information for them. The project ENA ID is listed in the paper
Check whether the group IDs are correct. I think they might differ now between the Janno file and the ind-file. Perhaps it could be a solution to keep the names from the Janno file, but prepend them by the group names in the ind-file. I don't know where they came from.

If anything is unclear, please ask me and I can inquire further.

93Boy · 2023-05-30T20:58:54Z

Sure Stephan I will start working on this now

93Boy · 2023-05-31T21:56:22Z

Hello Stephan, I went through the data and found those points.

I have gone through the n/a fields. They are not in the original dataset but EAS003 is available in the S2.1 table and the rest in Table S3.5 which includes F4 stats but no other genotype data was found. I will contact Joscha for further information.
I have added accession ID and downloaded ENA data to process as a ".ssf" file.
Group IDs are not matching with .ind file. The group IDs in .ind file are more informative. for example .janno group name for Poseidon_ID 'ADN001' is 'Germany_EMA'. In .ind file it is 'NGermany_EMA_Anderten'. I think .ind version is better. Do you want me to change it accordingly?

stschiff · 2023-06-01T14:26:03Z

Yes I think perhaps it's easiest if you just then adapt the janno file to the ind-groups

93Boy · 2023-06-01T20:15:29Z

I have contacted Joscha , he said EAS003 was once a part of data but then he removed it in order to publish seperately. He will look into other IDs as well. However I didnt find these IDs in ENA data.
I have adapted the group names of the .ind file into .janno as you mentioned and ssf file also created

stschiff · 2023-06-05T10:26:55Z

Hmm, but that means that EAS003 is part of the genotype files then? That's bad... then we need to remove those I suppose.

93Boy · 2023-06-08T21:03:54Z

May I remove these from the genotype data?

93Boy · 2023-06-12T23:20:53Z

Hello Stephan, I received an update from Joscha. As a summary we can include all of them in the Poseidon package, All the entries except EAS003, are re-sequences from the Schiffels et al. 2016 paper. May I extract that information from your paper?

stschiff · 2023-06-22T11:41:03Z

Yes of course. Please do so. Once you have filled whatever you can, please report back. I'm happy to fill anything you don't know.

93Boy · 2023-07-07T13:02:26Z

@stschiff I have filled missing data fields from your 2016_SchiffelsNatureCommunications. I would like to note the below points.

EAS003 was removed
I havent found a match for I0791_duplicate in your publication.
Group name fields of 2022_Gretzinger_AngloSaxons were named as "England_EMA" but yours its equal to the Poseidon_ID. But I kept your format. Should I change it back?

stschiff · 2023-08-11T12:55:47Z

I checked, and it seems that EAS003 was in fact not removed from the genotype data, only from the Janno File. I will put it back

stschiff · 2023-08-11T13:34:36Z

OK, I've gone through this. As written above, EAS003 was still part of the genotype data, so I put it back into the Janno and filled the necessary fields in consultation with Joscha.

@93Boy, Please work on the following:

you filled "C14" in all the Date_Type entries, but only some have actual C14 dates. Please change all the ones without C14 dates to "contextual". I started, but there are a lot more.
I talked to Joscha about I0791_duplicate: Please remove this individual, from the Janno file and the genotype data. To remove it from the genotype data, you will have to use forge, using the syntax -<I0791_duplicate> in the forge string to get it out.
Please check Joschas original Table S1 from Gretzinger et al. 2022. There is kinship information in there (column AC), particularly identical samples. Please add this information into new Janno columns (http://www.poseidon-adna.org/#/janno_details?id=relations-among-samplesindividuals).

For individuals I0161, I0159, I0769, I0773, I0774, I0777, I0157: These are the ones that you took from Schiffels_2016. I have now some new information on these: They are in fact Capture datasets from Davids Lab of the same individuals that were published in 2016, but it's new datasets. So please

update their Janno fields to the values used in the AADR (https://raw.githubusercontent.com/poseidon-framework/aadr-archive/main/AADR_v54_1_p1_1240K_EuropeAncient/AADR_v54_1_p1_1240K_EuropeAncient.janno)
add information about them being duplicates to the ones published in Schiffels et al. 2016, using the new Janno columns such as Relation_To and so on, as above.
adapt their group names to the ones proposed in Joschas *.ind file! You have put new group names now that are not in sync with the ind-file. Please switch back to the ind file. I know you had asked me about this, but I've now changed my mind. They should all follow exactly the ind file.
Please add a citation to Gretzinger et al. 2022, to Schiffels et al. 2016 and to the AADR. This then means that all of these must also be part of the bib file.

Finally, please convert the genotype data to Plink using trident genoconvert.

Let me know if you encounter problems.

stschiff · 2023-08-23T12:08:06Z

@93Boy do you have an update for us?

…nity-archive into Anglo_sax_draft

stschiff · 2023-09-04T09:18:24Z

Ooookay, so I have finally finalised this Pull Request. It needs to still be reviewed by Joscha Gretzinger, though, which I'll take care of.

For the record, here are a few things that I did:

The order between Janno and ind was messed up, I reordered to janno, as the genotype data must remain fixed.
The janno-file was comma-separated, not tab-separated when I took over, so I changed that.
After the change from comma- to tab-, for some reasons the columns in Janno were mis-aligned... I suspect that somewhere along the lines before, Libre-Office or some other tool misinterpreted whitespace with tabs, and messed up the alignment. I went through by hand and inserted/deleted columns so that all columns are aligned again.
I fixed various issues around dates. There were spurious "CE" which I had to remove to make them strictly numeric.
I added the "No collagen" strings to the Date_notes field.
I manually added dummy values for the contamination error estimates, to be clarified with Joscha
I manually added some lower or upper bounds to the dates where there were missing (to be confirmed by Joscha).
I aligned some group names, from England_EMA_Capture to just England_EMA.
I added duplicates via Relation_* fields. No other relatives have been added yet.

stschiff · 2023-09-04T09:58:32Z

Specific points for Joscha to check:

Our supplement gives contamination estimates without error bars. Poseidon requires them, if the estimates themselves are set. I now set them all to a dummy value of 0.001, but it would of course be good if you could actually fill the correct ones if you still have them.
The dates in GRO004, GRO006, GRO015, GRO016, GRO020 were given only as upper bound in our Supplement. I have now set their lower bound to 900 in all of these. Please check whether that is appropriate
All HIDXXX samples were only given with a date fixed at 400 CE. I have now set this 300-500. Please check whether that is appropriate.
I changed England_EMA_Capture to England_EMA. I think that they are capture is in Janno already given in the Capture_Type column, and that they’re the same as my 2016 individuals is given via the relationship field
I changed the group names of the four 1240K duplicates of WGS samples at OAI.

stschiff · 2023-09-04T14:45:19Z

@93Boy I think it would be great if you could start filling in the rest of the relationships. I have created the Relationship_* columns and filled only the duplicates so far, but there are a lot more, listed in Joscha's Supplement. I suggest you start a new branch, which (as an exception to the rule) branches of from this branch here, so that we can get this merged in even before your task is done, and then merge your bits later as a second branch.

93Boy · 2023-09-05T19:34:36Z

@stschiff Yes I have begun to working on the relationship data in a sub branch as you have mentioned

93Boy · 2023-09-11T19:04:14Z

@stschiff I have updated all the kinship information I have found in supplementary documents

stschiff · 2023-09-13T07:51:33Z

I think you haven't pushed your changes, @93Boy

93Boy · 2023-09-13T19:44:39Z

@stschiff I have created a sub branch named AS_relationship and pushed changes.

stschiff · 2023-09-18T10:58:41Z

Great, I'll take a look.

stschiff · 2023-10-04T10:53:53Z

OK, after having first merged #136 and then realised that the kinship needs more work, I decided to roll-back this PR to the state before the kinship information was merged. I will now await the validation pass (it passed locally) and then merge this in. Some points need addressing by Joscha Gretzinger, but I do not want to wait further and instead then have him look through the merged package and eventually provide an update. It's such a big package that I find it important that it gets out now.

stschiff · 2023-10-04T11:25:07Z

OK, so I think this is ready to be merged. @AyGhal do you want to have a look? There are some points, listed above, which need to be checked by Joscha, but given that some time has passed I suggest to merge this now and then release an update once Joscha has checked some of the details in the Janno. This is a large package and it needs to be published yesterday.

AyGhal · 2023-10-04T14:24:28Z

I had a quick look and it seems okay.

A draft janno file

714b024

nevrome marked this pull request as draft May 5, 2023 13:58

stschiff added 4 commits May 16, 2023 14:25

added genotype data

1c59923

added genotype data

0a96e1e

removed quotes from janno file

25e81b0

fixed janno-order to genotype data order

b56f026

group_name update,adding ssf

1f0db0d

Missing data added from 2016_SchiffelsNatureCommunications

e5f23b5

stschiff marked this pull request as ready for review July 15, 2023 10:26

stschiff self-requested a review July 15, 2023 10:26

stschiff self-assigned this Jul 15, 2023

removed quotations from Gretzinger-Janno file

cfe9f77

started filling back EAS003

37ca571

stschiff added 2 commits August 11, 2023 15:38

continued with EAS003

eb950b8

added endog. DNA to EAS003

8384a9c

stschiff and others added 4 commits September 1, 2023 23:13

validate passes

306d808

Merge branch 'Anglo_sax_draft' of github.com:poseidon-framework/commu…

a737855

…nity-archive into Anglo_sax_draft

aligned pop-names

67e7a6a

removed dupliate individual, converted to PLINK

0d03d03

stschiff marked this pull request as draft September 4, 2023 09:18

stschiff added 3 commits September 4, 2023 11:54

added duplicates

0b61723

fixed column-misalignment

212d72c

adapted group field to some Oakington samples that are WGS duplicates

9959eb2

fixed group name inconsistency

cdab216

stschiff force-pushed the Anglo_sax_draft branch from 25760bf to cdab216 Compare October 4, 2023 09:31

stschiff mentioned this pull request Oct 4, 2023

Anglo saxon kinship #138

Merged

updated ssf and yml files

183d0c2

stschiff marked this pull request as ready for review October 4, 2023 10:59

stschiff force-pushed the Anglo_sax_draft branch from 77751b3 to 183d0c2 Compare October 4, 2023 11:04

stschiff added 4 commits October 4, 2023 13:05

Merge branch 'master' into Anglo_sax_draft

fcbc91e

removed redundant reference

352eb91

renamed ssf file and added to YAML

a3ad78b

updated SSF and checksums

4d3dd90

AyGhal merged commit 3d04838 into master Oct 4, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anglo_Saxon_Project #104

Anglo_Saxon_Project #104

93Boy commented Nov 9, 2022

stschiff commented Dec 19, 2022

stschiff commented Dec 19, 2022

stschiff commented May 30, 2023

93Boy commented May 30, 2023

93Boy commented May 31, 2023

stschiff commented Jun 1, 2023

93Boy commented Jun 1, 2023

stschiff commented Jun 5, 2023

93Boy commented Jun 8, 2023

93Boy commented Jun 12, 2023

stschiff commented Jun 22, 2023

93Boy commented Jul 7, 2023

stschiff commented Aug 11, 2023

stschiff commented Aug 11, 2023 •

edited

Loading

stschiff commented Aug 23, 2023

stschiff commented Sep 4, 2023 •

edited

Loading

stschiff commented Sep 4, 2023

stschiff commented Sep 4, 2023

93Boy commented Sep 5, 2023

93Boy commented Sep 11, 2023

stschiff commented Sep 13, 2023

93Boy commented Sep 13, 2023

stschiff commented Sep 18, 2023

stschiff commented Oct 4, 2023

stschiff commented Oct 4, 2023

AyGhal commented Oct 4, 2023

Anglo_Saxon_Project #104

Anglo_Saxon_Project #104

Conversation

93Boy commented Nov 9, 2022

stschiff commented Dec 19, 2022

stschiff commented Dec 19, 2022

stschiff commented May 30, 2023

93Boy commented May 30, 2023

93Boy commented May 31, 2023

stschiff commented Jun 1, 2023

93Boy commented Jun 1, 2023

stschiff commented Jun 5, 2023

93Boy commented Jun 8, 2023

93Boy commented Jun 12, 2023

stschiff commented Jun 22, 2023

93Boy commented Jul 7, 2023

stschiff commented Aug 11, 2023

stschiff commented Aug 11, 2023 • edited Loading

stschiff commented Aug 23, 2023

stschiff commented Sep 4, 2023 • edited Loading

stschiff commented Sep 4, 2023

stschiff commented Sep 4, 2023

93Boy commented Sep 5, 2023

93Boy commented Sep 11, 2023

stschiff commented Sep 13, 2023

93Boy commented Sep 13, 2023

stschiff commented Sep 18, 2023

stschiff commented Oct 4, 2023

stschiff commented Oct 4, 2023

AyGhal commented Oct 4, 2023

stschiff commented Aug 11, 2023 •

edited

Loading

stschiff commented Sep 4, 2023 •

edited

Loading