-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AFS folding (polarised=False) #2972
Comments
Hm, well, maybe this bit answers your question? (It's talking about sites with more than two alleles)
... if not, can you provide a concrete example (and the code to generate it)? |
Thank you Peter! Well, this is the passage which I have a bit of a hard time extrapolating to a joint 2D-AFS, for instance. But here is an example. I consider two populations of size 10000 diploids each, which diverged 6000 generations ago. I sample 2 diploids (4 haploids) in each population. Herebelow I build a polarised and unpolarised (folded) joint 2D-AFS using only one site, to showcase the problem I am considering. Specifically, I am looking at a site where the MAF is 50% (the "ambiguous" minority I was referring to in my previous message). Here, the genotypes are [1,1,1,1] in Pop A and [0,0,0,0] in Pop B. Everything is fine for the polarised 2D-AFS. However, for the folded version, I would expect that both entries
So, this is the variant at the single retained site:
We can see there is a single mutation. The corresponding genotypes: The polarised 2D-AFS:
The unpolarised (folded) 2D-AFS:
But this is the folded 2D-AFS I would expect:
Many thanks again for your lights! |
Oh... this is interesting. If I now select a site with genotypes Polarised:
Folded:
So now, the folded is identical to the polarised... NB. In all examples, I consider only biallelic sites. Edit: Part of the code changed:
|
Ah, okay - thanks for the examples, this is much clearer (not that your initial message wasn't clear, but it's clearer to me now I've seen the example). I think that bit I quoted is irrelevant, since that's just about sites with more than two alleles. So, it sounds like you're expecting a matrix with half the counts in each of the two places they could be assigned to (i.e. , ξ[i,j] == ξ[n-i,m-j])? Instead, we're returning an array in which some of the elements have been zeroed out. For instance:
Looking at the joint AFS I'm surprised by what I see there:
Notice that the only slots that got added together were (4,0) and (0,4). However, this is all as expected - adjusting the parameters a bit so I get mutations in most of the slots:
The rule for which half of the entries are kept in the unpolarised version is something like "entries with Is this answering the question? |
Thank you so much Peter! I now get it, and it is now quite clear with the rule what Thanks a lot for your help :) |
I agree, better docs would be nice. Hm, it currently says:
I think we left it at this because the precise specification of which cells are empty is not at all easy to explain (because of the edge cases, basically). Do you have any suggestions? (And, how do those other programs handle it?) |
Dear all,
I was checking variable implementations of the (joint) allele frequency spectrum folding and went through the polarisation subsection in
msprime
. I am not sure I clearly understood the procedure implemented ints.allele_frequency_spectrum(..., polarised=False)
, so all my apologies if I misinterpreted something...Consider for instance a simple two-population AFS,$\xi$ (with haploid sample sizes $n_0 = n_1$ ), I thought I understood that any site with derived allele counts $i$ and $j$ where $i+j=(n_0+n_1)/2$ would add a half-count in both $\xi(i,j)$ and $\xi(n_0-i,n_1-j)$ . However, looking into the spectrum, it seems that all cells checking the condition of ambiguous allele minority do not equal their reverse (which I expect would be the case if indeed the said sites were following the implementation described previously).
Am I misunderstanding the
msprime
implementation of folding?Many thanks for your lights!
All the best,
Rémi
The text was updated successfully, but these errors were encountered: