-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Higher vg call AD values after using vg filter #4443
Comments
Yeah, that doesn't seem to make much sense. Are you able to share the gam and gbz, along with a site that's identical in the vcf but has higher AD in the filtered one? Thanks. |
Yes I can - do you have an email address I can send a google drive link to? Thanks a lot! |
Sure, use my name here, glennhickey, at gmail. |
great - shared, let me know if you have trouble accessing |
got it, thanks. |
Actually, I don't see the .xg in the shared folder, which is required for |
Ah yes sorry - that should be in there now |
Hi, sorry for the delay. I have indeed reproduced this and agree there is something fishy going on. If the genotype is different, then it's understandable for AD to be different due to how call uses rounding and flow assignment to come up with support. But there are cases with the same genotype with completely different weights that I can't explain. Anyway, I haven't forgotten, just haven't had time to get to the bottom of it -- will hopefully have news soon. |
Hi Glen, thanks for the update: glad it's not just me! Let me know if there's any other files you need here. |
1. What were you trying to do?
2. What did you want to happen?
Identify coverage of ref/alt alleles from the vcf call AD statistic - and looking at the impact of the vg filter/call step.
3. What actually happened?
Running vg filter on the gam reduced the size of the gam substantially. In most cases the AD for each site in $gam.pack.vcf vs. $filter.gam.pack.vcf made sense: the AD was higher for the calls from the unfiltered gam. However there a minority of sites (~2.5%) of cases where the AD for either the reference or alt allele is higher in the filtered gam than the same site in the unfiltered gam (which substantially alters the estimated ref/alt allele coverage - see example graph
, blue points are where the ref or alt allele coverage is higher from the filtered vcf, red are where it is lower) - I don't understand how this can be the case, surely the filtering should only be removing reads? The graph is made from a biallelic vcf so it shouldn't be mutli-allele mapping that is the problem.
5. What data and command can the vg dev team use to make the problem happen?
I can send the data and exact script I used if needed. Any thoughts on this much appreciated!
6. What does running
vg version
say?vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601```
The text was updated successfully, but these errors were encountered: