Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging gvcf with GLnexus introduces non-zero heterozygous PL in hemizygous PAR #811

Open
ASLeonard opened this issue Apr 23, 2024 · 3 comments
Assignees

Comments

@ASLeonard
Copy link

This is a follow up from an email thread with @pichuan, where DeepVariant correctly produces only hemizygous genotypes in regions indicated as PAR, but through GLnexus merging and later imputation can gain "impossible" heterozygous genotypes.

The probably mainly seems to originate in GLnexus merging, where uncalled alleles (e.g., for "GT:DP:AD:GQ:PL:RNC" "./.:5:5,0:0:14,0,104:II" where I = gVCF input site is non-called) has a non-zero likelihood of being heterozygous according the PL tag, which is later imputed to a "0/1" genotype. But this also occurs for samples called as reference (0/0) but with non-zero PL tags for 0/1.

For example, the third sample gains a heterozygous genotype at site Y:6899016 after imputation, and has the gvcf block

Y       6898989 .       C       <*>     0       .       END=6899117     GT:GQ:MIN_DP:MED_DP:PL  0/0:1:0:0:0,3,29

leads to the GLnexus merged GT

Y       6899016 Y_6899016_T_C   T       C       47      .      0/0:0:0,0:1:0,3,29:..

and then the imputed

Y       6899016 Y_6899016_T_C   T       C       47      .      0/1:1:0,1,0

variant calling with DeepVariant

Running with singularity image v1.6.1.
This was done with the "WGS" model, using --channels "insert_size" --include_med_dp --gvcf <gvcf> for make_examples and --haploid_contigs "X,Y" --par_regions_bed <PAR.bed> for postprocess.

merging with GLnexus

Running with singularity image v1.4.1-0-g68e25e5.
This uses a modified DeepVariant config, where there is no genotype revision (revise_genotypes: false).

/usr/local/bin/glnexus_cli \
        --dir $TMPDIR/GLnexus.DB \
        --config <deepvariant_preset_with_revise_genotypes_false> \
        --threads 4 \
        --mem-gbytes 20 \
        *.g.vcf |\
        bcftools view - |\
        bgzip -@ 8 -c > Yhap.Unrevised.vcf.gz

imputing with beagle4

This is where the "./." genotypes with non-zero het GL turn into "0/1".

java -jar -Xss25m -Xmx50G beagle.27Jan18.7e1.jar ne=100 gl=Yhap.Unrevised.vcf.gz out=Yhap.beagle4

The unrevised (glnexus output, with no hets) and imputed (beagle output, with hets) variants are here, along with the gvcfs. I can provide the alignments, references, etc. if you need as these samples are already public.
Y_haploid_vcf.tar.gz

Best,
Alex

@pichuan pichuan self-assigned this Apr 23, 2024
@pichuan
Copy link
Collaborator

pichuan commented May 6, 2024

Hi @ASLeonard ,
A quick update: We think this is because we need to adjust our PL in our gVCF as well.
I'm working on a code change, but it might take a bit longer than I thought.
I just want to let you know that this is still in my queue!

@pichuan
Copy link
Collaborator

pichuan commented Jul 18, 2024

Update: Internally our team has been making other improvements to postprocess_variants, so I've not been actively looking into this issue. I'll plan to resume in the next few weeks.

@pichuan
Copy link
Collaborator

pichuan commented Aug 19, 2024

Update: I've done some investigation, and I think we need to change the logic in make_examples, specifically somewhere around here:

https://github.com/google/deepvariant/blob/r1.6.1/deepvariant/variant_caller.py#L207-L230

I've done a prototype but the behavior isn't quite what I expected yet. I'll continue to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants