Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry on Handling Nested Variants #4483

Closed
GooLey1025 opened this issue Dec 20, 2024 · 1 comment
Closed

Inquiry on Handling Nested Variants #4483

GooLey1025 opened this issue Dec 20, 2024 · 1 comment

Comments

@GooLey1025
Copy link

Dear vg team,

Thank you for developing such an incredible tool; it has been a valuable resource for my work.

I am currently working on mapping short reads to a Cactus-output graph for genotyping to detect and have been exploring two approaches to detect SV:

  • Using vg giraffe followed by vg call.
  • Using PanGenie.

For SNPs and small indels, I understand that tools like DeepVariant might perform better than vg or PanGenie, as these tools are primarily optimized for genotyping SVs. Is that correct? I would appreciate your insights on this point.

My Cactus graph represents a pangenome of 110 rice genomes, and I observed that genotyping one sample using PanGenie takes nearly 9 hours, which is longer than I expected. Due to this runtime issue, I turned to vg giraffe as an alternative.

I would like to confirm whether the VCF output from vg call can handle nested variants inside of bubbles, similar to how PanGenie processes nested variants (as described in the PanGenie Wiki, particularly the steps related to PanGenie-ready-input processing and convert_biallelic for PanGenie output).

Additionally, I would greatly appreciate it if you could review the following command to ensure its correctness for achieving my goal:

t=36
prefix=Nipponbare.pangenome.d11
sample=C002
export TMPDIR=/public/home/cszx_huangxh/qiujie/collabrators/gulei/rice_graph_pangenome/genotyping/giraffe/hapl/TMP
kmc -k29 -m128 -okff -t$t -hp ../fq/merged_$sample.fq.gz ${TMPDIR}/$sample $TMPDIR
# vg gbwt -p --num-threads $t -r $prefix.ri -Z $prefix.gbz
# vg haplotypes -v 2 -t $t -H $prefix.hapl $prefix.gbz
vg haplotypes -v 2 -t $t --include-reference --diploid-sampling -i $prefix.hapl -k ${TMPDIR}/$sample.kff -g $TMPDIR/$sample.gbz $prefix.gbz
vg giraffe -p -t $t -Z $TMPDIR/$sample.gbz -i -f ../fq/merged_$sample.fq.gz > $sample.gam
vg pack -t $t -x $prefix.gbz -g $sample.gam -o $sample.pack -Q 5  
vg call -t $t $prefix.gbz -k $sample.pack -a -z > ${sample}_giraffe_call.vcf

Everything went ok without error.

@glennhickey
Copy link
Contributor

About nested variants, I'm hoping to add support to vg call soon, but it's not ready yet. Same for pangenie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants