Use SVDB merge for merging samples to case #424

jemten · 2024-10-11T13:34:56Z

Hola! Merging of sample SV calls to case should ideally be handled by a tool that can handle the imprecise locations of SV. bcftools merge will only merge exact matches. One option is SVDB merge. Others are Jasmine or SURVIVOR. Maybe a check with @J35P312 could be beneficial.

nallo/subworkflows/local/call_svs/main.nf

Line 86 in 1263d72

BCFTOOLS_MERGE ( ch_bcftools_merge_in, ch_fasta, ch_fai, ch_bed )

fellen31 · 2024-10-14T07:33:34Z

Hm, my question would be: what if you have a sample with a call that have good and exact breakpoints, and then you merge it with 50 other samples and the results becomes less exact?

My idea was that the annotation with SVDB query would is imprecise (and annotate SVs that are the same but not exact matches with the same annotations), but I understand that this would lead to the same SV being reported twice in a "family" / CG case.

J35P312 · 2024-10-14T08:20:25Z

In general The "precisness" of SV varies across the genome, even within high quality data. There are biological reasons complicating the positioning of SV as well, such as microhomology.

BCFtools is nice for the small SV, they behave like INDELS so they can be merged based on the ALT sequence. For large SV you need to take the start, end and SVtype in account. BCFtools does not look at the END tag, so it will treat the SV as a single point. Then you are better of setting the bnd_distance to 1 in SVDB.

But in truth, its probably better to apply some custom approach for the population genomic projects. I would recomend merging the Sniffles2 files directly using Sniffles2 for instance.

"but I understand that this would lead to the same SV being reported twice in a "family" / CG case."

Not only that! Its important to merge the SV to get the correct inheritance patterns.

fellen31 · 2024-10-14T08:28:45Z

In general The "precisness" of SV varies across the genome, even within high quality data. There are biological reasons complicating the positioning of SV as well, such as microhomology.

BCFtools is nice for the small SV, they behave like INDELS so they can be merged based on the ALT sequence. For large SV you need to take the start, end and SVtype in account. BCFtools does not look at the END tag, so it will treat the SV as a single point. Then you are better of setting the bnd_distance to 1 in SVDB.

But in truth, its probably better to apply some custom approach for the population genomic projects. I would recomend merging the Sniffles2 files directly using Sniffles2 for instance.

Thanks Jesper. If we do want to use SVDB instead and not Sniffles2 for merging calls, do you think the default 0.6 and 10,000 BND distance is good/reasonable for both say creating a small dataset of 100-1000 samples, and a CG case?

We should also merge calls within-sample from HiFiCNV with calls from Severus/Sniffles, same question there :)

Not only that! Its important to merge the SV to get the correct inheritance patterns.

Yes, definitely!

adameur · 2024-10-14T10:40:58Z

In my opinion, what should be considered the same SV is a philosophical question and most likely we'll never find a tool that works perfectly. Maybe one thing could be to look at what is being done in big projects around the world, so we're using an approach that facilitates international collaboration? For example, if we're using ColorsDB for filtering maybe it would make sense to use a similar approach as they did.. But I don't know, maybe there are good reasons to choose some other option. In any case I think it's a really interesting and important question. Maybe that graph genomes can improve this at some point but that feels quite far in the future

fellen31 · 2024-10-15T11:41:01Z

Seems like the most appropriate action is to separate the building and exporting of a VCF for larger population calling/building in-house databases (#372), and exporting a merged case/project VCF (this issue).

fellen31 added this to the 0.4 milestone Oct 14, 2024

fellen31 self-assigned this Oct 15, 2024

fellen31 mentioned this issue Oct 15, 2024

Replace bcftools with SVDB for SV merging #428

Merged

10 tasks

fellen31 closed this as completed in #428 Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SVDB merge for merging samples to case #424

Use SVDB merge for merging samples to case #424

jemten commented Oct 11, 2024

fellen31 commented Oct 14, 2024

J35P312 commented Oct 14, 2024

fellen31 commented Oct 14, 2024

adameur commented Oct 14, 2024

fellen31 commented Oct 15, 2024

Use SVDB merge for merging samples to case #424

Use SVDB merge for merging samples to case #424

Comments

jemten commented Oct 11, 2024

fellen31 commented Oct 14, 2024

J35P312 commented Oct 14, 2024

fellen31 commented Oct 14, 2024

adameur commented Oct 14, 2024

fellen31 commented Oct 15, 2024