Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping deletions of the same size wrongly clustered #350

Open
AyushSaxena opened this issue Aug 17, 2022 · 2 comments
Open

Overlapping deletions of the same size wrongly clustered #350

AyushSaxena opened this issue Aug 17, 2022 · 2 comments

Comments

@AyushSaxena
Copy link

We have two overlapping deletions of size 22bp that we can identify with clear breakpoints on IGV. One is ~50% of the population, and the other is ~10%. However sniffles2 only detects the major variant. I am using sniffles2 after reducing the minimum support (--minsupport) and minimum sv length (--minsvlen) in non germline mode. I'm using high quality pacbio CCS reads.

I believe that the proximity of the two deletions is making sniffles2 cluster them together. I have seen a similar behaviour with pbsv as well.

Is there a way to prevent sniffles2 from clustering closeby mutations? If I understand correctly, the point of clustering proximate variants is to account for alignment errors, but with CCS reads, the alignment is clean (using minimap2 'map-hifi' mode)

I've tried changing cluster bin size, value of cluster-r and other paramters, but I'm unable to rescue the minor 22-bp deletion. All suggestions are welcome!

Ayush

@fritzsedlazeck
Copy link
Owner

Hi Ayush,
just to be sure that is on the population level right ? Not on the single sample calling ?
Thanks
Fritz

@AyushSaxena
Copy link
Author

AyushSaxena commented Aug 20, 2022

Hi Fritz,

I apologize for not being specific enough. This is single sample calling in plasmid DNA that has two potential hairpin regions close to each other, producing overlapping deletion calls when viewed on IGV. (We don't really know what the structure of the hairpin is in double-stranded plasmid). We are sequencing a heterogenous population of plasmids, all sequenced as one sample, and when viewed on IGV, ~50% of the plasmid DNA has one deletion but not the other, and ~10% are the other way around (carries deletion 2 but not deletion 1). I don't think we ever see them both deleted at the same time. Sniffles detects the dominant mutation but misses the rarer one.

I've been trying several parameters on sniffles to ensure that they don't get clustered together somehow. The signal on the CIGAR string is the same for both '22D'. I'm struggling the same way with pbsv as well where I haven't been able to find the right setting where these mutations don't get clustered. The sv signature file in pbsv detects both the 'haplotypes', but the final vcf filters the rarer allele.

Ayush

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants