-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid MAPQ=60 for single-end multimappers #327
Conversation
approved. |
by ensuring that we compute at least two alignments even if the first that is found has edit distance 0. See #293
(Note that I had a typo/logic error in the PR and pushed a fix.) I tried to do some measurements. It’s a bit hard to inspect multimappers only, so I looked at incorrectly mapped reads instead, which should all be multimappers: Assuming a couple of things about how the read mapper works, these should have at least two roughly equivalent mapping locations (from the read mapper’s point of view): The correct one and the one the read mapper actually outputs. So they should get MAPQ=0. So issue #293 is essentially that we get many incorrectly mapped reads that are assigned MAPQ>0, which should actually have been assigned MAPQ=0. I generated a test dataset with know mapping locations by picking every tenth 100-mer from the Drosophila reference (about 14.4 million reads). Then I could count incorrectly mapped reads that have MAPQ>0 and MAPQ=0, respectively. For good measure, I also counted the correctly mapped reads with MAPQ>0 and MAPQ=0. The results are as follows:
I think the relevant numbers are the two bold ones: The "bad" alignments are much closer to zero and the number of "good" ones has gone up. Also, all four percentages are now closer to BWA-MEM. Note that the percentage of "correct and MAPQ>0" goes down (and "correct and MAPQ=0" up accordingly). I assume these are mostly multimappers that strobealign by chance placed into the "correct" position (the one from which the k-mer was sampled). They previously got MAPQ=60, but since they are multimappers, they should get MAPQ=0, which they now do. |
great improvement, approved! |
I forgot to add a changelog entry in this PR and took the liberty of pushing that directly to main without a PR. |
I realized I hadn’t check the performance impact of this change, so here are some measurements on 100 bp reads.
These measurements were quite noisy, especially for rye. |
Ok, worth it. |
Agreed! |
by ensuring that we compute at least two alignments even if the first that is found has edit distance 0.
Closes #293
Because I still need to test whether this actually fixes #293, I’m marking this as draft.