-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usearch_global search aligning to Ns with 100% identity #393
Comments
Thanks for reporting this. I have seen similar behaviour as well. This is related to issue #354. Matches between/to ambiguous residues is currently counted as matches, and the output is therefore as expected. Matches to long stretches of N's like this are usually unwanted. |
Any updates on this? We are also facing the same issue skewing the results. Is there a way to see the match score w.r.t alignment length? |
No, there is currently no way to see the match score. The score for matching a nucleotide vs an N is zero. I am not sure how to handle this. Alignments can have a negative score and still be shown, both in vsearch and usearch. The alignment score is just used to align a pair of sequences in the best possible way. Note that terminal gaps (and gap penalties) are usually not counted. These kind of matches with a lot of Ns can also be produced by usearch, but perhaps not exactly this one with only Ns, due to some heuristics. To eliminate these kind of matches, I think we need to add an option where ambiguous matches (with other symbols than ACGTU) are not counted as matches. Currently matches between compatible symbols, e.g. A vs R, but not A vs Y, are counted as matches when computing the identity percentage. We could also add an option to set a (negative) score for ambiguous matches. |
Thank you for replying.
My suggestion would be to differentiate Mixed bases ( like A vs R) from more generic bases like (A vs N). If we could differentiate just the ‘N’s it will be useful. Mixed bases could also mean Mixed populations in some cases and are very subjective.
I think the practical way to implement this would be to give that option to users. If users can somehow input what combinations can be considered as a match and what would be the weight for each combination on the matching score, It will be useful for all cases.
Regards,
Ragavi.
|
Seeing false full length alignments that show 100% identity to stretches of Ns.
vsearch v2.14.1_linux_x86_64
alignment.txt
The text was updated successfully, but these errors were encountered: