-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removePrimers misses valid primer matches in certain circumstances #989
Comments
Thanks for opening this issue, somehow your pull request slipped through my to-do list. Request: Can you provide a minimal example of the specific problem being addressed here? E.g. fastq file(s) with like 3 sequences, one of which should be handled better with this new logic? |
Sure @benjjneb ... I'll plan to get this to you, hopefully this weekend. |
Did we follow up on this over email? |
No, sorry, I haven't got round to this yet. I'll probably return to looking at it in a month or so and I'll try to follow up then... |
Very sorry for the long delay, @benjjneb . I've finally got around to putting together some test sets that show the issue. I'm hoping this falls under the "better late than never" category! To demonstrate, I'm following the outline of this procedure, using a subset of the data.
The above set of commands filters with three different parameter sets. The first, Inspecting the output, I found 4370 reads missing from It seems the issue is that as the matching parameters are loosened, spurious (coincidental) matches might be identified that lead to counterintuitive and undesirable results. For cases where the true match is on the reverse-complement sequence, I believe what can happen is that when inspecting the forward read sequence, one of the primers can give a spurious hit, but the other one doesn't (as expected since the true match is on the reverse complement)...It seems the current algorithm quits at this point and omits the read from the output, whereas my patch is intended to prompt it to continue looking by trying the reverse-complement (as it would if neither primer gave match on the forward read sequence). I should note that there can still be issues, when loosening the matching parameters, if BOTH the primers give spurious hits to the forward read sequence. This leads to the still undesirable (though more intuitive) result of mis-trimmed sequences, generally having unexpected length. (It seems a small fraction of the reads have this issue with the Callahan_16S_R_3.1.ccs99.9.in_A_not_B2.fastq.gz |
Currently, using
removePrimers
withorient=TRUE
appears to miss valid primer matches in certain circumstances. Specifically, it seems that in the currentremovePrimers
logic, if there is a fwd-only primer match on the original (non-reverse-complement) sequence, a full (fwd+rev) match on the reverse complement will not be considered. This can lead to unexpected behavior, e.g. "loosening" of matching thresholds can result in fewer passing read sequences.I believe #956 should fix this.
The text was updated successfully, but these errors were encountered: