-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: adds keeplength
option to ensure the correct region is extracted when using primers.
#208
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @mikerobeson ! I have one small question inline.
Also: could you maybe add a tiny unit test to make sure this is working as intended? Maybe you have a test case where the primer introduced gaps in an alignment and led to incorrect results? Or is it all just very theoretical?
rescript/trim_alignment.py
Outdated
alignment_with_primers, = expand_alignment_action( | ||
alignment=aligned_sequences, | ||
sequences=primers, | ||
addfragments=True, | ||
keeplength=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @mikerobeson do we want to hardcode this param? Should we make it True by default but expose the option to the user? Is there ever a case when they might want to not keep the length? In this case this would only happen if there are indels in the primer region, right? which we probably don't want to preserve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These primer indel cases are explicitly tested in the mafft-add tests by toggling that option on/off. I was thinking I did not need to reproduce that test case here.
The issue is that the final alignment length changes after the primer sequences are added. Often, due to common column gaps in the newly combined alignment being removed. From what I can observe this is regardless if new indels are added within the location of the newly aligned primers. If any of the V3V4, V4, V4V5 primers are used no indels are added to the location where the primers align.
Due to the alignment length change after the primers are added to the alignment, the derived positional values of the primers will differ from the proper positions of the original alignment. Thus, when these derived positional values are used, the actual region extracted is not the region we are after, as they are based on the positions of the new alignment.
I suppose I could add another set of small files based on a few of the actual SILVA alignment sequences. Then I can check the expected vs observed returned alignment length differences. I'd just have to override keeplength=True
option and compare. I've done this manually, by using an alignment viewer, and the differences are quite obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is the trimming aspect... we are deriving the positional values from the resulting alignment after the primers are added. Then using those values to trim from the original alignment. This won't work in all cases as the resulting alignment length can very likely change. Which is why we need to use keeplength=True
. I think that toggling this option makes obvious sense for the general mafft-add
function, which is implemented ... but not for our specific use case here: finding alignment position trimming locations. Also, I'd like to think that most primers have already been vetted against a global set of reference sequences. 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay now I understand! So any length change messes up the positional trimming. I agree, hardcoding sounds necessary in that case.
And agreed, unit tests are not needed if this is explicitly tested in q2-alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that I can likely fix this by adding an if
statement to check if primers were used. If primers are used then this line could be something like:
result = _trim_all_sequences(aligned_sequences_with_primers, trim_positions)
That is, we should be trimming the correct data from that new alignment rather than the original one. Then the primer sequences can removed from that output prior to saving. This would enable us add the ability to toggle the keeplength
option. For instance if users would like to concatenate their edits to some master alignment and not worry about alignment length changes.
Thought that might be unnecessarily onerous... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nbokulich, I've added explicit unit test for the mafft --keeplength
option. I decided to forgo toggling the --p-keeplengh
parameter, as it is expected that this should used when trying to find the location of added fragments. That is, the mafft parameter --mapout
(currently not accessible within the plugin) enforces the mafft --keeplength
parameter if it is not supplied. Thus, I think we are in keeping with the spirit of the tool when it comes to determining alignment positions. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm still working on the test cases... I'll let you know when I am done. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay... I realized that I should actually run my test cases rather than mock them. This PR should be ready to review now. :-)
Appears to be failing on |
looks like q2-alignment is missing from the recipe. Try adding this and re-run? |
Well, that's embarrassing. That is what the issue was... 🤦 Everything passes. 🎉 |
Fixes #175. This is now possible as the
keeplength
option was added to the q2-alignment plugin.Adds
keeplength
parameter to make sure original alignment length does not change. Otherwise there is potential for the incorrect region to be extracted from the alignment. This is important to make sure alignment positions can easily be referenced to the original alignment. See the q2-alignment plugin, specifically the mafft-add test cases for examples.Also adds clearer clarification and a warning within the
trim-alignment
description when using primers to find and extract amplicon regions.All of the original
trim-alignment
tests still pass.