ENH: adds `keeplength` option to ensure the correct region is extracted when using primers. #208

mikerobeson · 2024-11-02T16:00:34Z

Fixes #175. This is now possible as the keeplength option was added to the q2-alignment plugin.

Adds keeplength parameter to make sure original alignment length does not change. Otherwise there is potential for the incorrect region to be extracted from the alignment. This is important to make sure alignment positions can easily be referenced to the original alignment. See the q2-alignment plugin, specifically the mafft-add test cases for examples.

Also adds clearer clarification and a warning within the trim-alignment description when using primers to find and extract amplicon regions.

All of the original trim-alignment tests still pass.

nbokulich

Looks good, thanks @mikerobeson ! I have one small question inline.

Also: could you maybe add a tiny unit test to make sure this is working as intended? Maybe you have a test case where the primer introduced gaps in an alignment and led to incorrect results? Or is it all just very theoretical?

nbokulich · 2024-11-03T06:25:35Z

rescript/trim_alignment.py

        alignment_with_primers, = expand_alignment_action(
            alignment=aligned_sequences,
            sequences=primers,
            addfragments=True,
+            keeplength=True,


hey @mikerobeson do we want to hardcode this param? Should we make it True by default but expose the option to the user? Is there ever a case when they might want to not keep the length? In this case this would only happen if there are indels in the primer region, right? which we probably don't want to preserve.

These primer indel cases are explicitly tested in the mafft-add tests by toggling that option on/off. I was thinking I did not need to reproduce that test case here.

The issue is that the final alignment length changes after the primer sequences are added. Often, due to common column gaps in the newly combined alignment being removed. From what I can observe this is regardless if new indels are added within the location of the newly aligned primers. If any of the V3V4, V4, V4V5 primers are used no indels are added to the location where the primers align.

Due to the alignment length change after the primers are added to the alignment, the derived positional values of the primers will differ from the proper positions of the original alignment. Thus, when these derived positional values are used, the actual region extracted is not the region we are after, as they are based on the positions of the new alignment.

I suppose I could add another set of small files based on a few of the actual SILVA alignment sequences. Then I can check the expected vs observed returned alignment length differences. I'd just have to override keeplength=True option and compare. I've done this manually, by using an alignment viewer, and the differences are quite obvious.

This issue is the trimming aspect... we are deriving the positional values from the resulting alignment after the primers are added. Then using those values to trim from the original alignment. This won't work in all cases as the resulting alignment length can very likely change. Which is why we need to use keeplength=True. I think that toggling this option makes obvious sense for the general mafft-add function, which is implemented ... but not for our specific use case here: finding alignment position trimming locations. Also, I'd like to think that most primers have already been vetted against a global set of reference sequences. 🤞

okay now I understand! So any length change messes up the positional trimming. I agree, hardcoding sounds necessary in that case.

And agreed, unit tests are not needed if this is explicitly tested in q2-alignment.

I just realized that I can likely fix this by adding an if statement to check if primers were used. If primers are used then this line could be something like:

result = _trim_all_sequences(aligned_sequences_with_primers, trim_positions)

That is, we should be trimming the correct data from that new alignment rather than the original one. Then the primer sequences can removed from that output prior to saving. This would enable us add the ability to toggle the keeplength option. For instance if users would like to concatenate their edits to some master alignment and not worry about alignment length changes.

Thought that might be unnecessarily onerous... 🤔

Hi @nbokulich, I've added explicit unit test for the mafft --keeplength option. I decided to forgo toggling the --p-keeplengh parameter, as it is expected that this should used when trying to find the location of added fragments. That is, the mafft parameter --mapout (currently not accessible within the plugin) enforces the mafft --keeplength parameter if it is not supplied. Thus, I think we are in keeping with the spirit of the tool when it comes to determining alignment positions. :-)

Actually, I'm still working on the test cases... I'll let you know when I am done. :-)

Okay... I realized that I should actually run my test cases rather than mock them. This PR should be ready to review now. :-)

mikerobeson · 2024-11-11T21:45:30Z

Appears to be failing on ModuleNotFoundError: No module named 'qiime2.plugins.alignment'. Not sure why.... When I run the tests locally within the q2dev-amplicon environment all the tests pass. This applies for both the 2024.10 and 2025.4 versions of dev.

nbokulich · 2024-11-18T15:25:37Z

looks like q2-alignment is missing from the recipe. Try adding this and re-run?

mikerobeson · 2024-11-18T17:57:54Z

Well, that's embarrassing. That is what the issue was... 🤦

Everything passes. 🎉

adding keeplength option and help text clarification

dd500b9

nbokulich requested changes Nov 3, 2024

View reviewed changes

mikerobeson added 3 commits November 11, 2024 10:53

added unit test for keeplength option

d3c8eaa

fixed test keeplength test cases to reflect actual running of action

a565653

updated mafft_add import statement in tests

ee2e35c

mikerobeson added 2 commits November 18, 2024 11:33

added q2-alignment to recipe

ebabb5d

added q2-alignment to recipe, run

5f3a7d6

Merge branch 'master' into keeplength

5035109

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: adds `keeplength` option to ensure the correct region is extracted when using primers. #208

ENH: adds `keeplength` option to ensure the correct region is extracted when using primers. #208

mikerobeson commented Nov 2, 2024 •

edited

Loading

nbokulich left a comment

nbokulich Nov 3, 2024

mikerobeson Nov 3, 2024 •

edited

Loading

mikerobeson Nov 3, 2024 •

edited

Loading

nbokulich Nov 3, 2024

mikerobeson Nov 3, 2024 •

edited

Loading

mikerobeson Nov 11, 2024 •

edited

Loading

mikerobeson Nov 11, 2024

mikerobeson Nov 11, 2024 •

edited

Loading

mikerobeson commented Nov 11, 2024

nbokulich commented Nov 18, 2024

mikerobeson commented Nov 18, 2024

ENH: adds keeplength option to ensure the correct region is extracted when using primers. #208

Are you sure you want to change the base?

ENH: adds keeplength option to ensure the correct region is extracted when using primers. #208

Conversation

mikerobeson commented Nov 2, 2024 • edited Loading

nbokulich left a comment

Choose a reason for hiding this comment

nbokulich Nov 3, 2024

Choose a reason for hiding this comment

mikerobeson Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

mikerobeson Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

nbokulich Nov 3, 2024

Choose a reason for hiding this comment

mikerobeson Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

mikerobeson Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

mikerobeson Nov 11, 2024

Choose a reason for hiding this comment

mikerobeson Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

mikerobeson commented Nov 11, 2024

nbokulich commented Nov 18, 2024

mikerobeson commented Nov 18, 2024

ENH: adds `keeplength` option to ensure the correct region is extracted when using primers. #208

ENH: adds `keeplength` option to ensure the correct region is extracted when using primers. #208

mikerobeson commented Nov 2, 2024 •

edited

Loading

mikerobeson Nov 3, 2024 •

edited

Loading

mikerobeson Nov 3, 2024 •

edited

Loading

mikerobeson Nov 3, 2024 •

edited

Loading

mikerobeson Nov 11, 2024 •

edited

Loading

mikerobeson Nov 11, 2024 •

edited

Loading