Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uchime2_denovo to close #92 #100

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

colinbrislawn
Copy link
Contributor

@colinbrislawn colinbrislawn commented Sep 29, 2024

WIP to close #92
Add internal functions and tests to use 'uchime' 'uchime2' or 'uchime3'
Open questions:

  1. Externally, do we want to present all these as new methods/functions or new settings?
qiime vsearch uchime-denovo
qiime vsearch uchime2-denovo
qiime vsearch uchime3-denovo
# or
qiime vsearch uchime-denovo --p-method 'uchime2'
  1. Internally, do we want to combine some of these? They are very similar, especially uchime2 and 3

The only difference [in --uchime3_denovo] from --uchime2_denovo is that the default minimum abundance
skew (--abskew) is set to 16.0 rather than 2.0.

  1. Do we want to expose --abskew for some or all of these?

@hagenjp
Copy link
Contributor

hagenjp commented Oct 3, 2024

Hi @colinbrislawn,
1./2. We all agree that new parameters would be best vs. new methods. Thank you!
3. We do not have strong feelings about --abskew but making sure that the default is none (would be best so that it can match the algorithm by default)

@colinbrislawn
Copy link
Contributor Author

colinbrislawn commented Oct 3, 2024

new parameters would be best

Cool!

How does this look for the CLI? (CLI docs for the existing function)

qiime vsearch uchime-denovo \
  --p-method 'uchime2' \
  --p-mindiffs 99 # ignored when running uchime2 and uchime3
  --p-mindiv 0.8 # ignored when running uchime2 and uchime3
  --p-minh 0.99 # ignored when running uchime2 and uchime3
  ...

What should we do if someone passes settings that are not used by uchime2 and uchime3?
vsearch simply ignores them silently, which I don't love for our API

@colinbrislawn
Copy link
Contributor Author

Let's add support for --abskew in a different PR, to keep things tidy ✨ 🧹

# this function only exists to simplify testing
chimeras = DNAFASTAFormat()
nonchimeras = DNAFASTAFormat()
uchime_stats = UchimeStatsFmt()
with tempfile.NamedTemporaryFile() as fasta_with_sizes:
_fasta_with_sizes(str(sequences), fasta_with_sizes.name, table)
cmd = ['vsearch',
'--uchime_denovo', fasta_with_sizes.name,
'--' + method + '_denovo', fasta_with_sizes.name,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to do this?

@colinbrislawn colinbrislawn marked this pull request as ready for review October 3, 2024 21:18
@colinbrislawn
Copy link
Contributor Author

colinbrislawn commented Oct 3, 2024

Is --p-method a good name? I could go back to --p-algorithm or something new.

The methods are tested with 100% coverage, but the results are not, as I'm not sure how they should work with non-trivial examples.

If we want to build a test that shows the difference between these methods, here's commentary on vsearch's implementation:
torognes/vsearch#283
torognes/vsearch#503

@nbokulich, if you have the time and interest, I would appreciate your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Awaiting Info
Development

Successfully merging this pull request may close these issues.

ENH: add support for the different versions of uchime_denovo within vsearch
2 participants