Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourmash ONT taxonomy questions #3480

Open
peterdoug opened this issue Jan 10, 2025 · 1 comment
Open

Sourmash ONT taxonomy questions #3480

peterdoug opened this issue Jan 10, 2025 · 1 comment

Comments

@peterdoug
Copy link

First, thanks for such a great tool! Sourmash (especially with the branchwater plugin) is incredibly impressive.

Firstly, I'm interested in using sourmash for ONT metagenome taxonomic assignment. From the paper "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets", sourmash seems to struggle a bit with ONT.
If the root issue here is low read accuracy, does anyone have any experience with pre-processing using read error correction tools such as herro or dechat?

Secondly, does anyone have experience with using translated long reads with sourmash-sketched protein databases for taxonomic assignment? Do you expect this to improve taxonomic assignment?
In relation to this, I see that base sourmash currently supports translated sketching. Is this a feature you are considering adding to the branchwater plugin?

Thanks for any answers or comments!

@ctb
Copy link
Contributor

ctb commented Jan 12, 2025

First, thanks for such a great tool! Sourmash (especially with the branchwater plugin) is incredibly impressive.

Thank you! Flattery will get you many places :)

Firstly, I'm interested in using sourmash for ONT metagenome taxonomic assignment. From the paper "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets", sourmash seems to struggle a bit with ONT. If the root issue here is low read accuracy, does anyone have any experience with pre-processing using read error correction tools such as herro or dechat?

We do not, but would love to hear back!

Please also see: contig level gather, #3095, which is being worked on (albeit back burner),

Secondly, does anyone have experience with using translated long reads with sourmash-sketched protein databases for taxonomic assignment? Do you expect this to improve taxonomic assignment?

Mmmh, I do not have good intuition here. I could see it going either way:

  • smaller k-sizes + increased sensitivity of protein matches across long evolutionary distances => improvement
  • more matches, including spurious ones, and increased computational costs associated with more matches => degradation

@bluegenes thoughts?

In relation to this, I see that base sourmash currently supports translated sketching. Is this a feature you are considering adding to the branchwater plugin?

It is not so far away - sourmash-bio/sourmash_plugin_branchwater#262 and sourmash-bio/sourmash_plugin_branchwater#520 - so if we had a good reason it would be straightforward to add.

Such a reason might be you finding that it works well in small test circumstances and now you want to scale up... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants