Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dashing2 for metagenome #71

Open
jianshu93 opened this issue Apr 3, 2023 · 1 comment
Open

dashing2 for metagenome #71

jianshu93 opened this issue Apr 3, 2023 · 1 comment

Comments

@jianshu93
Copy link

Hello Daniel,

I am comparing dashing2 with bindash and Mash for metagenome. I am well aware of the fact that canonical kmer was used in Mash, so that for metagenomic reads (always pair end due to sequencing), pair-end reads can be merged into one single reads by overlap detection, so that we do not need to process so many reads but only half of them since it is the same if we use canonical k-mer. I did not see a suggestion from Mash or dashing to do merge first (very fast), then we can reduce computation time to half without changing results at all. what do you think

Thanks,

Jianshu

@dnbaker
Copy link
Owner

dnbaker commented Jul 10, 2023

Hi Jianshu -

Interesting. Yes, you can collapse them together. A lot depends on if the two ends overlap with each other. You can safely concatenate the sequences with an N between - Dashing and Dashing2 will mask any k-mers with unknown k-mers, so you'll end up with one k-mer set for the paired-end reads.

Normally you could just concatenate the files directly since they end up in the same bucket. But you are right, any preprocessing can make things smaller.

And to check in about DartMinHash - I've worked on incorporating its weighted minhashing scheme but I haven't had time to test accuracy results and merge it in. If it helps us with weighted sketching, it would really help cut out the costs of --bagminhash weighted sketching.

Sorry for the delay!

Thanks,

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants