You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am comparing dashing2 with bindash and Mash for metagenome. I am well aware of the fact that canonical kmer was used in Mash, so that for metagenomic reads (always pair end due to sequencing), pair-end reads can be merged into one single reads by overlap detection, so that we do not need to process so many reads but only half of them since it is the same if we use canonical k-mer. I did not see a suggestion from Mash or dashing to do merge first (very fast), then we can reduce computation time to half without changing results at all. what do you think
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered:
Interesting. Yes, you can collapse them together. A lot depends on if the two ends overlap with each other. You can safely concatenate the sequences with an N between - Dashing and Dashing2 will mask any k-mers with unknown k-mers, so you'll end up with one k-mer set for the paired-end reads.
Normally you could just concatenate the files directly since they end up in the same bucket. But you are right, any preprocessing can make things smaller.
And to check in about DartMinHash - I've worked on incorporating its weighted minhashing scheme but I haven't had time to test accuracy results and merge it in. If it helps us with weighted sketching, it would really help cut out the costs of --bagminhash weighted sketching.
Hello Daniel,
I am comparing dashing2 with bindash and Mash for metagenome. I am well aware of the fact that canonical kmer was used in Mash, so that for metagenomic reads (always pair end due to sequencing), pair-end reads can be merged into one single reads by overlap detection, so that we do not need to process so many reads but only half of them since it is the same if we use canonical k-mer. I did not see a suggestion from Mash or dashing to do merge first (very fast), then we can reduce computation time to half without changing results at all. what do you think
Thanks,
Jianshu
The text was updated successfully, but these errors were encountered: