dashing2 for metagenome #71

jianshu93 · 2023-04-03T02:51:42Z

Hello Daniel,

I am comparing dashing2 with bindash and Mash for metagenome. I am well aware of the fact that canonical kmer was used in Mash, so that for metagenomic reads (always pair end due to sequencing), pair-end reads can be merged into one single reads by overlap detection, so that we do not need to process so many reads but only half of them since it is the same if we use canonical k-mer. I did not see a suggestion from Mash or dashing to do merge first (very fast), then we can reduce computation time to half without changing results at all. what do you think

Thanks,

Jianshu

dnbaker · 2023-07-10T19:03:00Z

Hi Jianshu -

Interesting. Yes, you can collapse them together. A lot depends on if the two ends overlap with each other. You can safely concatenate the sequences with an N between - Dashing and Dashing2 will mask any k-mers with unknown k-mers, so you'll end up with one k-mer set for the paired-end reads.

Normally you could just concatenate the files directly since they end up in the same bucket. But you are right, any preprocessing can make things smaller.

And to check in about DartMinHash - I've worked on incorporating its weighted minhashing scheme but I haven't had time to test accuracy results and merge it in. If it helps us with weighted sketching, it would really help cut out the costs of --bagminhash weighted sketching.

Sorry for the delay!

Thanks,

Daniel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dashing2 for metagenome #71

dashing2 for metagenome #71

jianshu93 commented Apr 3, 2023

dnbaker commented Jul 10, 2023

dashing2 for metagenome #71

dashing2 for metagenome #71

Comments

jianshu93 commented Apr 3, 2023

dnbaker commented Jul 10, 2023