Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no dashing2 cmp help info #51

Open
jianshu93 opened this issue Mar 13, 2022 · 2 comments
Open

no dashing2 cmp help info #51

jianshu93 opened this issue Mar 13, 2022 · 2 comments

Comments

@jianshu93
Copy link

Hello Daniel,

I compute sketches using dashing2 sketch and store sketch using - - outfile. There is a sketch and a sketch.name.txt. But I am not sure how to feed those to dashing2 cmp since no help is provided. I looked into the code and it confuses me. I can use -cmpout to have the distance but I want to check how long sketching and cmp take.

Thanks,

Jianshu

@jianshu93
Copy link
Author

Hello Daniel,

time dashing2 sketch -k 11 -S 12000 --threads 24 --pminhash --topk 250 --cmpout phage_GPD_topK_250.txt -Q name.txt -F reference.txt

With and without the --topk 250 option, I have exactly the same output. Am I making a mistake? I am using the newest v2.1.11 for 512bw. Forgive me the sketch command read me is a little bit long/confusing.

Thanks,

Jianshu

@dnbaker
Copy link
Owner

dnbaker commented Mar 23, 2022

Hi Jianshu,

Sorry for the wait! It's been a busy couple of weeks.

I've added in cmp usage (3b71c9c), so thank you for pointing that out.

You can pass sketched files to cmp, but you have to sketch all the original input files together. For example, something like this:

dashing2 sketch  <sketching options> -F filelist.txt -o stacked_sketch_file.
dashing2 cmp <sketching options> --presketched stacked_sketch_file.rc_canon.sketchsize1024.k32.SetSpace.DNA.opss 

This way, it's broken into two stages (which you can time). This also makes it easier to work with larger collections, since the sketch matrix can be memory-mapped and therefore exceed system RAM.

Does that help?

I'll look into the topk 250 option results as well. I'm not sure what's going on there, but I'll let you know.

Thanks,

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants