-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The subtraction function of Dashing #80
Comments
Hi there! This is currently not supported, but it's something you could use Dashing to perform parts of the computation. If you perform sketching with --full-khash-sets, then Dashing will create 64-bit hash sets and write them to disk in a gzip file. The first 8 bytes is a number indicating the number of hashes, and the rest of the file is 64-bit. You could then load the hash sets for A and B, filtering by C. I see that it could be very useful, and we'll consider supporting it directly as development continues. Let me know if you need any further help or have any question.. Thanks, Daniel |
Hi, Daniel Got it. Thanks a lot for your prompt reply! I will try to use the "sketch" function. In this case, is it possible to load these hash files by python? Cause I am not very familiar with c++. Another not so important question: Regards, |
Hi, Daniel I have just tested the "--full-khash-sets" parameters with one genome. But there is an error. The command I used is I have assigned 200G memory for this job, which should be enough for one small bacterial genome (~5M). |
Hi Liao, You're right, that probably isn't running out of memory. That was a bug, unfortunately, which I've found/fixed in this branch, and it's now been merged into main here. You can download the new binaries from https://github.com/dnbaker/dashing/tree/main/release/. You could do a lot of functionality from within Python. To parse each of these k-mer files, here's some Python code:
This yields a hash set in vector form. So after using the fixed/rebuilt code to eliminate the segmentation fault, you might try something like this:
|
Thanks for letting me know. What's your operating system? You could check if it has permissions (
You can then install it either with |
By the way - while there's no such functionality in this software, I added to Dashing2 a feature which supports something like this. The feature uses the flag And your request inspired me to provide it - so thank you! |
Firstly, thanks a lot for your excellent tool! It's really cool!
This is my question:
Currently, I have three files, A.fq, B.fa, C.fa. And I want to calculate the distance between A.fq and B.fa, but not consider k-mers in C.fa. In other words, I need to do dist(A-C, B-C). I am not sure whether dashing can be used to do this?
If it works, how about many "B", I mean, dist(A-C, "genome_path"-C), where "genome_path" refers to many fasta files (just like the input file (genome_path.txt) of dist function of dashing), not only one B.
Thanks a lot for your answering in advance!
The text was updated successfully, but these errors were encountered: