Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected memory usage #80

Open
matnguyen opened this issue Jul 19, 2023 · 4 comments
Open

Expected memory usage #80

matnguyen opened this issue Jul 19, 2023 · 4 comments

Comments

@matnguyen
Copy link

What's the expected memory usage per genome for Dashing2? I'm trying to run it on 500,000 viral isolates, and am running out of memory even with 500GB

@dnbaker
Copy link
Owner

dnbaker commented Jul 20, 2023 via email

@matnguyen
Copy link
Author

This is the command I'm running:

dashing2_savx2 sketch --cmpout dist_mat.txt -k 7 --parse-by-seq -p 32 sequences.multi.fa

@dnbaker
Copy link
Owner

dnbaker commented Jul 21, 2023

Thank you! This is a big help.

There's one other place memory is used in --parse-by-seq mode: storing the sequences of each read. What's happening here is that the whole fasta ends up being stored in memory.

I have to say this isn't desirable behavior for most cases. If edit distance is chosen for an output distance or the program is running in greedy clustering mode, then the program needs to hold on to the sequences for later use, but otherwise it doesn't need to hold on to them.

I need to do a bit of work to reorder this to avoid this problem; I think I have a path to do it, but it will take a bit of reorganization.

I'll update you when there's a fix for this.

Thanks again,

Daniel

@dnbaker
Copy link
Owner

dnbaker commented Jul 31, 2023

Checking back in - this is improved with #81. I'm rebuilding v2.1.18 binaries currently and will update you when they're ready.

Memory usage should be a ower for --parse-by-seq mode. It won't hold onto sequences it doesn't need.

Would you give it another try?

Thanks!

Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants