Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high memory consumption when using gsva #277

Open
dmalzl opened this issue Sep 8, 2024 · 2 comments
Open

high memory consumption when using gsva #277

dmalzl opened this issue Sep 8, 2024 · 2 comments

Comments

@dmalzl
Copy link

dmalzl commented Sep 8, 2024

As fellow Python user who does not want to switch to R too often I love the gseapy package. Recently, I discovered that it also offers a GSVA implementation. To see if this interpretation helps with what I am tyring to achieve with my data, I decided to give it a try and ran it using 32 cores. The data at hand is 13k samples x 36k genes which is quite large but easily fits in 32GB of RAM (depending on the representation). However, I had quite some issues with the memory consumption of the implementation where it would stay at around 30GB for most of the run but blows out of proportion to 220GB at the end. I did not dig into the code yet but I feel like this is a bit out of hand. From experience with Python/R interfaces I can only guess that this may come from shuttling data between Rust and Python or maybe is due to the data being converted from sparse to dense at some point of the algorithm. Other options also include all 32 threads receiving a copy of the same data. In any case, although I am lucky our system can handle such a large amount of RAM usage, I feel like there is an opportunity to make the algorithm more memory efficient to also allow others to use it on large datasets.

Any thoughts on why this is and how to improve on it?

@zqfang
Copy link
Owner

zqfang commented Sep 9, 2024

how large the gene_set size ?

You can iterate through the gene_set ( one set )per run, this may reduce the memory. I will see if I can reduce the memory usage in the Rust backend.

@dmalzl
Copy link
Author

dmalzl commented Sep 10, 2024

There are 7300 gene sets in my GMT file. However, I fear running them separately will increase computation drastically as eunning them in one already takes 17h.

Thanks for looking into it. As I said it is not really a problem for me I just noticed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants