Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips for larger matrices? #11

Open
PedroMilanezAlmeida opened this issue Jan 12, 2021 · 3 comments
Open

Tips for larger matrices? #11

PedroMilanezAlmeida opened this issue Jan 12, 2021 · 3 comments

Comments

@PedroMilanezAlmeida
Copy link

I am working with a matrix that has 53201 cells and 20245 genes.

Its size in memory is only 482 MB as a dgCMatrix but 8.62 GB as.matrix().

When I try RunALRA from Seurat, I get:

Error: vector memory exhausted (limit reached?)

Same if I try to run with alra(A_norm = as.matrix(normRNA), use.mkl = FALSE) and use.mkl = TRUE (only that if TRUE it takes a lot longer to show the error).

Do you have any suggestions for how to run on large matrices on a laptop?

@linqiaozhi
Copy link
Member

Hi Pedro, thanks for your interest in ALRA.

The ALRA function produces multiple copies of the matrix, which can be problematic when you have limited memory. The reason the matrices are duplicated in the memory is because we originally thought people will want to access the imputed matrix before scaling and thresholding. This does not seem to be the case...we are pretty much only interested in the final matrix.

Please see this branch, I added a function called alra.low.memory(). That should reduce the memory footprint. Can you try that function? See here.

If you still are having trouble, can you tell me at which step you actually get the error? Also, how much memory is on your laptop? Are any of these steps helpful?

@PedroMilanezAlmeida
Copy link
Author

PedroMilanezAlmeida commented Jan 13, 2021

Hi George, thanks for the quick feedback!

If I get it right, the change in alra.low.memory is in line 271 (don't return all matrices), right? However, when I tried to run alra step-by-step last night, memory was exhausted already at line 232 (A_norm_rank_k is already another (approximate) copy of A_norm, occupying additional 8.6 GB in memory).

While going through alra step-by-step, I tried to convert A_norm_rank_k to a dgCMatrix but, probably bc A_norm_rank_k is not sparse, the conversion also exhausted the memory. I also tried to force the matrix multiplications in line 227 to give a dgCMatrix as result by coverting fastDecomp_noc$u, fastDecomp_noc$v and diag(fastDecomp_noc$d) to dgCMatrix, but the matrix multiplications as dgCMatrix blew up memory anyways and never finished.

My solution for the moment was to run alra only on 2k variable genes instead of the entire matrix, which runs pretty smooth and fast now, but I haven't yet looked into whether the results are any good.

Btw, my laptop has 16GB mem and I have not tried to change R_MAX_VSIZE in .Renviron yet.

@Biomiha
Copy link

Biomiha commented Dec 14, 2023

Hi @linqiaozhi,

I've recently come across this issue also having hit the memory limits due to the large number of cells we are analysing.
I have noticed that there is a bug in the alra.low.memory function. There is a line that checks if the class of the input A_norm is a matrix but the if statement only takes class(A_norm) == "matrix". In my case the output of class(A_norm) is a vector of matrix and array and this generates an error that prevents the function from completing.
I have submitted a pull request where the if statement checks if matrix is present in the vector and ignores any additional classes.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants