Tips for larger matrices? #11

PedroMilanezAlmeida · 2021-01-12T22:25:11Z

I am working with a matrix that has 53201 cells and 20245 genes.

Its size in memory is only 482 MB as a dgCMatrix but 8.62 GB as.matrix().

When I try RunALRA from Seurat, I get:

Error: vector memory exhausted (limit reached?)

Same if I try to run with alra(A_norm = as.matrix(normRNA), use.mkl = FALSE) and use.mkl = TRUE (only that if TRUE it takes a lot longer to show the error).

Do you have any suggestions for how to run on large matrices on a laptop?

The text was updated successfully, but these errors were encountered:

linqiaozhi · 2021-01-13T14:40:10Z

Hi Pedro, thanks for your interest in ALRA.

The ALRA function produces multiple copies of the matrix, which can be problematic when you have limited memory. The reason the matrices are duplicated in the memory is because we originally thought people will want to access the imputed matrix before scaling and thresholding. This does not seem to be the case...we are pretty much only interested in the final matrix.

Please see this branch, I added a function called alra.low.memory(). That should reduce the memory footprint. Can you try that function? See here.

If you still are having trouble, can you tell me at which step you actually get the error? Also, how much memory is on your laptop? Are any of these steps helpful?

PedroMilanezAlmeida · 2021-01-13T15:20:31Z

Hi George, thanks for the quick feedback!

If I get it right, the change in alra.low.memory is in line 271 (don't return all matrices), right? However, when I tried to run alra step-by-step last night, memory was exhausted already at line 232 (A_norm_rank_k is already another (approximate) copy of A_norm, occupying additional 8.6 GB in memory).

While going through alra step-by-step, I tried to convert A_norm_rank_k to a dgCMatrix but, probably bc A_norm_rank_k is not sparse, the conversion also exhausted the memory. I also tried to force the matrix multiplications in line 227 to give a dgCMatrix as result by coverting fastDecomp_noc$u, fastDecomp_noc$v and diag(fastDecomp_noc$d) to dgCMatrix, but the matrix multiplications as dgCMatrix blew up memory anyways and never finished.

My solution for the moment was to run alra only on 2k variable genes instead of the entire matrix, which runs pretty smooth and fast now, but I haven't yet looked into whether the results are any good.

Btw, my laptop has 16GB mem and I have not tried to change R_MAX_VSIZE in .Renviron yet.

Biomiha · 2023-12-14T15:22:30Z

Hi @linqiaozhi,

I've recently come across this issue also having hit the memory limits due to the large number of cells we are analysing.
I have noticed that there is a bug in the alra.low.memory function. There is a line that checks if the class of the input A_norm is a matrix but the if statement only takes class(A_norm) == "matrix". In my case the output of class(A_norm) is a vector of matrix and array and this generates an error that prevents the function from completing.
I have submitted a pull request where the if statement checks if matrix is present in the vector and ignores any additional classes.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tips for larger matrices? #11

Tips for larger matrices? #11

PedroMilanezAlmeida commented Jan 12, 2021

linqiaozhi commented Jan 13, 2021

PedroMilanezAlmeida commented Jan 13, 2021 •

edited

Loading

Biomiha commented Dec 14, 2023

Tips for larger matrices? #11

Tips for larger matrices? #11

Comments

PedroMilanezAlmeida commented Jan 12, 2021

linqiaozhi commented Jan 13, 2021

PedroMilanezAlmeida commented Jan 13, 2021 • edited Loading

Biomiha commented Dec 14, 2023

PedroMilanezAlmeida commented Jan 13, 2021 •

edited

Loading