-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage with huge datasets #87
Comments
A bit unrelated question to the issue but may I ask @maximilian-heeg how is your lab using Baysor on the HPC? I am trying to run it using Singularity to avoid installing things at the HPC level, so far not lucky (even though the docker container works). Thanks and sorry for any spam to your issue! |
@maximilian-heeg , thank you for this test! It's indeed a problem. We're working on memory optimizations for v0.7.0, and if it works as expected, it should drastically reduce the memory size (10 folds or so). As for tiling, we also plan to add this graph cut idea, but it's not there yet. So the only thing you could atm is manually split the data by FOVs. |
@sebgoti , a short answer: I didn't try Baysor with Singularity. We have our lab servers, which are just big singular machines, so no clusters. If you need some input on your situation, I'd be happy to continue the discussion in a separate issue. |
@sebgoti I have tried to run the Docker container using Singularity, but that did not work for me on the HPC. I ended up installing juliaup in a conda environment and then building baysor as described in the Readme. Viel Erfolg! @VPetukhov Thank you so much for the answer and your work on this. For us, getting a good segmentation is currently the bottleneck of processing spatial data. I will try to split it into multiple FOVs. |
@VPetukhov, you say split the data by FOVs using the fov_name column of the transcripts.csv.gz file? |
@VPetukhov Hello, I and members of my lab are also very curious about if the new release is still in progress, and expected release time if you know. We are working with data between 10 million and 25 million transcripts, and hard to use current version with our resources |
Hi,
Thank you so much for providing Baysor. I recently updated my installation to version 0.6.2, and it is running great with Julia 1.9.
In our lab, we have recently generated new (huge) spatial datasets with up to 250 million transcripts (using a 500 gene panel), and we were planning to use Baysor for cell segmentation. I was expecting that this requires a lot of memory, so I did some benchmarking with smaller FOVs of the dataset (see below).
It seems to that, that memory use scales linear with the number of transcripts. Extrapolating this, I would assume that our dataset with 250 million transcripts requires approximately 5-6 TB memory (which I unfortunately don't even have on our HPC).
Are there any solutions to that? Is there an easy way of creating smaller tiles and stitching them back together? I think, with the increasing panel sizes and imaging areas of commercial solutions, this might become an important limitation for many users soon.
Any help/ideas/suggestions are greatly appreciated.
Max
The text was updated successfully, but these errors were encountered: