Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell2location runs over an hour in test mode #717

Closed
scottgigante-immunai opened this issue Nov 29, 2022 · 7 comments · Fixed by #724
Closed

cell2location runs over an hour in test mode #717

scottgigante-immunai opened this issue Nov 29, 2022 · 7 comments · Fixed by #724
Labels
bug Something isn't working

Comments

@scottgigante-immunai
Copy link
Collaborator

scottgigante-immunai commented Nov 29, 2022

Per the execution timeline, cell2location takes a long time to run even when test=True. This is costing us money. @vitkl can you look into this?

@scottgigante-immunai scottgigante-immunai added the bug Something isn't working label Nov 29, 2022
@vitkl
Copy link
Contributor

vitkl commented Nov 30, 2022

Is the number of test epoch determined here?

if test:
max_epochs_sc = max_epochs_sc or 2
max_epochs_st = max_epochs_st or 2
num_samples = num_samples or 10

@vitkl
Copy link
Contributor

vitkl commented Nov 30, 2022

What does it mean that the same name appears multiple times in the report?
E.g. search "spatial_decomposition:cell2location_detection_alpha_20_nb-tabula_muris_senis_alpha_1:openproblems-python-extras":
One process took 14 min while another run took 1h. How are these different?

@vitkl
Copy link
Contributor

vitkl commented Nov 30, 2022

A reason for 15 minutes instead of 2 minutes could be that computing num_samples=10 on full data on CPU takes time. I can suggest reducing num_samples=2. Why would this test take 1h I don't know. I need to know more about why there are many processes with the same name.

I will look into which variables are sampled and if any of them are large (re issue below).

At the moment amortised version uses too much memory (scverse/scvi-tools#1801) and I am working on a solution. However, this doesn't explain this issue because amortised version is not slower here.

@scottgigante-immunai
Copy link
Collaborator Author

scottgigante-immunai commented Dec 1, 2022 via email

@vitkl
Copy link
Contributor

vitkl commented Dec 2, 2022

Maybe the test can subset the data to a smaller number of genes/cells/locations or simulate fewer locations? That would speed up tests for all methods. Cell2location package does a very similar test on random data which runs in 30 seconds (https://github.com/BayraktarLab/cell2location/actions/runs/3554709736/jobs/5970971306).

@vitkl
Copy link
Contributor

vitkl commented Dec 2, 2022

Did num_samples=2 actually help?

@scottgigante-immunai
Copy link
Collaborator Author

scottgigante-immunai commented Dec 2, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants