Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse performance using datashader? #296

Closed
LucaMarconato opened this issue Jul 13, 2024 · 6 comments · Fixed by #309
Closed

Worse performance using datashader? #296

LucaMarconato opened this issue Jul 13, 2024 · 6 comments · Fixed by #309
Assignees
Labels
bug Something isn't working images 🖼️ Anything related to Images labels 🏷️ Anything related to Labels points 🧮 Anything related to Points priority: medium

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 13, 2024

I wrote some benchmarks available here #295 (they can simply run as tests) and I have noticed that the datashader performance is worse than the matplotlib based one.

I think this maybe be due to the size of the canvas used by datashader since in the MERFISH example here #243 the performance was (as expected) better.

Therefore using a smaller default canvas size may fixed the issue. @Sonja-Stockhaus could you please have a look into this?

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Jul 13, 2024

Here are the results of a (single) run of the tests (the timing are consistent across multiple manual runs).

image

@LucaMarconato
Copy link
Member Author

With the fix that I proposed to the performance bug here #297 the performance gap is much bigger

image

@timtreis
Copy link
Member

@Sonja-Stockhaus my "didn't-look-at-the-code" theory is that datashader generates too large of an image which then bypasses the rasterisation-downsampling logic. Wdyt?

@timtreis timtreis added bug Something isn't working priority: medium images 🖼️ Anything related to Images labels 🏷️ Anything related to Labels points 🧮 Anything related to Points labels Jul 14, 2024
@Sonja-Stockhaus
Copy link
Collaborator

Yep, datashader generates an image that is exactly the size of the extent (large extent = large image = long runtime). I'll think of sth so that we can use a smaller canvas size and then maybe rasterize or so to bring it back to the original scale.
Do we want a heuristic again to decide on the "smaller canvas size"?

I also noticed that for datashader, e.g. the radius of the points is relative to the axes which is not the case for matplotlib. So for a large extent you need extremely large point sizes to even make them visible at all with datashader. That should be consistent with matplotlib.

@LucaMarconato
Copy link
Member Author

Thanks for the explanation. I would reuse the logic of _rasterize_if_necessary() or _multiscale_to_spatial_image() to take the dpi of the figure and the fig_size into consideration, since the extent could be extremely large, but in the end we are limited by the pixels available on screen/paper for plotting.

@LucaMarconato
Copy link
Member Author

Btw, off-topic comment, when plotting Visium HD data as points/circles I noticed a Moire pattern due to the presence of a small rotation in the raw data. With datashader rasterization the Moire pattern disappears, which is great! So using datashader could have also this nice use case beyond improved performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working images 🖼️ Anything related to Images labels 🏷️ Anything related to Labels points 🧮 Anything related to Points priority: medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants