Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datashader speedup and bugfixes #309

Merged
merged 34 commits into from
Oct 23, 2024
Merged

Conversation

Sonja-Stockhaus
Copy link
Collaborator

This speeds everything up for me locally, but for the benchmark (#296), I see an effect (aka datashader faster than matplotlib) for e.g. 10k points/shapes etc.

@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 17, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 83.43195% with 28 lines in your changes missing coverage. Please review.

Project coverage is 83.36%. Comparing base (80c6f77) to head (73568ec).

Files with missing lines Patch % Lines
src/spatialdata_plot/pl/render.py 76.59% 22 Missing ⚠️
src/spatialdata_plot/pl/utils.py 91.78% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
- Coverage   83.73%   83.36%   -0.37%     
==========================================
  Files           8        8              
  Lines        1543     1653     +110     
==========================================
+ Hits         1292     1378      +86     
- Misses        251      275      +24     
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/basic.py 89.26% <ø> (ø)
src/spatialdata_plot/pl/render_params.py 100.00% <100.00%> (ø)
src/spatialdata_plot/pl/utils.py 77.48% <91.78%> (+1.11%) ⬆️
src/spatialdata_plot/pl/render.py 90.25% <76.59%> (-4.47%) ⬇️

trans = mtransforms.Affine2D(matrix=affine_trans)
trans_data = trans + ax.transData

rgba_image = np.transpose(rgba_image.data.compute(), (1, 2, 0)) # type: ignore[attr-defined]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and in render_shapes), we access the image as numpy array from the SpatialImage. mypy doesn't believe that compute() exists...

@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review July 17, 2024 18:57
@Sonja-Stockhaus Sonja-Stockhaus requested a review from timtreis July 17, 2024 18:57
Comment on lines 452 to 460

# compute canvas size in pixels close to the actual image size to speed up computation
plot_width = x_ext[1] - x_ext[0]
plot_height = y_ext[1] - y_ext[0]
plot_width_px = int(round(fig_params.fig.get_size_inches()[0] * fig_params.fig.dpi))
plot_height_px = int(round(fig_params.fig.get_size_inches()[1] * fig_params.fig.dpi))
factor = np.min([plot_width / plot_width_px, plot_height / plot_height_px])
plot_width = int(np.round(plot_width / factor))
plot_height = int(np.round(plot_height / factor))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider bundling this code in a private function since it's duplicate from above.

@@ -456,8 +495,25 @@ def _render_points(
cmap=render_params.cmap_params.cmap,
)

rbga_image = np.transpose(ds_result.to_numpy().base, (0, 1, 2))
cax = ax.imshow(rbga_image, zorder=render_params.zorder, alpha=render_params.alpha)
# create SpatialImage to get it back to original size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this code is a duplicate, I'd consider it refactoring it into a private function.

@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 19, 2024 that may be closed by this pull request
@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as draft July 19, 2024 15:48
@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 30, 2024 that may be closed by this pull request
3 tasks
@Sonja-Stockhaus Sonja-Stockhaus linked an issue Jul 31, 2024 that may be closed by this pull request
@LucaMarconato
Copy link
Member

LucaMarconato commented Aug 1, 2024

Thanks @Sonja-Stockhaus ! I have tried the PR on the Visium HD data and the speed up is significant! I switched back to using datashader instead of matplotlib in the notebook 😊

@LucaMarconato
Copy link
Member

LucaMarconato commented Aug 9, 2024

Sonja, I found a bug with this PR when the rasterized object needs to be aligned with the rest #291. The solution seems straightforward, I think one just needs to initialize the image element from datashader using the old coordinate transformations.

@timtreis
Copy link
Member

Will look into it!

@timtreis timtreis marked this pull request as ready for review August 14, 2024 23:59
@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 27, 2024

Thanks for the update!

outline of polygons need to be implemented in datashader

This task here seems tricky. I would consider raising a warning saying that the outline is currently not supported and workin on this on a separate PR (with low priority).

@Sonja-Stockhaus
Copy link
Collaborator Author

outline of polygons need to be implemented in datashader

This task here seems tricky. I would consider raising a warning saying that the outline is currently not supported and workin on this on a separate PR (with low priority).

If you use cvs.line instead of cvs.polygons you get only the outline, so you should be able to overlay the two. Depending on how fast we want to merge this PR, we can of course move that to a new one

@LucaMarconato
Copy link
Member

Ah ok, then it should be a fast addition, thanks for the update.

@timtreis timtreis assigned Sonja-Stockhaus and unassigned timtreis Sep 30, 2024
@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review October 2, 2024 07:20
@Sonja-Stockhaus
Copy link
Collaborator Author

General problem of datashader so far: when you render elements, coloring them by a value and a color map, you wouldn't see a single element if all of them have the same value X. This is due to the scaling of the data for the colormap that happens in datashader (scaled_data = (data - span[0])/(span[1] - span[0])) which leads to all elements being assigned alpha=0.

Fix in this PR: internally, not the whole colormap is passed to datashader, but just the color given by cmap(0.0). It is used to color all elements. Still, a colorbar of the cmap is drawn with the range [X, X+1].

@LucaMarconato
Copy link
Member

Thanks for the explanation, I think the approach that you implemented is a good one!

Copy link
Contributor

@melonora melonora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Sonja-Stockhaus, thanks for the amazing work. I think it is very far already, but would like some minor changes here and there, mainly to shorten the code a bit and some refactorings to keep it more maintainable.

src/spatialdata_plot/pl/basic.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/utils.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/utils.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed

src/spatialdata_plot/pl/render_params.py Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
src/spatialdata_plot/pl/render.py Outdated Show resolved Hide resolved
tests/pl/test_render_shapes.py Show resolved Hide resolved
tests/pl/test_render_shapes.py Show resolved Hide resolved
@Sonja-Stockhaus
Copy link
Collaborator Author

As discussed with @timtreis, we now use sum as default datashader aggregation for points and mean for shapes

@timtreis timtreis merged commit bc2db2c into main Oct 23, 2024
5 checks passed
@timtreis timtreis deleted the feature/296-datashader-canvas-size branch October 23, 2024 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working points 🧮 Anything related to Points priority: high shapes 🫧 Anything related to Shapes
Projects
None yet
5 participants