Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize rasterization of categorical NdOverlays #6206

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

philippjfr
Copy link
Member

Currently when we rasterize an NdOverlay we always concatenate all the data (and insert NaNs between each layer if we are rasterizing lines). This process of concatenation is very expensive and largely pointless if we are generating a categorical aggregate that is coincident with the dimension of the NdOverlay, i.e. each layer in the overlay is a distinct category.

Therefore this PR adds an optimization for this specific codepath where instead of concatenating and rasterizing everything in a single datashader call, we keep each categories data as a distinct dataframe and make multiple calls to datashader, concatenating all the resulting arrays along the categorical dimension after the fact. This produces the same result without paying the cost.

Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 78.43137% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 88.38%. Comparing base (4a526c0) to head (a877ff2).
Report is 2 commits behind head on main.

Files Patch % Lines
holoviews/operation/datashader.py 78.43% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #6206       +/-   ##
===========================================
+ Coverage   17.43%   88.38%   +70.94%     
===========================================
  Files         323      323               
  Lines       67582    67627       +45     
===========================================
+ Hits        11784    59771    +47987     
+ Misses      55798     7856    -47942     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jbednar
Copy link
Member

jbednar commented Apr 23, 2024

Nice! Could consider applying the same optimization in the non-categorical case, but I guess that would need to be done by having Datashader accept a list or dict of lines, so that two-stage aggregators like ds.mean can be handled.

@philippjfr
Copy link
Member Author

Yes originally considered that but didn't want to handle parsing all the aggregators. Certainly count would be an easy one though. There is a point where the cost of concatenating amortizes though.

@droumis
Copy link
Member

droumis commented Apr 23, 2024

I think I might be confused, but assuming the 'after' below is the proposed API, so far it saves a bit of time by not having to do a groupby, but the rasterizing appears to be slower compared to main.

Before

curve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
              for key, value in df.groupby('parameter')}

overlay = hv.NdOverlay(curve_dict, kdims='parameter')

rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
    responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover'])

After

curve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
              for key, value in df_dict.items()}

overlay = hv.NdOverlay(curve_dict, kdims='parameter')

rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
    responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover'])

@philippjfr
Copy link
Member Author

I think I might be confused, but assuming the 'after' below is the proposed API

I don't think there's any change to the API needed or proposed here, but the performance profile of this change should be that on first render this should be quite a lot faster and on every subsequent render (e.g. zoom events) it'll be slightly slower.

@droumis
Copy link
Member

droumis commented Apr 24, 2024

Got it. It would be nice if there was a convenient way to report timing of the initial and subsequent paint. Hopefully soon!

While a dict of df's is so much cleaner.. I wonder if the cost of manually inserting nan-separators into a df will be fully amortized over just a couple interaction updates, when using rasterize(hv.Curve(df)).

@hoxbro hoxbro added the type: enhancement Minor feature or improvement to an existing feature label May 17, 2024
@droumis
Copy link
Member

droumis commented May 23, 2024

This approach may not be feasible for dozens of lines. At some point the resulting imagestack (1 image per line) grows beyond what is reasonably performant

@philippjfr philippjfr added this to the 1.19.x milestone Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants