Optimize rasterization of categorical NdOverlays #6206

philippjfr · 2024-04-23T16:01:01Z

Currently when we rasterize an NdOverlay we always concatenate all the data (and insert NaNs between each layer if we are rasterizing lines). This process of concatenation is very expensive and largely pointless if we are generating a categorical aggregate that is coincident with the dimension of the NdOverlay, i.e. each layer in the overlay is a distinct category.

Therefore this PR adds an optimization for this specific codepath where instead of concatenating and rasterizing everything in a single datashader call, we keep each categories data as a distinct dataframe and make multiple calls to datashader, concatenating all the resulting arrays along the categorical dimension after the fact. This produces the same result without paying the cost.

codecov · 2024-04-23T16:05:25Z

Codecov Report

Attention: Patch coverage is 78.43137% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 88.38%. Comparing base (4a526c0) to head (a877ff2).
Report is 2 commits behind head on main.

Files	Patch %	Lines
holoviews/operation/datashader.py	78.43%	11 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #6206       +/-   ##
===========================================
+ Coverage   17.43%   88.38%   +70.94%     
===========================================
  Files         323      323               
  Lines       67582    67627       +45     
===========================================
+ Hits        11784    59771    +47987     
+ Misses      55798     7856    -47942

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jbednar · 2024-04-23T17:36:52Z

Nice! Could consider applying the same optimization in the non-categorical case, but I guess that would need to be done by having Datashader accept a list or dict of lines, so that two-stage aggregators like ds.mean can be handled.

philippjfr · 2024-04-23T17:40:56Z

Yes originally considered that but didn't want to handle parsing all the aggregators. Certainly count would be an easy one though. There is a point where the cost of concatenating amortizes though.

droumis · 2024-04-23T23:08:58Z

I think I might be confused, but assuming the 'after' below is the proposed API, so far it saves a bit of time by not having to do a groupby, but the rasterizing appears to be slower compared to main.

Before

curve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
              for key, value in df.groupby('parameter')}

overlay = hv.NdOverlay(curve_dict, kdims='parameter')

rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
    responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover'])

After

curve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
              for key, value in df_dict.items()}

overlay = hv.NdOverlay(curve_dict, kdims='parameter')

rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
    responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover'])

philippjfr · 2024-04-24T16:45:08Z

I think I might be confused, but assuming the 'after' below is the proposed API

I don't think there's any change to the API needed or proposed here, but the performance profile of this change should be that on first render this should be quite a lot faster and on every subsequent render (e.g. zoom events) it'll be slightly slower.

droumis · 2024-04-24T19:30:13Z

Got it. It would be nice if there was a convenient way to report timing of the initial and subsequent paint. Hopefully soon!

While a dict of df's is so much cleaner.. I wonder if the cost of manually inserting nan-separators into a df will be fully amortized over just a couple interaction updates, when using rasterize(hv.Curve(df)).

droumis · 2024-05-23T10:03:27Z

This approach may not be feasible for dozens of lines. At some point the resulting imagestack (1 image per line) grows beyond what is reasonably performant

Optimize rasterization of categorical NdOverlays

9f80545

Merge branch 'main' into datashade_multi_category

a877ff2

hoxbro added the type: enhancement Minor feature or improvement to an existing feature label May 17, 2024

philippjfr added this to the 1.19.x milestone Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize rasterization of categorical NdOverlays #6206

Optimize rasterization of categorical NdOverlays #6206

philippjfr commented Apr 23, 2024

codecov bot commented Apr 23, 2024 •

edited

Loading

jbednar commented Apr 23, 2024

philippjfr commented Apr 23, 2024

droumis commented Apr 23, 2024 •

edited

Loading

philippjfr commented Apr 24, 2024

droumis commented Apr 24, 2024

droumis commented May 23, 2024

Optimize rasterization of categorical NdOverlays #6206

Are you sure you want to change the base?

Optimize rasterization of categorical NdOverlays #6206

Conversation

philippjfr commented Apr 23, 2024

codecov bot commented Apr 23, 2024 • edited Loading

Codecov Report

jbednar commented Apr 23, 2024

philippjfr commented Apr 23, 2024

droumis commented Apr 23, 2024 • edited Loading

Before

After

philippjfr commented Apr 24, 2024

droumis commented Apr 24, 2024

droumis commented May 23, 2024

codecov bot commented Apr 23, 2024 •

edited

Loading

droumis commented Apr 23, 2024 •

edited

Loading