-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize rasterization of categorical NdOverlays #6206
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6206 +/- ##
===========================================
+ Coverage 17.43% 88.38% +70.94%
===========================================
Files 323 323
Lines 67582 67627 +45
===========================================
+ Hits 11784 59771 +47987
+ Misses 55798 7856 -47942 ☔ View full report in Codecov by Sentry. |
Nice! Could consider applying the same optimization in the non-categorical case, but I guess that would need to be done by having Datashader accept a list or dict of lines, so that two-stage aggregators like |
Yes originally considered that but didn't want to handle parsing all the aggregators. Certainly count would be an easy one though. There is a point where the cost of concatenating amortizes though. |
I think I might be confused, but assuming the 'after' below is the proposed API, so far it saves a bit of time by not having to do a groupby, but the rasterizing appears to be slower compared to main. Beforecurve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
for key, value in df.groupby('parameter')}
overlay = hv.NdOverlay(curve_dict, kdims='parameter')
rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover']) Aftercurve_dict = {key: hv.Curve(value, kdims=['time'], vdims=['value', 'parameter'], label=key)
for key, value in df_dict.items()}
overlay = hv.NdOverlay(curve_dict, kdims='parameter')
rasterize(overlay, line_width=1, aggregator=ds.by('parameter', ds.count())).opts(
responsive=True, min_height=400, show_legend=True, cmap='glasbey', cnorm='eq_hist', colorbar=False, tools=['hover']) |
I don't think there's any change to the API needed or proposed here, but the performance profile of this change should be that on first render this should be quite a lot faster and on every subsequent render (e.g. zoom events) it'll be slightly slower. |
Got it. It would be nice if there was a convenient way to report timing of the initial and subsequent paint. Hopefully soon! While a dict of df's is so much cleaner.. I wonder if the cost of manually inserting nan-separators into a df will be fully amortized over just a couple interaction updates, when using |
This approach may not be feasible for dozens of lines. At some point the resulting imagestack (1 image per line) grows beyond what is reasonably performant |
Currently when we rasterize an
NdOverlay
we always concatenate all the data (and insert NaNs between each layer if we are rasterizing lines). This process of concatenation is very expensive and largely pointless if we are generating a categorical aggregate that is coincident with the dimension of the NdOverlay, i.e. each layer in the overlay is a distinct category.Therefore this PR adds an optimization for this specific codepath where instead of concatenating and rasterizing everything in a single datashader call, we keep each categories data as a distinct dataframe and make multiple calls to datashader, concatenating all the resulting arrays along the categorical dimension after the fact. This produces the same result without paying the cost.