Large Scale Geospatial Benchmarks #1545

jrbourbeau · 2024-09-09T17:47:45Z

jrbourbeau
Sep 9, 2024
Maintainer

People love the Xarray/Dask/... software stack for geospatial workloads, but only up to about the terabyte scale. At the terabyte scale this stack can struggle, requiring expertise to work well and frustrating users and developers alike.

To address this, we want to build a large-scale geospatial benchmark suite of end-to-end examples to ensure that these tools operate smoothly up to the 100-TB scale.

We want your help to build a catalog of large scale, end-to-end, representative benchmarks. What does this help look like? We can use:

Ideas of what are the most common workflows in this space like:
"People often need to take netCDF files that are generated hourly, rechunk them to be arranged spatially, and then update that dataset every day".
Real datasets to work on. We want to work with real data, not fake data.
Real code that does it. We don’t know the space well enough to write like a user here.

This is a big ask, we know, but we hope that if a few people can contribute something meaningfully then we’ll be able to contribute code changes that accelerate those workflows (and others) considerably.

We’d welcome contributions as comments on this issue.

bahaugen · 2024-09-10T18:58:27Z

bahaugen
Sep 10, 2024

I am not sure what would be considered in scope for geospatial workloads but I find that large-scale spatial joins are some of the most unruly computations in my experience. I see all sorts of blogs/benchmarks/demos etc. using the NYC taxi dataset and the NYC neighborhoods as an example of scaling to large geospatial datasets. I find this example interesting but not helpful in many situations. This particular problem gets much harder when both sides of your join are large because you can no longer broadcast your smaller dataset to all of the workers to run several smaller joins on each worker.

I would propose a spatial join with relatively large datasets on both sides of the join. Bonus points for doing polygons (and not just points) on each side of the join. I would certainly suggest one of the open source building footprint datasets as these tend to be pretty large datasets of polygons. I am sure there are tons of point datasets you could join with this to make an interesting benchmark but I would try to find another polygon-based dataset to join with.

Possible polygon based datasets to join to the a building footprint dataset
-Census Blocks - This would be useful for assigning buildings to census blocks
-Property Parcel Data - This would be useful for assigning buildings to each of the parcels. The issue here is that I am not aware of any large-scale, open source parcel datasets
-Another building footprint dataset - Albeit more contrived, it would be an interesting computation from a validation/comparison perspective if you were the developer of one of the building footprint segmentation models or doing comparison as an end user.
-Satellite or Aerial Imagery - If you have a dataset with outlines of all of the images in the EO dataset you could join this with the building footprint datasets to figure out which images contain each of the houses.

If you want to make it even more challenging, modifying this to be the closely related but much harder nearest-neigbor join on two large datasets would be a true stress test.

I am not sure how common these problems are in the "wild" but I have come across them multiple times in my career. I know this probably doesn't get to your multi-terabyte threshold but I have found that the scalability of these types of joins gets painful even before you get to terabytes. Also, if I am missing some trick to just makes these computations "just work" I'd love to hear about it.

Feel free to reach out if you want to chat about the benchmark or about my experiences with these types of computations.

0 replies

upbram · 2024-09-10T20:28:57Z

upbram
Sep 10, 2024

Based on my experience, every dataset has its own specific patterns, and attempting to use real-time data for benchmarking is often limited to the moment it's collected. Data evolves constantly, and there is no one-size-fits-all solution for benchmarking across different real-time datasets. Customers tend to have unique data characteristics that vary over time, making it challenging to rely on a single dataset for accurate benchmarking in the long term. This dynamic nature of data requires customized, adaptive approaches for each situation rather than static benchmarks based on some real-time data

0 replies

mrocklin · 2024-09-10T21:22:35Z

mrocklin
Sep 10, 2024
Maintainer

Thanks @bahaugen ! Large scale spatial joins are a fun problem (I actually got this working in an early version of dask-geopandas. It was neat!)

I think I should ask the following two questions before we invest much time here:

Is this a common problem? (my gut reaction is that actually "no", it's fairly uncommon (but quite painful when it does occur))
Does it occur in a good end-to-end problem and dataset that we have access to? (we don't have the bandwidth or expertise to hunt these down, and are looking to the community for help here)

I would be excited if the answer to both questions is "yes!" because I think this would be a fun thing to work on.

3 replies

bahaugen Sep 11, 2024

That makes sense to me.

I can say that I have done most of the computations I suggested in my career but I am also fully aware that my experience is likely not universal. It would be helpful to hear from others to see if this is, in fact, a common use case.
I think most of the datasets I suggested with the exception of property parcel data is open source and I'd be happy to hunt down links and datasets if there is interest.

As far as end-to-end problem, I am not sure how far you are interested in going but I think reading in the two datasets and producing a joined table is often a standalone stage in a larger pipeline. I have also found these computations to be problematic enough that I tend to put them in their own script/stage of the pipeline because I often have to tweak them and rerun them regularly because of how often they fail.

My experiences are largely as a data engineer preparing datasets for data scientists to use in their modeling efforts. This means I am often trying to find attributes from one data set that apply to another dataset. The satellite/aerial imagery example would be used to find the appropriate images to crop a building from a larger image for use in some sort of image model.

maxrjones Sep 18, 2024

I actually think this is a common problem and will become increasingly so due to the growth of climate analytics requirements (e.g., what does this raster climate model mean for my specific polygonal area of interest). https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715 is an open example of this type of problem. We used that approach in https://github.com/carbonplan/extreme-heat, which should also have completely open input data.

alxmrs Sep 27, 2024

+1 to a benchmark for geospatial joins. I'd like to offer my 2¢:

I think this is a really common problem. For example, folks often need to join elevation maps (rasters) to tabular data in order to make inferences. Further, in marketing analytics, it's really common for folks to "weatherize" their forecasts (e.g. did our marketing campaign really sell more ice cream, or was it just a hot day?).
We could use https://github.com/google-research/arco-era5 as the accessible, weather dataset and choose a problem that's a good fit. Here's a good candidate problem: I have a dataset of potential data centers that I want to build. I need to calculate a (Gaussian) distribution of ranges of temperature and humidity for the locations of each of the buildings in order to determine tolerances for extreme weather events. For example, can these data centers withstand extreme high temperatures? How could I calculate this from the history of the weather?
I've started to investigate these problems in this repo: https://github.com/alxmrs/xarray-sql

mrocklin · 2024-09-10T21:24:33Z

mrocklin
Sep 10, 2024
Maintainer

Thanks @upbram ! I agree 100% with you. The devil is always in the details.

That being said, lots of problems do look pretty similar, and assuming that we continue to build general purpose software that isn't tailor made to very specific problems, I have moderately high confidence that work done to optimize a broad set of benchmarks will result in software that has improved performance on novel problems. This has certainly been our experience with other similar efforts we've done, like the TPC-H effort with dask dataframes

0 replies

Kirill888 · 2024-09-11T00:03:20Z

Kirill888
Sep 11, 2024

A while back I did an investigation of the low-level libraries often used in geospatial python workflows. We were just starting to move to S3 and there wasn't a lot of understanding of load-time performance in the cloud at the time. While I have not tried to run it for a long while now, it might still be useful, if just for historic context:

https://github.com/opendatacube/benchmark-rio-s3/blob/master/report.md

@mrocklin if there is interest from Coiled I would love to chat on the topic of geospatial Dask, and our experience of using Dask in the cloud for large raster-based geospatial workloads.

1 reply

jrbourbeau Sep 11, 2024
Maintainer Author

Thanks @Kirill888, yeah it'd be great to chat about your experience here. I sent you an email to find a good time.

jrbourbeau · 2024-09-11T13:56:17Z

jrbourbeau
Sep 11, 2024
Maintainer Author

Thanks all who have engaged here so far. I'm going to convert this issue to a GitHub discussion so we can use threads to better handle multiple conversations going on.

0 replies

sheecegardezi · 2024-09-11T14:23:47Z

sheecegardezi
Sep 11, 2024

A test could be Upscaling or downscaling a datase at scale.

0 replies

guillaumeeb · 2024-09-11T15:21:30Z

guillaumeeb
Sep 11, 2024

I'd like to propose some benchmark based on global and high frequencies remote sensing observations, typically Landsat 8 or Sentinel 2 datasets. The big advantage is that you kind find them in most cloud platforms and several private datacenters.

I think some typical large scale benchmark using those would be to compute and reduce some indices (NDVI, NDWI, NDSI, etc.) on either or both a large temporal or spatial scale. This can typically be done to plot some statistics (take the mean, or just count) of the evolution of water/vegetation/snow over a given area on a given period.

You can start with a simple case, like ploting the evolution of water on a single Sentinel 2 tile by year/season over the 8 years of observations from the mission. This means about 400 observations, using two bands per index values, about 100GB of data.
It's then possible to scale spatially, on a big region of interest like a country or a mountain range. Typically about 20 or more tiles over the European Alps. TeraByte threshold complete. You can then try to go global ;).
It's also possible to complexify, trying to reproject all this data into a common CRS and create maps like density points (most snowy pixels over the years) that you can display in a notebook.
It's also possible to go towards @Kirill888 benchmark example, do this kind of computations on several small areas that needs the opening of several Sentinel tiles.

4 replies

jrbourbeau Sep 11, 2024
Maintainer Author

The big advantage is that you kind find them in most cloud platforms

That's great : )

I think some typical large scale benchmark using those would be to compute and reduce some indices (NDVI, NDWI, NDSI, etc.) on either or both a large temporal or spatial scale. This can typically be done to plot some statistics (take the mean, or just count) of the evolution of water/vegetation/snow over a given area on a given period.

That sounds familiar. My colleague has this notebook he was playing around with https://gist.github.com/phofl/e3c53106b0e092f64eb4c0b864e0e08c that looks at Sentinel 2 on Azure (Planetary Computer) and computes a moisture index. Is this the sort of thing you had in mind? Also, do you happen to have code you can share that does this?

guillaumeeb Sep 11, 2024

My colleague has this notebook he was playing around with https://gist.github.com/phofl/e3c53106b0e092f64eb4c0b864e0e08c that looks at Sentinel 2 on Azure (Planetary Computer) and computes a moisture index. Is this the sort of thing you had in mind?

That's definitly it! A good example with NDMI over Germany if I read correctly. Just wondering why the 12 bands are selected.

There is a very nice Pangeo example written a few years ago by @scottyhq if I'm not mistaken: https://github.com/pangeo-data/landsat-8-tutorial-gallery/blob/master/landsat8.ipynb. We had translated an older version at CNES to work on Sentinel 2 data, I can probably find it back, but it really is the same. This one is visualization oriented, but you can definitly change a few lines at the end to compute statistics over all the dataset.

Also, maybe you should advertise this initiative on Pangeo or even Dask Discourse?

jrbourbeau Sep 11, 2024
Maintainer Author

Also, maybe you should advertise this initiative on Pangeo or even Dask Discourse?

You read my mind : ) https://discourse.pangeo.io/t/large-scale-geospatial-benchmarks/4498

KBodolai Sep 12, 2024

I agree that this would be a fantastic set of workflows to benchmark - there's a lot of dimensions we can benchmark on (reductions in time, in space, operations across bands...). In my experience, when dealing with time series over large areas things can get incredibly complicated and dask graphs can explode very quickly.

I'd love to contribute to this!

shoyer · 2024-09-11T22:11:26Z

shoyer
Sep 11, 2024

Climatology

A canonical workflow for weather/climate data is calculating climatology, i.e., average weather for a particular time of year/day, independently at each location.

We have code for calculating climatology using Apache Beam in WeatherBench2, which we have run on datasets up to at least ~100 TB scale using Google Cloud DataFlow: https://github.com/google-research/weatherbench2/blob/47d72575cf5e99383a09bed19ba989b718d5fe30/scripts/compute_climatology.py

The source dataset is described here. There are also larger variants up ~6 PB size in ARCO-ERA5 if you're looking for more of a challenge :).

This dataset has dozens of variables with dimensions {'time': 93544, 'latitude': 721, 'longitude': 1440, 'level': 13} stored with chunks={'time': 1, 'latitude': 721, 'longitude': 1440, 'level': 13}.

At a high-level, the climatology calculation in WeatherBench2 looks like this:

Load from Zarr
Slice in time (e.g., 1990 through 2019).
Rechunk so all times are in a single chunks ("pencil" format), e.g., chunks={'time': -1, 'longitude': 4, 'latitude': 4, 'level': 13}
Perform an embarassing parallel computation on each chunk (similar to ds.groupby(['time.dayofyear', 'time.hour']).mean()) that removes the 'time' dimensions and returns an object with new 'dayofyear' (size 366) and 'hour' (size 4) dimensions.
Rechunk back into "pancake" format, e.g., chunks={'dayofyear': 1, 'hour': 1, 'latitude': 721, 'longitude': 1440, 'level': 13}.
Write to Zarr

4 replies

dcherian Sep 11, 2024

Is the full rechunk necessary or would the proposed shuffle (pydata/xarray#9320) be a better model for what you need? I'm assuming the "mean" is just a strawman calculation, it should run just fine now.

shoyer Sep 11, 2024

In theory, I think a shuffle could indeed suffice here (for computing a rolling average), but then you would need to follow-up with a groupby-mean to compute the climatology.

In practice, the full rechunk is much more convenient/powerful, especially because it allows for computing arbitrarily complicated statistics (e.g., exact quantiles).

jrbourbeau Sep 13, 2024
Maintainer Author

Thanks @shoyer -- these workflows are great. A couple of questions come to mind initially:

Should we consider rechunking as it's own workflow? Here there's an on-demand rechunking, but I've also heard of folks maintaining two versions of their data (one time optimized and one space optimized). Also, TIL about the "pencil" and "pancake" terms for this : )
Should we be using Zarr as the input data format? I know this is what people would like, but I've run into lots of folks who are stuck with legacy netCDF data. I'm wondering if netCDF --> Computation --> Zarr is more representative than Zarr --> Computation --> Zarr. IIRC I think weathbench2 has some netCDF dataset too.

shoyer Sep 13, 2024

Yes, rechunking is also a worthy problem in its own right. If we wanted to experiment with methods for computing climatology then storing a separate copy would certainly be the right choice. The "pencil/pancake" terminology is borrowed from @rabernat's :)
We always convert from netCDF -> Zarr as a pre-processing step, though there are also tools like @TomNicholas VirtualiZarr that allow for accessing legacy files as Zarr. You might add a benchmark on Zarr ingestion, but ingestion is typically 100% embarassingly parallel, so it's just a simple scaling test. WeatherBench2 does have some netCDF files but are they are all cases of very small data where there was no need for chunking. (I generally only recommend Zarr if you hundreds of MB or more).

shoyer · 2024-09-11T22:23:13Z

shoyer
Sep 11, 2024

Regridding

Another canonical workflow for geospatial data is regridding/re-projection. We also have code for this in WeatherBench2, which we've run up to PB scale: https://github.com/google-research/weatherbench2/blob/47d72575cf5e99383a09bed19ba989b718d5fe30/scripts/regrid.py#L140

The source data here is similar to the data use for the climatology calculation, but the calculation itself is a little simpler:

Load from Zarr in native chunks {'time': 1}
Regrid each chunk, typically to a coarser resolution (e.g., 1440x721 -> 240x121)
Rechunk into larger chunks in size (e.g., {'time': 36}), in order to keep the chunk size roughly constant
Write to Zarr

In a more advanced version, we simultaneously repeat steps 2-4 for a variety of desired output resolutions. This allows us to only load data once, which results in significant cost savings.

11 replies

hendrikmakait Sep 20, 2024
Maintainer

That would be perfect. I think @jrbourbeau plans to attend, so he can weigh in on the discussion, and I'll catch up with him afterward.

dcherian Sep 25, 2024

xesmf is definitely the most common solution in the climate space today. @aulemahal might be able to help here.

@slevang and @BSchilperoort could contribute a xarray-regrid example. xref xarray-contrib/xarray-regrid#42

slevang Sep 25, 2024

Nice I hadn't seen the jax solution from weatherbench, curious to compare that to xarray-regrid.

hendrikmakait Oct 9, 2024
Maintainer

I've implemented a benchmark using xESMF and would appreciate some feedback (or a sanity check): #1557

maxrjones Nov 13, 2024

Kudos to the coiled team for publishing the first benchmarks and the great blog post! I wanted to share a reference to carbonplan/ndpyramid#150 which is a specific recent example of regridding challenges. It differs slightly than previous examples by operating on a DataTree rather than Dataset level, which adds some complexity and makes it even more likely for the task graph to blow up.

shoyer · 2024-09-11T22:43:38Z

shoyer
Sep 11, 2024

Forecast evaluation

Yet another canonical workflow for weather/climate data is forecast evaluation, e.g., this code in WeatherBench2.

Weather forecasts and ground-truth datasets typically have three "time" dimensions:

"time" or "valid time": The time for which the forecast is made, for which it applies, e.g., "2024-09-21" (10 days from now)
"initialization time" or "forecast reference time": The time at which the forecast is made, i.e., when the forecast model is run, e.g., "2024-09-11" (today)
"lead time" or "forecast_period": The difference between time and initialization time, i.e., time - init_time, e.g., "+10 days".

Forecast datasets have dimensions ("valid_time", "lead_time", ...) whereas validation datasets have dimensions ("time", ...). Forecasts are typically larger than validation datasets, because many forecasts of the same day's weather are made repeatedly in the preceding weeks.

The workflow then looks like:

Load forecast data from Zarr in native chunks (typically `{"valid_time": 1, "lead_time': 1})
For each valid/lead time, load the corresponding validation data, via the formula time = valid_time + lead_time
Compute error for each valid/init time using a variety of metrics, e.g., mean absolute error abs(forecast - truth)
Average over horizontal spatial dimensions (latitude/longitude) weighted by cos(latitude), and also average over all initialization times.
Write the result to a single netCDF file with dimensions ("lead_time", "level") and an assortment of variables.

Alternatively, omit averaging over either spatial or temporal dimensions in step (4), and write the result to Zarr instead.

0 replies

dcherian · 2024-09-17T23:06:12Z

dcherian
Sep 17, 2024

Transformed Eulerian Mean

A standard atmospheric circulation diagnostic:

import xarray as xr

ds = xr.open_zarr(
    "gs://weatherbench2/datasets/era5/1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr",
    chunks={},
)

ds = ds[
    ["u_component_of_wind", "v_component_of_wind", "temperature", "vertical_velocity"]
].rename(
    {
        "u_component_of_wind": "U",
        "v_component_of_wind": "V",
        "temperature": "T",
        "vertical_velocity": "W",
    }
)

zonal_means = ds.mean("longitude")
anomaly = ds - zonal_means

anomaly["uv"] = anomaly.U * anomaly.V
anomaly["vt"] = anomaly.V * anomaly.T
anomaly["uw"] = anomaly.U * anomaly.W

temdiags = zonal_means.merge(anomaly[["uv", "vt", "uw"]].mean("longitude"))

# This is incredibly slow, takes a while for flox to construct the graph
# daily = temdiags.resample(time="D").mean()

# Option 2: rechunk to make it a blockwise problem
# we should do this automatically
from xarray.groupers import TimeResampler
daily = temdiags.chunk(time=TimeResampler("D")).resample(time="D").mean()

daily.to_zarr(SOMEWHERE)

There is a regridding step in the middle that I skipped, will update when I am able to confirm some details of that step.

1 reply

jrbourbeau Sep 20, 2024
Maintainer Author

Thanks @dcherian -- xref #1554

maxrjones · 2024-09-18T13:43:48Z

maxrjones
Sep 18, 2024

Vectorized functions (e.g., `xarray.apply_ufunc`)

This may overlap some previous examples, but https://github.com/carbonplan/extreme-heat/blob/main/notebooks/02_generate.ipynb is an example of using xarray.apply_ufunc to calculate a derived quantity (in that case elevated adjusted pressure) for every grid cell in a raster dataset. Note at the end of the notebook there is a for loop that processes each year independently, which was done to avoid issues with scaling to the full dataset if I recall correctly. So I think this could illuminate some places for improvement. It also is done on input NetCDF rather than Zarr, which reflects a common use case when people cannot/chose not to cloud-optimize the data. https://github.com/carbonplan/kerchunk-NEX-GDDP-CMIP6 shows part of this workflow using virtual kerchunk references rather than only open_mfdataset.

1 reply

jrbourbeau Sep 19, 2024
Maintainer Author

Btw, for those who are interested, here's a benchmark that converts NetCDF files --> Zarr with a rechunk in between, which comes up often #1551

schroederdewitt · 2024-10-22T16:16:06Z

schroederdewitt
Oct 22, 2024

I would love to see more discussion of memmap-based solutions (including approaches built on top of these) - any geospatial dataset that can be written in a single memmap on SSD will get an order of magnitude speedup for slicing etc access as memmaps avoid various file-related kernel calls. See also our AAAI paper on global weather forecasting

https://ojs.aaai.org/index.php/AAAI/article/view/17749/17556

0 replies

lgray · 2024-11-12T16:08:01Z

lgray
Nov 12, 2024

Greetings from particle physics! We face related problems to this workflow in our datasets (which are similarly large except we don't do joins yet, but rather have complex task graphs atop large amounts of data). I would love to be able to collaborate with y'all towards scaling to ~billions of tasks in complex task graphs well!

@jpivarski @ikrommyd @pfackeldey @martindurant

11 replies

martindurant Nov 13, 2024

The thread here was in response to one of the problems specifically mentioned in your blog and this discussion: large graph sizes (in terms of layers and/or tasks). This is what the HEP case has been struggling with for some time. The issue,I think on distributed, where we tried to start a conversation around this stalled, so we are here to say "us too" and maybe we can help one-another.

mrocklin Nov 13, 2024
Maintainer

Understood that large graph sizes are frustrating to you all as well. As I mentioned in the Dask issue in your position I would recommend doing more profiling to more strongly identify what is causing your workloads to be slow. This stuff is often-as-not in the downstream library.

Regardless, I'd still like to keep this topic focused on large scale geospatial benchmarks. Sorry if upstream issues are stalling; that's likely a symptom of us not having enough resources to do work on all the problems people have. They're valid problems, we're just not capable of solving all of them simultaneously, and so we focus on the ones that allow us to continue to pay people and increase staff. Right now the people at Coiled are prioritizing on geospatial workloads for the reasons mentioned above. If that generates benefits for HEP then that's great! If there is some way to motivate Coiled people to work on HEP workloads then that's great too!

slevang Nov 13, 2024

Large graphs are certainly an issue for geospatial as well. See this for example. Just trying to open that zarr store, without any modifications to chunk size, gives you a 2.5GB graph.

mrocklin Nov 13, 2024
Maintainer

Yup. We're very aware of large graph issues and working to fix them in geospatial workloads.

Sometimes these fixes are systemic and affect all of Dask, but this is relatively rare. Instead, it's more common that whatever code created the graph (like code in xarray or dask-awkward) needs to be updated to make slimmer graphs. There are several open_zarr calls in the current benchmark suite, so I'm pretty confident that we're sensitive to the issue that you've raised @slevang . Thanks for being thorough though!

lgray Nov 13, 2024

Large graphs are certainly an issue for geospatial as well. See this for example. Just trying to open that zarr store, without any modifications to chunk size, gives you a 2.5GB graph.

While I haven't managed to produce a materialized "read-only" graph quite that intense, we do share similar low level issues!

Perhaps what we can do is try to do some pattern matching between HEP use cases and Geospatial use cases and identify the geospatial workloads most similar to ours. Optimization efforts will proceed more easily with big open datasets and wider community interest. At least I would be interested to pitch in and help out as possible, perhaps other physicists as well.

As we get bigger open datasets for particle physics the road can become a bit more bidirectional as well? Time will tell.

I think the only major difference in technology stack comes down to dask-awkward vs. whatever is being used by the geospatial folks, and we're really interested in scaling workflow topologies and task sizes which is a few degrees abstracted from that.

Large Scale Geospatial Benchmarks #1545

jrbourbeau Sep 9, 2024 Maintainer

Replies: 15 comments · 36 replies

mrocklin Sep 10, 2024 Maintainer

mrocklin Sep 10, 2024 Maintainer

jrbourbeau Sep 11, 2024 Maintainer Author

jrbourbeau Sep 11, 2024 Maintainer Author

jrbourbeau Sep 11, 2024 Maintainer Author

jrbourbeau Sep 11, 2024 Maintainer Author

Climatology

jrbourbeau Sep 13, 2024 Maintainer Author

Regridding

hendrikmakait Sep 20, 2024 Maintainer

hendrikmakait Oct 9, 2024 Maintainer

Forecast evaluation

Transformed Eulerian Mean

jrbourbeau Sep 20, 2024 Maintainer Author

Vectorized functions (e.g., xarray.apply_ufunc)

jrbourbeau Sep 19, 2024 Maintainer Author

mrocklin Nov 13, 2024 Maintainer

mrocklin Nov 13, 2024 Maintainer

jrbourbeau
Sep 9, 2024
Maintainer

Replies: 15 comments 36 replies

mrocklin
Sep 10, 2024
Maintainer

mrocklin
Sep 10, 2024
Maintainer

jrbourbeau Sep 11, 2024
Maintainer Author

jrbourbeau
Sep 11, 2024
Maintainer Author

jrbourbeau Sep 11, 2024
Maintainer Author

jrbourbeau Sep 11, 2024
Maintainer Author

jrbourbeau Sep 13, 2024
Maintainer Author

hendrikmakait Sep 20, 2024
Maintainer

hendrikmakait Oct 9, 2024
Maintainer

jrbourbeau Sep 20, 2024
Maintainer Author

Vectorized functions (e.g., `xarray.apply_ufunc`)

jrbourbeau Sep 19, 2024
Maintainer Author

mrocklin Nov 13, 2024
Maintainer

mrocklin Nov 13, 2024
Maintainer