-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slicing multiple data layers or band channels with different spatial resolutions #93
Comments
I just wanted to throw out Satpy's Basically Satpy uses the pyresample library's This logic makes sense for most satellite-base sensors we work with (GOES-R ABI, VIIRS, etc). Other sensors like MODIS don't follow this nice even alignment of the pixels, but that is typically ignored for most of the imagery work we do: |
This is not an area I have all that much experience in but after carefully reading this issue, I have a straw man proposal to offer. Before I get to my proposal, I'll offer a few bits of perspective that may be useful to ground the discussion here.
My thought is that to build a new batch generator class that works with DataTree objects but uses one part of the datatree as its reference for batching the others. I think could be quite similar to the concept @djhoese introduced with Satpy's Imagine if you had a class like this class BboxBatchGenerator:
def __init__(self, dt: DataTree, ref: str, input_dims, ...):
...
def __iter__(self) -> Iterator[dt.Datatree]:
...
gen = BboxBatchGenerator(dt, ref='10m') Under the hood, you may imagine using the existing Xbatcher generator for the reference dataset (e.g. Underlying my proposal here is that I think we should be open to developing additional generators in Xbatcher. Today we have just |
Just noting that there's a PR on upstreaming
The From a developer/maintainer perspective, should this new |
Thanks @maxrjones for pointing me to that satpy <-> datatree integration thread at pytroll/satpy#2352 started by @djhoese. If that design proposal goes ahead, it sounds like we might be able to let I'll have some time this week to experiment on how xbatcher might work with datatree, will open a PR once I get something working. |
@weiji14 have you arrived at some functional approach yet? Did datatree help you here? For my use case, I am willing to resample all datasets to a resolution with a common divisor so that we can implement simpler/faster index logic with .isel instead of .sel (which would be much slower afaik). |
Since datatree will be upstreamed to |
Awesome, I'll take a look! Being able to support multiple resolutions both in space and time with this datatree approach is likely going to be the approach we take, but I have found the for loop/generator approach in xbatcher has been a bit slow and before I have actually relied on numpys as_strided using a map_block approach. However, chunks are likely not going to spatially align between different resolution datasets, so I'll have to think that through and implement a map_slices on coords approach. |
Is your feature request related to a problem?
To enable chipping/batching datasets with different spatial resolutions, each dataset (either an
xarray.DataArray
orxarray.Dataset
) currently needs to be sliced separately in xbatcher v0.1.0. The key limitation is that xbatcher assumes everyxarray.DataArray
'layer' to have the same resolution, andxbatcher.BatchGenerator
would use xarray's.isel
method to index and slice along the specified dimensions.xbatcher/xbatcher/generators.py
Lines 41 to 43 in 72ce00f
However, this is not always the case, for example:
Describe the solution you'd like
Ideally, there would be:
datatree
might be able to handle, i.e. have each data layer with a different resolution be on a separate node of the datatree.xbatcher
would then need to have a way of slicing these multi-resolution datasets. MaybeDataTree.isel
could work?Describe alternatives you've considered
Keep
xbatcher
to be focused onxarray.DataArray
andxarray.Dataset
only (and not bring inxarray.DataTree
). Users would then need to implement their own way of slicing multi-resolution datasets themselves in an ad-hoc way.Additional context
There was some discussion before at microsoft/torchgeo#279 about sampling in pixel/image units or coordinate reference system (CRS) units. If working with multi-resolution datasets though, sampling in pixel/images would require some math (e.g. 20 pixels for a 500m resolution grid would be 10 pixels for a 1000m resolution grid). The CRS based indexing method however, would require something like https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.raster_dataset.RasterDataset.clip_box.
Other references:
The text was updated successfully, but these errors were encountered: