-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading multiple ICESat-2 ATL11 point cloud data nicely via Zarr #100
Comments
Putting down some notes on a potential
Just some things to play with once I get some free time 🙂 |
Awesome! Regarding the |
We're actually working on some benchmarks over in that repo (e.g. ICESAT-2HackWeek/h5cloud#9), and the |
Gathering some notes on how best to read multiple ICESat-2 ATL11 data (basically a point cloud) in a user friendly way, with metadata preserved!
TLDR: Be able to do
xr.open_mfdataset("ATL11_*.h5", engine="zarr", ...)
.Inspired by the blog post "Cloud-Performant NetCDF4/HDF5 Reading with the Zarr Library". Zarr is an amazing project, and I really like the
.zmetadata
json file which can be opened with a text editor and tell you stuff about the data. The dream would be to read HDF5 files in an out-of-core manner with Zarr like speed/abilities (through the.zmetadata
pointer).Jupyter notebook demo can be found at https://github.com/rsignell-usgs/hurricane-ike-water-levels/blob/master/coawst_3ways.ipynb. See also discussion thread at zarr-developers/zarr-python#535 on "Using the Zarr library to read HDF5".
Main hurdles to get through, dependent on upstream, there's two 'separate' parts:
chunk_store
argument to use Zarr to read HDF5 - wait for Allow chunk_store argument when opening Zarr datasets pydata/xarray#3804xr.open_mfdataset
- wait for Xarray open_mfdataset with engine Zarr pydata/xarray#4187 / xarray.open_mzar: open multiple zarr files (in parallel) pydata/xarray#4003intake.open_ndzarr
will break with the above ☝️ - wait for xarray.open_zarr to be deprecated intake/intake-xarray#70Current situation in that I do HDF5 -> Zarr conversion, and read from that. It would be nice to stick to the original HDF5 data source (though I might need to flatten the nested ICESat-2 ATL11 data structure). Note that I'm not necessarily after raw speed, I just prefer readability (i.e. having xarray's wonderful annotated metadata).
Other open Issues/Pull Requests:
Blog posts:
You can tell I had way too many tabs open on my browser 😆
The text was updated successfully, but these errors were encountered: