-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a notebook on opening a remote zarr dataset #13
Conversation
@SF-N Opening a remote Zarr datasets (here hosted on S3 but we're accessing it via HTTP) now works! Do you have any suggestions as to what else should go into this notebook, aside form the standard titles and short explanations? |
Thanks a lot, this is great. |
So far I’m a bit sceptical of earthkit’s from source, as it seems to mostly rely on downloading the datasets. Obviously we cannot do this in the online lab (we have a few MB of cache there, so any GB or TB dataset must be accessed remotely). There are good existing solutions, some format-specific (e.g. for NetCDF), some more general (using fsspec with zarr or h5netcdf). I think we should support those general options first and perhaps you can suggest to the earthkit team to add fsspec as another source (which then brings additional sources like remote http or s3 in for free) / to add an already-opened xarray as a source. |
I've also made progress on loading the dataset with |
@SF-N I've also experimented with supporting GRIB at https://gist.github.com/juntyr/14b3f80c58a39624641f9021450e5f28, but it seem like I've opened ecmwf/earthkit-data#467 for this. |
I've given up a bit on native support for NetCDF through However, I've managed to get everything to work using https://gist.github.com/juntyr/23c2df3b3e20ac351591b99d70e19ca8 |
I've also tried to get it to work with GRIB files, but unfortunately merging different messages doesn't fully work yet in @SF-N perhaps you could reach out to the My not-fully working example (the |
The combined and documented notebook is now at https://gist.github.com/juntyr/a89175eb60a80150dc17bf553cd2e2d7. I'll wait for |
After many weeks of experiments in the background and a flurry of ideas and patches (climet-eu/lab@7846b57...086f285),
aiohttp
andfsspec
now work sufficiently well in the lab to support opening remote zarr datasets :DThis PR adds a short notebook showcasing this functionality with a ~32TB example dataset