You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
that would be similar to the "file" source in making use of random access but use Python's file-like interface (so perhaps "file-like" would be another name) and thus add support for fsspec's numerous backends to earthkit for free.
This new source should also support loading large GRIB datasets without reading the entire file. Ideally, loading the GRIB file into xarray would only read as little data as possible and defer any data reads until the user specifically asks for the data (similar to how NetCDF and Zarr support lazy-loading).
(inspired by ecmwf/cfgrib#326 (comment)) provides the closest current solution but treats the file pessimistically as only a stream and not as a random-access file, which results in excessive reads.
Additional context
I am working in an extremely memory-constrained environment and would like to support opening remote GRIB files (in addition to NetCDF and Zarr datasets which already work).
Organisation
University of Helsinki, ESiWACE3 project
The text was updated successfully, but these errors were encountered:
Just a remark. If you want to convert GRIB to xarray first you need to scan the whole file/files (all the messages) for metadata. So this is very much different to use case of NetCDF and zarr where this information is available "instantly".
Just a remark. If you want to convert GRIB to xarray first you need to scan the whole file/files (all the messages) for metadata. So this is very much different to use case of NetCDF and zarr where this information is available "instantly".
I didn’t know that, is it related to GRIB’s format? In that case, would the index files help? If so, would it be possible to check if a pre-generated index file is available as well (e.g. a file-like object passed alongside or a relative fsspec uri) and to use that to skip the initial full-file scan?
In any case, it would be important that once the metadata has been extracted, the actual data is not kept in memory until requested by the user (so slices would still only be lazily loaded in).
Is your feature request related to a problem? Please describe.
I haven't yet found a good way to open large (exceeds RAM) remote (not on my local file system) GRIB files in xarray.
Describe the solution you'd like
A new source would be added, e.g.
that would be similar to the "file" source in making use of random access but use Python's file-like interface (so perhaps "file-like" would be another name) and thus add support for fsspec's numerous backends to earthkit for free.
This new source should also support loading large GRIB datasets without reading the entire file. Ideally, loading the GRIB file into xarray would only read as little data as possible and defer any data reads until the user specifically asks for the data (similar to how NetCDF and Zarr support lazy-loading).
Describe alternatives you've considered
(inspired by ecmwf/cfgrib#326 (comment)) provides the closest current solution but treats the file pessimistically as only a stream and not as a random-access file, which results in excessive reads.
Additional context
I am working in an extremely memory-constrained environment and would like to support opening remote GRIB files (in addition to NetCDF and Zarr datasets which already work).
Organisation
University of Helsinki, ESiWACE3 project
The text was updated successfully, but these errors were encountered: