to_dataset_dict() - raises exception when trying to load Zarr files from object store #426
Replies: 6 comments
-
The File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/zarr/storage.py", line 2501, in __init__
meta = json_loads(store[metadata_key])
File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/zarr/storage.py", line 791, in __getitem__
raise KeyError(key)
KeyError: '.zmetadata' Can you confirm that the following works for this particular zarr store: import xarray as xr
import fsspec
zarr_path = 'http://cmip6-zarr-o.s3.jc.rl.ac.uk/CMIP6.DCPP.IPSL.IPSL-CM6A-LR/dcppC-ipv-NexTrop-pos.r1i1p1f1.Amon.tauu.gr.v20190110.zarr'
fsmap = fsspec.get_mapper(zarr_path)
ds = xr.open_zarr(fsmap, consolidated=True, use_cftime=True) Also, does this work ?? col.to_dataset_dict(zarr_kwargs={"consolidated": False, "use_cftime": True}) |
Beta Was this translation helpful? Give feedback.
-
@andersy005 Thanks for your response. Here are some responses:
yields:
It returns that it cannot find One of my colleagues mentioned that he is using some settings for accessing our Zarr store. Might any of these be required in the
|
Beta Was this translation helpful? Give feedback.
-
@andersy005 : following my last point, I get the same result if I try:
|
Beta Was this translation helpful? Give feedback.
-
@agstephens, are these settings being used when you run: fsmap = fsspec.get_mapper(zarr_path)
ds = xr.open_zarr(fsmap, consolidated=True, use_cftime=True) or are xarray/fsspec/zarr able to read your zarr store without any additional settings? |
Beta Was this translation helpful? Give feedback.
-
Hi @andersy005. No, those settings are not used when I run
No extra info is needed. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the clarification.... I tried creating a minimal example to see if there's any bug in intake-esm, but I'm having trouble reproducing the issue: In [1]: import intake
In [2]: col = intake.open_esm_datastore("test.json")
In [3]: col.df
Out[3]:
path variable long_name
0 http://localhost:8000/test.zarr air 4xDaily Air temperature at sigma level 995
1 http://localhost:8000/test.zarr Tair test
In [4]: col.to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
'path.variable.long_name'
Out[4]: ██████████████████████████████████████████████████████████████████████████████████████████████| 100.00% [2/2 00:00<00:00]
{'http://localhost:8000/test.zarr.air.4xDaily Air temperature at sigma level 995': <xarray.Dataset>
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float32 dask.array<chunksize=(730, 13, 27), meta=np.ndarray>
Attributes:
Conventions: COARDS
description: Data is from NMC initialized reanalysis\n(4x/day...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.n...
title: 4x daily NMC reanalysis (1948)
intake_esm_varname: None
intake_esm_dataset_key: http://localhost:8000/test.zarr.air.4xDaily Air ...,
'http://localhost:8000/test.zarr.Tair.test': <xarray.Dataset>
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float32 dask.array<chunksize=(730, 13, 27), meta=np.ndarray>
Attributes:
Conventions: COARDS
description: Data is from NMC initialized reanalysis\n(4x/day...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.n...
title: 4x daily NMC reanalysis (1948)
intake_esm_varname: None
intake_esm_dataset_key: http://localhost:8000/test.zarr.Tair.test} As you can see everything is working... What's the output of import intake_esm
intake_esm.show_versions() I'm running out of ideas about how to diagnose the issue :) |
Beta Was this translation helpful? Give feedback.
-
I have just built an intake-esm catalog for our CMIP6 Zarr holdings in our own in-house object store on JASMIN (in the UK).
intake-esm is working well in terms of searching/filtering, but when I call
col.to_dataset_dict(...)
I get an error. The only difference I can see between my own catalog and other online examples is that:I have tested that the zarr files can be loaded properly with xarray and zarr (without using intake) and this works fine:
An example zarr_path is:
http://cmip6-zarr-o.s3.jc.rl.ac.uk/CMIP6.DCPP.IPSL.IPSL-CM6A-LR/dcppC-ipv-pos.r1i1p1f1.Amon.rsus.gr.v20190110.zarr
In my catalog, I specify the asset as:
However, when I use
col.to_dataset_dict(zarr_kwargs={"consolidated": True, "use_cftime": True})
I get the following error:Do you have any tips on what I might need to do to get this working? Thanks
Beta Was this translation helpful? Give feedback.
All reactions