Replies: 10 comments
-
It should be relatively easy to return the nested dictionary. A couple other ideas include enabling an |
Beta Was this translation helpful? Give feedback.
-
👍 |
Beta Was this translation helpful? Give feedback.
-
More thoughts: how would this work? Would what would the keys be? Would it just group by all columns? |
Beta Was this translation helpful? Give feedback.
-
It would return a dataset for each row in the database. We could form keys from the groupby applied to all columns, but maybe it would be more accessible if the key was just the index. What do you think? |
Beta Was this translation helpful? Give feedback.
-
What would intake-esm currently do if there were no |
Beta Was this translation helpful? Give feedback.
-
Answer: Raise That is NOT the right behavior. Aggregation should be totally 100% optional in these catalogs. |
Beta Was this translation helpful? Give feedback.
-
Agreed, that's a bug, but easy to fix. Without groups = self.df.groupby(self.df.columns.tolist()) and the returned keys will be of the same format. We can trigger the same behavior if |
Beta Was this translation helpful? Give feedback.
-
With #164 the following works: import intake
col_file = "https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json"
col = intake.open_esm_datastore(col_file)
query = dict(experiment_id='historical', table_id='Oyr',
variable_id='o2', grid_label='gn', member_id='r1i1p1f1')
cat = col.search(**query)
# Disable aggregations
dsets_pp = cat.to_dataset_dict(aggregate=False)
print(dsets_pp.keys()) --> The keys in the returned dictionary of datasets are constructed as follows:
'zstore'
--> There will be 2 group(s)
dict_keys(['gs://cmip6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Oyr/o2/gn/', 'gs://cmip6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Oyr/o2/gn/']) |
Beta Was this translation helpful? Give feedback.
-
@andersy005 - nice! However, I would prefer for the keys to be the groups, not the paths, as @matt-long suggested. Are the keys the datasets themselves? |
Beta Was this translation helpful? Give feedback.
-
Assuming that we have a row with the following attributes: activity_id AerChemMIP
institution_id BCC
source_id BCC-ESM1
experiment_id ssp370
member_id r1i1p1f1
table_id Amon
variable_id pr
grid_label gn
zstore gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1...
dcpp_init_year NaN
Name: 0, dtype: object
Should we have something along these lines? { 'AerChemMIP.BCC.BCC-ESM1.ssp370.r1i1p1f1.Amon.pr.gn.NaN' :
<xarray.Dataset>
Dimensions: (bnds: 2, lat: 64, lon: 128, time: 492)
Coordinates:
* lat (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
lat_bnds (lat, bnds) float64 dask.array<chunksize=(64, 2), meta=np.ndarray>
* lon (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
lon_bnds (lon, bnds) float64 dask.array<chunksize=(128, 2), meta=np.ndarray>
* time (time) object 2015-01-16 12:00:00 ... 2055-12-16 12:00:00
time_bnds (time, bnds) object dask.array<chunksize=(492, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
pr (time, lat, lon) float32 dask.array<chunksize=(492, 64, 128), meta=np.ndarray>
Attributes:
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
further_info_url: https://furtherinfo.es-doc.org/CMIP6.BCC.BCC-ESM1...
grid: T42 |
Beta Was this translation helpful? Give feedback.
-
I'm sitting with @naomi-henderson, and we are discussing how we might make intake-esm more transparent about what it's doing under the hood.
It would be nice if there were a mode where, rather than running the all the merge operations, intake returns a nested dictionary similar to the one I showed in my recursive merge demo
This would allow users to manually descend into the different individual datasets and examine them one a time, optionally applying their own merge logic.
This should be relatively easy, since intake-esm probably has an internal data structure like this already.
Beta Was this translation helpful? Give feedback.
All reactions