Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intake-esm or intake #1

Open
koldunovn opened this issue Oct 12, 2023 · 4 comments
Open

intake-esm or intake #1

koldunovn opened this issue Oct 12, 2023 · 4 comments

Comments

@koldunovn
Copy link
Member

Hi @jonseddon (ping @wachsylon )

The catalog you created for jasmin is the intake-ESM one. Do you have some strings attached to this format, or you might consider to change to the format we use in nextGEMS, that I find more intuitive: https://github.com/nextGEMS/catalog

@wachsylon
Copy link
Contributor

I think both have their features and complement each other. ESM is rather for finding and browsing with metadata while the access via the normal intake is faster. Afaik, searching via intake (no esm) is like a google search but some users may not know how to search efficiently through the catalog. For these users, intake-esm is better.

I think providing both is also a good idea.

@jonseddon
Copy link
Contributor

@koldunovn I have no experience with either format, but in my initial tests then intake-ESM seemed to be much easier to use for me to catalogue the data as most of our data had been CMORised. If I used the other format then it appeared as if I was going to have to generate appropriate YAML files. Intake-ESM scans my directory structure for me.

However, it would be good to be consistent with others and so I don't have strings attached to this format.

What is the best way to create the YAML files for CMORised data?

@koldunovn
Copy link
Member Author

Hi @jonseddon We discussed it a bit on the meetings and I think so far we converging to having both at the same time and see how much mess it will create :) Anyway things like CMIP6 data on DKRZ are stored in intake-esm, so people should know how to use both - we will try to assist with that (maybe by providing converters). In my view simple intake is easy when there are only few experiments (as in our case), while intake-esm make sense when there is a lot of different experiments and models.

Regarding example YAML, it can be as simple as:

plugins:
  source:
  - module: intake_xarray
sources:
  2D_1h_0.25deg:
    args:
      urlpath:
      - /work/bm1344/AWI/Cycle3/FESOM/IFS_4.4-FESOM_5-cycle3/025/2D_1h_native/*/*.nc
    description: 2D_1h_0.25deg data
    driver: netcdf

In this case all netCDF files will be combined in one happy xarray. The best practice is not to mix different time frequencies.

@jonseddon
Copy link
Contributor

@koldunovn , great! This is all very new to me and so it will be really useful at the Hackathon to see how everyone uses Intake and to work on improved solutions during the event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants