Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream modifications to cf_xarray #186

Open
dcherian opened this issue Apr 12, 2021 · 2 comments
Open

upstream modifications to cf_xarray #186

dcherian opened this issue Apr 12, 2021 · 2 comments
Labels
framework Issue pertains to the framework code

Comments

@dcherian
Copy link

dcherian commented Apr 12, 2021

Hi,

I am very excited to see MDTF use cf_xarray.

I think it would make sense to upstream many of the modifications you make in https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/src/xr_parser.py.

  1. We could definitely add .cf.calendar
  2. We could definitely add .cf.formula_terms and use it in our .cf.decode_vertical_coords()
  3. We could add .axes_vars, and .cf.coordinates_vars as Mapping[str, xr.DataArray] etc. to replace
    def axes(self, var_name=None, filter_set=None):
    """Override cf_xarray accessor behavior by having values of the 'axes'
    dict be the Dataset variables themselves, instead of their names.
  4. Is this one basically using X,Y,Z,T as synonyms for lat,lon,vertical,time when they are not set? It seems cleaner to set the axis attribute in that case when parsing metadata rather than monkey patching
    def patch_cf_xarray_accessor(mod):

    EDIT: We could even add this as an opt-in heuristic to .cf.guess_coord_axis()
  5. Happy to upstream some of the following logic so that it gets less confused :) We can only fix these when issues get reported...
    # case where var_name given:
    # do validation on cf_xarray.accessor's work, since it turns out it
    # can get confused on real-world data
    empty_keys = []
    delete_keys = []
    dims_list = list(axes_obj.dims)
    for k,v in vardict.items():
  6. Isn't the following ds.cf.axes?
    vardict = {
    key: cf_xarray.accessor.apply_mapper(
    cf_xarray.accessor._get_axis_coord, axes_obj, key, error=False
    ) for key in cf_xarray.accessor._AXIS_NAMES
    }

If there are other utility functions that could be upstream, we are definitely happy to discuss that too.

Do you have time to send a few PRs?

Deepak

cc @malmans2 @aulemahal

@jkrasting
Copy link
Collaborator

Thanks @dcherian. I'll check with the rest of the team.

@tsjackson-noaa
Copy link
Contributor

tsjackson-noaa commented Apr 13, 2021

Hi @dcherian, thanks for your interest! We've found cf_xarray to be an invaluable tool for working with climate data in practice, especially in our project's use case, where the data is coming from a third party and can be organized in a way we don't fully know ahead of time.

The top priority of our project is supporting the in-house needs of our funding institutions, with general support for all aspects of the CF standard secondary to that. As you saw in the code, this means that the customizations I made to the accessors are specialized to our use case (and/or hacky), but we'd be happy to contribute any aspects that cf_xarray could benefit from!

Responses to your remarks:

  • re: # 2: Agreed; it'd be straightforward to work through all the cases covered in the CF standard.
  • re # 3 and # 6: The reason for redefining axes() [and adding dim_axes()] is to make the behavior of those attributes closer to the class representing the package's internal metadata record, which loosely follows the cf-python package from NCAS CMS in the UK in that regard. (I decided against simply using cf-python since it's outside of the xarray ecosystem and tightly coupled to the netCDF libraries: we want to be able to support data in other formats in the future, such as zarr, etc.).
    The upshot is that this stuff is specific to our use case, and wouldn't benefit from being upstreamed.
  • re: # 4: you're right, monkey-patching is very hacky, and the net result is only to change what gets assigned to the axis attribute -- I think I decided that patching would be simpler than modifying behavior in the multiple places _get_axis_coord is called from.
    The reason for the modification is because we want different behavior than cf_xarray's policy on axis identification (discussed in your issue #23). Currently our input data is always on a lat-lon grid, so it makes sense for us to implement the opposite choice of what was made in that issue.
    I agree it would be nice to allow more user control over that behavior, since different defaults would make sense for different subfields (weather vs. climate, atmos vs. ocean, etc.) I'm not sure what the best way to pass that configuration would be, given the constraints of the xarray accessor API.
  • re: # 5: Well, now I wish my past self had written a better comment :). If I can find which data set produced that problem in my notebooks, I'll submit a MWE.

@wrongkindofdoctor wrongkindofdoctor added the framework Issue pertains to the framework code label Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework Issue pertains to the framework code
Projects
None yet
Development

No branches or pull requests

4 participants