Using cf_xarray to infer attributes #193

rwegener2 · 2021-08-05T18:17:14Z

What was done

This PR adds the .cf.guess_coord_axis() and .cf.add_canonical_attributes() functions to the preprocessing component of the library. This will update the dataset with additional attributes in line with CF conventions that can be inferred by the cf_xarray library.

See: Issue #185

How it was done

Two lines were added to the rename_cmip6 function:

    ds = ds.cf.guess_coord_axis()
    ds = ds.cf.add_canonical_attributes()

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

Testing

I added an additional assertion to the test_rename_cmip6 function in test_preprocessing.py. The test is weak mostly because there isn't a lot that is guaranteed to be true after running the additional steps here. The one check I added (because I believe it should always be true) is that history is present in the global attributes. I could certainly be convinced that this in unnecessary and the test should be removed.

WIP

This is a Work In Progress because I actually can't get the test to pass (😱). When I use my local version of the library it works just fine so I'm not sure why the tests are working. I thought I'd still post this, though, since the necessity of the test is questionable, before spending oodles of time debugging it.

Seeing the test problem

Personally I ran my tests with python -m pytest . in my conda activated environment.
This PR has a slightly annoying number of print statements, which I left in to help demonstrate how it seems that the preprocessing file is correctly using the new lines of code but (at least for me) the tests aren't. I print the dataset keys in both the preprocessing file and the test file to show the error.

Code I use to manually test, since the regular test is failing

import numpy as np
import xarray as xr

from cmip6_preprocessing.preprocessing import rename_cmip6, cmip6_renaming_dict

def create_test_ds(xname, yname, zname, xlen, ylen, zlen):
    x = np.linspace(0, 359, xlen)
    y = np.linspace(-90, 89, ylen)
    z = np.linspace(0, 5000, zlen)

    data = np.random.rand(len(x), len(y), len(z))
    ds = xr.DataArray(data, coords=[(xname, x), (yname, y), (zname, z)]).to_dataset(name="test")
    ds.attrs["source_id"] = "test_id"
    # if x and y are not lon and lat, add lon and lat to make sure there are no conflicts
    lon = ds[xname] * xr.ones_like(ds[yname])
    lat = xr.ones_like(ds[xname]) * ds[yname]
    if xname != "lon" and yname != "lat":
        ds = ds.assign_coords(lon=lon, lat=lat)
    else:
        ds = ds.assign_coords(longitude=lon, latitude=lat)
        return ds


ds = create_test_ds('i', 'j', 'lev', 10, 5, 6)
ds_renamed = rename_cmip6(ds, cmip6_renaming_dict())
# See 'history' in global attributes of `ds_renamed` but not in `ds`

jbusecke · 2021-08-06T13:32:01Z

Thank you so much for adding this @rwegener2.

I think we need to clearly define what we want this implementation to do. Under which circumstances will a history attribute be generated or not? Maybe @dcherian can help there.

I think one major thing to check is if the axes are correctly detected. We 'know' the association between axes and dimensions roughly: x->X, y->Y, lev->Z. Can we write a test that checks something like this (pseudo-code):

def test_cf_axes(ds):
    ds_preprocessed = combined_preprocessind(ds)
    for dim, axis in [('x', 'X'), ('y', 'Y'), ('lev','Z'), ('time', 'T')]:
        if dim in ds_preprocessed.dims:
             #check that the axis is in the cf object
             axis in ds_preprocessed.cf.axes.keys()

You can create a bunch of test cases in the normal tests under https://github.com/jbusecke/cmip6_preprocessing/blob/master/tests/test_preprocessing.py, but I think this would be REALLY useful to check in the cloud tests , because that would enable us to actually see which models fail and why.

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

I think I would prefer a seperate function like cf_parsing or similar, just to keep things neat.

dcherian · 2021-08-06T14:58:40Z

Nice work @rwegener2 .

The history attribute is only added when .cf.add_canonical_attributes adds something.

I think you'll want to write a test like @jbusecke suggested. These are the attributes that .cf.guess_coord_axis adds:
https://github.com/xarray-contrib/cf-xarray/blob/e628079a7ce25c40a3d9fff974615c84ca1bd82d/cf_xarray/accessor.py#L58-L67

rwegener2 · 2021-09-09T15:05:50Z

Thanks for the feedback on this. I apologize for taking so insanely long to respond - I wasn't working on any development in August - but I am settled back in so I can be much more responsive going forward.

I updated the test_preprocessing.py test to use the axis -> dimension mapping rather than checking the history. I also cleaned up the code in preprocessing.py to use a separate _parse_cf() function.

As for the cloud test, I added the same chunk of testing code into the test_check_dim_coord_values_wo_intake function. This seems to be the right place, but I also didn't see anywhere in that testing script where the rename_cmip6() function gets run (which is where the _parse_cf() function gets called). @jbusecke let me know if this wasn't the right way to do that.

I'm sorry again for the long delay and let me know if there is anything else that should be changed.

dcherian · 2021-09-09T17:45:21Z

.cf.add_canonical_attributes

I think you want to avoid this. It adds units which could be totally wrong. You can skip that with skip="units" but I don't think it adds anything useful if you skip the units

rwegener2 · 2021-09-10T17:41:36Z

Thanks for the feedback @dcherian. I just removed that so that only .cf.guess_coord_axis() is used.

jbusecke · 2022-03-17T00:51:13Z

Hey everyone,

I am very sorry for dropping the ball here. I should have a bit more capacity to maintain cmip6_pp going forward. Is this still of interest? Would be happy to help to move this along.

rwegener2 · 2022-03-21T14:37:53Z

Hey @jbusecke!

If you think this is a helpful addition to the project I'm happy to put some energy to get it merged. I just pushed a superficial change to trigger the checks again because the previous failure logs got deleted with inactivity. I don't remember seeing much in the logs that made sense to me, but if you know what I should do to get the test to pass just let know. Thanks, Julius!

jbusecke · 2022-03-21T15:35:39Z

I think I will dedicate some time to cmip6_pp in 1-2 weeks. How about I ping this issue then and we can iterate on it? I still believe this would be a great addition.

rwegener2 · 2022-03-22T12:47:50Z

Sounds good, I'll keep an eye out!

adding cf convention helper functions

66abcbd

rwegener2 added 2 commits September 9, 2021 10:52

refactor cf convention test

0f4dab9

add cf renaming cloud test

22a305e

remove cf.add_canonical_attributes()

f5bc368

rwegener2 marked this pull request as ready for review September 10, 2021 17:41

superficial change to trigger checks

329f56e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using cf_xarray to infer attributes #193

Using cf_xarray to infer attributes #193

rwegener2 commented Aug 5, 2021 •

edited

Loading

jbusecke commented Aug 6, 2021

dcherian commented Aug 6, 2021

rwegener2 commented Sep 9, 2021

dcherian commented Sep 9, 2021

rwegener2 commented Sep 10, 2021

jbusecke commented Mar 17, 2022

rwegener2 commented Mar 21, 2022

jbusecke commented Mar 21, 2022

rwegener2 commented Mar 22, 2022

Using cf_xarray to infer attributes #193

Are you sure you want to change the base?

Using cf_xarray to infer attributes #193

Conversation

rwegener2 commented Aug 5, 2021 • edited Loading

What was done

How it was done

Testing

WIP

Seeing the test problem

jbusecke commented Aug 6, 2021

dcherian commented Aug 6, 2021

rwegener2 commented Sep 9, 2021

dcherian commented Sep 9, 2021

rwegener2 commented Sep 10, 2021

jbusecke commented Mar 17, 2022

rwegener2 commented Mar 21, 2022

jbusecke commented Mar 21, 2022

rwegener2 commented Mar 22, 2022

rwegener2 commented Aug 5, 2021 •

edited

Loading