Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cf_xarray to infer attributes #193

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rwegener2
Copy link

@rwegener2 rwegener2 commented Aug 5, 2021

What was done

This PR adds the .cf.guess_coord_axis() and .cf.add_canonical_attributes() functions to the preprocessing component of the library. This will update the dataset with additional attributes in line with CF conventions that can be inferred by the cf_xarray library.

See: Issue #185

How it was done

Two lines were added to the rename_cmip6 function:

    ds = ds.cf.guess_coord_axis()
    ds = ds.cf.add_canonical_attributes()

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

Testing

I added an additional assertion to the test_rename_cmip6 function in test_preprocessing.py. The test is weak mostly because there isn't a lot that is guaranteed to be true after running the additional steps here. The one check I added (because I believe it should always be true) is that history is present in the global attributes. I could certainly be convinced that this in unnecessary and the test should be removed.

WIP

This is a Work In Progress because I actually can't get the test to pass (😱). When I use my local version of the library it works just fine so I'm not sure why the tests are working. I thought I'd still post this, though, since the necessity of the test is questionable, before spending oodles of time debugging it.

Seeing the test problem

Personally I ran my tests with python -m pytest . in my conda activated environment.
This PR has a slightly annoying number of print statements, which I left in to help demonstrate how it seems that the preprocessing file is correctly using the new lines of code but (at least for me) the tests aren't. I print the dataset keys in both the preprocessing file and the test file to show the error.

Code I use to manually test, since the regular test is failing
import numpy as np
import xarray as xr

from cmip6_preprocessing.preprocessing import rename_cmip6, cmip6_renaming_dict

def create_test_ds(xname, yname, zname, xlen, ylen, zlen):
    x = np.linspace(0, 359, xlen)
    y = np.linspace(-90, 89, ylen)
    z = np.linspace(0, 5000, zlen)

    data = np.random.rand(len(x), len(y), len(z))
    ds = xr.DataArray(data, coords=[(xname, x), (yname, y), (zname, z)]).to_dataset(name="test")
    ds.attrs["source_id"] = "test_id"
    # if x and y are not lon and lat, add lon and lat to make sure there are no conflicts
    lon = ds[xname] * xr.ones_like(ds[yname])
    lat = xr.ones_like(ds[xname]) * ds[yname]
    if xname != "lon" and yname != "lat":
        ds = ds.assign_coords(lon=lon, lat=lat)
    else:
        ds = ds.assign_coords(longitude=lon, latitude=lat)
        return ds


ds = create_test_ds('i', 'j', 'lev', 10, 5, 6)
ds_renamed = rename_cmip6(ds, cmip6_renaming_dict())
# See 'history' in global attributes of `ds_renamed` but not in `ds`

@jbusecke
Copy link
Owner

jbusecke commented Aug 6, 2021

Thank you so much for adding this @rwegener2.

I think we need to clearly define what we want this implementation to do. Under which circumstances will a history attribute be generated or not? Maybe @dcherian can help there.

I think one major thing to check is if the axes are correctly detected. We 'know' the association between axes and dimensions roughly: x->X, y->Y, lev->Z. Can we write a test that checks something like this (pseudo-code):

def test_cf_axes(ds):
    ds_preprocessed = combined_preprocessind(ds)
    for dim, axis in [('x', 'X'), ('y', 'Y'), ('lev','Z'), ('time', 'T')]:
        if dim in ds_preprocessed.dims:
             #check that the axis is in the cf object
             axis in ds_preprocessed.cf.axes.keys()

You can create a bunch of test cases in the normal tests under https://github.com/jbusecke/cmip6_preprocessing/blob/master/tests/test_preprocessing.py, but I think this would be REALLY useful to check in the cloud tests , because that would enable us to actually see which models fail and why.

I put them in the rename_cmip6 function in preprocessing.py because it seemed to semantically fit, but I'm certainly open to moving them if there's a better spot.

I think I would prefer a seperate function like cf_parsing or similar, just to keep things neat.

@dcherian
Copy link

dcherian commented Aug 6, 2021

Nice work @rwegener2 .

The history attribute is only added when .cf.add_canonical_attributes adds something.

I think you'll want to write a test like @jbusecke suggested. These are the attributes that .cf.guess_coord_axis adds:
https://github.com/xarray-contrib/cf-xarray/blob/e628079a7ce25c40a3d9fff974615c84ca1bd82d/cf_xarray/accessor.py#L58-L67

@rwegener2
Copy link
Author

Thanks for the feedback on this. I apologize for taking so insanely long to respond - I wasn't working on any development in August - but I am settled back in so I can be much more responsive going forward.

I updated the test_preprocessing.py test to use the axis -> dimension mapping rather than checking the history. I also cleaned up the code in preprocessing.py to use a separate _parse_cf() function.

As for the cloud test, I added the same chunk of testing code into the test_check_dim_coord_values_wo_intake function. This seems to be the right place, but I also didn't see anywhere in that testing script where the rename_cmip6() function gets run (which is where the _parse_cf() function gets called). @jbusecke let me know if this wasn't the right way to do that.

I'm sorry again for the long delay and let me know if there is anything else that should be changed.

@dcherian
Copy link

dcherian commented Sep 9, 2021

.cf.add_canonical_attributes

I think you want to avoid this. It adds units which could be totally wrong. You can skip that with skip="units" but I don't think it adds anything useful if you skip the units

@rwegener2
Copy link
Author

Thanks for the feedback @dcherian. I just removed that so that only .cf.guess_coord_axis() is used.

@rwegener2 rwegener2 marked this pull request as ready for review September 10, 2021 17:41
@jbusecke
Copy link
Owner

Hey everyone,

I am very sorry for dropping the ball here. I should have a bit more capacity to maintain cmip6_pp going forward. Is this still of interest? Would be happy to help to move this along.

@rwegener2
Copy link
Author

Hey @jbusecke!

If you think this is a helpful addition to the project I'm happy to put some energy to get it merged. I just pushed a superficial change to trigger the checks again because the previous failure logs got deleted with inactivity. I don't remember seeing much in the logs that made sense to me, but if you know what I should do to get the test to pass just let know. Thanks, Julius!

@jbusecke
Copy link
Owner

I think I will dedicate some time to cmip6_pp in 1-2 weeks. How about I ping this issue then and we can iterate on it? I still believe this would be a great addition.

@rwegener2
Copy link
Author

Sounds good, I'll keep an eye out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants