Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining experiments/variables with different simulation lengths #5

Closed
Timh37 opened this issue Jan 24, 2023 · 6 comments
Closed

Combining experiments/variables with different simulation lengths #5

Timh37 opened this issue Jan 24, 2023 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@Timh37
Copy link
Owner

Timh37 commented Jan 24, 2023

Functionality needed to combine simulations of models with unusual lengths, e.g., SSP experiments running past 2100 or historical experiments provided only after a given year later than 1850 (happens for e.g., EC-Earth3).

ddict_merged = merge_variables(ddict)

in https://github.com/Timh37/CMIP6cf/blob/main/notebooks/get_CMIP6_gridded_around_tgs_xmip.ipynb currently drops these datasets under the following warning (example):

/srv/conda/envs/notebook/lib/python3.10/site-packages/xmip/postprocessing.py:157: UserWarning: ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp585.r117i1p1f1.day.gr.none.psl failed to combine with :cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'time' ('time',)
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")

#etc
@Timh37 Timh37 added the help wanted Extra attention is needed label Jan 24, 2023
@jbusecke
Copy link
Collaborator

I have faced this before. I usually preselect the time with .sel(time=slice(None, '2100') to get rid of the long running members.

To accomodate 'late start' runs, you could try:

ddict_merged = def merge_variables(ddict, merge_kwargs={'join':'outer'})

I think this should pad missing values with nan. What year do they usually start?

@Timh37
Copy link
Owner Author

Timh37 commented Jan 30, 2023

I have implemented the preselection with:

def shorten_ssp_runs(ddict,end_year):
    ddict_shortened=ddict
    for k, v in ddict.items():
        if 'ssp' in k:
            ddict_shortened[k] = v.sel(time=slice(None, str(end_year)))
        else:
            ddict_shortened[k] = v
    return ddict_shortened

then found out a remaining issue, which is that for some variants some timesteps are missing that are not missing for others. While it is possible to pad these missing values with nan, as you suggest,

ddict_merged = def merge_variables(ddict, merge_kwargs={'join':'outer'})

that results in large chunks for those variants for which many timesteps are missing. For example, variant r111i1p1f1 of EC-Earth3:

image

becomes

image

after merging different members. Note that for EC-Earth3, each year is stored in a separate file on ESGF. Probably, some of these files are missing on Google Cloud. I have added these instances to #2.

@Timh37
Copy link
Owner Author

Timh37 commented Feb 2, 2023

As an additional issue,

merge_kwargs={'join':'outer'}

also pads missing values with nan where the latitude/longitude coordinates of different member_id's do not exactly agree. For example, I've tried concatenating members r1i1p1f1 and r2i1p1f1 of MPI-ESM1-2-HR, for ssp585. The coordinates of these members are exactly the same, but the Indexes of their latitude differ very slightly (order 10^-14), so that xarray.DataArray.equals returns False and {'join':'exact'} fails on the latitude coordinate. The result is that {'join':'outer'} results in zonal bands of nan's which is problematic for the subsetting.

For now, my workaround is to copy the coordinates of the first matching dataset to the other matching datasets in the custom concatenation of member_id's function.

If I would preprocess with xmip at the start, this issue may not arise, but at the moment that preprocessing fails on renaming the variables I'm querying.

@jbusecke
Copy link
Collaborator

If I would preprocess with xmip at the start, this issue may not arise, but at the moment that preprocessing fails on renaming the variables I'm querying.

  1. Not sure that xmip would catch this. Most of that logic works on a 'per dataset' basis.
  2. Is this related to xmip's combined_preprocessing raises Renaming failed warnings #7 ? Are you sure these warnings result in errors? See my answer there, in my experience this is often a meaningless warning. If not, I would love to see an example.

Generally to fix both the time alignment and use override the coordinates for lon/lat you could use several calls to xr.align and using the exclude argument to only align a subset of dimensions. This is a bit hacky but should work at least for the lon/lat alignment issues.

I think for the large time chunks your intution about missing files seems plausible. Are you able to move ahead on this without the dataset in question?

Hopefully I will be able to make some progress on #2 soon and then this might go away.

@Timh37
Copy link
Owner Author

Timh37 commented Feb 14, 2023

Regarding aligning lon/lat separately, this seems to work:

def align_lonlat(ds_list):

    aligned_ds_list = []
    
    for ds in ds_list:
        a,b = xr.align(ds_list[0],ds,join='override',exclude=['time','member_id'])
        aligned_ds_list.append(b)
   
    return aligned_ds_list

but doesn't feel very optimal. I can't figure out how to pass a list of datasets to xr.align.

With regards to #2, I think this issue will partially persist because some ESGF runs don't start in the same year, even if we would have them complete on the cloud. I'll test if dask.config.set(**{'array.slicing.split_large_chunks': True}): helps.

@Timh37
Copy link
Owner Author

Timh37 commented Oct 20, 2023

When preselecting periods, filtering out incomplete datasets and regridding first before combining members this is no longer an issue, so closing it.

@Timh37 Timh37 closed this as completed Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants