-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining experiments/variables with different simulation lengths #5
Comments
I have faced this before. I usually preselect the time with To accomodate 'late start' runs, you could try: ddict_merged = def merge_variables(ddict, merge_kwargs={'join':'outer'}) I think this should pad missing values with nan. What year do they usually start? |
I have implemented the preselection with: def shorten_ssp_runs(ddict,end_year):
ddict_shortened=ddict
for k, v in ddict.items():
if 'ssp' in k:
ddict_shortened[k] = v.sel(time=slice(None, str(end_year)))
else:
ddict_shortened[k] = v
return ddict_shortened then found out a remaining issue, which is that for some variants some timesteps are missing that are not missing for others. While it is possible to pad these missing values with nan, as you suggest, ddict_merged = def merge_variables(ddict, merge_kwargs={'join':'outer'}) that results in large chunks for those variants for which many timesteps are missing. For example, variant becomes after merging different members. Note that for EC-Earth3, each year is stored in a separate file on ESGF. Probably, some of these files are missing on Google Cloud. I have added these instances to #2. |
As an additional issue, merge_kwargs={'join':'outer'} also pads missing values with nan where the latitude/longitude coordinates of different For now, my workaround is to copy the coordinates of the first matching dataset to the other matching datasets in the custom concatenation of member_id's function. If I would preprocess with |
Generally to fix both the time alignment and use override the coordinates for lon/lat you could use several calls to xr.align and using the I think for the large time chunks your intution about missing files seems plausible. Are you able to move ahead on this without the dataset in question? Hopefully I will be able to make some progress on #2 soon and then this might go away. |
Regarding aligning lon/lat separately, this seems to work: def align_lonlat(ds_list):
aligned_ds_list = []
for ds in ds_list:
a,b = xr.align(ds_list[0],ds,join='override',exclude=['time','member_id'])
aligned_ds_list.append(b)
return aligned_ds_list but doesn't feel very optimal. I can't figure out how to pass a list of datasets to With regards to #2, I think this issue will partially persist because some ESGF runs don't start in the same year, even if we would have them complete on the cloud. I'll test if |
When preselecting periods, filtering out incomplete datasets and regridding first before combining members this is no longer an issue, so closing it. |
Functionality needed to combine simulations of models with unusual lengths, e.g., SSP experiments running past 2100 or historical experiments provided only after a given year later than 1850 (happens for e.g., EC-Earth3).
in https://github.com/Timh37/CMIP6cf/blob/main/notebooks/get_CMIP6_gridded_around_tgs_xmip.ipynb currently drops these datasets under the following warning (example):
The text was updated successfully, but these errors were encountered: