Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add get_partition_input #289

Merged
merged 52 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
7f1da4c
add get_partition_input
juliettelavoie Nov 17, 2023
1e73589
fix cftime + rename_dict
juliettelavoie Dec 11, 2023
c4c17b9
add doc example
juliettelavoie Dec 12, 2023
9bda2c3
remove comment
juliettelavoie Dec 12, 2023
a8898d9
improve doc
juliettelavoie Dec 13, 2023
146dc10
add test
juliettelavoie Dec 14, 2023
5b57c8d
Merge remote-tracking branch 'origin/main' into add-partition-extract
juliettelavoie Dec 14, 2023
9171778
comments
juliettelavoie Dec 14, 2023
61287be
Merge remote-tracking branch 'origin/main' into add-partition-extract
juliettelavoie Dec 14, 2023
deb00f6
fix test
juliettelavoie Dec 14, 2023
58dc200
fix doc
juliettelavoie Dec 14, 2023
3aaeb01
test to get make doc pass
juliettelavoie Dec 14, 2023
90ee2a0
try to remove wmo link
juliettelavoie Dec 14, 2023
41412da
another wmo link
juliettelavoie Dec 14, 2023
19d32d3
put back the links
juliettelavoie Dec 14, 2023
e236b83
put back the links
juliettelavoie Dec 14, 2023
56b45b9
Merge remote-tracking branch 'origin/fix-docs' into add-partition-ext…
juliettelavoie Dec 15, 2023
c50f2de
Merge remote-tracking branch 'origin/fix-docs' into add-partition-ext…
juliettelavoie Dec 15, 2023
81b3a45
import path
juliettelavoie Dec 15, 2023
05476ab
try to hide cell
juliettelavoie Dec 15, 2023
a48e05d
merge
juliettelavoie Dec 15, 2023
b650b79
merge
juliettelavoie Dec 15, 2023
b70bf53
Merge branch 'main' into add-partition-extract
Zeitsperre Dec 15, 2023
6912c75
import xarray
juliettelavoie Dec 18, 2023
7c7792a
try again
juliettelavoie Dec 18, 2023
46f2e85
fix array again ?
juliettelavoie Dec 18, 2023
dacd617
new cat
juliettelavoie Dec 18, 2023
9634c01
fix toctree
juliettelavoie Dec 18, 2023
5d1a65d
build_partition_data
juliettelavoie Dec 19, 2023
463c05a
new doc + fix hidden
juliettelavoie Dec 19, 2023
3ddec06
new test
juliettelavoie Dec 19, 2023
ba029b8
write the right name
juliettelavoie Dec 19, 2023
ded0556
Merge branch 'main' into add-partition-extract
juliettelavoie Dec 19, 2023
4cf76ed
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 19, 2023
6043e53
add a check
juliettelavoie Dec 19, 2023
56cc1c2
remove old func
juliettelavoie Dec 20, 2023
87dff8f
essayer de diagnostiquer le probleme du build rtd
juliettelavoie Dec 20, 2023
138ad97
try again
juliettelavoie Dec 20, 2023
455139b
try again str ?
juliettelavoie Dec 20, 2023
2ece1d7
load le path a la main pour voir
juliettelavoie Dec 20, 2023
114d880
enleve debuggage
juliettelavoie Dec 20, 2023
20d649c
enleve debuggage open
juliettelavoie Dec 20, 2023
229698c
give up on reusing cat
juliettelavoie Dec 21, 2023
9d6faab
path
juliettelavoie Dec 21, 2023
ae3367d
suggestion from review
juliettelavoie Jan 9, 2024
04f5b2e
Merge branch 'main' into add-partition-extract
juliettelavoie Jan 9, 2024
2fdd40d
sparse <= 0.14 ?
juliettelavoie Jan 9, 2024
7dc2949
Merge remote-tracking branch 'origin/add-partition-extract' into add-…
juliettelavoie Jan 9, 2024
b7c1324
dev also?
juliettelavoie Jan 9, 2024
39c0e67
recipe and meta
juliettelavoie Jan 9, 2024
926f96b
toml
juliettelavoie Jan 9, 2024
df30d75
add reason
juliettelavoie Jan 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ New features and enhancements
* ``xs.io.round_bits`` to round floating point variable up to a number of bits, allowing for a better compression. This can be combined with the saving step through argument ``"bitround"`` of ``save_to_netcdf`` and ``save_to_zarr``. (:pull:`266`).
* Added annual global tas timeseries for CMIP6's models CMCC-ESM2 (ssp245, ssp370, ssp585), EC-Earth3-CC (ssp245, ssp585), KACE-1-0-G (ssp245, ssp370, ssp585) and TaiESM1 (ssp245, ssp370). Moved global tas database to a netCDF file. (:issue:`268`, :pull:`270`).
* Implemented support for multiple levels and models in ``xs.subset_warming_level``. Better support for `DataArray` and `DataFrame` in ``xs.get_warming_level``. (:pull:`270`).
* Added the ability to directly provide an ensemble dataset to ``xs.ensemble_stats``. (:pull:`299`).
* Added support in ``xs.ensemble_stats`` for the new robustness-related functions available in `xclim`. (:pull:`299`).
* New function ``xs.ensembles.get_partition_input`` (:pull:`289`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand All @@ -41,6 +44,7 @@ Bug fixes
* `search_data_catalogs` now eliminates anything that matches any entry in `exclusions`. (:issue:`275`, :pull:`280`).
* Fixed a bug in ``xs.scripting.save_and_update`` where ``build_path_kwargs`` was ignored when trying to guess the file format. (:pull:`282`).
* Add a warning to ``xs.extract._dispatch_historical_to_future``. (:issue:`286`, :pull:`287`).
* Modify use_cftime for the calendar conversion in ``to_dataset``. (:issue:`303`, :pull:`289`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ensemble reduction\n",
"# Ensembles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensemble reduction\n",
"\n",
"This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`."
]
Expand Down Expand Up @@ -39,7 +46,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the data\n",
"### Preparing the data\n",
"\n",
"Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010.\n",
"\n",
Expand Down Expand Up @@ -82,7 +89,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting a reduced ensemble\n",
"### Selecting a reduced ensemble\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
Expand Down Expand Up @@ -145,13 +152,144 @@
"\n",
"plot_rsqprofile(fig_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ensemble partition\n",
"This tutorial will show how to use the xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This tutorial will show how to use the xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."
"This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)."

]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a catalog for the tutorial\n",
"from pathlib import Path\n",
"\n",
"output_folder = Path().absolute() / \"_data\"\n",
"\n",
"project = {\n",
" \"title\": \"partition-catalog\",\n",
" \"description\": \"Catalog for the tutorial NetCDFs.\",\n",
"}\n",
"\n",
"cat = xs.ProjectCatalog(\n",
" str(output_folder / \"partition-catalog.json\"),\n",
" create=True,\n",
" project=project,\n",
" overwrite=True,\n",
")\n",
"\n",
"for bap in [\"A\", \"B\"]:\n",
" for s in [\"model1\", \"model2\"]:\n",
" df = xs.parse_directory(\n",
" directories=[f\"{Path().absolute()}/samples/tutorial/\"],\n",
" patterns=[\n",
" \"{activity}/{domain}/{institution}/{source}/{experiment}/{member}/{frequency}/{?:_}.nc\"\n",
" ],\n",
" homogenous_info={\n",
" \"mip_era\": \"CMIP6\",\n",
" \"type\": \"simulation\",\n",
" \"processing_level\": \"raw\",\n",
" \"bias_adjust_project\": bap,\n",
" \"source\": s,\n",
" },\n",
" read_from_file=[\"variable\", \"date_start\", \"date_end\"],\n",
" )\n",
"\n",
" cat.update(df)\n",
"cat.df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function searches the catalog with `search_kw` and creates a dataset with new dimensions in `partition_dim`(`[\"source\", \"experiment\", \"bias_adjust_project\"]`). \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be edited, since it still relates to the previous behaviour.

"- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.\n",
"- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a dictionnary of datasets wanted for the partition\n",
"input_dict = cat.search(variable=\"tas\", member=\"r1i1p1f1\").to_dataset_dict()\n",
"\n",
"# build a single dataset\n",
"\n",
"ds = xs.ensembles.build_partition_data(\n",
" input_dict, subset_kw=dict(name=\"mtl\", method=\"gridpoint\", lat=[45.5], lon=[-73.6])\n",
")\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pass the input to an xclim partition function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
"# This is a hidden cell.\n",
"# extend with fake data to have at least 3 years\n",
"import xarray as xr\n",
"\n",
"ds2 = ds.copy()\n",
"ds[\"time\"] = xr.cftime_range(start=\"2001-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
"ds2[\"time\"] = xr.cftime_range(start=\"2003-01-01\", periods=len(ds[\"time\"]), freq=\"D\")\n",
"ds = xr.concat([ds, ds2], dim=\"time\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import xclim as xc\n",
"\n",
"# get a yearly dataset\n",
"da = xc.atmos.tg_mean(ds=ds)\n",
"\n",
"# compute uncertainty partitionning\n",
"mean, uncertainties = xc.ensembles.hawkins_sutton(da)\n",
"uncertainties"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\"> <b>NOTE</b>\n",
" \n",
"Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.\n",
" \n",
"</div>"
]
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"celltoolbar": "Aucun(e)",
"language_info": {
"codemirror_mode": {
"name": "ipython",
Expand All @@ -162,7 +300,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ Examples
1_catalog
2_getting_started
3_diagnostics
4_ensemble_reduction
4_ensembles
5_warminglevels
6_config
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ values = [
"gamma"
]


[tool.coverage.run]
relative_files = true
include = ["xscen/*"]
Expand Down
35 changes: 35 additions & 0 deletions tests/test_ensembles.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
import pytest
import xarray as xr
import xesmf
from xclim.testing.helpers import test_timeseries as timeseries

import xscen as xs
Expand Down Expand Up @@ -1059,3 +1060,37 @@ def test_attribute_weight_error(self):
self.ens_rcm,
attribute_weights={"experiment": {"rcp45": 2, "rcp85": 1}},
)


class TestEnsemblePartition:
def test_build_partition_data(self, samplecat, tmp_path):
# test subset
datasets = samplecat.search(variable="tas").to_dataset_dict()
ds = xs.ensembles.build_partition_data(
datasets=datasets,
partition_dim=["source", "experiment"],
subset_kw=dict(name="mtl", method="gridpoint", lat=[45.0], lon=[-74]),
rename_dict={"source": "new-name"},
)

assert ds.dims == {"time": 730, "scenario": 4, "new-name": 2}
assert ds.lat.values == 45.0
assert ds.lon.values == -74
assert [i for i in ds.data_vars] == ["tas"]

# test regrid
ds_grid = xesmf.util.cf_grid_2d(-75, -74, 0.25, 45, 48, 0.55)
datasets = samplecat.search(variable="tas", member="r1i1p1f1").to_dataset_dict()
ds = xs.ensembles.build_partition_data(
datasets=datasets,
regrid_kw=dict(ds_grid=ds_grid, weights_location=tmp_path),
)

assert ds.dims == {
"scenario": 4,
"model": 1,
"time": 730,
"lat": 5,
"lon": 4,
}
assert [i for i in ds.data_vars] == ["tas"]
2 changes: 1 addition & 1 deletion xscen/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ def preprocess(ds):
ds = ensure_correct_time(ds, xrfreq)
if calendar is not None:
ds = ds.convert_calendar(
calendar, use_cftime=(calendar == "default"), align_on="date"
calendar, use_cftime=(calendar != "default"), align_on="date"
)
return ds

Expand Down
86 changes: 85 additions & 1 deletion xscen/ensembles.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,17 @@
from xclim import ensembles

from .config import parse_config
from .regrid import regrid_dataset
from .spatial import subset
from .utils import clean_up, get_cat_attrs

logger = logging.getLogger(__name__)

__all__ = ["ensemble_stats", "generate_weights"]
__all__ = [
"build_partition_data",
"ensemble_stats",
"generate_weights",
]


@parse_config
Expand Down Expand Up @@ -664,3 +670,81 @@ def generate_weights( # noqa: C901
weights = weights / weights.sum(dim="realization")

return weights


def build_partition_data(
datasets: Union[dict, list[xr.Dataset]],
partition_dim: list[str] = ["source", "experiment", "bias_adjust_project"],
subset_kw: dict = None,
regrid_kw: dict = None,
rename_dict: dict = None,
):
"""Get the input for the xclim partition functions.

From a list or dictionary of datasets, create a single dataset with
`partition_dim` dimensions (and time) to pass to one of the xclim partition functions
(https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning).
If the inputs have different grids,
they have to be subsetted and regridded to a common grid/point.


Parameters
----------
datasets : dict
List or dictionnary of Dataset objects that will be included in the ensemble.
The datasets should include the necessary ("cat:") attributes to understand their metadata.
Tip: With a project catalog, you can do: `datasets = pcat.search(**search_dict).to_dataset_dict()`.
partition_dim: list[str]
Components of the partition. They will become the dimension of the output.
The default is ['source', 'experiment', 'bias_adjust_project'].
For source, the dimension will actually be institution_source_member.
subset_kw: dict
Arguments to pass to `xs.spatial.subset()`.
regrid_kw:
Arguments to pass to `xs.regrid_dataset()`.
rename_dict:
Dictionary to rename the dimensions from xscen names to xclim names.
The default is {'source': 'model', 'bias_adjust_project': 'downscaling', 'experiment': 'scenario'}.

Returns
-------
xr.Dataset
The input data for the partition functions.

See Also
--------
xclim.ensembles

"""
if isinstance(datasets, dict):
datasets = list(datasets.values())
# initialize dict
subset_kw = subset_kw or {}
regrid_kw = regrid_kw or {}

list_ds = []
for ds in datasets:
if subset_kw:
ds = subset(ds, **subset_kw)

if regrid_kw:
ds = regrid_dataset(ds, **regrid_kw)

for dim in partition_dim:
if f"cat:{dim}" in ds.attrs:
ds = ds.expand_dims(**{dim: [ds.attrs[f"cat:{dim}"]]})

if "source" in partition_dim:
new_source = f"{ds.attrs['cat:institution']}_{ds.attrs['cat:source']}_{ds.attrs['cat:member']}"
ds = ds.assign_coords(source=[new_source])
list_ds.append(ds)
ens = xr.merge(list_ds)

rename_dict = rename_dict or {}
rename_dict.setdefault("source", "model")
rename_dict.setdefault("experiment", "scenario")
rename_dict.setdefault("bias_adjust_project", "downscaling")
rename_dict = {k: v for k, v in rename_dict.items() if k in ens.dims}
ens = ens.rename(rename_dict)

return ens