Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add category conversion #269

Merged
merged 42 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
952d9ac
Add initial support for converting DataArrays.
mikapfl Sep 23, 2021
2e5bb70
Depend on version of climate_categories with conversion support.
mikapfl Nov 2, 2021
2afb274
Conversions: refactor some things into own sub-functions, fix some ov…
mikapfl Nov 2, 2021
000dfc1
Conversions: refactor, do something useful with rules restricted to s…
mikapfl Nov 5, 2021
8b9b86f
Avoid name clash between weight function and its own argument.
mikapfl Nov 5, 2021
f4cc526
Require climate_categories >= 0.6.3, which introduces some API we need.
mikapfl Nov 5, 2021
506d0a9
Merge branch 'main' into mika-conversions
mikapfl Nov 14, 2023
fdbec61
style: ruff
mikapfl Nov 14, 2023
1ecee18
fix: stub file generation
mikapfl Nov 15, 2023
a7a1d9c
types: better typing for sum_rule
mikapfl Nov 15, 2023
1a2f2ba
test: some tests for correct results of convert()
mikapfl Nov 15, 2023
7e66953
perf: convert in-place
mikapfl Nov 15, 2023
a76feea
fix: types for 3.9
mikapfl Nov 15, 2023
8bf3e02
Merge branch 'main' into conversions
mikapfl Oct 7, 2024
91f9b76
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 7, 2024
80e87f0
Merge branch 'main' into conversions
mikapfl Oct 7, 2024
0849805
fix: _alias_selection is called _selection now
mikapfl Oct 7, 2024
e88f760
fix: bump minimum required version of climate_categories to something…
mikapfl Oct 7, 2024
57c8a5f
BURDI test draft
Oct 10, 2024
b9973cd
docs: add some algorithm notes
mikapfl Oct 10, 2024
4e1bb85
Merge branch 'main' into conversions
mikapfl Oct 10, 2024
77f7b16
Merge remote-tracking branch 'origin/conversions' into conversions-db
Oct 10, 2024
920999c
test for BURDI conversion
Oct 14, 2024
69c637b
ruff
Oct 14, 2024
113abda
comments
Oct 14, 2024
7428d51
add test for custom categorisations and custom conversion
Oct 17, 2024
868afbc
refactor convert
Oct 21, 2024
4ea5398
ruff
Oct 21, 2024
44c2cae
clean up
Oct 21, 2024
7ffe506
more cleanup
Oct 21, 2024
0db7373
docstring and argument passing from outer to inner convert function
Oct 21, 2024
adb202c
ruff and docstring
Oct 21, 2024
b896f64
remove _convert_inner wrapper
Oct 22, 2024
e62291d
update climate categories
Oct 24, 2024
bd9b91d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2024
1b714fd
get test data with importlib
Oct 24, 2024
73ce959
Revert "get test data with importlib"
Oct 24, 2024
2f8a35c
Merge branch 'conversions-db' of github.com:pik-primap/primap2 into c…
Oct 24, 2024
c90bb01
importlib
Oct 24, 2024
fffb84a
clean up
Oct 24, 2024
4b9bf2b
test signed commit
Oct 28, 2024
86aec44
update email for verified commits
crdanielbusch Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 13 additions & 15 deletions primap2/_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,15 @@ def select_no_scalar_dimension(
"""
if sel is None:
return obj
else:
sele: DatasetOrDataArray = obj.loc[sel]
if dim_names(obj) != dim_names(sele):
raise ValueError(
"The dimension of the selection doesn't match the dimension of the "
"orginal dataset. Likely you used a selection casting to a scalar "
"dimension, like sel={'axis': 'value'}. Please use "
"sel={'axis': ['value']} instead."
)
return sele
selection: DatasetOrDataArray = obj.loc[sel]
if dim_names(obj) != dim_names(selection):
raise ValueError(
"The dimension of the selection doesn't match the dimension of the "
"orginal dataset. Likely you used a selection casting to a scalar "
"dimension, like sel={'axis': 'value'}. Please use "
"sel={'axis': ['value']} instead."
)
return selection


class DataArrayAggregationAccessor(BaseDataArrayAccessor):
Expand All @@ -52,11 +51,10 @@ def _reduce_dim(
if dim is not None and reduce_to_dim is not None:
raise ValueError("Only one of 'dim' and 'reduce_to_dim' may be supplied, not both.")

if dim is None:
if reduce_to_dim is not None:
if isinstance(reduce_to_dim, str):
reduce_to_dim = [reduce_to_dim]
dim = set(self._da.dims) - set(reduce_to_dim)
if dim is None and reduce_to_dim is not None:
if isinstance(reduce_to_dim, str):
reduce_to_dim = [reduce_to_dim]
dim = set(self._da.dims) - set(reduce_to_dim)

return dim

Expand Down
683 changes: 683 additions & 0 deletions primap2/_convert.py

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions primap2/_downscale.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,21 @@ def downscale_timeseries(
----------
dim: str
The name of the dimension which contains the basket and its contents, has to
be one of the dimensions in ``ds.dims``.
be one of the dimensions in ``da.dims``.
basket: str
The name of the super-category for which values are known at higher temporal
resolution and/or for a wider range. A value from ``ds[dimension]``.
resolution and/or for a wider range. A value from ``da[dimension]``.
basket_contents: list of str
The name of the sub-categories. The sum of all sub-categories equals the
basket. Values from ``ds[dimension]``.
basket. Values from ``da[dimension]``.
check_consistency: bool, default True
If for all points where the basket and all basket_contents are defined,
it should be checked if the sum of the basket_contents actually equals
the basket. A ``ValueError`` is raised if the consistency check fails.
sel: Selection dict, optional
If the downscaling should only be done on a subset of the Dataset while
retaining all other values unchanged, give a selection dictionary. The
downscaling will be done on ``ds.loc[sel]``.
downscaling will be done on ``da.loc[sel]``.
skipna_evaluation_dims: list of str, optional
Dimensions which should be evaluated to determine if NA values should be
skipped entirely if missing fully. By default, no NA values are skipped.
Expand Down
17 changes: 16 additions & 1 deletion primap2/_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,22 @@ def resolve_not(


def translate(item: KeyT, translations: typing.Mapping[typing.Hashable, str]) -> KeyT:
"""Translate primap2 short names into xarray names."""
"""Translates a single str key or the keys of a dict using the given translations.

If a key is not found in the translations, return it untranslated.

Parameters
----------
item : str or dict with str keys
The input to translate. Either a str or a dict with str keys.
translations : dict
The translations to apply.

Returns
-------
translated : str or dict with str keys
The same type as the input item, but translated.
"""
if isinstance(item, str):
if item in translations:
return translations[item]
Expand Down
2 changes: 2 additions & 0 deletions primap2/accessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import xarray as xr

from ._aggregate import DataArrayAggregationAccessor, DatasetAggregationAccessor
from ._convert import DataArrayConversionAccessor
from ._data_format import DatasetDataFormatAccessor
from ._downscale import DataArrayDownscalingAccessor, DatasetDownscalingAccessor
from ._fill_combine import DataArrayFillAccessor, DatasetFillAccessor
Expand Down Expand Up @@ -37,6 +38,7 @@ class PRIMAP2DatasetAccessor(
class PRIMAP2DataArrayAccessor(
DataArrayAggregationAccessor,
DataArrayAliasSelectionAccessor,
DataArrayConversionAccessor,
DataArrayDownscalingAccessor,
DataArrayMergeAccessor,
DataArrayOverviewAccessor,
Expand Down
40 changes: 40 additions & 0 deletions primap2/tests/data/BURDI_conversion.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# references: non_annex1_data repo
# last_update: 2024-10-14
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGuetschow does the BURDI to IPCCC2006_PRIMAP conversion make sense to you? I took took the rules from the config in the UNFCCC_non-AnnexI_data repo. The rules here combine the mapping and aggregation step. However we still need some aggregation after the conversion, for example for "3": {"sources": ["M.AG", "M.LULUCF"]. That's beyond the scope of what the conversion should do if I remember correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add 4 + 5, 3 as a rule to aggregate AFOLU as you already do the same for 2 and 3. I think we just wanted to exclude downscaling, not aggregation.

BURDI,IPCC2006_PRIMAP,comment
1,1
1.A,1.A
1.A.1,1.A.1
1.A.2,1.A.2
1.A.3,1.A.3
1.A.4,1.A.4
1.A.5,1.A.5
1.B,1.B
1.B.1,1.B.1
1.B.2,1.B.2
2 + 3,2
2.A,2.A
2.B + 2.E,2.B
2.C,2.C
2.F,2.F
2.G + 2.D, 2.H
3,2.D
4,M.AG
4.A,3.A.1
4.B,3.A.2
4.C,3.C.7
4.D + 4.C + 4.E + 4.F + 4.G,3.C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also map 4.D to single category, else we have some of the 3.C subcategories but not all which can lead to confusion and problems in category aggregation and consistency checks. I would add an M-category to IPCC2006_PRIMAP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. I will take this to another PR. And maybe move it to climate categories, because this is where the conversion will live.

4.E,3.C.1.c
4.F,3.C.1.b
4.G,3.C.8
5,M.LULUCF
6,4
6.A,4.A
6.B,4.D
6.C,4.C
6.D,4.E
24540,0
15163,M.0.EL
14637,M.BK
14424,M.BK.A
14423,M.BK.M, leaving 14638 --> M.BIO out for now, as it's not in climate categories
7,5, 5.A-D ignored as not fitting 2006 cats
35 changes: 35 additions & 0 deletions primap2/tests/data/simple_categorisation_a.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: A
title: Simple Categorization
comment: A simple example categorization without relationships between categories
references: doi:00000/00000
institution: PIK
last_update: 2021-02-23
hierarchical: no
version: 1
categories:
1:
title: Category 1
comment: The first category
alternative_codes:
- A
- CatA
info:
important_data:
- A
- B
- C
other_important_thing: ABC
2:
title: Category 2
comment: The second category
alternative_codes:
- B
- CatB
3:
title: Category 3
comment: The third category
alternative_codes:
- C
- CatC
unnumbered:
title: The unnumbered category
27 changes: 27 additions & 0 deletions primap2/tests/data/simple_categorisation_b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: B
title: Simple Categorization
comment: A simple example categorization without relationships between categories
references: doi:00000/00000
institution: PIK
last_update: 2021-02-23
hierarchical: no
version: 1
categories:
1:
title: Category 1
comment: The first category
alternative_codes:
- A
- CatA
info:
important_data:
- A
- B
- C
other_important_thing: ABC
2:
title: Category 2
comment: The second category
alternative_codes:
- B
- CatB
5 changes: 5 additions & 0 deletions primap2/tests/data/simple_conversion.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# references: test
# last_update: 2024-10-14
A,B,comment
1,1, no comment
2+3,2
Loading
Loading