Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add category conversion #269

Merged
merged 42 commits into from
Oct 28, 2024
Merged

Add category conversion #269

merged 42 commits into from
Oct 28, 2024

Conversation

crdanielbusch
Copy link
Collaborator

@crdanielbusch crdanielbusch commented Oct 10, 2024

Pull request

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Description in a {pr}.thing.md file in the directory changelog added - see changelog/README.md for details

Description

Continues Mika's work from #74. We want to be able to convert categories from one standard into another, for example from IPCC1996 to IPCC2006. More info here primap-community/climate_categories#160

@crdanielbusch crdanielbusch changed the base branch from main to conversions October 10, 2024 07:55
@@ -0,0 +1,40 @@
# references: non_annex1_data repo
# last_update: 2024-10-14
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGuetschow does the BURDI to IPCCC2006_PRIMAP conversion make sense to you? I took took the rules from the config in the UNFCCC_non-AnnexI_data repo. The rules here combine the mapping and aggregation step. However we still need some aggregation after the conversion, for example for "3": {"sources": ["M.AG", "M.LULUCF"]. That's beyond the scope of what the conversion should do if I remember correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add 4 + 5, 3 as a rule to aggregate AFOLU as you already do the same for 2 and 3. I think we just wanted to exclude downscaling, not aggregation.

@crdanielbusch crdanielbusch requested a review from mikapfl October 21, 2024 13:05
Comment on lines 15 to 21
def _convert_inner(
self,
dim: Hashable | str,
categorization: climate_categories.Categorization | str,
*,
# TODO type will change to climate_categories.Conversion when
# https://github.com/primap-community/climate_categories/pull/164 is merged
conversion: climate_categories._conversions.Conversion,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see the definition of _convert_inner, I think we can go all-in with this approach. As you said in chat, we can leave the whole "get the right conversion" business to climate_categories, where we have functions for this already. So, we just call this here convert and throw away the wrapper which fetches the conversion from a name. The resulting call for a "simple" conversion would look like this:

import climate_categories as cc

conversion = cc.BURDI.conversion_to(cc.IPCC2006_PRIMAP)
da.convert("category", conversion=conversion)

for me, this is actually more transparent than what the current convert does:

da.convert("category", "IPCC2006_PRIMAP")

Sure, the current convert function needs less typing, but it also hides where the conversions come from, so as soon as you define your own categorization which is not built into climate_categories, it is unclear how you would use it. In contrast, using the "inner" function as the public API, it is obvious that you can build your own conversion and pass that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally designed the API of convert to be similar to e.g. GWP and unit conversion. However, for units and also global warming potentials, we assume that user-defined units and GWPs will be super uncommon, so hiding the complexities from the user made sense. So, users don't have to pass a Unit object, they can simply pass a string with a name.

However, here, I think we already know from the non-AnnexI repo that we will frequently have to modify the conversion before we can use (or use a fully custom conversion). In this case, users have a Conversion object already anyway, so it makes sense to use the Conversion object instead of names.

But we should check that the passed Conversion actually converts from the actual source categorization of the data and refuse to convert FAOstat categories using a conversion from BURDI to IPCC2006.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes all sense to me

Copy link
Collaborator Author

@crdanielbusch crdanielbusch Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input Mika and Johannes. Mika if you could have a quick look at _convert.py again, that would be great. I have now integrated the changes from the latest climate categories release. In the end, only a few lines of code changed.

Copy link
Contributor

@JGuetschow JGuetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments. Have to submit a review so they are actually visible.

Comment on lines 15 to 21
def _convert_inner(
self,
dim: Hashable | str,
categorization: climate_categories.Categorization | str,
*,
# TODO type will change to climate_categories.Conversion when
# https://github.com/primap-community/climate_categories/pull/164 is merged
conversion: climate_categories._conversions.Conversion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes all sense to me

4.A,3.A.1
4.B,3.A.2
4.C,3.C.7
4.D + 4.C + 4.E + 4.F + 4.G,3.C
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also map 4.D to single category, else we have some of the 3.C subcategories but not all which can lead to confusion and problems in category aggregation and consistency checks. I would add an M-category to IPCC2006_PRIMAP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. I will take this to another PR. And maybe move it to climate categories, because this is where the conversion will live.

@@ -0,0 +1,40 @@
# references: non_annex1_data repo
# last_update: 2024-10-14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add 4 + 5, 3 as a rule to aggregate AFOLU as you already do the same for 2 and 3. I think we just wanted to exclude downscaling, not aggregation.

@mikapfl
Copy link
Member

mikapfl commented Oct 25, 2024

Before you merge, you need to change the target of the PR to main (currently conversions).

Copy link
Member

@mikapfl mikapfl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, definitely a cleaner API now, and adding the tests is super useful for further development. 🙂

Comment on lines +39 to +40
categorization. Contains ``climate_categories.Categorization``
object for old and new categorization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
categorization. Contains ``climate_categories.Categorization``
object for old and new categorization.
categorization. Contains the ``climate_categories.Categorization``
objects for the old and the new categorization.

@mikapfl
Copy link
Member

mikapfl commented Oct 25, 2024

Changes we also discussed which aren't in this PR yet, but will be much easier now with the added tests:

  • remove the sum_rule argument and code, always assume extensive. Add documentation that this is only meant to be used with extensive quantities.
  • remove the input_weights and output_weights arguments and code. Since we always assume extensive, the input_weights are never needed, and we decided we don't want to do downscaling in the category conversion function, so output_weights are also not needed. We have to deal with multiple categories in the target differently. Ideas we had so far:
    • e.g. if you have categories 1.A and 1.B on the rhs, automatically add an A.1.A+1.B category (name up to debate).
    • if there is a category which if the sum of the categories on the rhs already, and the rhs categorization is total_sum, simply use the existing super-category. E.g. if the rule specifies categories 1.A and 1.B, and in the rhs categorization 1 has the children 1.A and 1.B only and is total_sum, simply use 1 as the rhs category.
    • maybe don't add automatic A. categories in the conversion, raise an error instead and the user has to add appropriate M. categories to the categorization before the conversion.

Probably, these changes deserve their own PR each. Also, still some open questions around best usability re the downscaling question. My suggestion: In a first PR remove the sum_rule, in a second PR which builds on the first remove the input_weights and output_weights and raise errors instead if downscaling would be necessary. Then, start work on an actual conversion and add downscaling support as needed. Maybe we don't need anything automatic, maybe some automation is useful, but that will be much clearer once e.g. the FAO conversion is written and adding also these super-categories by hand is annoying or actually rather pleasant.

@crdanielbusch crdanielbusch changed the base branch from conversions to main October 28, 2024 08:07
@crdanielbusch crdanielbusch marked this pull request as ready for review October 28, 2024 08:14
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 92.66055% with 16 lines in your changes missing coverage. Please review.

Project coverage is 96.96%. Comparing base (bfc0fb8) to head (86aec44).
Report is 94 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
primap2/_convert.py 88.73% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #269      +/-   ##
==========================================
- Coverage   97.20%   96.96%   -0.24%     
==========================================
  Files          47       49       +2     
  Lines        4397     4612     +215     
==========================================
+ Hits         4274     4472     +198     
- Misses        123      140      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@crdanielbusch crdanielbusch merged commit ce5362a into main Oct 28, 2024
17 checks passed
@crdanielbusch crdanielbusch deleted the conversions-db branch October 28, 2024 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants