Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one-to-many conversion #284

Merged
merged 11 commits into from
Nov 7, 2024
Merged

one-to-many conversion #284

merged 11 commits into from
Nov 7, 2024

Conversation

crdanielbusch
Copy link
Collaborator

@crdanielbusch crdanielbusch commented Oct 29, 2024

Pull request

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Description in a {pr}.thing.md file in the directory changelog added - see changelog/README.md for details

Description

Functionality added in this pull request:

  • one-to-many conversion: Input and output weights are not supported anymore (see PR remove input and output weights from convert function #282). That means a category (or a group of categories) cannot be split into a group of categories, e.g. 4.B.10 + 4.B.11 + 4.B.12 + 4.B.13 -> 3.A.2.j works, but 2.C.5 -> 2.C.5 + 2.C.6 + 2.C.7 does not work. When there are several categories on the target side, we now create a new category, e.g. the rule 2.C.5 -> 2.C.5 + 2.C.6 + 2.C.7 would yield a new category M_2.C.5_2.C.6_2.C.7.
  • one-to-one conversion but target category is not part of target categorisation: If a category on the target side does not exist it we need to create it in the respective categorisation in climate categories. Sometimes categories don't have a matching target category or group of categories, but the information should not be lost in the target categorisation, e.g. 4.D -> M.3.C.45.AG where M.3.C.45.AG is the sum of the agriculture-related emissions of 3.C.4 and 3.C.5. In this case the user needs to add the category manually to the categorisation in climate categories. We think this will be needed only for IPCC2006_PRIMAP.

Questions:

  • Does it make sense that a one-to-many conversion creates a new category automatically but an unknown single category needs to be added to climate categories? For example, you could write in your conversion 2.C.5 -> 2.C.5 + 2.C.6 + 2.C.7 and the new category M_2.C.5_2.C.6_2.C.7 will be created (if all categories are part of your target categorisation). At the same time the 4.D -> M.3.C.45.AG will raise an error, because M.3.C.45.AG is not part of the target categorisation.
  • At what point should we move into a real world example (FAO) instead of thinking about theoretical examples. Personally, I would rather have a simple conversion function and then develop it further as we read the FAO data.

Background (Mika):

  • We have to deal with multiple categories in the target differently. Ideas we had so far:

    • e.g. if you have categories 1.A and 1.B on the rhs, automatically add an A.1.A+1.B category (name up to debate).
    • if there is a category which if the sum of the categories on the rhs already, and the rhs categorization is total_sum, simply use the existing super-category. E.g. if the rule specifies categories 1.A and 1.B, and in the rhs categorization 1 has the children 1.A and 1.B only and is total_sum, simply use 1 as the rhs category.
    • maybe don't add automatic A. categories in the conversion, raise an error instead and the user has to add appropriate M. categories to the categorization before the conversion.
  • In this PR: start work on an actual conversion and add downscaling support as needed. Maybe we don't need anything automatic, maybe some automation is useful, but that will be much clearer once e.g. the FAO conversion is written and adding also these super-categories by hand is annoying or actually rather pleasant.

@crdanielbusch crdanielbusch self-assigned this Oct 29, 2024
Copy link

codecov bot commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.68%. Comparing base (ae21fe8) to head (898491b).
Report is 12 commits behind head on remove-weights.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
primap2/tests/test_convert.py 96.29% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           remove-weights     #284      +/-   ##
==================================================
- Coverage           96.80%   96.68%   -0.12%     
==================================================
  Files                  49       49              
  Lines                4598     4619      +21     
==================================================
+ Hits                 4451     4466      +15     
- Misses                147      153       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@JGuetschow JGuetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but some small comments and some questions.

primap2/_convert.py Outdated Show resolved Hide resolved
primap2/tests/data/BURDI_conversion.csv Outdated Show resolved Hide resolved
primap2/tests/test_convert.py Outdated Show resolved Hide resolved
# rule 7 -> 5
assert (result.pr.loc[{"category": "5"}] == 1.0 * primap2.ureg("Gg CO2 / year")).all().item()
# rule 2.F.6 -> 2.E + 2.F.6 + 2.G.1 + 2.G.2 + 2.G.4,
# rule 2.F.6 + 3.D -> 2.E + 2.F.6 + 2.G - ignored because 2.F.G already converted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring rules can be dangerous, because if we convert more than one country data coverage is likely different for the different countries and while one rule might apply for one country another rule might be needed for another country. I know this is not the place where the ignoring goes on, but I wanted to raise the point anyway.

Copy link
Collaborator Author

@crdanielbusch crdanielbusch Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Maybe that's something I can tackle in a following pull request, since we would like to add a feature to add/remove rules dynamically anyway.

About this particular case: Am I missing something or can we change the rules from

2.F.6 -> 2.E + 2.F.6 + 2.G.1 + 2.G.2 + 2.G.4
2.F.6 + 3.D -> 2.E + 2.F.6 + 2.G

to

2.F.6 -> 2.E + 2.F.6 + 2.G.1 + 2.G.2 + 2.G.4
3.D -> 2.G.3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2.G has only subcategories 1-4 it looks like it can be rewritten. But I did not make the mapping, so there might be some motivation behind the rules that I don't know of.

primap2/tests/test_convert.py Outdated Show resolved Hide resolved
Copy link
Contributor

@JGuetschow JGuetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

# rule 7 -> 5
assert (result.pr.loc[{"category": "5"}] == 1.0 * primap2.ureg("Gg CO2 / year")).all().item()
# rule 2.F.6 -> 2.E + 2.F.6 + 2.G.1 + 2.G.2 + 2.G.4,
# rule 2.F.6 + 3.D -> 2.E + 2.F.6 + 2.G - ignored because 2.F.G already converted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2.G has only subcategories 1-4 it looks like it can be rewritten. But I did not make the mapping, so there might be some motivation behind the rules that I don't know of.

primap2/_convert.py Outdated Show resolved Hide resolved
@crdanielbusch crdanielbusch merged commit 2e1363c into remove-weights Nov 7, 2024
17 checks passed
@crdanielbusch crdanielbusch deleted the 1-to-n-mapping branch November 7, 2024 15:34
@crdanielbusch crdanielbusch restored the 1-to-n-mapping branch November 7, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants