-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix discovered issues and improve MESSAGEix-Materials #201
Conversation
9bbc70e
to
ea92d38
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #201 +/- ##
=======================================
- Coverage 52.2% 51.9% -0.3%
=======================================
Files 141 142 +1
Lines 11346 11250 -96
=======================================
- Hits 5925 5843 -82
+ Misses 5421 5407 -14
|
description: >- | ||
Does not include Saint Helena, Ascension and Tristan da Cunha. | ||
In France-> Mayotte, Reunion data is present post 2010. | ||
IEA MAP: AFRICATOT - (DZA+EGY+LBY+MAR+SDN+SSD+TUN)->IIASA_AFRICA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what this description line and the following line mean, or how they are actually/intended to be used in the code:
- 'AFRICATOT' (for instance) is not defined anywhere.
- 'IIASA_AFRICA' (for instance) is not defined anywhere.
- This added file is not mentioned in doc/pkg-data/node.rst.
- The header comment seems to be exactly the same as the one in R12.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was shared by @SiddharthJoshi-Git together with a copy of the preprocessed IEA EWEB 2023. It is used to group and aggregate IEA EWEB data to R12 regions for calibration purposes in MESSAGEix-Materials. Since the file is similar to R12.yaml with main differences in the child
values, does it make more sense to merge it into the existing R12.yaml? Otherwise we can also rename the file since R12_SSP_V1.yaml is maybe not very indicative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used to group and aggregate IEA EWEB data to R12 regions for calibration purposes in MESSAGEix-Materials.
Can you point to where specifically this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this happens in material/data_util.py. The top-level functions are modify_industry_demand()
and add_new_ind_hist_act()
, which are called in model.material.build()
. They read a parquet file containing the EWEB data, map it to R12 and adjust existing demand
and historical_activity
parametrization. At the moment there are still some hardcoded parts in the whole procedure and inefficient parts (e.g. reading the parquet file more then once instead of passing it to the respective function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. I recall this was alluded to in a MESSAGE meeting recently—that some code, somewhere, is being used to process the IEA EWEB into a Parquet file, including some spatial manipulations that are not in .tools.iea.web
. Not having seen that code, I can't guess at what it contains, but from the code in message_ix_models.model.material
/this branch, it seems some of these labels like "AFRICATOT" are introduced or used by it.
There are several ways in which this appears to duplicate existing message_ix_models
code that has tests and docs and is more concise. We need to be careful to avoid this. However, to avoid expanding scope too much, I won't insist that they be removed/reworked this PR, but will mention them as further TODOs in #194 or subsidiary to it.
As for what to do in this PR:
Per your link, I see the file is used in two calls to this function
message-ix-models/message_ix_models/model/material/data_util.py
Lines 921 to 950 in 36fc249
def map_iea_db_to_msg_regs(df_iea: pd.DataFrame, reg_map_fname: str) -> pd.DataFrame: | |
""" | |
Parameters | |
---------- | |
df_iea | |
df containing the IEA energy balances data set | |
reg_map_fname | |
name of file used for mapping countries to MESSAGEix regions | |
Returns | |
------- | |
object | |
""" | |
file_path = package_data_path("node", reg_map_fname) | |
yaml_data = read_yaml_file(file_path) | |
if "World" in yaml_data.keys(): | |
yaml_data.pop("World") | |
r12_map = {k: v["child"] for k, v in yaml_data.items()} | |
r12_map_inv = {k: v[0] for k, v in invert_dictionary(r12_map).items()} | |
df_iea = df_iea.merge( | |
pd.DataFrame.from_dict( | |
r12_map_inv, orient="index", columns=["REGION"] | |
).reset_index(), | |
left_on="COUNTRY", | |
right_on="index", | |
).drop("index", axis=1) | |
return df_iea |
I am guessing here because of the lack of tests, but it appears most of the contents of R12_SSP_V1.yaml
are not used. Rather, only the child:
field is used to populate a "REGION" column in a data frame that has a "COUNTRY" column. Is that correct?
If so, this can be simplified a lot using existing features. For example, see:
iea-weo-region: Africa |
I suspect we can add another ‘annotation’ named, e.g.
material-iea-eweb-region
and containing these values like "CHINAREG". Then the file can be read with get_codes()
, and MappingAdapter
can be used to do the relabeling.
I'd be happy to push some commits to make these changes; although again it will be difficult for me to ensure that I preserve behaviour without tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just pushed the modifications, including one added test. Please have a look and try them out with the private data file.
There was one ambiguous point I couldn't figure out from what's visible on the branch. The existing R12.yaml file contains, for example:
R12_NAM:
child: [CAN, GUM, PRI, SPM, USA, VIR]
…whereas the added file contained:
R12_NAM:
child: [CAN, USA]
It's not clear why this is different. Does the private data file contain data for e.g. "GUM" that must be omitted for a correct result? Or does the added YAML file only reflect a subset of codes that actually appear in that private data file?
In the latter case, note that it's harmless to keep "GUM → R12_NAM" in the constructed mapping; since there will be no observations with COUNTRY=GUM, the entry simply goes unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I manually tested with the private data file now. It seems that parametrization is identical except for R12_EEU. The private data file contains entries, where COUNTRY == "YUG", until 2021. This means there is a double counting of all countries that formerly belonged to Yugoslavia. According to the IEA EWEB documentation, IEA reports Yugoslavia under two codes: YUGOND and MYUGO. But I don't know how YUG data was derived in the private data file. So we need to either revisit the pre-processing of the EWEB data or modify the mapping in R12.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the check. The annotations I added to R12.yaml include:
material-region: ["*", SCG, YUG, KOSOVO]
The file added on the branch has:
child: [ALB, BGR, BIH, CZE, EST, HRV, HUN, LTU, LVA, MKD, MNE, POL, ROU, SRB, SVK, SVN, KOSOVO]
Curiously the latter does not include any of YUG, YUGOND, or MYUGO, which really makes me scratch my head.
But anyway, with the goal of identical behaviour, in the former, "*" refers to the list of children for the same node, which I see already includes "YUG". So you could try to edit to:
material-region: ["*", SCG, KOSOVO]
(don't duplicate, although with the way the dict is constructed, I don't think this will make any change)
or, most verbose:
material-region: [ALB, BGR, BIH, CZE, EST, HRV, HUN, LTU, LVA, MKD, MNE, POL, ROU, SRB, SVK, SVN, KOSOVO]
…which simply copies the value from the added file—without clarifying why the Yugoslavia-related codes are omitted.
Could you please try those two settings and see what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying the second suggested setting leads to the correct parametrization. So this would be a viable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, good to confirm. If this works, then perhaps you can git rebase -i main
your branch and put d
/drop
beside the commit that added the R12_SSP_V1.yaml file (if it was a single commit. If it was added with other files, try e
/edit
and adjust the commit so the file is not added).
It may also help to:
- Put a comment in R12.yaml file directly above that line, linking back to this thread or with a summary, in case others are also confused.
- Expand the header comment in R12.yaml to inform people where they could find the R12_SSP_V1.yaml from which we got that information.
@@ -184,7 +184,7 @@ def broadcast_reduced_df(df, par_name): | |||
|
|||
for col in yr_col_inp: | |||
yr_cols_codes[col] = literal_eval(df_bc_node[col].values[0]) | |||
broadcast_years(df_bc_node, yr_col_out, yr_cols_codes, col) | |||
df_bc_node = broadcast_years(df_bc_node, yr_col_out, yr_cols_codes, col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function called here has no tests or docstring. Does it do something that can't be done with message_ix_models.util.broadcast()
? If so, what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed that top level function now to a imo better name and added docstrings to all functions in this module. broadcast_years()
and broadcast_nodes()
use message_ix_models.util.broadcast()
to unpivot the rows that are stored in methanol_techno_economic.xlsx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments inline.
Some other points:
- Files like message_ix_models/data/material/iea_mappings/*.csv:
- should be stored as text, not using Git LFS.
- contain the same phrase "iea_mappings" in their path and filename; this could be simplified.
- do not appear to be mentioned anywhere in the documentation.
message_data.tools.utilities.get_optimization_years()
does the same thing asScenarioInfo.Y
. The latter is computed only once, and has tests and documentation. The former is not tested, and hits the database every time it is called. I would prefer that we take the opportunity, when moving code tomessage_ix_models
, to use the better-maintained solution. Similar applies toget_historical_years()
.
9f64c64
to
4347136
Compare
Add missing files in new directories and replace private_data_path with package_data_path Materials build still had some overseen dependencies on message_data
Apply sorting to MESSAGEix sets based on number of dimensions ixmp 3.9.0 automatically sorts set lists generated by .set_list() alphabetically. When adding new items to multidimensional sets they need to be added to the basic sets first that do not depend on other sets. e.g extending 'cat_addon' requires adding both technologies in set 'technology' first.
Move documentation of base year demand literature to yaml file
- Use top-level CLI options in material-ix commands - Replace underscore with hyphens in command names - Replace print with log
- Add .data_util.get_region_map(). - Use message_ix_model built-ins to read the node codelist. - Add/expand docstrings, inline comments. - Drop file name argument from map_iea_db_to_msg_regs(); adjust usage.
- Define all children in material-region field for R12_EEU explicitly - Extend documentation in R12.yaml
- Move build function to build.py - Move CLI commands to separate submodule - Remove model.material.build.apply_spec() - Use model.build.apply_spec()
4347136
to
c831913
Compare
At the moment our biggest csv file is ~400 KB. Is it okay to store all of them as text?
Just to be clear: This means we should refactor the functions that use these helpers and pass a |
Generally yes. Git will automatically compress these files. Some further nuances:
|
Okay thanks for confirming! I pushed the suggested changes. I think all the requested changes and comments are addressed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion with @macflo8, this is good to merge.
In particular, the codecov/patch check failure is acceptable: per the PR description, that will be addressed via tests to be added in a subsequent PR.
message_data
dependencies (some.yaml
and.csv
input files).utils/compat/message_data
..model.material
#194 (comment).material-ix
grouputils/compat/message_data
to using tested equivalents ofmessage-ix-models
click
options frommaterial-ix
commands that are already implemented inmessage_ix_models/cli.py
Details:
Building MESSAGEix-Materials on an existing MESSAGEix-GLOBIOM scenario still requires a few external data file that have previously been stored in a private repository. The respective files are migrated to
message-ix-models/data
and data paths in the modules updated accordingly in this PR. The files located inutil/compat/message_data
migrated in the course of #188 contain some incorrect import statements, which are fixed in this PR.The following files are needed in the build and were therefore added:
The build however still requires proprietary data from IEA Extended Energy Balances to calibrate historical industry activity. The path to the required file can be specified with a newly added CLI option
--iea_data_path
in thebuild
command.All input data for MESSAGEix-Materials is now packaged in a
.tar.gz
file, that can be fetched withmix-models fetch MESSAGEix-Materials
.When executing
mix-models material-ix --help
only the basic comments related to building/solving/reporting are shownHow to review
Run build/solve/report with
material-ix
CLI commands and confirm correct execution of each task.PR checklist
[ ] Continuous integration checks all ✅Not possible due to missing tests[ ] Add or expand tests; coverage checks both ✅will be handled via Test and improve.model.material
#194