Add workflow/docs to validate changes in MESSAGE/MACRO core #180

khaeru · 2024-04-22T10:16:17Z

This expands on comments by @glatterf42 here and @gidden here. We update the description if there are alternate proposals or to link to PRs.

Summary

The validity of outputs from MESSAGEix-GLOBIOM scenarios (produced with the current repo message-ix-models and various branches of message_data) depend on the particular GAMS implementation of MESSAGE (and/or MACRO) in the message_ix package.
Periodically we have PRs to message_ix that modify the GAMS code, for instance Correct MACRO GDP reporting & update docs message_ix#430, Adjust calculation of PRICE_EMISSION message_ix#726, and others. Often these are to correct known bugs.
Per testing:
- message_ix contains:
  - Self-contained tests that are narrowly targeted to certain behaviours:
    - These tests use the suite of simplified models (Dantzig, Austria, Westeros) that are contained with the MESSAGE code.
    - These tests are run automatically.
    - PRs that touch the GAMS code expand or adjust these tests.
  - There are (still) some "nightly" tests that download and run MESSAGEix-GLOBIOM scenarios, and make certain checks against their outputs. These scenarios are by now quite old.
- message-ix-models has a test suite with high coverage.
- message_data branch dev has low test coverage, and other branches even lower.
For message_ix GAMS PRs, there is often a concern that the following could happen:
- The message_ix test suite, including expanded/added tests for the specific changes in a PR, passes, but
- message-ix-models or message_data tests are broken, or
- Un-tested message_data or message-ix-model behaviour (the "particular outputs" mentioned in the first bullet) changes in a way that's not obvious.

Things to do

There are a variety of things to do about this.

Manual checks

This is more or less what we have done thus far.

The message_ix PR author(s) or reviewer(s) say: I think this PR may have consequences "further up the stack" / "downstream", and describe what those impacts could be.
Someone (manually) re-runs some code or (manual) workflows or steps and makes some (manual) checks to ensure there are no unexpected impacts. To be clear, this is done in an ad hoc way every time; there is no HOWTO for these: which branches/code to use, which command(s) to run, which checks to make.
Comments and discussion on the PR either determine (a) there is no impact, and the PR is good to merge, or (b) there are impacts, and the PR is adjusted.

Practices, e.g. `pip freeze`

iiasa/message_data#546 points to another option: using pip freeze. This is currently documented under one particular "Known issue" in the message_data Install instructions, here, but we could maybe move this to a more prominent location, like the “Reproducibility” page of the message-ix-models docs.
As a general rule, we could express it like this:
If:
- A particular branch/workflow of message_data/message-ix-models is known to work with specific versions of other packages in the stack (message-ix-models, message_ix, ixmp, genno, pandas, any others) and
- either:
  - There are no tests of that message_data/message-ix-models branch/workflow; or
  - The versions of dependencies are on branches other than main, not yet merged or associated with a PR; or
  - More recent versions of the dependencies are known not to work;
Then:
- One or more requirements.txt file(s) should be created for that workflow/branch, recording the version(s) of upstream packages that are known to work; and
- The meaning of “known to work”—i.e. expected features of the model outputs—should be explicitly documented.
This allows upstream improvements to go forward. Developers working on particular project or model variant code then have a few options:
- Continue to use the ‘frozen’ versions recorded in the requirements.txt files they have created.
- Check if their code works with newer versions of dependencies; update or remove the requirements.txt.
- Add tests so that compatibility of their code and specific outputs can be automatically validated as dependencies update.

Semi-automated checks

For workflows with tests and/or a defined CLI entry-point, a workflow like transport.yaml can be established.
This workflow can be used in a semi-automated way within the “manual checks” process described above:
- The message_ix PR author(s)/reviewer(s) must still identify: This change may impact 1 or more downstream workflow(s), in [specific ways].
- They, or someone else, then follows some documented steps to trigger the CI workflow, including providing an input to force it to use message_ix (GAMS) code from the PR branch under review—rather than main or the released version.
- The results of the workflow either directly include checks for validity, or there are further documented steps to inspect the outputs for signs of undesired impacts.

The text was updated successfully, but these errors were encountered:

khaeru · 2024-04-22T10:29:52Z

So for a particular point of discussion, @gidden suggested:

…to follow @khaeru's example here and combine this with @OFR-IIASA's ENGAGE process script here.

I would suggest to run

a baseline scenario

a full century budget scenario

a peak-budget scenario

And compare a few key metrics on the output excel (e.g., co2 emissions, kyoto emissions, gdp, primary energy, final energy).

and I said:

to implement these ideas would involve, conservatively, at least 2 person-weeks of work

The reasons for this:

The transport.yaml workflow is on the dev branch, and relies on some changes/improvements made on the dev branch of message_data.
AFAIK the ssp_dev branch containing the suggested workflow does not have these changes.
I don't know if we have:
- Per "semi-automated checks" above": Tests or a defined [=documented] CLI entry-point for the "ENGAGE process script".
- Per "practices" above: a requirements.txt stating which versions of message-ix-models, message_ix, ixmp etc. it's compatible with.
  - Note that if we had this, we could merge Correct MACRO GDP reporting & update docs message_ix#430, and work with the ssp_dev branch could proceed with message_ix fixed to a certain, known-to-work version prior to the merge.
And I know for certain that we do not have:
- A CI workflow that accepts "an input to force it to use message_ix code from the branch under review—rather than main"; thus also not "documented steps" to trigger it. Someone would need to develop these.
- "Direct checks for validity" from that workflow, or "documented steps to inspect the outputs"

Hence my estimate of the time to gather the necessary info and implement these. I previously tried something similar (and spent similar time) with iiasa/message_data#411, but people did not have bandwidth to validate and review, so it was not merged.

Again, I agree it's important we move in this direction, but I think we shouldn't underestimate the work required, and until there's a concrete decision to prioritize that work we should choose pragmatic alternatives.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

khaeru commented Apr 22, 2024

khaeru commented Apr 22, 2024

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

Add workflow/docs to validate changes in MESSAGE/MACRO core #180

Comments

khaeru commented Apr 22, 2024

Summary

Things to do

Manual checks

Practices, e.g. pip freeze

Semi-automated checks

khaeru commented Apr 22, 2024

Practices, e.g. `pip freeze`