Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and fix runahead computation. #5893

Merged
merged 13 commits into from
Jan 9, 2024
Merged

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Dec 19, 2023

Fix bug revealed on the forum: the runahead limit point erroneously advances when the limit is specified as a time interval and future triggers are present.

Taking a closer look at the code, I realized it still contained nasty vestiges of the old "max active cycle points" computation, when we were (or at least thought we were) counting active points, rather than all possible points, beyond the base point. Probably my bad 😬

It's considerably simpler and more efficient on this branch.

[UPDATE] close #5825

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@hjoliver hjoliver added the bug label Dec 19, 2023
@hjoliver hjoliver added this to the cylc-8.2.4 milestone Dec 19, 2023
@hjoliver hjoliver self-assigned this Dec 19, 2023
@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 20, 2023

LGTM.

I think this might actually close #5825.

I've transferred the test from that branch here: hjoliver#36

changes.d/5893.fix Outdated Show resolved Hide resolved

if compat_mode == 'compat-mode':
# Cylc 7 does not count failed tasks in runahead computation.
assert int(str(schd.pool.runahead_limit_point)) == 5
Copy link
Member Author

@hjoliver hjoliver Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different from your original version of the test @oliver-sanders. Cylc 7 ignores failed tasks when computing the limit, that should include submit-failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should include submit-failed

It would appear that Cylc 7 didn't include submit-failed.

E.G. In this workflow 2/foo will submit-fail:

[scheduling]
    max active cycle points = 3
    cycling mode = integer
    initial cycle point = 1
    [[dependencies]]
        [[[P1]]]
            graph = foo
        [[[R1/2]]]
            graph = foo[-P1] => foo

[runtime]
    [[foo]]
        script = """
            if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then
                cylc broadcast "${CYLC_SUITE_NAME}" -n foo -p 2 -s '[environment]foo=$(if'
                sleep 5
            fi
        """

In Cylc 7 the workflow stalls because the submit dependence prevents 3/foo from being spawned. If you insert 3/foo, the workflow will stall on the runahead limit:

Screenshot from 2024-01-04 11-38-57

Whereas on this branch the workflow will run on (so no stall):

Screenshot from 2024-01-04 11-41-30

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should include submit-failed

It would appear that Cylc 7 didn't include submit-failed.

Notice that I said it should include submit-failed - I didn't actually check that it did. So thanks for checking!

OK I'll revert that to reproduce the (incorrect) Cylc 7 behavior.

assert int(str(schd.pool.runahead_limit_point)) == 4 # no change

# mark cycle 1 as complete
# (via task message so the task gets removed before runahead compute)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new extra-simplified method assumes completed tasks have been removed already, which doesn't happen if you artificially call state_reset in a test.

tests/integration/test_task_pool.py Outdated Show resolved Hide resolved
tests/integration/test_task_pool.py Outdated Show resolved Hide resolved
@wxtim wxtim self-requested a review January 4, 2024 10:40
Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review so far...

  • Wrote simple examples to check that both bugs were fixed.
  • Read the code.
  • Checked that the tests covered the two bug cases.

},
}
}
point = lambda point: IntegerPoint(str(int(point)))
Copy link
Member

@wxtim wxtim Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8:E371 is not happy with this. I'm not sure how important it is in this context, but consider...

PEP8 Reference

Suggested change
point = lambda point: IntegerPoint(str(int(point)))
def point(point): return IntegerPoint(str(int(point)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 won't be happy with that inline function either.

IMO it's ok, especially for a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it as-is then.

Copy link
Member

@wxtim wxtim Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 won't be happy with that inline function either.

You sure? PEP8 clearly shows an example:

# Correct:
def f(x): return 2*x

and flake 8 seems OK with it.

> echo 'def point(point): return 42'> foo.py
> flake8 foo.py
...

Don't care, tis a test.

cylc/flow/task_pool.py Outdated Show resolved Hide resolved

if compat_mode == 'compat-mode':
# Cylc 7 does not count failed tasks in runahead computation.
assert int(str(schd.pool.runahead_limit_point)) == 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should include submit-failed

It would appear that Cylc 7 didn't include submit-failed.

E.G. In this workflow 2/foo will submit-fail:

[scheduling]
    max active cycle points = 3
    cycling mode = integer
    initial cycle point = 1
    [[dependencies]]
        [[[P1]]]
            graph = foo
        [[[R1/2]]]
            graph = foo[-P1] => foo

[runtime]
    [[foo]]
        script = """
            if [[ $CYLC_TASK_CYCLE_POINT -eq 1 ]]; then
                cylc broadcast "${CYLC_SUITE_NAME}" -n foo -p 2 -s '[environment]foo=$(if'
                sleep 5
            fi
        """

In Cylc 7 the workflow stalls because the submit dependence prevents 3/foo from being spawned. If you insert 3/foo, the workflow will stall on the runahead limit:

Screenshot from 2024-01-04 11-38-57

Whereas on this branch the workflow will run on (so no stall):

Screenshot from 2024-01-04 11-41-30

},
}
}
point = lambda point: IntegerPoint(str(int(point)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake8 won't be happy with that inline function either.

IMO it's ok, especially for a test.

hjoliver and others added 11 commits January 8, 2024 16:23
In back-compat mode the cycle point time zone is assumed to be local, whereas in normal mode it is assumed to be UTC. There was contamination of the point parse caching where the time zone would carry over from tests of back-compat vs normal mode
* We were using the pytest-env plugin to run the tests in a non-UTC time zone.
* The pytest-env plugin doesn't work with pytest-xdist so this was being ignored.
* Also due to the way TZ support works in Python, changing the env var whilst Python is running may or may not result in changes.
Add compat mode and not compat mode versions of the future triggers bug test.
@hjoliver
Copy link
Member Author

hjoliver commented Jan 8, 2024

@wxtim - did you intend to push the commit "wip" 238a790 to this branch?

It isn't obviously related, and it's causing a couple of test failures.

[UPDATE] I've rebased the branch, and pushed a revert commit for your "wip" one to avoid losing your work. If not needed can rebase again to remove both.

@wxtim
Copy link
Member

wxtim commented Jan 8, 2024

@wxtim - did you intend to push the commit "wip" 238a790 to this branch?

No, that's a mistake. I think I've been stung by the behavour of gh rather than git. gh set up a local branch tracking your branch rather than a copy at github/wxtim which I'd have PR'd.

WIP is investigatory work on another ticket and you definately don't want it.

@wxtim wxtim merged commit b64fcd6 into cylc:8.2.x Jan 9, 2024
32 of 36 checks passed
@hjoliver hjoliver deleted the fix-runahead-compute branch January 9, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants