backport fix from #5712 #5927

wxtim · 2024-01-16T17:17:16Z

Backport simulation mode fixes from master #5712

I found bugs in #5712 , as part of the simulation mode refactor. They (the fixes) ought to be backported.

Bug 1

To replicate:
[scheduling]
    initial cycle point = 1100
    [[graph]]
        R1 = foo

[runtime]
    [[foo]]
        execution retry delays = 'PT3S'
        [[[simulation]]]
            fail try 1 only = true
            fail cycle points = all
cylc vip --mode simulation
will give traceback, caused by there being no default value of submission retry delays in the config.

Bug 2

Add the .get fix to bug 1 and run the same workflow again. This one will go into an endless loop of resubmission. We don't want that!

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

MetRonnie · 2024-01-16T17:31:55Z

Would be useful to know what PR(s) these fixes are from?

cylc/flow/task_job_mgr.py

oliver-sanders · 2024-01-16T17:45:36Z

cylc/flow/task_events_mgr.py

@@ -778,6 +778,7 @@ def _process_message_check(

        if (
                itask.state(TASK_STATUS_WAITING)
+                and itask.tdef.run_mode == 'live'


Disables logging?

The code within this section is designed to avoid problems with polling - but there will be no polling in simulation mode, and it causes simulated tasks to remain in waiting mode forever. #5712 has this change.

But this is the process message check run on incomming messages, it doesn't trigger polling?!

Disables logging?

And returns False, rather than True.

No it doesn't trigger polling.

Comment at line 799

# Ignore messages if task has a retry lined up # (caused by polling overlapping with task failure)

In simulation mode this check weeds out the submitted message. Alternatively, I think that the following diff would be more reasonable for the same effect:

- - else: + return False + elif flag == self.FLAG_POLLED_IGNORED: LOG.warning( f"[{itask}] " f"{self.FLAG_POLLED_IGNORED}{message}{timestamp}" ) - return False + return False severity = cast(int, LOG_LEVELS.get(severity, INFO))

IMO I think it would make sense to make a change which matches master, then change both later if you think this or some other approach is more sensible.

I've pushed up this change to see if it breaks owt on CI

MetRonnie · 2024-01-17T11:56:59Z

tests/functional/modes/03-simulation/flow.cylc

+        execution retry delays = PT2S
+        [[[simulation]]]


Suggested change

execution retry delays = PT2S

[[[simulation]]]

execution retry delays = PT1S

[[[simulation]]]

default run length = PT0S

The point of this test was, at least partly to test the interaction of this setting.

Sorry, what do you mean? I'm just suggesting this to speed up the test

I don't want the default run length.

changes.d/fix.5712.md

MetRonnie · 2024-01-17T12:02:50Z

Btw, neither of these changes are on #5712 or master?

wxtim · 2024-01-17T13:43:29Z

cylc/flow/task_job_mgr.py

@@ -979,8 +979,8 @@ def _set_retry_timers(
            rtconfig = itask.tdef.rtconfig

        submit_delays = (
-            rtconfig['submission retry delays']
-            or itask.platform['submission retry delays']
+            rtconfig.get('submission retry delays', [])


The absence of this may make the test workflow fail, but it can depend on whats in your global cylc. Have a look at the artifacts in https://github.com/cylc/cylc-flow/actions/runs/7556705094/job/20574317445?pr=5927

This fix (which you are saying is for bug 2), does not appear to be on master / #5712

You are right that it is not on master - I thought it had gone in with #5712, but it hasn't.

However, I'm claiming this is the fix to bug1: You have to fix bug 1 to see bug 2.

wxtim · 2024-01-17T13:45:50Z

cylc/flow/task_events_mgr.py

@@ -803,13 +803,13 @@ def _process_message_check(
                    f"[{itask}] "
                    f"{self.FLAG_RECEIVED_IGNORED}{message}{timestamp}"
                )
-
-            else:
+                return False


@MetRonnie This isn't the same as the change on master - It's the response to this comment: #5927 (comment). I wanted to see what CI made of it. This is the commit where I changed it from what's on master.

wxtim · 2024-01-17T13:46:47Z

Btw, neither of these changes are on #5712 or master?

It doesn't appear to any-more, but it did. I've responded to each fix separately.

Fix no-submission retry delays bug. undo unecessary thing add get methods to allow for lack of default submission time limit clarify the changelog entry

oliver-sanders · 2024-01-22T10:18:50Z

@wxtim, sorry, I'm confused, I don't know what issues this is attempting to fix or how it relates to #5712/master

wxtim · 2024-01-22T16:15:14Z

@wxtim, sorry, I'm confused, I don't know what issues this is attempting to fix or how it relates to #5712/master

@oliver-sanders

I found bugs in #5712 , as part of the simulation mode refactor. They (the fixes) ought to be backported.

Bug 1

To replicate:

[scheduling]
    initial cycle point = 1100
    [[graph]]
        R1 = foo

[runtime]
    [[foo]]
        execution retry delays = 'PT3S'
        [[[simulation]]]
            fail try 1 only = true
            fail cycle points = all

cylc vip --mode simulation

will give traceback, caused by there being no default value of submission retry delays in the config.

Bug 2

Add the .get fix to bug 1 and run the same workflow again. This one will go into an endless loop of resubmission. We don't want that!

changes.d/fix.5712.md

MetRonnie · 2024-01-22T18:24:11Z

cylc/flow/task_job_mgr.py

@@ -979,8 +979,8 @@ def _set_retry_timers(
            rtconfig = itask.tdef.rtconfig

        submit_delays = (
-            rtconfig['submission retry delays']
-            or itask.platform['submission retry delays']
+            rtconfig.get('submission retry delays', [])


This fix (which you are saying is for bug 2), does not appear to be on master / #5712

Co-authored-by: Ronnie Dutta <[email protected]>

wxtim · 2024-01-23T10:18:45Z

Too confusing. Will make sure it's fixed on master first.

Can't remember why I wanted it on 8.2.x

MetRonnie · 2024-01-23T10:30:28Z

Fixing bugs on 8.2.x then merging into master is the normal way of going about things. The only confusing thing was the description of this PR

wxtim · 2024-03-19T08:21:34Z

Fixing bugs on 8.2.x then merging into master is the normal way of going about things. The only confusing thing was the description of this PR

Because of the changes to master in #5721 the automated copy PR may fail

backport two fixes from cylc#5712

67e417b

wxtim requested a review from oliver-sanders January 16, 2024 17:26

wxtim self-assigned this Jan 16, 2024

wxtim added the bug Something is wrong :( label Jan 16, 2024

wxtim added this to the cylc-8.2.5 milestone Jan 16, 2024

wxtim requested a review from MetRonnie January 16, 2024 17:26

tests

79712dc

wxtim force-pushed the fix.backport_two_fixes_from_sim_mode_testing branch from 7fe4b1e to 79712dc Compare January 16, 2024 17:29

oliver-sanders reviewed Jan 16, 2024

View reviewed changes

cylc/flow/task_job_mgr.py Outdated Show resolved Hide resolved

oliver-sanders reviewed Jan 16, 2024

View reviewed changes

wxtim requested a review from oliver-sanders January 17, 2024 10:24

MetRonnie reviewed Jan 17, 2024

View reviewed changes

wxtim commented Jan 17, 2024

View reviewed changes

wxtim changed the title ~~backport two fixes from #5712~~ backport fix from #5712 Jan 17, 2024

wxtim marked this pull request as draft January 17, 2024 13:49

Response to review.

3bae035

Fix no-submission retry delays bug. undo unecessary thing add get methods to allow for lack of default submission time limit clarify the changelog entry

wxtim force-pushed the fix.backport_two_fixes_from_sim_mode_testing branch from 4682d4c to 3bae035 Compare January 17, 2024 15:25

wxtim requested a review from MetRonnie January 17, 2024 15:25

wxtim marked this pull request as ready for review January 18, 2024 08:35

MetRonnie reviewed Jan 22, 2024

View reviewed changes

Update changes.d/fix.5712.md

f363b7f

Co-authored-by: Ronnie Dutta <[email protected]>

wxtim marked this pull request as draft January 23, 2024 10:17

wxtim closed this Jan 23, 2024

wxtim removed this from the cylc-8.2.5 milestone Jan 23, 2024

MetRonnie mentioned this pull request Mar 18, 2024

sim mode: traceback if execution retries and submission retrys unset #5935

Closed

MetRonnie added the wontfix label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport fix from #5712 #5927

backport fix from #5712 #5927

wxtim commented Jan 16, 2024 •

edited by MetRonnie

Loading

MetRonnie commented Jan 16, 2024

oliver-sanders Jan 16, 2024

wxtim Jan 17, 2024 •

edited

Loading

oliver-sanders Jan 17, 2024

wxtim Jan 17, 2024 •

edited

Loading

wxtim Jan 17, 2024 •

edited

Loading

MetRonnie Jan 17, 2024

wxtim Jan 17, 2024

MetRonnie Jan 22, 2024

wxtim Mar 19, 2024

MetRonnie commented Jan 17, 2024

wxtim Jan 17, 2024 •

edited

Loading

MetRonnie Jan 22, 2024 •

edited

Loading

wxtim Jan 23, 2024 •

edited

Loading

wxtim Jan 17, 2024

wxtim commented Jan 17, 2024

oliver-sanders commented Jan 22, 2024

wxtim commented Jan 22, 2024 •

edited

Loading

MetRonnie Jan 22, 2024 •

edited

Loading

wxtim commented Jan 23, 2024

MetRonnie commented Jan 23, 2024

wxtim commented Mar 19, 2024

backport fix from #5712 #5927

backport fix from #5712 #5927

Conversation

wxtim commented Jan 16, 2024 • edited by MetRonnie Loading

Bug 1

Bug 2

MetRonnie commented Jan 16, 2024

oliver-sanders Jan 16, 2024

Choose a reason for hiding this comment

wxtim Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

oliver-sanders Jan 17, 2024

Choose a reason for hiding this comment

wxtim Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

wxtim Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

MetRonnie Jan 17, 2024

Choose a reason for hiding this comment

wxtim Jan 17, 2024

Choose a reason for hiding this comment

MetRonnie Jan 22, 2024

Choose a reason for hiding this comment

wxtim Mar 19, 2024

Choose a reason for hiding this comment

MetRonnie commented Jan 17, 2024

wxtim Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

MetRonnie Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

wxtim Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

wxtim Jan 17, 2024

Choose a reason for hiding this comment

wxtim commented Jan 17, 2024

oliver-sanders commented Jan 22, 2024

wxtim commented Jan 22, 2024 • edited Loading

Bug 1

Bug 2

MetRonnie Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

wxtim commented Jan 23, 2024

MetRonnie commented Jan 23, 2024

wxtim commented Mar 19, 2024

wxtim commented Jan 16, 2024 •

edited by MetRonnie

Loading

wxtim Jan 17, 2024 •

edited

Loading

wxtim Jan 17, 2024 •

edited

Loading

wxtim Jan 17, 2024 •

edited

Loading

wxtim Jan 17, 2024 •

edited

Loading

MetRonnie Jan 22, 2024 •

edited

Loading

wxtim Jan 23, 2024 •

edited

Loading

wxtim commented Jan 22, 2024 •

edited

Loading

MetRonnie Jan 22, 2024 •

edited

Loading