Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport fix from #5712 #5927

Closed

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented Jan 16, 2024

Backport simulation mode fixes from master #5712

I found bugs in #5712 , as part of the simulation mode refactor. They (the fixes) ought to be backported.

Bug 1

To replicate:

[scheduling]
    initial cycle point = 1100
    [[graph]]
        R1 = foo

[runtime]
    [[foo]]
        execution retry delays = 'PT3S'
        [[[simulation]]]
            fail try 1 only = true
            fail cycle points = all
cylc vip --mode simulation

will give traceback, caused by there being no default value of submission retry delays in the config.

Bug 2

Add the .get fix to bug 1 and run the same workflow again. This one will go into an endless loop of resubmission. We don't want that!

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@wxtim wxtim self-assigned this Jan 16, 2024
@wxtim wxtim added the bug Something is wrong :( label Jan 16, 2024
@wxtim wxtim added this to the cylc-8.2.5 milestone Jan 16, 2024
@wxtim wxtim requested a review from MetRonnie January 16, 2024 17:26
@wxtim wxtim force-pushed the fix.backport_two_fixes_from_sim_mode_testing branch from 7fe4b1e to 79712dc Compare January 16, 2024 17:29
@MetRonnie
Copy link
Member

Would be useful to know what PR(s) these fixes are from?

@@ -778,6 +778,7 @@ def _process_message_check(

if (
itask.state(TASK_STATUS_WAITING)
and itask.tdef.run_mode == 'live'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disables logging?

Copy link
Member Author

@wxtim wxtim Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code within this section is designed to avoid problems with polling - but there will be no polling in simulation mode, and it causes simulated tasks to remain in waiting mode forever. #5712 has this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is the process message check run on incomming messages, it doesn't trigger polling?!

Copy link
Member Author

@wxtim wxtim Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disables logging?

And returns False, rather than True.

No it doesn't trigger polling.

Comment at line 799

        # Ignore messages if task has a retry lined up
        # (caused by polling overlapping with task failure)

In simulation mode this check weeds out the submitted message. Alternatively, I think that the following diff would be more reasonable for the same effect:

-
-            else:
+                return False
+            elif flag == self.FLAG_POLLED_IGNORED:
                 LOG.warning(
                     f"[{itask}] "
                     f"{self.FLAG_POLLED_IGNORED}{message}{timestamp}"
                 )
-            return False
+                return False
 
         severity = cast(int, LOG_LEVELS.get(severity, INFO))

IMO I think it would make sense to make a change which matches master, then change both later if you think this or some other approach is more sensible.

Copy link
Member Author

@wxtim wxtim Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up this change to see if it breaks owt on CI

Comment on lines +12 to +13
execution retry delays = PT2S
[[[simulation]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
execution retry delays = PT2S
[[[simulation]]]
execution retry delays = PT1S
[[[simulation]]]
default run length = PT0S

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this test was, at least partly to test the interaction of this setting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean? I'm just suggesting this to speed up the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the default run length.

changes.d/fix.5712.md Outdated Show resolved Hide resolved
@MetRonnie
Copy link
Member

Btw, neither of these changes are on #5712 or master?

@@ -979,8 +979,8 @@ def _set_retry_timers(
rtconfig = itask.tdef.rtconfig

submit_delays = (
rtconfig['submission retry delays']
or itask.platform['submission retry delays']
rtconfig.get('submission retry delays', [])
Copy link
Member Author

@wxtim wxtim Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The absence of this may make the test workflow fail, but it can depend on whats in your global cylc. Have a look at the artifacts in https://github.com/cylc/cylc-flow/actions/runs/7556705094/job/20574317445?pr=5927

Copy link
Member

@MetRonnie MetRonnie Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix (which you are saying is for bug 2), does not appear to be on master / #5712

Copy link
Member Author

@wxtim wxtim Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that it is not on master - I thought it had gone in with #5712, but it hasn't.

However, I'm claiming this is the fix to bug1: You have to fix bug 1 to see bug 2.

@@ -803,13 +803,13 @@ def _process_message_check(
f"[{itask}] "
f"{self.FLAG_RECEIVED_IGNORED}{message}{timestamp}"
)

else:
return False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MetRonnie This isn't the same as the change on master - It's the response to this comment: #5927 (comment). I wanted to see what CI made of it. This is the commit where I changed it from what's on master.

@wxtim
Copy link
Member Author

wxtim commented Jan 17, 2024

Btw, neither of these changes are on #5712 or master?

It doesn't appear to any-more, but it did. I've responded to each fix separately.

@wxtim wxtim changed the title backport two fixes from #5712 backport fix from #5712 Jan 17, 2024
@wxtim wxtim marked this pull request as draft January 17, 2024 13:49
Fix no-submission retry delays bug.

undo unecessary thing

add get methods to allow for lack of default submission time limit

clarify the changelog entry
@wxtim wxtim force-pushed the fix.backport_two_fixes_from_sim_mode_testing branch from 4682d4c to 3bae035 Compare January 17, 2024 15:25
@wxtim wxtim requested a review from MetRonnie January 17, 2024 15:25
@wxtim wxtim marked this pull request as ready for review January 18, 2024 08:35
@oliver-sanders
Copy link
Member

@wxtim, sorry, I'm confused, I don't know what issues this is attempting to fix or how it relates to #5712/master

@wxtim
Copy link
Member Author

wxtim commented Jan 22, 2024

@wxtim, sorry, I'm confused, I don't know what issues this is attempting to fix or how it relates to #5712/master

@oliver-sanders

I found bugs in #5712 , as part of the simulation mode refactor. They (the fixes) ought to be backported.

Bug 1

To replicate:

[scheduling]
    initial cycle point = 1100
    [[graph]]
        R1 = foo

[runtime]
    [[foo]]
        execution retry delays = 'PT3S'
        [[[simulation]]]
            fail try 1 only = true
            fail cycle points = all
cylc vip --mode simulation

will give traceback, caused by there being no default value of submission retry delays in the config.

Bug 2

Add the .get fix to bug 1 and run the same workflow again. This one will go into an endless loop of resubmission. We don't want that!

changes.d/fix.5712.md Outdated Show resolved Hide resolved
@@ -979,8 +979,8 @@ def _set_retry_timers(
rtconfig = itask.tdef.rtconfig

submit_delays = (
rtconfig['submission retry delays']
or itask.platform['submission retry delays']
rtconfig.get('submission retry delays', [])
Copy link
Member

@MetRonnie MetRonnie Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix (which you are saying is for bug 2), does not appear to be on master / #5712

Co-authored-by: Ronnie Dutta <[email protected]>
@wxtim wxtim marked this pull request as draft January 23, 2024 10:17
@wxtim
Copy link
Member Author

wxtim commented Jan 23, 2024

Too confusing. Will make sure it's fixed on master first.

Can't remember why I wanted it on 8.2.x

@wxtim wxtim closed this Jan 23, 2024
@wxtim wxtim removed this from the cylc-8.2.5 milestone Jan 23, 2024
@MetRonnie
Copy link
Member

Fixing bugs on 8.2.x then merging into master is the normal way of going about things. The only confusing thing was the description of this PR

@wxtim
Copy link
Member Author

wxtim commented Mar 19, 2024

Fixing bugs on 8.2.x then merging into master is the normal way of going about things. The only confusing thing was the description of this PR

Because of the changes to master in #5721 the automated copy PR may fail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :( wontfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants