-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reload: wait for pending tasks to submit and pause the workflow #5592
Conversation
ebe5dd7
to
acdb441
Compare
self.process_workflow_db_queue() # see #5593 | ||
|
||
# flush out preparing tasks before attempting reload | ||
self.reload_pending = 'waiting for pending tasks to submit' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets used in the workflow status so progress will be indicated in the GUI.
script = """ | ||
cylc reload "${CYLC_WORKFLOW_ID}" | ||
# wait for the command to complete | ||
cylc__job__poll_grep_workflow_log 'Reload completed' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal change: This test was timing dependent because it didn't wait for reload to complete.
) | ||
] | ||
) | ||
async def test_illegal_config_load( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered by the new test.
To test, you can simulate a slow remote-init and a slow config load by jamming in sleeps like so: diff --git a/cylc/flow/scheduler.py b/cylc/flow/scheduler.py
index d3cab5225..28bd79ac8 100644
--- a/cylc/flow/scheduler.py
+++ b/cylc/flow/scheduler.py
@@ -1096,6 +1096,7 @@ class Scheduler:
LOG.info("Reloading the workflow definition.")
try:
config = self.load_flow_file(is_reload=True)
+ sleep(20)
except (ParsecError, CylcConfigError) as exc:
if cylc.flow.flags.verbosity > 1:
# log full traceback in debug mode
diff --git a/cylc/flow/scripts/remote_init.py b/cylc/flow/scripts/remote_init.py
index 7706cecfa..9c52ae8af 100755
--- a/cylc/flow/scripts/remote_init.py
+++ b/cylc/flow/scripts/remote_init.py
@@ -60,7 +60,8 @@ def get_option_parser() -> COP:
@cli_function(get_option_parser)
def main(parser, options, install_target, rund, *dirs_to_be_symlinked):
-
+ from time import sleep
+ sleep(20)
remote_init(
install_target,
rund, In the GUI you should see the following when you reload:
|
acdb441
to
d5f331f
Compare
Got a couple of functional tests to fix, but otherwise the functionality should be good. One question, in order to keep the scheduler responsive whilst we're waiting for pending tasks to submit, I added This is probably fine, the point of this loop is just to wait for the workflow to enter a safe state for reload to be performed. It's probably even possible to trigger tasks at this stage if you want them to be submitted with the pre-reload config! But there's the potential for unplanned interactions so we may want to consider whitelisting this somehow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Code read
Functional Review
- Try this with a variety of workflows
- Check that this works the same way with
cylc vr
andcylc reinstall
- Check what reloading with these changes looks like in the GUI.
- Try reloading a change which breaks the workflow.
- Try to get tasks in horrible states to break the logic at line 1418 of
scheduler.py
The two test failures have flagged a legitimate issue with reload causing duplicate stall events which I'll sort out soon, will require a line or two of change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some trivial, and possibly annoying, comments. Code LGTM. I'll do some more brutal functional testing on Monday, but I'm pretty confident it's all good 🎉
A user-facing string explaining why the workflow was paused if | ||
helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user-facing string explaining why the workflow was paused if | |
helpful. | |
A user-facing string explaining why the workflow was paused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I'm not going to do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? Isn't the "if helpful" redundant? (We wouldn't put it there if it not helpful!)
Whether to log anything in the event the workflow is not | ||
paused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether to log anything in the event the workflow is not | |
paused. | |
Whether to log anything if the workflow is not paused. |
Using a super-basic workflow I tried:
Is that what you want? I think it's dangerous/misleading...And this behaviour seems to be matched in the GUI. |
Yes:
|
d5f331f
to
e7545d8
Compare
Fixed the test failures, the problem was that |
e7545d8
to
bfebb85
Compare
* Closes cylc#5107 * Reload now waits for pending tasks to submit before attempting to reload the config itself. * Reload now also puts the workflow into the paused state during the reload process. This doesn't actually achieve anything as the reload command is blocking in the main loop, but it does help to communicate that the workflow will not de-queue or submit and new tasks during this process. * The workflow status message is now updated to reflect the reload progress.
* The reload code was spread around four places: * Scheduler.command_reload_workflow * Scheduler.main_loop * Pool.set_do_reload * Pool.reload_taskdefs * This commit co-locates the Scheduler/Pool parts and turns them into a single synchronous operation (no main-loop moving parts) to simplify the pathway. * This removes the need for the `do_reload` pool flag.
bfebb85
to
820d337
Compare
Co-authored-by: Tim Pillinger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy
This should also close #5579 by removing any potential for interaction between reload and preparing tasks. But note, I've not actually reproduced #5579.
Needs extensive functional testing.
The second commit straightens out the reload logic which is currently split between four routines:
With
command_reload_workflow
callingset_do_reload
somewhere in the middle of the routine, then handing back over to the main loop which does its part before callingreload_taskdefs
. I think this is just Cylc 7 hangover, removing the main-loop interaction from reload should reduce the chance of unexpected state changes occurring during the process.Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.