pool: fix infinite loop with --start-cycle-point #5604

oliver-sanders · 2023-06-28T12:43:48Z

Closes pool: infinite loop on startup #5603
Fixes an infinite loop which could occur where one or more task sequences terminate before the start point.
This triggered a bug whereby the runahead limit was disabled on startup causing an infinite spawning bug.

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

* Closes cylc#5603 * Fixes an infinite loop which could occur where one or more task sequences terminate before the start point. * This triggered a bug whereby the runahead limit was disabled on startup causing an infinite spawning bug.

oliver-sanders · 2023-06-28T12:46:01Z

cylc/flow/scheduler.py

            self._configure_contact()
+            await self.configure()


Unrelated: Swap the order of these two so that the contact file gets written earlier.

This reduces the window of opportunity for two schedulers to start up for the same workflow. Taking a look at the contact file fields, I can't see anything set during Scheduler.configure which is used in _configure_contact so the order should be arbitrary.

oliver-sanders · 2023-06-28T12:46:45Z

cylc/flow/task_pool.py

+                    seq.get_first_point(self.config.start_point)
+                    for seq in self.config.sequences
+                }
+                if point is not None


The problem was caused by a None value in this list. Due to the logic lower down, this caused the runahead limit point to be set to None.

* Write the contact file earlier to minimise the window of opportunity for two schedulers to start up for the same workflow.

oliver-sanders · 2023-06-28T13:42:11Z

cylc/flow/scheduler.py

@@ -1807,7 +1807,11 @@ async def _shutdown(self, reason: BaseException) -> None:
        sys.stdout.flush()
        sys.stderr.flush()

-        if self.contact_data and self.task_job_mgr:


Related to the previous, because contact_data is set a little earlier, this test needed bodging a little as we could get into the situation where the workflow failed to startup, but did configure its contact file triggering remote-tidy to run on shutdown which caused the database to be created when it shouldn't have been.

Test changed to only run when the database is present, which is a better test, as remote-tidy needs database info to function.

hjoliver

All good.

wxtim

Well that's a horrid bug!

pool: fix possible infinite loop bug

4d66301

* Closes cylc#5603 * Fixes an infinite loop which could occur where one or more task sequences terminate before the start point. * This triggered a bug whereby the runahead limit was disabled on startup causing an infinite spawning bug.

oliver-sanders added the bug Something is wrong :( label Jun 28, 2023

oliver-sanders added this to the cylc-8.2.0 milestone Jun 28, 2023

oliver-sanders self-assigned this Jun 28, 2023

oliver-sanders changed the title ~~5603~~ pool: fix infinite loop with --start-cycle-point Jun 28, 2023

oliver-sanders commented Jun 28, 2023

View reviewed changes

scheduler: write contact file before configure

0aadc90

* Write the contact file earlier to minimise the window of opportunity for two schedulers to start up for the same workflow.

oliver-sanders force-pushed the 5603 branch from 637b1b4 to 0aadc90 Compare June 28, 2023 13:37

oliver-sanders requested review from hjoliver and wxtim June 28, 2023 13:38

oliver-sanders commented Jun 28, 2023

View reviewed changes

hjoliver approved these changes Jun 28, 2023

View reviewed changes

wxtim approved these changes Jun 29, 2023

View reviewed changes

wxtim merged commit 8acb8e7 into cylc:master Jun 29, 2023

oliver-sanders deleted the 5603 branch June 29, 2023 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pool: fix infinite loop with --start-cycle-point #5604

pool: fix infinite loop with --start-cycle-point #5604

oliver-sanders commented Jun 28, 2023

oliver-sanders Jun 28, 2023

oliver-sanders Jun 28, 2023

oliver-sanders Jun 28, 2023

hjoliver left a comment

wxtim left a comment

pool: fix infinite loop with --start-cycle-point #5604

pool: fix infinite loop with --start-cycle-point #5604

Conversation

oliver-sanders commented Jun 28, 2023

oliver-sanders Jun 28, 2023

Choose a reason for hiding this comment

oliver-sanders Jun 28, 2023

Choose a reason for hiding this comment

oliver-sanders Jun 28, 2023

Choose a reason for hiding this comment

hjoliver left a comment

Choose a reason for hiding this comment

wxtim left a comment

Choose a reason for hiding this comment