Skip to content

Commit

Permalink
Fix bug checksum failures
Browse files Browse the repository at this point in the history
This commit fixes the bug where multiple realizations try to read checksum
at the same time, leading it to grind to a halt.
The issue was fixed by moving the forward_model_ok_lock up a level, so
that only one realization can call `verify_checksum` at a time. This
also solved the issue where jobs would be shown as stuck in pending even though
all forward model steps had completed. This was probably due to
checksums all being checked and verified at the same time, which can be
IO intensive.
  • Loading branch information
jonathan-eq committed Sep 3, 2024
1 parent f0fb2aa commit 6c8da52
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/ert/scheduler/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,9 +150,9 @@ async def run(
break

if self.returncode.result() == 0:
if self._scheduler._manifest_queue is not None:
await self._verify_checksum()
async with forward_model_ok_lock:
if self._scheduler._manifest_queue is not None:
await self._verify_checksum()
await self._handle_finished_forward_model()
break

Expand Down

0 comments on commit 6c8da52

Please sign in to comment.