Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call extend_report_with_coverage_gains in apply_async callback. #709

Merged
merged 3 commits into from
Nov 8, 2024

Conversation

oliverchang
Copy link
Collaborator

@oliverchang oliverchang commented Nov 8, 2024

Per
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress.

For large experiments, this is likely causing problems causing our throughput to slow to a crawl as the experiment runs, as every single benchmark experiment finishing requires this expensive calculation.

From debugging with GDB on
#692, it looks like a large number of worker processes are stuck waiting to report results:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put
    with self._wlock:
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()

This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

Per
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async,
callbacks should return immediately or they will otherwise block the
entire Pool from making progress.

For large experiments, this is likely causing problems causing our
throughput to decrease as the experiment runs.

From debugging with GDB on
#692, it looks like a large
number of worker processes are stuck waiting to report results:

```
(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put
    with self._wlock:
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
```

This partially reverts #566.
We instead just create a new sub-process to periodically call this in the
background to avoid blocking anything.
@oliverchang
Copy link
Collaborator Author

/gcbrun exp -n ochang-mp

@oliverchang
Copy link
Collaborator Author

@oliverchang
Copy link
Collaborator Author

Report generation seems to be working. I'm going to merge this to unblock the large experimet.

@oliverchang oliverchang merged commit be6a32a into main Nov 8, 2024
4 of 5 checks passed
@oliverchang oliverchang deleted the fix-multiprocessing branch November 8, 2024 06:10
@DavidKorczynski
Copy link
Collaborator

This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

Sounds good, I'll take a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants