Don't call extend_report_with_coverage_gains in apply_async callback. #709

oliverchang · 2024-11-08T05:38:45Z

Per
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress.

For large experiments, this is likely causing problems causing our throughput to slow to a crawl as the experiment runs, as every single benchmark experiment finishing requires this expensive calculation.

From debugging with GDB on
#692, it looks like a large number of worker processes are stuck waiting to report results:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put
    with self._wlock:
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()

This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

Per https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress. For large experiments, this is likely causing problems causing our throughput to decrease as the experiment runs. From debugging with GDB on #692, it looks like a large number of worker processes are stuck waiting to report results: ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put with self._wlock: File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() ``` This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

oliverchang · 2024-11-08T05:41:37Z

/gcbrun exp -n ochang-mp

oliverchang · 2024-11-08T05:52:15Z

Report: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-709-ochang-mp-comparison/index.html (triggered before the formatting fix)

oliverchang · 2024-11-08T06:10:52Z

Report generation seems to be working. I'm going to merge this to unblock the large experimet.

DavidKorczynski · 2024-11-08T08:28:44Z

This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

Sounds good, I'll take a look at this

oliverchang requested review from DonggeLiu and DavidKorczynski November 8, 2024 05:39

fix format

96405a8

increase interval

ecacc77

oliverchang merged commit be6a32a into main Nov 8, 2024
4 of 5 checks passed

oliverchang deleted the fix-multiprocessing branch November 8, 2024 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't call extend_report_with_coverage_gains in apply_async callback. #709

Don't call extend_report_with_coverage_gains in apply_async callback. #709

oliverchang commented Nov 8, 2024 •

edited

Loading

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

DavidKorczynski commented Nov 8, 2024

Don't call extend_report_with_coverage_gains in apply_async callback. #709

Don't call extend_report_with_coverage_gains in apply_async callback. #709

Conversation

oliverchang commented Nov 8, 2024 • edited Loading

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

DavidKorczynski commented Nov 8, 2024

oliverchang commented Nov 8, 2024 •

edited

Loading