-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large experiment. #692
Large experiment. #692
Conversation
With all oracles and 10 targets each per project.
FYI @DavidKorczynski |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large |
1 similar comment
/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large |
Going to have to re-run this. Looks like the expeirment is stuck somehow. |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large |
It's still not clear to me why the last one got stuck. Hopefully #705 will help with debuggging this. |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large |
/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large |
/gcbrun exp -n oc-20241108 -i -b large-generated-20241106 -ns 4 --large |
/gcbrun exp -n oc-20241108 -i -b large-generated-20241106 -ns 4 --large |
Per https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress. For large experiments, this is likely causing problems causing our throughput to decrease as the experiment runs. From debugging with GDB on #692, it looks like a large number of worker processes are stuck waiting to report results: ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put with self._wlock: File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() ``` This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.
…#709) Per https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress. For large experiments, this is likely causing problems causing our throughput to slow to a crawl as the experiment runs, as every single benchmark experiment finishing requires this expensive calculation. From debugging with GDB on #692, it looks like a large number of worker processes are stuck waiting to report results: ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put with self._wlock: File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() ``` This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.
/gcbrun exp -n oc-20241108-fixed -b large-generated-20241106 -ns 4 --large |
(hopefully final) report link: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241108-fixed-large-generated-20241106/index.html |
Experiment finished at https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241108-fixed-large-generated-20241106/index.html! (with very impressive results). |
With all oracles and 6 targets each per project.