Large experiment. #692

oliverchang · 2024-11-06T06:10:27Z

With all oracles and 6 targets each per project.

With all oracles and 10 targets each per project.

oliverchang · 2024-11-06T06:10:33Z

FYI @DavidKorczynski

oliverchang · 2024-11-06T06:13:33Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large

oliverchang · 2024-11-06T06:29:09Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large

oliverchang · 2024-11-06T06:57:41Z

Report: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-06-692-oc-20241106-large-generated-20241106/index.html

oliverchang · 2024-11-07T04:33:45Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-07T04:36:50Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-07T04:41:49Z

New report link: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-07-692-oc-20241106-large-generated-20241106/index.html

oliverchang · 2024-11-08T00:58:01Z

Going to have to re-run this. Looks like the expeirment is stuck somehow.

oliverchang · 2024-11-08T00:58:05Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 --large

oliverchang · 2024-11-08T00:58:51Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T01:02:31Z

New report: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241106-large-generated-20241106/index.html

It's still not clear to me why the last one got stuck. Hopefully #705 will help with debuggging this.

oliverchang · 2024-11-08T02:27:04Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T02:35:51Z

/gcbrun exp -n oc-20241106 -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T02:50:28Z

/gcbrun exp -n oc-20241108 -i -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T02:54:59Z

/gcbrun exp -n oc-20241108 -i -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T03:12:53Z

Report: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241108-large-generated-20241106/index.html

Per https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress. For large experiments, this is likely causing problems causing our throughput to decrease as the experiment runs. From debugging with GDB on #692, it looks like a large number of worker processes are stuck waiting to report results: ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put with self._wlock: File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() ``` This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

…#709) Per https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.apply_async, callbacks should return immediately or they will otherwise block the entire Pool from making progress. For large experiments, this is likely causing problems causing our throughput to slow to a crawl as the experiment runs, as every single benchmark experiment finishing requires this expensive calculation. From debugging with GDB on #692, it looks like a large number of worker processes are stuck waiting to report results: ``` (gdb) py-bt Traceback (most recent call first): File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__ return self._semlock.__enter__() File "/usr/lib/python3.11/multiprocessing/queues.py", line 376, in put with self._wlock: File "/usr/lib/python3.11/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() ``` This partially reverts #566. We instead just create a new sub-process to periodically call this in the background to avoid blocking anything.

oliverchang · 2024-11-08T06:11:43Z

/gcbrun exp -n oc-20241108-fixed -b large-generated-20241106 -ns 4 --large

oliverchang · 2024-11-08T06:21:46Z

(hopefully final) report link: https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241108-fixed-large-generated-20241106/index.html

oliverchang · 2024-11-10T08:15:53Z

Experiment finished at https://llm-exp.oss-fuzz.com/Result-reports/ofg-pr/2024-11-08-692-oc-20241108-fixed-large-generated-20241106/index.html! (with very impressive results).

FYI @DavidKorczynski @DonggeLiu

oliverchang added 7 commits November 6, 2024 13:00

Support cached images for cloud runner.

b97fae2

fix

1be0c64

override var

b09d8de

fix docker image existance check

028f7cf

Set real_project

a55143e

debug

2aa1238

Large experiment.

016e581

With all oracles and 10 targets each per project.

fix large exp config

6f7e1a8

Base automatically changed from cloud-cached to main November 6, 2024 10:15

DavidKorczynski mentioned this pull request Nov 6, 2024

infra: enable building projects using cached images google/oss-fuzz#12597

Merged

oliverchang added 2 commits November 7, 2024 15:15

Merge remote-tracking branch 'origin/main' into large-exp-20241106

5c5e354

trim benchmarks a little

d827e1b

Merge branch 'main' into large-exp-20241106

dbba505

Merge branch 'main' into large-exp-20241106

84d1898

oliverchang mentioned this pull request Nov 8, 2024

Try setting maxtasksperchild=1 #707

Merged

Merge branch 'main' into large-exp-20241106

0f35e90

oliverchang mentioned this pull request Nov 8, 2024

Don't call extend_report_with_coverage_gains in apply_async callback. #709

Merged

Merge branch 'main' into large-exp-20241106

6919c4e

oliverchang closed this Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large experiment. #692

Large experiment. #692

oliverchang commented Nov 6, 2024 •

edited

Loading

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 10, 2024

Large experiment. #692

Large experiment. #692

Conversation

oliverchang commented Nov 6, 2024 • edited Loading

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 6, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 7, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 8, 2024

oliverchang commented Nov 10, 2024

oliverchang commented Nov 6, 2024 •

edited

Loading