Spurious segfaults in sv-benchmarks runs #1520

sim642 · 2024-06-21T08:27:46Z

For a while now, sv-benchmarks runs have been showing a number of spurious SEGMENTATION FAULT results which are on different tasks on every run. In our latest run, it was 157/31498 tasks (0.5%). These make BenchExec diff tables particularly annoying to look at: one has to filter out those statuses from both columns to focus on actual differences.

In SV-COMP 2024 results we had zero of them, so something must have happened. AFAIK, segfaults in OCaml can only be due to misuse of Obj or bad C stubs (which we use for float domains). I opened #1371 at one point but convinced myself that our stubs were actually fine. Maybe they still are not? But they were such also at SV-COMP 2024, so the same issues should've popped up then as well.

Their spurious nature makes them extremely difficult to debug as well. Since it's not any specific task, one cannot just bisect and run that under gdb. Rather, at every bisect step, a full sv-benchmarks run needs to be done. And that only gives the first problematic commit at best, there's still no way to get the full backtrace from gdb while running Goblint under BenchExec.

The text was updated successfully, but these errors were encountered:

michael-schwarz · 2024-06-21T08:31:28Z

Iirc, StackOverflows that happen while executing native code (e.g., in Apron) also produce segfaults, right?
I always assumed that these explain most of those cases?

sim642 · 2024-06-21T08:38:50Z

That's true and that can even be true without things like Apron. If the C stack happens to exactly overflow during some OCaml runtime implementation (in C), then that could also be the case.

However, I'm still doubtful that it's what's happening here. If a particular program causes a stack overflow (whether in OCaml or C at the time), this should be consistent and repeatable. We don't invoke any randomness, so Goblint should be deterministic at what it does and how deep each recursion goes. It shouldn't be that one run somehow goes into deeper recursion than the second identical run.

sim642 · 2024-06-27T09:52:29Z

#1522 doesn't seem to have helped with this, sadly. avsm/ocaml-yaml#75 fixed a segfault in YAML with large values, but that doesn't seem to have been the case here.

sim642 · 2024-06-27T10:00:53Z

I just noticed something in my latest ghost witness generation runs (https://goblint.cs.ut.ee/results/188-concurrency-witness-simpl/table-generator.table.html#/table):
there are no segfaults in the non-relational-only runs and there are segfaults in the relational runs. So maybe this is related to Apron somehow.

EDIT: I managed to coincidentally reproduce this under gdb where bt shows the following:

#0  caml_darken (v=0, p=0x7fffffffb728) at major_gc.c:285
#1  0x00005555577da51d in caml_do_local_roots_nat (f=f@entry=0x5555577dc930 <caml_darken>, bottom_of_stack=<optimized out>, last_retaddr=<optimized out>, gc_regs=<optimized out>, local_roots=<optimized out>)
    at roots_nat.c:514
#2  0x00005555577da6e5 in caml_do_roots (f=0x5555577dc930 <caml_darken>, do_globals=<optimized out>) at roots_nat.c:432
#3  0x00005555577da732 in caml_darken_all_roots_start () at roots_nat.c:357
#4  0x00005555577dd250 in start_cycle () at major_gc.c:407
#5  caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1089
#6  0x00005555577de741 in caml_gc_dispatch () at minor_gc.c:500
#7  0x00005555577de851 in caml_check_urgent_gc (extra_root=<optimized out>, extra_root@entry=1) at minor_gc.c:575
#8  0x00005555577de8e4 in caml_alloc_small_dispatch (wosize=2, flags=flags@entry=1, nallocs=nallocs@entry=1, encoded_alloc_lens=encoded_alloc_lens@entry=0x0) at minor_gc.c:524
#9  0x00005555577e00f9 in caml_alloc_small (wosize=2, tag=tag@entry=255) at alloc.c:68
#10 0x00005555577f4645 in alloc_custom_gen (ops=0x5555587e7c00 <camlidl_apron_custom_var_ptr>, bsz=<optimized out>, mem=0, max_major=1, mem_minor=0, max_minor=1) at custom.c:50
#11 0x000055555777d5d4 in camlidl_environment_ap_environment_vars ()
#12 0x0000555556dd5a0b in camlGoblint_lib__ApronDomain__join_2689 () at src/cdomains/apron/apronDomain.apron.ml:568
[...]

Although this doesn't necessarily mean Apron did something wrong. It just means that when Apron wanted to allocate on the OCaml heap, the GC ran and something in the OCaml heap has been screwed up at some earlier point, such that GC now crashes following an invalid pointer.

sim642 · 2024-06-27T14:51:43Z

After a day of debugging, I think I found camlidl of all things to be the cause: xavierleroy/camlidl#29. Apron uses camlidl to generate the bindings.

michael-schwarz · 2024-06-27T15:24:32Z

Wow, that is quite the find! There would have been many other things I would have suspected before turning to camlidl....

sim642 · 2024-06-27T16:23:32Z

I found some hand-written instances in Apron itself as well: antoinemine/apron#112.

sim642 · 2024-06-28T14:21:48Z

I now did a sv-benchmarks run with the camlidl and Apron fixes on top of #1522: all segfaults now disappeared.

sim642 · 2024-10-09T09:35:16Z

Since this has been fixed in upstream repositories, but not yet released to opam, I've added opam pins for them for now. There's nothing more for us to do about this issue, other than remove the pins eventually.

sim642 added bug type-safety Type-safety improvements sv-comp SV-COMP (analyses, results), witnesses labels Jun 21, 2024

sim642 added this to the SV-COMP 2025 milestone Jun 21, 2024

sim642 mentioned this issue Jun 27, 2024

Initialize local roots with Val_unit to fix segfault xavierleroy/camlidl#29

Merged

sim642 self-assigned this Jun 27, 2024

sim642 added a commit that referenced this issue Oct 9, 2024

Pin unreleased camlidl and apron for stability (issue #1520)

a5bedf6

sim642 closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious segfaults in sv-benchmarks runs #1520

Spurious segfaults in sv-benchmarks runs #1520

sim642 commented Jun 21, 2024

michael-schwarz commented Jun 21, 2024

sim642 commented Jun 21, 2024

sim642 commented Jun 27, 2024

sim642 commented Jun 27, 2024 •

edited

Loading

sim642 commented Jun 27, 2024

michael-schwarz commented Jun 27, 2024

sim642 commented Jun 27, 2024

sim642 commented Jun 28, 2024

sim642 commented Oct 9, 2024

Spurious segfaults in sv-benchmarks runs #1520

Spurious segfaults in sv-benchmarks runs #1520

Comments

sim642 commented Jun 21, 2024

michael-schwarz commented Jun 21, 2024

sim642 commented Jun 21, 2024

sim642 commented Jun 27, 2024

sim642 commented Jun 27, 2024 • edited Loading

sim642 commented Jun 27, 2024

michael-schwarz commented Jun 27, 2024

sim642 commented Jun 27, 2024

sim642 commented Jun 28, 2024

sim642 commented Oct 9, 2024

sim642 commented Jun 27, 2024 •

edited

Loading