Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize local roots with Val_unit to fix segfault #29

Merged
merged 1 commit into from
Jul 6, 2024

Conversation

sim642
Copy link
Contributor

@sim642 sim642 commented Jun 27, 2024

Long story (below) short: local GC roots (in this case in an array) should be initialized with Val_unit instead of 0 like the CAMLlocal* macros do it.
In the case described below, while in the middle of passing multiple out arguments, GC may be triggered, which could see improperly initialized parts of the output tuple and segfault.

Currently this PR is a draft because I've made the fix in only one place in camlidl code generation that mattered for the specific case (where it fixes the segfault). If my understanding of the OCaml runtime is correct and such fix is indeed needed, I can try to extend it to other parts of the code generation.
This PR fixes all such places in camlidl I managed to find.

Long story

In Goblint I have been fighting with spurious segfaults (goblint/analyzer#1520 (comment)) for a while. I finally managed to get a backtrace from gdb:

#0  caml_darken (v=0, p=0x7fffffffb728) at major_gc.c:285
#1  0x00005555577da51d in caml_do_local_roots_nat (f=f@entry=0x5555577dc930 <caml_darken>, bottom_of_stack=<optimized out>, last_retaddr=<optimized out>, gc_regs=<optimized out>, local_roots=<optimized out>)
    at roots_nat.c:514
#2  0x00005555577da6e5 in caml_do_roots (f=0x5555577dc930 <caml_darken>, do_globals=<optimized out>) at roots_nat.c:432
#3  0x00005555577da732 in caml_darken_all_roots_start () at roots_nat.c:357
#4  0x00005555577dd250 in start_cycle () at major_gc.c:407
#5  caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1089
#6  0x00005555577de741 in caml_gc_dispatch () at minor_gc.c:500
#7  0x00005555577de851 in caml_check_urgent_gc (extra_root=<optimized out>, extra_root@entry=1) at minor_gc.c:575
#8  0x00005555577de8e4 in caml_alloc_small_dispatch (wosize=2, flags=flags@entry=1, nallocs=nallocs@entry=1, encoded_alloc_lens=encoded_alloc_lens@entry=0x0) at minor_gc.c:524
#9  0x00005555577e00f9 in caml_alloc_small (wosize=2, tag=tag@entry=255) at alloc.c:68
#10 0x00005555577f4645 in alloc_custom_gen (ops=0x5555587e7c00 <camlidl_apron_custom_var_ptr>, bsz=<optimized out>, mem=0, max_major=1, mem_minor=0, max_minor=1) at custom.c:50
#11 0x000055555777d5d4 in camlidl_environment_ap_environment_vars ()
#12 0x0000555556dd5a0b in camlGoblint_lib__ApronDomain__join_2689 () at src/cdomains/apron/apronDomain.apron.ml:568
[...]

It looks like something to do with Apron, although that could be any heap corruption that just happens to crash from allocations from Apron bindings.

The top of this stack corresponds to this "Local C roots" code in the runtime: https://github.com/ocaml/ocaml/blob/8eb41f72ded84df884c3671734c947f612091f84/runtime/roots_nat.c#L509-L517.
After prodding around with gdb for a while, I realized that the problematic pointer (0x7fffffffb728) corresponded exactly to the second item in a two-element top-level local C root.

This comes from the following camlidl-generated code (with SEGFAULT location marked):

value camlidl_environment_ap_environment_vars(
	value _v_e)
{
  ap_environment_ptr e; /*in*/
  ap_var_t *name_of_intdim; /*out*/
  ap_var_t *name_of_realdim; /*out*/
  struct camlidl_ctx_struct _ctxs = { CAMLIDL_TRANSIENT, NULL };
  camlidl_ctx _ctx = &_ctxs;
  mlsize_t _c1;
  value _v2;
  mlsize_t _c3;
  value _v4;
  value _vresult;
  value _vres[2] = { 0, 0, };

  camlidl_ml2c_environment_ap_environment_ptr(_v_e, &e, _ctx);
  name_of_intdim = camlidl_malloc((*e).intdim * sizeof(ap_var_t ), _ctx);
  name_of_realdim = camlidl_malloc((*e).realdim * sizeof(ap_var_t ), _ctx);
  /* begin user-supplied calling sequence */

{
size_t i;
for(i=0;i<e->intdim;i++) name_of_intdim[i] = ap_var_operations->copy(e->var_of_dim[i]);
 for(i=0;i<e->realdim;i++) name_of_realdim[i] = ap_var_operations->copy(e->var_of_dim[e->intdim+i]);
 }
  /* end user-supplied calling sequence */
  Begin_roots_block(_vres, 2)
    _vres[0] = camlidl_alloc((*e).intdim, 0);
    Begin_root(_vres[0])
      for (_c1 = 0; _c1 < (*e).intdim; _c1++) {
        _v2 = camlidl_c2ml_var_ap_var_t(&name_of_intdim[_c1], _ctx); // SEGFAULT location
        caml_modify(&Field(_vres[0], _c1), _v2);
      }
    End_roots()
    _vres[1] = camlidl_alloc((*e).realdim, 0);
    Begin_root(_vres[1])
      for (_c3 = 0; _c3 < (*e).realdim; _c3++) {
        _v4 = camlidl_c2ml_var_ap_var_t(&name_of_realdim[_c3], _ctx);
        caml_modify(&Field(_vres[1], _c3), _v4);
      }
    End_roots()
    _vresult = camlidl_alloc_small(2, 0);
    Field(_vresult, 0) = _vres[0];
    Field(_vresult, 1) = _vres[1];
  End_roots()
  camlidl_free(_ctx);
  return _vresult;
}

At that point, some inlined functions allocate on the OCaml heap and trigger GC, which scans the top-level local roots in _vres. The first one is fine, because a proper array was already allocated for it. The second one is the problem: it still has value 0 from its initialization. caml_darken checks that this is a block and tries to dereference stuff around the NULL pointer.

If this were Val_unit, like CAMLlocal* would initialize it (before adding a local root), everything would be fine.

Copy link
Owner

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, thanks for the report! The fix looks good to me, with a possible improvement below that I'll look into later. Merging!

compiler/variables.ml Show resolved Hide resolved
@xavierleroy xavierleroy merged commit 346de7b into xavierleroy:master Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants