Issues due to OCaml 5 compiler bug with `Atomic.get` #97

polytypic · 2023-11-03T18:26:57Z

Today we identified a compiler bug in OCaml 5 where the compiler may incorrectly optimize away repeated Atomic.gets.

We already identified several places in my PR #83 that were affected by this compiler bug.

I'll go through the code in Saturn (in the following days) and try to identify other places affected by the bug and add them as comments here. I already identified one case, but I probably don't have time today to find more.

The text was updated successfully, but these errors were encountered:

polytypic · 2023-11-03T18:37:19Z

The MPMC relaxed queue has this operation:

saturn/src_lockfree/mpmc_relaxed_queue.ml

Lines 113 to 115 in 49e4e59

    
           (* [ccas] A slightly nicer CAS. Tries without taking microarch lock first. Use on indices. *) 
        
           let ccas cell seen v = 
        
             if Atomic.get cell != seen then false else Atomic.compare_and_set cell seen v

And it is used here:

saturn/src_lockfree/mpmc_relaxed_queue.ml

Lines 125 to 134 in 49e4e59

    
           let pop { array; head; mask; _ } = 
        
             let head_val = Atomic.fetch_and_add head 1 in 
        
             let index = head_val land mask in 
        
             let cell = Array.get array index in 
        
             let item = ref (Atomic.get cell) in 
        
             while Option.is_none !item || not (ccas cell !item None) do 
        
               Domain.cpu_relax (); 
        
               item := Atomic.get cell 
        
             done; 
        
             Option.get !item

Unfortunately the compiler optimizes out the Atomic.get. This means that the compiler defeats the intention of ccas to avoid writing to memory and reduce contention. You can observe that the compiler really optimizes the Atomic.get out by examining the compiler output. Specifically, consider the snippet:

.L135:
        movq    (%rdi), %rbx
.L131:
        testb   $1, %bl
        jne     .L133
.L134:
        movq    (%rdi), %rax
        cmpq    %rbx, %rax
        jne     .L133

The register %rbx has the value read from an atomic. That value is stored into a ref cell (on first line). Then the last three lines first reads the value from the ref cell back and compares it against the value in register %rbx.

So, the optimization done by the compiler defeats the optimization done by the programmer.

polytypic · 2023-11-04T12:32:59Z

I think that the compiler bug affects more things than I initially thought.

Basically, if you have a non-recursive function (i.e. an inlineable function) that uses Atomic.get, then such an operation is no longer necessarily linearizable.

You would expect two linearizable operations on a single thread to have a timeline like:

<---x--->  <---y--->
|   |   |
|   |   +- Operation ends
|   +- Operation takes effect
+- Operation starts

But with the optimization that is no longer the case. The timelines are merged and the points (x and y) at which the operations take effect become the one and same point in time.

This can even happen across operations. You might have three operations, x < y < z, (on a single thread) and then the operation in the middle might actually take effect after the last operation as the first and last operations become merged, x=z < y.

So, basically, any operations in Saturn that are non-recursive and might return after a(ny number of) Atomic.get(s) (without some optimization foiling operations in between) are no longer necessarily linearizable.

To be safe(r), as a workaround, any non-recursive uses of Atomic.get x should probably be changed to Atomic.get (Sys.opaque_identity x).

This was referenced Nov 4, 2023

Compiler bug in OCaml 5 affecting Queue.snapshot robur-coop/miou#14

Closed

Repeated Atomic.get optimized away incorrectly ocaml/ocaml#12713

Closed

polytypic closed this as completed Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues due to OCaml 5 compiler bug with `Atomic.get` #97

Issues due to OCaml 5 compiler bug with `Atomic.get` #97

polytypic commented Nov 3, 2023 •

edited

Loading

polytypic commented Nov 3, 2023

polytypic commented Nov 4, 2023 •

edited

Loading

Issues due to OCaml 5 compiler bug with Atomic.get #97

Issues due to OCaml 5 compiler bug with Atomic.get #97

Comments

polytypic commented Nov 3, 2023 • edited Loading

polytypic commented Nov 3, 2023

polytypic commented Nov 4, 2023 • edited Loading

Issues due to OCaml 5 compiler bug with `Atomic.get` #97

Issues due to OCaml 5 compiler bug with `Atomic.get` #97

polytypic commented Nov 3, 2023 •

edited

Loading

polytypic commented Nov 4, 2023 •

edited

Loading