Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues due to OCaml 5 compiler bug with Atomic.get #97

Closed
polytypic opened this issue Nov 3, 2023 · 2 comments
Closed

Issues due to OCaml 5 compiler bug with Atomic.get #97

polytypic opened this issue Nov 3, 2023 · 2 comments

Comments

@polytypic
Copy link
Contributor

polytypic commented Nov 3, 2023

Today we identified a compiler bug in OCaml 5 where the compiler may incorrectly optimize away repeated Atomic.gets.

We already identified several places in my PR #83 that were affected by this compiler bug.

I'll go through the code in Saturn (in the following days) and try to identify other places affected by the bug and add them as comments here. I already identified one case, but I probably don't have time today to find more.

@polytypic
Copy link
Contributor Author

The MPMC relaxed queue has this operation:

(* [ccas] A slightly nicer CAS. Tries without taking microarch lock first. Use on indices. *)
let ccas cell seen v =
if Atomic.get cell != seen then false else Atomic.compare_and_set cell seen v

And it is used here:

let pop { array; head; mask; _ } =
let head_val = Atomic.fetch_and_add head 1 in
let index = head_val land mask in
let cell = Array.get array index in
let item = ref (Atomic.get cell) in
while Option.is_none !item || not (ccas cell !item None) do
Domain.cpu_relax ();
item := Atomic.get cell
done;
Option.get !item

Unfortunately the compiler optimizes out the Atomic.get. This means that the compiler defeats the intention of ccas to avoid writing to memory and reduce contention. You can observe that the compiler really optimizes the Atomic.get out by examining the compiler output. Specifically, consider the snippet:

.L135:
        movq    (%rdi), %rbx
.L131:
        testb   $1, %bl
        jne     .L133
.L134:
        movq    (%rdi), %rax
        cmpq    %rbx, %rax
        jne     .L133

The register %rbx has the value read from an atomic. That value is stored into a ref cell (on first line). Then the last three lines first reads the value from the ref cell back and compares it against the value in register %rbx.

So, the optimization done by the compiler defeats the optimization done by the programmer.

@polytypic
Copy link
Contributor Author

polytypic commented Nov 4, 2023

I think that the compiler bug affects more things than I initially thought.

Basically, if you have a non-recursive function (i.e. an inlineable function) that uses Atomic.get, then such an operation is no longer necessarily linearizable.

You would expect two linearizable operations on a single thread to have a timeline like:

<---x--->  <---y--->
|   |   |
|   |   +- Operation ends
|   +- Operation takes effect
+- Operation starts

But with the optimization that is no longer the case. The timelines are merged and the points (x and y) at which the operations take effect become the one and same point in time.

This can even happen across operations. You might have three operations, x < y < z, (on a single thread) and then the operation in the middle might actually take effect after the last operation as the first and last operations become merged, x=z < y.

So, basically, any operations in Saturn that are non-recursive and might return after a(ny number of) Atomic.get(s) (without some optimization foiling operations in between) are no longer necessarily linearizable.

To be safe(r), as a workaround, any non-recursive uses of Atomic.get x should probably be changed to Atomic.get (Sys.opaque_identity x).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant