-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruling out pthread_cond_signal
failure to wake up pthread_cond_wait
#963
Comments
I've seen that bug locally, but not in CI as far as I can recall: ocaml-multicore/eio#700 https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1899800/comments/5 says:
|
IIUC, that applies only to glibc on Ubuntu (?) and the version of glibc on Debian, for example, does not (necessarily) have the fix. The lockups I've observed have definitely mostly happened on Debian and mostly on the ARM machines with OCaml 4.14, but I do recall seeing lockups on other Debian machines. Assuming that the cause is the For background, I had some tests that used systhreads extensively in OCaml 4 and those locked up occasionally. I have since then reduced the use of systhreads on OCaml 4 and the lockups seem to happen less frequently, but I still see them. |
I think the only way to apply the fix is to rebuild libc (which sort of defeats the purpose of testing code on a given distro if we then fundamentally change it), and it appears that Ubuntu is the only distro that patched it. Which is a bit disappointing given the issue has been open since 2020, and has even had a TLA+ proof for the fix in 2023 (although that is not the fix that Ubuntu applied, I think Ubuntu only applied the one liner workaround, not the more complicated fix). Although if your distro is old enough to have a libc older than 2.27 then you're not affected. Maybe we could convince Debian to take the same patch that Ubuntu has, at least until upstream glibc gets around to review and apply the patches? |
Could we perhaps switch the non-x86_64 builders to Ubuntu though? That should give us more coverage on other architectures when looking for bugs in the OCaml runtime or multicore libraries, while avoiding the known libc bug. |
The workers themselves are all running Ubuntu. When you see something like |
I think for the bug what matters is the version of glibc, i.e. the version of the container, not the host OS. Although there are quite a few Ubuntu docker images for other architectures: |
This seems right to me
so, iiuc, the upshot is that there isn't a fix we can reasonably do in ocaml-ci for this. But could it be that some of the multicore-specific CIs may want tweaks to provide a testing environment for your purposes that doesn't produce noise from flawed standard dependencies @polytypic? |
It is an interesting situation. I read through the thread here and, IIUC, there is a mention that the "mitigation" patch used in Ubuntu still has issues. So, to put it a bit provocatively, at the moment, Linux and OCaml are incompatible. I don't think that using a version of glibc with this one bug fixed would completely defeat the purpose of testing on a given Linux distribution — |
Would |
There is a known bug which causes a
pthread_cond_signal
to fail to wake up apthread_cond_wait
. The OCaml runtime and libraries that come with OCaml use those and are known to be affected by this bug (search for "OCaml" in the issue).I observe some multicore OCaml stuff I'm developing locking up (some test hangs and is then killed after an hour) on some of the machines (e.g. debian 12 ARM with OCaml 4.14) occasionally. This kind of symptom could be explained by that
pthread_cond_signal
bug, but it could also point to some issue in my code. Knowing that it cannot be thepthread_cond_signal
bug would help a lot.It would be great if we could make sure that all the OCaml CI machines have this bug fixed/patched. This way people working on multicore OCaml stuff could perhaps sleep a little better. 😅
The text was updated successfully, but these errors were encountered: