Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) #4877

Merged
merged 31 commits into from
Oct 12, 2023
Merged
Changes from 3 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
38317c3
LazyInitializeAtLeastOnceDestroyNever v1
Oct 9, 2023
e7b8c4f
Go back to using `union` as originally suggested by jbms@. The trick …
Oct 9, 2023
74ac0d9
Revert "Go back to using `union` as originally suggested by jbms@. Th…
Oct 9, 2023
109a165
Remove `#include <stdalign.h>`
Oct 9, 2023
88cec11
Suppress gcc 4.8.5 (CentOS 7) warning.
Oct 9, 2023
1ce2715
Replace comments:
Oct 9, 2023
e7be9c2
Adopt suggestion by @tkoeppe:
Oct 9, 2023
f07b28b
Add `PYBIND11_CONSTINIT`, but it does not work for the current use ca…
Oct 9, 2023
36be645
Revert "Add `PYBIND11_CONSTINIT`, but it does not work for the curren…
Oct 9, 2023
7bc16a6
Reapply "Add `PYBIND11_CONSTINIT`, but it does not work for the curre…
Oct 9, 2023
78f4e93
Add Default Member Initializer on `value_storage_` as suggested by @t…
Oct 9, 2023
6d9441d
Fix copy-paste-missed-a-change mishap in commit 88cec1152ab5576db19ba…
Oct 9, 2023
a864f21
Semi-paranoid placement new (based on https://github.com/pybind/pybin…
Oct 9, 2023
6689b06
Move PYBIND11_CONSTINIT to detail/common.h
Oct 9, 2023
d965f29
Move code to the right places, rename new class and some variables.
Oct 9, 2023
398a42c
Fix oversight: update tests/extra_python_package/test_files.py
Oct 9, 2023
ab2cf8e
Get the name right first.
Oct 9, 2023
6d5bdd8
Use `std::call_once`, `std::atomic`, following a pattern developed by…
Oct 9, 2023
4c5dd1b
Make the API more self-documenting (and possibly more easily reusable).
Oct 9, 2023
8633c5b
google-clang-tidy IWYU fixes
Oct 9, 2023
82f3efc
Rewrite comment as suggested by @tkoeppe
Oct 10, 2023
3ebd139
Update test_exceptions.cpp and exceptions.rst
Oct 10, 2023
c33712d
Fix oversight in previous commit: add `PYBIND11_CONSTINIT`
Oct 10, 2023
dcf2b92
Make `get_stored()` non-const for simplicity.
Oct 10, 2023
4557dce
Add comment regarding `KeyboardInterrupt` behavior, based heavily on …
Oct 10, 2023
704fe13
Add `assert(PyGILState_Check())` in `gil_scoped_release` ctor (simple…
Oct 10, 2023
b2f87a8
Fix oversight in previous commit (missing include cassert).
Oct 10, 2023
fad1017
Remove use of std::atomic, leaving comments with rationale, why it is…
Oct 10, 2023
8453302
Rewrite comment re `std:optional` based on deeper reflection (aka 2nd…
Oct 10, 2023
66bbc67
Additional comment with the conclusion of a discussion under PR #4877.
Oct 11, 2023
7cd9390
Small comment changes suggested by @tkoeppe.
Oct 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 33 additions & 6 deletions include/pybind11/numpy.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,14 @@

#include <algorithm>
#include <array>
#include <cassert>
#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <numeric>
#include <sstream>
#include <stdalign.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this, this is a C interop header (and we're not writing C).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 109a165

#include <string>
#include <type_traits>
#include <typeindex>
Expand All @@ -42,6 +44,30 @@ class array; // Forward declaration

PYBIND11_NAMESPACE_BEGIN(detail)

// Main author of this class: jbms@
template <typename T>
class LazyInitializeAtLeastOnceDestroyNever {
public:
template <typename Initialize>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that you expect the GIL to be held, or in any case that calls are serialized. (Otherwise the mutating accesses would cause races.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 1ce2715

T &Get(Initialize &&initialize) {
if (!initialized_) {
assert(PyGILState_Check());
// Multiple threads may run this concurrently, but that is fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit misleading. Multiple threads may call Get, but multiple threads definitely mustn't get to this point! E.g. initialized_ is not an atomic variable and it'd be a race to access it concurrently.

What is true is that multiple threads may possibly reach the inside of initialize, but that's a separate matter.

Copy link
Collaborator Author

@rwgk rwgk Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain I understand the two points correctly.

My understanding:

From getting burned in the past, I've developed the strong belief that multiple threads do in fact get EDITconcurrently(see below) into the if (! initialized_) branch.

If the first thread calls back into the Python C API (e.g. for a Python import), Python can and does in the general case release the GIL. That gives other threads the opportunity to also pass the if (! initialized_) test, so the Python initialize() really can run multiple times.

But after the initialize() call we are sure to have the GIL again, therefore there is no race flipping initialized_ from false to true. One lucky thread gets there first. (All other threads that made it to the initialize() call will write the true again, but each under the GIL.) Only then will other threads no longer get into the if (! initialized_) branch.

Would it help to change the comment to like this?

// Multiple threads may reach `initialize()` EDIT~concurrently~(see below), but each holding the GIL at that time.

Fundamentally, the "at least once" aspect here isn't great. "Guaranteed once" would be better.

My thinking:

  • Let's get the "at least once" solution stable. It almost seems to be there, and it solves the original deadlock problem.

  • Then let's look at it again (this PR maybe) to see what it takes to achieve a "guaranteed once" implementation (right here in pybind11, within the limitations of C++11).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think when I said "multiple threads can call Get" that's not entirely right. By assuming that Get is only called with the GIL held, we effectively say that only one thread can call Get at a time. What I meant, but didn't say right, was that "mulitple threads can attept to lock the GIL and call Get".

From getting burned in the past, I've developed the strong belief that multiple threads do in fact get concurrently into the if (! initialized_) branch.

I'm not sure how this would happen, and if it did, it'd definitely be a bug, since that's a race.

If the first thread calls back into the Python C API (e.g. for a Python import), Python can and does in the general case release the GIL. That gives other threads the opportunity to also pass the if (! initialized_) test, so the Python initialize() really can run multiple times.

OK, but that is not "concurrently". Only the thread holding the GIL can be accessing initialized_. In order for one thread to release the GIL it must already have completed this access. (And here "completed" can be made precise in the sense of the memory model.)

I don't think "initialize() really can run multiple times" is a precise enough statement to allow for a meaningful interpretation. It is certainly true that multiple threads can exectue inside initialize, but it is (or definitely should) not be true that multiple threads reach the expression initialize() on L56 "concurrently", in the sense of "without sequencing". If one thread is inside initialize and another thread reaches L56, then the former thread must be blocked.

Copy link
Contributor

@tkoeppe tkoeppe Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can say something more concrete and specific to this situation:

// It is possible that multiple threads execute `Get` with `initialized_` still being
// false, and thus proceed to execute `initialize()`. For this to happen, `initialize`
// has to release and reacquire the GIL internally. We accept this, and expect the
// operation to be both idempotent and cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I appear to be mixing up terms, which is critical here.

What is the correct term? (I want to edit-correct my comment.)

E.g.

I've developed the strong belief that multiple threads do in fact get concurrently CORRECT_TERM_HERE into the if (! initialized_) branch.

I want to say what you are saying.

Although all I originally wanted: leave a meaningful, terse, and correct comment in the code, just enough to give readers a good clue to follow up on in case they want to fully understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm seeing your suggested comment only now. Will adopt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 1ce2715

auto value = initialize(); // May release and re-acquire the GIL.
if (!initialized_) { // This runs with the GIL held,
new // therefore this is reached only once.
(reinterpret_cast<T *>(value_storage_)) T(std::move(value));
initialized_ = true;
}
}
return *reinterpret_cast<T *>(value_storage_);
}

private:
alignas(T) char value_storage_[sizeof(T)];
bool initialized_ = false;
};

template <>
struct handle_type_name<array> {
static constexpr auto name = const_name("numpy.ndarray");
Expand Down Expand Up @@ -206,8 +232,8 @@ struct npy_api {
};

static npy_api &get() {
static npy_api api = lookup();
return api;
static LazyInitializeAtLeastOnceDestroyNever<npy_api> api_init;
return api_init.Get(lookup);
}

bool PyArray_Check_(PyObject *obj) const {
Expand Down Expand Up @@ -643,10 +669,11 @@ class dtype : public object {
char flags() const { return detail::array_descriptor_proxy(m_ptr)->flags; }

private:
static object _dtype_from_pep3118() {
module_ m = detail::import_numpy_core_submodule("_internal");
static PyObject *obj = m.attr("_dtype_from_pep3118").cast<object>().release().ptr();
return reinterpret_borrow<object>(obj);
static object &_dtype_from_pep3118() {
static detail::LazyInitializeAtLeastOnceDestroyNever<object> imported_obj;
return imported_obj.Get([]() {
return detail::import_numpy_core_submodule("_internal").attr("_dtype_from_pep3118");
});
}

dtype strip_padding(ssize_t itemsize) {
Expand Down