Make the specializing interpreter thread-safe in `--disable-gil` builds #115999

swtaarrs · 2024-02-27T19:08:29Z

Feature or enhancement

Proposal:

In free-threaded builds, the specializing adaptive interpreter needs to be made thread-safe. We should start with a small PR to simply disable it in free-threaded builds, which will be correct but will incur a performance penalty. Then we can work out how to properly support specialization in a free-threaded build.

These two commits from Sam's nogil-3.12 branch can serve as inspiration:

There are two primary concerns to balance while implementing this functionality on main:

Runtime overhead: There should be no performance impact on normal builds, and minimal performance impact on single-threaded code running in free-threaded builds.
Reducing code duplication/divergence: We should come up with a design that is minimally disruptive to ongoing work on the specializing interpreter. It should be easy for other devs to keep the free-threaded build working without having to know too much about it.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

Specialization Families

Linked PRs

The text was updated successfully, but these errors were encountered:

brandtbucher · 2024-02-27T19:36:18Z

(subscribing myself)

…aded builds (#116013) For now, disable all specialization when the GIL might be disabled.

swtaarrs · 2024-03-01T17:12:51Z

This is now a performance (rather than correctness) issue for free-threaded builds, so I'm going to focus on more time-sensitive issues for a while.

…e-threaded builds (python#116013) For now, disable all specialization when the GIL might be disabled.

corona10 · 2024-04-20T06:30:07Z

@swtaarrs Out of curiosity, is there any progress or plan for this issue?

Fidget-Spinner · 2024-04-24T17:22:28Z

@corona10 I'm planning to work on this after I get the deferred reference stack in. However, there are no concrete plans as of now. I'm really happy for you or anyone else to propose a design for the specializing interpreter with free-threaded safety!

corona10 · 2024-04-25T01:23:43Z

@Fidget-Spinner cc @swtaarrs
Nice. I was also thinking about how to make it thread-safe in a seamless way since I agree with @swtaarrs.
But there is no good idea yet to solve the issue right now since I am not in a full-time position for this task :)
So it will be happy to see you have a good plan.
(I am curious that we can make them per-thread mechanism...)

By the way, in the short term, can we enable the specializer to be used only for the main thread if we can not solve the issue before 3.13 is released?
We can easily track the performance degradation between the default build because most of pyperformance benchmark are based on a single thread :)

Fidget-Spinner · 2024-04-26T13:09:59Z

@corona10 for 3.13, I think generally we're focusing on scalability across multicore rather than single-threaded perf for 3.13. It's a bit too near to feature freeze for me to feel safe re-enabling specialization at this point. There are a lot of unsolved problems still even with specialization only on the main thread. Consider the following:

Two threads sharing the same code object, A and B. A is main thread.
Thread B is in LOAD_ATTR_METHOD_WITH_VALUES's action (after guards, it is in the middle of loading from a method)
Thread A is in LOAD_ATTR_METHOD_WITH_VALUES's guard, but then deopts, meaning the method reference is now most likely dead/invalid.
Thread B loads from LOAD_ATTR_METHOD_WITH_VALUE's method, it is now holding a dangling pointer.
Thread B pushes dangling pointer to the stack. Everything crashes.

I'm reading a few papers to get some inspiration and also looking at how CRuby and other runtimes deal with this. Will post back when I have an actual plan.

Stop the world when invalidating function versions The tier1 interpreter specializes `CALL` instructions based on the values of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1 interpreter uses function versions to verify that the attributes of a function during execution of a specialization match those seen during specialization. A function's version is initialized in `MAKE_FUNCTION` and is invalidated when any of the critical function attributes are changed. The tier1 interpreter stores the function version in the inline cache during specialization. A guard is used by the specialized instruction to verify that the version of the function on the operand stack matches the cached version (and therefore has all of the expected attributes). It is assumed that once the guard passes, all attributes will remain unchanged while executing the rest of the specialized instruction. Stopping the world when invalidating function versions ensures that all critical function attributes will remain unchanged after the function version guard passes in free-threaded builds. It's important to note that this is only true if the remainder of the specialized instruction does not enter and exit a stop-the-world point. We will stop the world the first time any of the following function attributes are mutated: - defaults - vectorcall - kwdefaults - closure - code This should happen rarely and only happens once per function, so the performance impact on majority of code should be minimal. Additionally, refactor the API for manipulating function versions to more clearly match the stated semantics.

…ython#124997) Stop the world when invalidating function versions The tier1 interpreter specializes `CALL` instructions based on the values of certain function attributes (e.g. `__code__`, `__defaults__`). The tier1 interpreter uses function versions to verify that the attributes of a function during execution of a specialization match those seen during specialization. A function's version is initialized in `MAKE_FUNCTION` and is invalidated when any of the critical function attributes are changed. The tier1 interpreter stores the function version in the inline cache during specialization. A guard is used by the specialized instruction to verify that the version of the function on the operand stack matches the cached version (and therefore has all of the expected attributes). It is assumed that once the guard passes, all attributes will remain unchanged while executing the rest of the specialized instruction. Stopping the world when invalidating function versions ensures that all critical function attributes will remain unchanged after the function version guard passes in free-threaded builds. It's important to note that this is only true if the remainder of the specialized instruction does not enter and exit a stop-the-world point. We will stop the world the first time any of the following function attributes are mutated: - defaults - vectorcall - kwdefaults - closure - code This should happen rarely and only happens once per function, so the performance impact on majority of code should be minimal. Additionally, refactor the API for manipulating function versions to more clearly match the stated semantics.

…ation for `BINARY_OP` (python#123926) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.

…bytecode change (python#126440) Fix the gdb pretty printer in the face of --enable-shared by delaying the attempt to load the _PyInterpreterFrame definition until after .so files are loaded.

…thongh-126450) - The specialization logic determines the appropriate specialization using only the operand's type, which is safe to read non-atomically (changing it requires stopping the world). We are guaranteed that the type will not change in between when it is checked and when we specialize the bytecode because the types involved are immutable (you cannot assign to `__class__` for exact instances of `dict`, `set`, or `frozenset`). The bytecode is mutated atomically using helpers. - The specialized instructions rely on the operand type not changing in between the `DEOPT_IF` checks and the calls to the appropriate type-specific helpers (e.g. `_PySet_Contains`). This is a correctness requirement in the default builds and there are no changes to the opcodes in the free-threaded builds that would invalidate this.

…ython#126414) Introduce helpers for (un)specializing instructions Consolidate the code to specialize/unspecialize instructions into two helper functions and use them in `_Py_Specialize_BinaryOp`. The resulting code is more concise and keeps all of the logic at the point where we decide to specialize/unspecialize an instruction.

…ythongh-126498)

* Enable specialization of CALL_KW * Fix bug pushing frame in _PY_FRAME_KW `_PY_FRAME_KW` pushes a pointer to the new frame onto the stack for consumption by the next uop. When pushing the frame fails, we do not want to push the result, `NULL`, to the stack because it is not a valid stackref. This works in the default build because `PyStackRef_NULL` and `NULL` are the same value, so the `PyStackRef_XCLOSE()` in the error handler ignores it. In the free-threaded build the values are not the same; `PyStackRef_XCLOSE()` will attempt to decref a null pointer.

…d builds (#127711) We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds: _CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version. _LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.

…re-attr' into pythongh-115999-integrate-attr

…integrate-attr

…27737)

* Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.

…pythongh-127737)

…thongh-127838) * Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.

gh-115999: Update test_opcace to test with nested method

swtaarrs added type-feature A feature request or enhancement topic-free-threading labels Feb 27, 2024

swtaarrs self-assigned this Feb 27, 2024

swtaarrs mentioned this issue Feb 27, 2024

PEP 703 -- Making the Global Interpreter Lock Optional in CPython #108219

Open

swtaarrs mentioned this issue Feb 27, 2024

gh-115999: Disable the specializing adaptive interpreter in free-threaded builds #116013

Merged

colesbury pushed a commit that referenced this issue Mar 1, 2024

gh-115999: Disable the specializing adaptive interpreter in free-thre…

339c8e1

…aded builds (#116013) For now, disable all specialization when the GIL might be disabled.

swtaarrs removed their assignment Mar 1, 2024

woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024

pythongh-115999: Disable the specializing adaptive interpreter in fre…

fad3958

…e-threaded builds (python#116013) For now, disable all specialization when the GIL might be disabled.

adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024

pythongh-115999: Disable the specializing adaptive interpreter in fre…

96d4fbd

…e-threaded builds (python#116013) For now, disable all specialization when the GIL might be disabled.

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythongh-115999: Disable the specializing adaptive interpreter in fre…

d31dae8

…e-threaded builds (python#116013) For now, disable all specialization when the GIL might be disabled.

mpage self-assigned this Aug 8, 2024

bedevere-app bot mentioned this issue Sep 10, 2024

gh-115999: Implement thread-local bytecode and enable specialization for BINARY_OP #123926

Merged

mpage added a commit to mpage/cpython that referenced this issue Sep 13, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

d34adeb

mpage added a commit to mpage/cpython that referenced this issue Sep 17, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

c2d8693

mpage added a commit to mpage/cpython that referenced this issue Sep 25, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

aa330b1

mpage added a commit to mpage/cpython that referenced this issue Sep 26, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

7dfd1ca

mpage added a commit to mpage/cpython that referenced this issue Sep 28, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

dd144d0

mpage added a commit to mpage/cpython that referenced this issue Sep 30, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

b6380de

This was referenced Oct 3, 2024

gh-115999: Refactor LOAD_GLOBAL specializations to avoid reloading {globals, builtins} keys #124953

Merged

gh-115999: Stop the world when invalidating function versions #124997

Merged

mpage added a commit to mpage/cpython that referenced this issue Oct 5, 2024

Merge branch 'main' into pythongh-115999-thread-local-bytecode

adb59ef

mpage added a commit to mpage/cpython that referenced this issue Oct 7, 2024

Merge branch 'main' into pythongh-115999-refactor-load-global

e68f7e0

This was referenced Dec 6, 2024

gh-115999: Specialize loading attributes from modules in free-threaded builds #127711

Merged

gh-115999: Specialize CALL_KW in free-threaded builds #127713

Merged

corona10 added a commit to corona10/cpython that referenced this issue Dec 8, 2024

pythongh-115999: Enable BINARY_SUBSCR_GETITEM for free-threaded build

bb23e4b

bedevere-app bot mentioned this issue Dec 8, 2024

gh-115999: Enable BINARY_SUBSCR_GETITEM for free-threaded build #127737

Merged

picnixz pushed a commit to picnixz/cpython that referenced this issue Dec 8, 2024

pythongh-115999: Move specializer test from test_dis to test_opcache (p…

393f5b5

…ythongh-126498)

mpage added a commit to mpage/cpython that referenced this issue Dec 9, 2024

Merge branch 'main' into pythongh-115999-load-attr-module

550f955

bedevere-app bot mentioned this issue Dec 11, 2024

gh-115999: Specialize STORE_ATTR in free-threaded builds. #127838

Merged

mpage added a commit to mpage/cpython that referenced this issue Dec 11, 2024

Merge branch 'main' into pythongh-115999-tlbc-call-kw

aef38b1

mpage added a commit to mpage/cpython that referenced this issue Dec 12, 2024

Merge branch 'main' into pythongh-115999-load-attr-module

cae5561

mpage added a commit to mpage/cpython that referenced this issue Dec 13, 2024

Merge branch 'main' into pythongh-115999-load-attr-module

648c0c8

mpage added a commit to mpage/cpython that referenced this issue Dec 14, 2024

Merge remote-tracking branch 'nascheme/pythongh-115999-specialize-sto…

0e2a208

…re-attr' into pythongh-115999-integrate-attr

mpage added a commit to mpage/cpython that referenced this issue Dec 14, 2024

Merge remote-tracking branch 'Yhg1s/compare_op' into pythongh-115999-…

b7e9a16

…integrate-attr

corona10 added a commit that referenced this issue Dec 19, 2024

gh-115999: Enable BINARY_SUBSCR_GETITEM for free-threaded build (gh-1…

48c70b8

…27737)

nascheme added a commit to nascheme/cpython that referenced this issue Dec 19, 2024

Merge branch 'main' into pythongh-115999-specialize-store-attr

14ae6b4

nascheme added a commit to nascheme/cpython that referenced this issue Dec 19, 2024

Merge branch 'main' into pythongh-115999-specialize-store-attr

06a7baf

mpage added a commit to mpage/cpython that referenced this issue Dec 20, 2024

Merge branch 'main' into pythongh-115999-load-attr

fa02260

bedevere-app bot mentioned this issue Dec 21, 2024

gh-115999: Specialize LOAD_ATTR for instance and class receivers in free-threaded builds #128164

Open

corona10 added a commit to corona10/cpython that referenced this issue Dec 22, 2024

pythongh-115999: Update test_opcace to test with nested method

91e350a

bedevere-app bot mentioned this issue Dec 22, 2024

gh-115999: Update test_opcache to test with nested method #128166

Merged

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Dec 23, 2024

pythongh-115999: Enable BINARY_SUBSCR_GETITEM for free-threaded build (…

c83b6bf

…pythongh-127737)

corona10 added a commit that referenced this issue Dec 23, 2024

gh-115999: Update test_opcache to test with nested method (gh-128166)

c5b0c90

gh-115999: Update test_opcace to test with nested method

mpage added a commit to mpage/cpython that referenced this issue Dec 23, 2024

Merge branch 'main' into pythongh-115999-load-attr-instance-merged

9755562

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the specializing interpreter thread-safe in `--disable-gil` builds #115999

Make the specializing interpreter thread-safe in `--disable-gil` builds #115999

swtaarrs commented Feb 27, 2024 •

edited by bedevere-app bot

Loading

Tasks

brandtbucher commented Feb 27, 2024

swtaarrs commented Mar 1, 2024

corona10 commented Apr 20, 2024

Fidget-Spinner commented Apr 24, 2024

corona10 commented Apr 25, 2024

Fidget-Spinner commented Apr 26, 2024

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Comments

swtaarrs commented Feb 27, 2024 • edited by bedevere-app bot Loading

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Specialization Families

Tasks

Linked PRs

brandtbucher commented Feb 27, 2024

swtaarrs commented Mar 1, 2024

corona10 commented Apr 20, 2024

Fidget-Spinner commented Apr 24, 2024

corona10 commented Apr 25, 2024

Fidget-Spinner commented Apr 26, 2024

Make the specializing interpreter thread-safe in `--disable-gil` builds #115999

Make the specializing interpreter thread-safe in `--disable-gil` builds #115999

swtaarrs commented Feb 27, 2024 •

edited by bedevere-app bot

Loading