-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Specialize LOAD_ATTR
for instance and class receivers in free-threaded builds
#128164
base: main
Are you sure you want to change the base?
Conversation
Look up a unicode key in an all unicode keys object along with the keys version, assigning one if not present. We need a keys version that is consistent with presence of the key for use in the guards.
Reading the shared keys version and looking up the key need to be performed atomically. Otherwise, a key inserted after the lookup could race with reading the version, causing us to incorrectly specialize that no key shadows the descriptor.
Everything starts out disabled
* Check that the type hasn't changed across lookups * Descr is now an owned ref
…_INST_ATTR_FROM_DICT
- Use atomic load for value - Use _Py_TryIncrefCompareStackRef for incref
- Use atomics and _Py_TryIncrefCompareStackRef in _LOAD_ATTR_SLOT - Pass type version during specialization
- Check that fget is deferred - Pass tp_version
Macros should be treated as terminators when searching for the assignment target of an expression involving PyStackRef_FromPyObjectNew
All instance loads are complete!
This matches the previous implementation and causes failures to specialize due to the presence of both __getattr__ and __getattribute__ to be classified correctly (rather than being classified as being out of type versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great!
I think I unintentionally changed the object_cfunction
when cleaning it up for inclusion in CPython. It previously had a * 1.0
to coerce the value to a float and avoid the _PyObject_LookupSpecial
.
(We should deal with the _PyObject_LookupSpecial
bottlenecks, but it's a lower priority)
Python/bytecodes.c
Outdated
@@ -2222,16 +2227,36 @@ dummy_func( | |||
PyObject *attr_o; | |||
|
|||
PyDictObject *dict = _PyObject_GetManagedDict(owner_o); | |||
DEOPT_IF(hint >= (size_t)dict->ma_keys->dk_nentries); | |||
DEOPT_IF(!LOCK_OBJECT(dict)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not thrilled with the lock here -- we've tried to avoid locking in the fast path dictionary accesses -- but given the overall perf gains, we can come back to this later.
The JIT test failure is unrelated to this PR: here is an example of it failing in the same way on a different PR. The failing test, The test: cpython/Lib/test/_test_multiprocessing.py Lines 1525 to 1530 in 30efede
The function, cpython/Lib/test/_test_multiprocessing.py Lines 1489 to 1497 in 30efede
This will fail if |
Lib/test/test_opcache.py
Outdated
a = object() | ||
# a must be set to an instance that uses deferred reference | ||
# counting in free-threaded builds | ||
a = type("Foo", (object,), {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit mysterious to me. Does this do something different than what class a: [...]
does? Calling it an instance (which techically true) is a bit confusing too. I would just say a "set to an object that uses ...". It's a class object. If you need it written like this, I think a little helper function with a docstring explaining why it must be done that way would help:
def make_deferred_ref_count_obj():
return type("Foo", (object,), {})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see later there is item.a = type("Foo", (object,), {})
so you want an expression not a statement.
Regarding |
Yep: the stats were collected on the default build. I've updated the PR description to make that clear.
That's a good point. Let me collect stats for the free-threaded build and see what things look like. |
|
This PR finishes specialization for
LOAD_ATTR
in the free-threaded build by adding support for class and instance receivers.The bulk of it is dedicated to making the specialized instructions and the specialization logic thread-safe. This consists of using atomics / locks in the appropriate places, avoiding races in specialization related to reloading versions, and ensuring that the objects stored in inline caches remain valid when accessed (by only storing deferred objects). See the section on "Thread Safety" below for more details.
Additionally, making this work required a few unrelated changes to fix existing bugs or work around differences between the two builds that results from only storing deferred values (which causes in specialization failures in the free-threaded build when a value that would be stored in the cache is not deferred):
PyStackRef_FromPyObjectNew
.test_descr.MiscTests.test_type_lookup_mro_reference
to work when specialization fails (and also be a behavorial test).test_capi.test_type.TypeTests.test_freeze_meta
when running refleaks tests on free-threaded builds. Specialization failure triggers an existing bug.Single-threaded Performance
We're leaving a bit of perf on the table by only storing deferred objects: we can't specialize attribute lookups that resolve to class attributes (e.g. counters, settings). I haven't measured how much perf we're giving up, but I'd like to address that in a separate PR.
Scalability
The
object_cfunction
andpymethod
benchmarks are improved (1.4x slower -> 14.3x faster, 1.8x slower -> 13.0x faster, respectively). Other benchmarks appear unchanged.I would expect that
cmodule_function
would also improve, but it looks like the benchmark is bottlenecked on increfing theint.__floor__
method that is returned from the call to_PyObject_LookupSpecial
inmath_floor
(the incref happens in_PyType_LookupRef
, which is called byPyObject_LookupSpecial
):cpython/Modules/mathmodule.c
Line 1178 in 3879ca0
Raw numbers:
Thread Safety
Thread safety of specialized instructions is addressed in a few different ways:
_LOAD_ATTR_WITH_HINT
.Thread safety of specialization is addressed using similar techniques:
Stats
Following the instructions in the comment preceding
specialize_attr_loadclassattr
, I collected stats for the default build for both this PR and its merge base using./python -m test_typing test_re test_dis test_zlib
and compared them usingTools/scripts/summarize_stats.py
. The results forLOAD_ATTR
are nearly identical and are consistent with results from comparing the merge base against itself:--disable-gil
builds #115999