-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-123516: Improve JIT memory consumption by invalidating cold executors #124443
GH-123516: Improve JIT memory consumption by invalidating cold executors #124443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change, thanks!
@@ -1289,6 +1289,11 @@ _Py_HandlePending(PyThreadState *tstate) | |||
_Py_RunGC(tstate); | |||
} | |||
|
|||
if((breaker & _PY_EVAL_JIT_INVALIDATE_COLD_BIT) != 0) { | |||
_Py_unset_eval_breaker_bit(tstate, _PY_EVAL_JIT_INVALIDATE_COLD_BIT); | |||
_Py_Executors_InvalidateCold(tstate->interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought I had while reading through... I don't think any of the stuff that manipulates the linked list of executors is thread-safe currently. Probably not a problem for this PR, but in the future we'll probably want to go through and add _PyEval_StopTheWorld
and _PyEval_StartTheWorld
calls in optimize.c
.
This PR succeeds #123402 and reworks the approach to use the eval breaker for the invalidation call instead of executor creation or gc (thanks @markshannon!). In experimenting, I tried a couple of different thresholds of 10k, 100k, and 1 million runs. The benchmarks for 100k and 1 million were most promising. Here are some relevant stats for quick reference:
100k
- -2.4% memory
- Roughly the same performance-wise
1 million
After chatting with @brandtbucher, I've opted to open this PR with the 100k threshold. One thing to note is that we are potentially a little too liberal in invalidating executors with this threshold, but with the lack of movement in performance and a more substantial decrease in memory usage, it seemed justified. We can continue to iterate here and consider making this tunable in the future.