-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): add level of indirection for provider.py contextvars #10525
base: main
Are you sure you want to change the base?
Conversation
|
interesting and promising |
BenchmarksBenchmark execution time: 2024-09-19 16:40:04 Comparing candidate commit 746d6ce in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 353 metrics, 47 unstable metrics. |
Could you run |
Datadog ReportBranch report: ✅ 0 Failed, 1113 Passed, 173 Skipped, 29m 51.39s Total duration (7m 10.65s time saved) |
…g/dd-trace-py into sanchda/make_contextvars_indirect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to help my understanding, would you be able to update the PR description with a small snippet example of a reassoc event that would trigger a segfault/corruption? Thanks!
Unfortunately, any |
I agree with the code, but I still can not understand why I get a different type of context within greenlets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR makes sense to me but I must admit I'm not nuanced in contextvars internals enough to give a more thoughtful review here. Is this not an issue with CPython's implementation of ContextVars, and if so, should it be fixed upstream?
I agree there could be an effort to understand what objects have lifetime issues. Though we still have to mitigate in the short term. |
I dropped this PR for a while because we shipped a patch to some affected customers and it didn't directly fix the issue we saw. However, it did reveal more information about where bad objects were being consumed. I count that as a net win. So, I'd like to summarize the benefits of this PR as the following
I don't know whether I'd call it an issue in the implementation or the design. There are a few problems here.
|
Whenever a contextvar is reassociated, it causes the underlying HAMT data structure to clone a node. This clone operation requires de-referencing stored Python objects, which can cause segmentation faults if other libraries mis-manage the reference counts for their objects, causing them to be GC'd.
This patch stores a single wrapper object into the contextvar, then manipulates a reference within that wrapper in order to propagate our desired information. In our normal testing fixture, we cause as many as 69 realloc (and clone) events in a single process (I deduce this by patching cpython itself to produce a log). With this patch, that number is down to 1, and it doesn't originate from this provider.py
I have a standalone reproduction for the noted behavior here. The repro isn't very clever about how it manages the lifetimes of GC'd objects--the issues we see in the wild are a little bit more subtle, since they don't segfault during normal scope cleanup (unlike mine).
Checklist
Reviewer Checklist