Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc.get_objects can corrupt in-progress GC in free threading build #125859

Open
colesbury opened this issue Oct 22, 2024 · 2 comments
Open

gc.get_objects can corrupt in-progress GC in free threading build #125859

colesbury opened this issue Oct 22, 2024 · 2 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Oct 22, 2024

Bug report

Background

The free threading GC uses two queue-like data structures to keep track of objects:

  • struct worklist, which is a singly linked list that repurposes ob_tid for the linked list pointer

  • _PyObjectStack, which is effectively a dynamically sized array of PyObject*. (Implemented using a linked list of fixed size array buffers).

The struct worklist data structure is convenient because enqueueing objects doesn't require a memory allocation and so can't fail. However, an object can only be part of one "worklist" at a time, because each object has only one ob_tid field.

Bug

Other threads can run while the GC is finalizing cyclic garbage and while it's calling tp_clear() and other clean-up.

During that time, some thread may call gc.get_objects(), which can return otherwise "unreachable" objects. The implementation of _PyGC_GetObjects temporarily pushes objects to a struct worklist, including objects that might already be part of some other worklist, overwriting the linked list pointer. This essentially corrupts the state of the in-progress GC and causes assertion failures.

Proposed fix

  • We should probably exclude objects in the "unreachable" (i.e. _PyGC_BITS_UNREACHABLE) from being returned by gc.get_objects()
  • We should limit the use of struct worklist to the actual GC and use _PyObjectStack (or some other data structure) in _PyGC_GetObjects(). This reduces the risk of bugs causing an object to be added to more than one worklist.

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels Oct 22, 2024
@rruuaanng
Copy link
Contributor

Hi @colesbury, if there is no objection, I would like to take over this task. I will submit a PR in the next few days. I will take plan We should probably exclude objects in the "unreachable" (i.e. _PyGC_BITS_UNREACHABLE) from being returned by gc.get_objects().
I think using plan
We should limit the use of struct worklist to the actual GC and use _PyObjectStack (or some other data structure) in _PyGC_GetObjects(). This reduces the risk of bugs causing an object to be added to more than one worklist.
may cause the memory usage to become higher. So I choose a simple conditional return

@colesbury
Copy link
Contributor Author

@rruuaanng - thank you for the offer, but I've already started working on this

colesbury added a commit to colesbury/cpython that referenced this issue Oct 23, 2024
This fixes a crash when `gc.get_objects()` or `gc.get_referrers()` is
called during a GC in the free threading build. Switch to
`_PyObjectStack` to avoid corrupting the `struct worklist` linked list
maintained by the GC. Also, don't return objects that are frozen
(gc.freeze) or in the process of being collected to more closely match
the behavior of the default build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants