Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release GIL as much as possible in PySeries and PyDataFrame methods #19382

Open
itamarst opened this issue Oct 22, 2024 · 4 comments
Open

Release GIL as much as possible in PySeries and PyDataFrame methods #19382

itamarst opened this issue Oct 22, 2024 · 4 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@itamarst
Copy link
Contributor

Description

Motivation:

  1. Holding the GIL can cause deadlocks when parallelization happens: the method awaits Rayon, threads in Rayon try to acquire GIL, deadlock. See Regression in 1.10.0 : Selection of rows hangs indefinitely for a frame contaning a pl.Object series #19358 for an example, but I have seen other cases where this happens.
  2. Releasing the GIL enables more parallelism with other Python extensions.
@itamarst itamarst added the enhancement New feature or an improvement of an existing feature label Oct 22, 2024
@itamarst
Copy link
Contributor Author

itamarst commented Oct 22, 2024

The ideal would be to have the code calling into Rayon for the Polars release the GIL, so that it would only need to happen once in the code.

@itamarst
Copy link
Contributor Author

This is plausibly doable by having conditional creation of POOL in polars-core to be either rayon::ThreadPool by default, or a wrapper that releases the GIL if the python feature is enabled. So going to try taht.

@ritchie46
Copy link
Member

No, we should just release the GIL on the edge, not deep in Polars on every threadpool access. We should just go to the chore on time and then we're done.

@itamarst
Copy link
Contributor Author

itamarst commented Oct 23, 2024

I am happy to do it the manual widespread small tweaks way, it's just that it feels like a systemic invariant, and putting the burden of a systemic invariant on every single caller (and there are many!) is a recipe for mistakes. Like, someone adds 3 methods in 4 months, and now there's a new place with the problem. Whereas doing it in one place where it's always done by default gives you a systemic solution to a systemic invariant.

The performance cost would be a if unsafe { PyGILState_Check() } == 1 on every POOL.install(), or something similar.

But again, will do it manual update-all-the-functions way if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants