Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytests crash on windows free-threaded interpreter #4685

Closed
ngoldbaum opened this issue Nov 5, 2024 · 13 comments · Fixed by #4690
Closed

pytests crash on windows free-threaded interpreter #4685

ngoldbaum opened this issue Nov 5, 2024 · 13 comments · Fixed by #4690

Comments

@ngoldbaum
Copy link
Contributor

Bug Description

See this CI run on my fork of PyO3: https://github.com/ngoldbaum/pyo3/actions/runs/11694025826/job/32566839238

Relevant part of the CI output:

nox > pytest 
  Fatal Python error: Aborted
  Current thread 0x00001388 (most recent call first):
    File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
    File "<frozen importlib._bootstrap_external>", line 1316 in create_module
    File "<frozen importlib._bootstrap>", line 813 in module_from_spec
    File "<frozen importlib._bootstrap>", line 921 in _load_unlocked
    File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pyo3_pytests\__init__.py", line 1 in <module>
    File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
    File "<frozen importlib._bootstrap_external>", line 1022 in exec_module
    File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
    File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
    File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
    File "<frozen importlib._bootstrap>", line 1310 in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
    File "D:\a\pyo3\pyo3\pytests\tests\test_awaitable.py", line 4 in <module>
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\assertion\rewrite.py", line 178 in exec_module
    File "<frozen importlib._bootstrap>", line 935 in _load_unlocked
    File "<frozen importlib._bootstrap>", line 1331 in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 1360 in _find_and_load
    File "<frozen importlib._bootstrap>", line 1387 in _gcd_import
    File "C:\hostedtoolcache\windows\Python\3.13.0\x64-freethreaded\Lib\importlib\__init__.py", line 88 in import_module
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\pathlib.py", line 566 in import_path
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\python.py", line 538 in importtestmodule
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\python.py", line 586 in _getobj
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\python.py", line 317 in obj
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pytest_asyncio\plugin.py", line 644 in _patched_collect
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\runner.py", line 387 in collect
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\runner.py", line 342 in from_call
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\runner.py", line 389 in pytest_make_collect_report
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\runner.py", line 564 in collect_one_node
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 826 in _collect_one_node
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 942 in genitems
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 947 in genitems
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 947 in genitems
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 800 in perform_collect
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 337 in pytest_collection
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 326 in _main
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 273 in wrap_session
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\main.py", line 320 in pytest_cmdline_main
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_callers.py", line 103 in _multicall
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\pluggy\_hooks.py", line 513 in __call__
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\config\__init__.py", line 175 in main
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Lib\site-packages\_pytest\config\__init__.py", line 198 in console_main
    File "D:\a\pyo3\pyo3\pytests\.nox\test\Scripts\pytest.EXE\__main__.py", line 7 in <module>
    File "<frozen runpy>", line 88 in _run_code
    File "<frozen runpy>", line 198 in _run_module_as_main
  nox > Command pytest  failed with exit code 3
  nox > Session test failed.
  nox > Command nox -f pytests/noxfile.py failed with exit code 1
Error: `nox -f pytests/noxfile.py` failed

Steps to Reproduce

Install free-threaded python on Windows, build PyO3 against that Python and run nox -f test-py.

Backtrace

See details above.

Your operating system and version

Windows (see github actions details)

Your Python version (python --version)

3.13.0t

Your Rust version (rustc --version)

1.82.0

Your PyO3 version

main branch

How did you install python? Did you use a virtualenv?

python.org installer

Additional Info

Not sure what's happening but I wanted to note this in an issue so I can come back and debug it on my Windows development environment.

@ngoldbaum ngoldbaum added this to the 0.23 milestone Nov 5, 2024
ngoldbaum added a commit to ngoldbaum/pyo3 that referenced this issue Nov 5, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 6, 2024
* run free-threaded CI on MacOS and Windows

* build against 3.13 instead of 3.13-dev on build-full

* skip windows build, see #4685
@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Nov 6, 2024

I managed to trigger this locally and get it to happen with the visual studio debugger attached. Visual studio doesn't see debug symbols in the rust code, however, I can see that the crash is happening inside of PyInterpreterState_Get because somehow the thread calling that function isn't attached to the Python runtime, so the thread state is NULL.

Fatal Python error: PyInterpreterState_Get: the function must be called with the GIL held, after Python initialization and before Python finalization, but the GIL is released (the current Python thread state is NULL)
Python runtime state: unknown

It looks like there's only one place we call that function, here:

pyo3/src/impl_/pymodule.rs

Lines 109 to 110 in faa644a

let current_interpreter =
unsafe { ffi::PyInterpreterState_GetID(ffi::PyInterpreterState_Get()) };

So somehow we're inside of a function that accepts a py token, but we're not attached to the runtime.

@ngoldbaum
Copy link
Contributor Author

Here's the stack trace I see in the debugger

image

@ngoldbaum
Copy link
Contributor Author

I found PyO3/maturin#2213 (comment) which has a workaround for the debugging symbols. I've attached a screenshot of the stack trace (I don't see a way to copy/paste it, sorry!)

image

@ngoldbaum
Copy link
Contributor Author

Ultimately this is all happening inside of module import, so I think that means we have to have a valid thread state, and on 3.13 the GIL is enabled too. That said, the py token comes from calling GILGuard::assume so we never explicitly ensure the GIL is acquired.

I'm using the python.org CPython build and it has compiler optimizations turned on so it's not straightforward for me to inspect the values of variables in the interpreter in the debugger.

If there's some data corruption happening that messes with the Python thread state, I think I'd need to set a watchpoint somewhere in CPython internals, so I think my next step might be to compile a debug build of Python on my Windows machine and then try debugging with that.

ping @colesbury, you may be interseted in this. I haven't actually proven it but this feels a little like a Windows-specific CPython bug.

@colesbury
Copy link

That's very strange. It might be worth checking if this is an issue specific to the Windows installer and try the test with @astral-sh/setup-uv.

@ngoldbaum
Copy link
Contributor Author

So it turns out you can get the python.org installer to install debug binaries as well. Interestingly there is a valid reference to a thread state at the top of the call stack, but then at the bottom of the call stack it's NULL somehow. I tried setting a watchpoint on the value of the thread_local variable inside pystate.c storing the thread state, but I couldn't figure out how to create a watchpoint in the same way as I'm used to with gdb and lldb.

Unfortunately the uv Python doesn't work either, it dies trying to create a virtualenv:

(goldbaum) PS C:\Users\goldbaum\pyo3> python -m nox -s test-py
nox > Running session test-py
nox > nox -f pytests/noxfile.py
nox > Running session test
nox > Creating virtual environment (virtualenv) using python.exe in .nox\test
nox > Command C:\Users\goldbaum\.venv\Scripts\python.exe -m virtualenv 'C:\Users\goldbaum\pyo3\pytests\.nox\test' failed with exit code 1:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\goldbaum\\AppData\\Roaming\\uv\\python\\cpython-3.13.0+freethreaded-windows-x86_64-none\\pythonw.exe'
nox > Session test failed.
nox > Command nox -f pytests/noxfile.py failed with exit code 1
nox > Session test-py failed.

@colesbury
Copy link

If it's easier to set breakpoints on functions, you could try setting one on detach_thread (or _PyThreadState_Detach if detach_thread is not exposed).

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Nov 7, 2024

If I set a breakpoint on the pyclass definition, _PyThreadState_Detach never gets called after we start initializing the pyclass.

I also made the following modification to the macro, so that we immediately call PyThreadState_Get() after the interpreter calls the handle for the module init function:

diff --git a/pyo3-macros-backend/src/module.rs b/pyo3-macros-backend/src/module.rs
index d9fac3cb..0c093455 100644
--- a/pyo3-macros-backend/src/module.rs
+++ b/pyo3-macros-backend/src/module.rs
@@ -469,6 +469,7 @@ fn module_initialization(
             #[doc(hidden)]
             #[export_name = #pyinit_symbol]
             pub unsafe extern "C" fn __pyo3_init() -> *mut #pyo3_path::ffi::PyObject {
+                let tstate = unsafe { #pyo3_path::ffi::PyInterpreterState_Get() };
                 unsafe { #pyo3_path::impl_::trampoline::module_init(|py| _PYO3_DEF.make_module(py, #gil_used)) }
             }
         });

But then somehow the thread state inside the interpreter is NULL:

image

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Nov 7, 2024

Ah, I know why this is happening!

At first, only python3t_d.dll is loaded, but then later when we call the rust extension, python313_d.dll is loaded. So maybe maturin or PyO3 isn't finding the correct libpython dll to link against?

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Nov 7, 2024

@davidhewitt helped me to find that we weren't accounting for the different library name on the free-threaded build inside of the pyo3 build configuration. In #4690 I attempted a fix, but I still see confusing behavior on Windows.

With that PR I get past the error about the invalid thread state and then it dies later when it tries to initialize a module because the module name is a NULL pointer. Similar to the behavior when we weren't getting the right libpython, now I'm seeing that the module object is valid and filled in on the PyO3 side of the FFI boundary but then inside of PyModule_Create2 in libpython the module object is filled in with garbage.

One thing I'm confused about - is it normal on windows for both python3t_d.dll and python313t_d.dll to be loaded? I'm seeing that Python starts up with the former loaded but then only later when we try to import the rust module the version-specific ABI gets loaded.

The call stack is here: https://gist.github.com/ngoldbaum/92e1dce5e5ddfef50d15a39d26a3f6ca

And here's a screenshot of the VS debugger showing the invalid module pointer and the libpython dlls that are loaded:
image

@ngoldbaum
Copy link
Contributor Author

Now I understand why it's crashing - somehow the PyObject layout on the Rust side of the FFI boundary isn't correct. When I look on the rust side, m_base doesn't have e.g. ob_mutex fields.

And I see why it's not working, for whatever reason when maturin builds the wheel, the build-config it's generating isn't correct:

PS C:\Users\goldbaum\pyo3\pytests> cat ..\debug-target\debug\build\pyo3-pytests-3f1d14e39bc363e6\output
cargo:rustc-check-cfg=cfg(Py_LIMITED_API)
cargo:rustc-check-cfg=cfg(Py_GIL_DISABLED)
cargo:rustc-check-cfg=cfg(PyPy)
cargo:rustc-check-cfg=cfg(GraalPy)
cargo:rustc-check-cfg=cfg(py_sys_config, values("Py_DEBUG", "Py_REF_DEBUG", "Py_TRACE_REFS", "COUNT_ALLOCS"))
cargo:rustc-check-cfg=cfg(invalid_from_utf8_lint)
cargo:rustc-check-cfg=cfg(pyo3_disable_reference_pool)
cargo:rustc-check-cfg=cfg(pyo3_leak_on_drop_without_reference_pool)
cargo:rustc-check-cfg=cfg(diagnostic_namespace)
cargo:rustc-check-cfg=cfg(c_str_lit)
cargo:rustc-check-cfg=cfg(rustc_has_once_lock)
cargo:rustc-check-cfg=cfg(Py_3_7)
cargo:rustc-check-cfg=cfg(Py_3_8)
cargo:rustc-check-cfg=cfg(Py_3_9)
cargo:rustc-check-cfg=cfg(Py_3_10)
cargo:rustc-check-cfg=cfg(Py_3_11)
cargo:rustc-check-cfg=cfg(Py_3_12)
cargo:rustc-check-cfg=cfg(Py_3_13)
cargo:rustc-cfg=Py_3_7
cargo:rustc-cfg=Py_3_8
cargo:rustc-cfg=Py_3_9
cargo:rustc-cfg=Py_3_10
cargo:rustc-cfg=Py_3_11
cargo:rustc-cfg=Py_3_12
cargo:rustc-cfg=Py_3_13

Now how there isn't a cargo:rustc-cfg=Py_GIL_DISABLED and there should be. Maybe maturin isn't finding the right python to call build-wheel with?

@ngoldbaum
Copy link
Contributor Author

Here's the issue:

fn from_interpreter(interpreter: impl AsRef<Path>) -> Result<Self> {
// sysconfig is missing all the flags on windows, so we can't actually
// query the interpreter directly for its build flags.
if cfg!(windows) {
return Ok(Self::new());
}

So I guess we need a different way to communicate this information to pyo3-ffi on Windows.

@ngoldbaum
Copy link
Contributor Author

Hmm, it looks like the comment there is out of date, sysconfig does have the flags on Windows, at least as of 3.13:

>>> sysconfig.get_config_vars().keys()
dict_keys(['prefix', 'exec_prefix', 'py_version', 'py_version_short', 'py_version_nodot', 'installed_base', 'base', 'installed_platbase', 'platbase', 'projectbase', 'platlibdir', 'implementation', 'implementation_lower', 'abiflags', 'py_version_nodot_plat', 'LIBDEST', 'BINLIBDEST', 'INCLUDEPY', 'EXT_SUFFIX', 'SOABI', 'Py_GIL_DISABLED', 'LIBDIR', 'LIBRARY', 'LDLIBRARY', 'EXE', 'VERSION', 'BINDIR', 'TZPATH', 'VPATH', 'userbase', 'abi_thread', 'srcdir'])

So maybe that check needs to come with a Python version check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants