Race condition in tracemalloc causes segfaults #128679

itamarst · 2025-01-09T15:20:30Z

Crash report

What happened?

This is a bit of a tricky situation, but it is real and impacting my ability to use tracemalloc. As background, I've added code to Polars to make it record all of its allocations in tracemalloc, and this is enabled in debug builds. This then allows writing unit tests that check memory usage, which is very useful in ensuring high memory usage is fixed, and making sure it doesn't get high again.

Unfortunately, I'm hitting a situation where tracemalloc causes segfaults in multi-threaded situations. I believe that this is a race condition between PyTraceMalloc_Track() in a new non-Python thread that does not hold the GIL, and tracemalloc.stop() being called in another thread. My hypothesis in detail:

Thread 1 does tracemalloc.start().
Thread 2 does not hold the GIL.
Thread 2 calls PyTraceMalloc_Track().
Thread 2 checks if tracemalloc is enabled, the answer is yes.
Thread 2 starts acquiring the GIL (see PyTraceMalloc_Track implementation). As part of acquiring the GIL memory needs to be allocated.
Thread 2 therefore once again checks if tracemalloc is enabled (I haven't actually read source code enough to verify if this part is true, but the coredump suggests it probably is). The answer is yes.
At this point, Thread 1 calls tracemalloc.stop().
Thread 2 continues as if tracemalloc enabled, but it is not!
BOOM

If this hypothesis is correct, the solution would for GIL acquisition to bypass tracemalloc altogether if it allocates; it's not like it allocates a lot of memory, so not tracking it is fine. This may be difficult in practice, so another approach would involve having an additional lock so there's no race condition around checking if tracemalloc is enabled.

Here is a stack trace from a coredump from the reproducer (see below) that led me to the above hypothesis:

#0  0x00000000005f0482 in traceback_new () at ../Python/tracemalloc.c:371
#1  0x00000000005f0083 in tracemalloc_add_trace (domain=0, ptr=132539872185888, size=336) at ../Python/tracemalloc.c:471
#2  0x00000000006930db in tracemalloc_alloc (elsize=336, nelem=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, use_calloc=<optimized out>) at ../Python/tracemalloc.c:541
#3  tracemalloc_raw_alloc (use_calloc=<optimized out>, ctx=0x9ec4f8 <_PyRuntime+10136>, nelem=<optimized out>, elsize=336) at ../Python/tracemalloc.c:715
#4  0x0000000000639a52 in alloc_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>) at ../Python/pystate.c:1452
#5  new_threadstate (interp=0x9ff6b0 <_PyRuntime+88400>, whence=4) at ../Python/pystate.c:1563
#6  0x0000000000480ef0 in PyGILState_Ensure () at ../Python/pystate.c:2766
#7  0x00000000005203fc in PyTraceMalloc_Track (domain=123, ptr=10, size=1) at ../Python/tracemalloc.c:1318
#8  0x0000788b7088774c in tracemalloc_repro::in_thread () at src/lib.rs:23

To run the reproducer you will need to pip install rustimport and have Rust installed. (I tried with Cython, had a hard time, gave up.)

Here's the Python file:

import gc
import time
import tracemalloc

import rustimport.import_hook

# rustimport will automatically compile tracemalloc_repro.rs into an extension:
import tracemalloc_repro

tracemalloc.start()
# This launches ~50 threads that run in parallel to this one:
tracemalloc_repro.run()
# This happens in parallel to the new threads running:
tracemalloc.stop()
gc.collect()

time.sleep(10)

And here is the Rust file, you should call it tracemalloc_repro.rs:

// rustimport
//: [package]
//: name = "tracemalloc_repro"
//: version = "0.1.0"
//: edition = "2021"
//:
//: [lib]
//: name = "tracemalloc_repro"
//: crate-type = ["cdylib"]
//:
//: [dependencies]
//: pyo3 = {version = "0.23", features = ["extension-module"]}
//: libc = "0.2"

use pyo3::prelude::*;
use libc::{c_int, c_uint, size_t, uintptr_t};

extern "C" {
    fn PyTraceMalloc_Track(domain: c_uint, ptr: uintptr_t, size: size_t) -> c_int;
}

fn in_thread() {
    let result = unsafe { PyTraceMalloc_Track(123, 10, 1) };
    println!("Result of tracking: {result}");
}

#[pyfunction]
fn run(py: Python) {
    py.allow_threads(|| {
        // With GIL released, run function in a thread.
        for _ in 0..50 {
            let _ = std::thread::spawn(in_thread);
        }
    });
}

#[pymodule]
fn tracemalloc_repro(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(run, m)?)?;
    Ok(())
}

You can reproduce by calling repro.py. Because this is a race condition, you may need to run it a few times; I had more consistent crashes with Python 3.12, but it does crash on Python 3.13. You may need to tweak the number 50 above to make it happen.

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.1 (main, Dec 4 2024, 08:54:15) [GCC 13.2.0]

Linked PRs

The text was updated successfully, but these errors were encountered:

tom-pytel · 2025-01-09T20:17:42Z

Thread 2 therefore once again checks if tracemalloc is enabled (I haven't actually read source code enough to verify if this part is true, but the coredump suggests it probably is). The answer is yes.

PyTraceMalloc_Track() in thread 2 does not in fact currently check tracemalloc enabled again after acquiring GIL, but that is not the problem here. The problem is that tracing is turned off at some point WHILE PyGILState_Ensure() is being processed and after it has decided to call the tracemalloc allocators. Simple quick solution is to add one final check before attempting to ADD_TRACE() and just skip adding the trace as an emergency fallback if tracing has been turned off in this critical interval. This is just the hack starting fix I will put here, a better fix would be to properly delineate and lock this critical interval. Also including a C extension module which seems to reproduce the problem 100% of the time:

mymod.c:

#include <pthread.h>

#include "Python.h"

void *
threadfunc(void *p) {
    PyTraceMalloc_Track(123, 10, 1);

    return NULL;
}

static PyObject *
test(PyObject *self, PyObject *args) {
    for (int i = 0; i < 50; i++) {
        pthread_t p;

        if (pthread_create(&p, NULL, threadfunc, NULL) != 0)
            break;

        pthread_detach(p);
    }

    Py_RETURN_NONE;
}

static PyMethodDef module_methods[] = {
    {"test", test, METH_NOARGS},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef module_definition = {
    PyModuleDef_HEAD_INIT,
    "mymod",
    "test module",
    -1,
    module_methods
};

PyMODINIT_FUNC
PyInit_mymod(void) {
    return PyModule_Create(&module_definition);
}

test_mymod.py:

import gc
import time
import tracemalloc

import mymod

tracemalloc.start()
mymod.test()
tracemalloc.stop()
gc.collect()

print('Waiting...')
time.sleep(10)
print('Done.')

Here are the changes as a starting point for the eventual fix:

$ git diff main
diff --git a/Python/tracemalloc.c b/Python/tracemalloc.c
index f661d69c03..d2e3dfc53b 100644
--- a/Python/tracemalloc.c
+++ b/Python/tracemalloc.c
@@ -538,11 +538,13 @@ tracemalloc_alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
         return NULL;
 
     TABLES_LOCK();
-    if (ADD_TRACE(ptr, nelem * elsize) < 0) {
-        /* Failed to allocate a trace for the new memory block */
-        TABLES_UNLOCK();
-        alloc->free(alloc->ctx, ptr);
-        return NULL;
+    if (tracemalloc_config.tracing) {
+        if (ADD_TRACE(ptr, nelem * elsize) < 0) {
+            /* Failed to allocate a trace for the new memory block */
+            TABLES_UNLOCK();
+            alloc->free(alloc->ctx, ptr);
+            return NULL;
+        }
     }
     TABLES_UNLOCK();
     return ptr;
@@ -963,8 +965,11 @@ _PyTraceMalloc_Stop(void)
     if (!tracemalloc_config.tracing)
         return;
 
-    /* stop tracing Python memory allocations */
+    /* stop tracing Python memory allocations,
+       but not while something might be in the middle of an operation */
+    TABLES_LOCK();
     tracemalloc_config.tracing = 0;
+    TABLES_UNLOCK();
 
     /* unregister the hook on memory allocators */
 #ifdef TRACE_RAW_MALLOC
@@ -1317,6 +1322,12 @@ PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
 
     gil_state = PyGILState_Ensure();
 
+    if (!tracemalloc_config.tracing) {
+        /* tracing may have been turned off as we were acquiring the GIL */
+        PyGILState_Release(gil_state);
+        return -2;
+    }
+
     TABLES_LOCK();
     res = tracemalloc_add_trace(domain, ptr, size);
     TABLES_UNLOCK();

As stated, the C extension module reproduces the problem 100% of the time and this fix appears to fix it 100% of the time but the person in charge of tracemalloc should really have a look at this.

itamarst · 2025-01-09T22:10:34Z

Thank you for the quick response and consistent reproducer!

tomasr8 · 2025-01-09T22:37:56Z

@tom-pytel I can confirm that your patch appears to fix the problem. Using the tables_lock to protect tracemalloc_config.tracing seems reasonable to me given that we cannot rely on the GIL yet, though I'm not an expert in this area..
Would you like to open a PR with the proposed changes?

tom-pytel · 2025-01-09T23:28:49Z

Would you like to open a PR with the proposed changes?

Done.

Check again 'tracemalloc_config.tracing' once the GIL is held in tracemalloc_raw_alloc() and PyTraceMalloc_Track(), since another thread can call tracemalloc.stop() during PyGILState_Ensure() call.

* tracemalloc_realloc_gil() and tracemalloc_raw_realloc() no longer remove the trace on reentrant call. * _PyTraceMalloc_Stop() unregisters _PyTraceMalloc_TraceRef(). * _PyTraceMalloc_GetTraces() sets the reentrant flag. * tracemalloc_clear_traces_unlocked() sets the reentrant flag.

* Use TABLES_LOCK() to protect 'tracemalloc_config.tracing'. * Hold TABLES_LOCK() longer while accessing tables. * tracemalloc_realloc_gil() and tracemalloc_raw_realloc() no longer remove the trace on reentrant call. * _PyTraceMalloc_Stop() unregisters _PyTraceMalloc_TraceRef(). * _PyTraceMalloc_GetTraces() sets the reentrant flag. * tracemalloc_clear_traces_unlocked() sets the reentrant flag.

* Use TABLES_LOCK() to protect 'tracemalloc_config.tracing'. * Hold TABLES_LOCK() longer while accessing tables. * tracemalloc_realloc() and tracemalloc_free() no longer remove the trace on reentrant call. * _PyTraceMalloc_Stop() unregisters _PyTraceMalloc_TraceRef(). * _PyTraceMalloc_GetTraces() sets the reentrant flag. * tracemalloc_clear_traces_unlocked() sets the reentrant flag.

tracemalloc_alloc(), tracemalloc_realloc(), tracemalloc_free(), _PyTraceMalloc_TraceRef() and _PyTraceMalloc_GetMemory() now check tracemalloc_config.tracing after calling TABLES_LOCK().

tracemalloc_alloc(), tracemalloc_realloc(), tracemalloc_free(), _PyTraceMalloc_TraceRef() and _PyTraceMalloc_GetMemory() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_TraceRef() now always returns 0.

tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.

tracemalloc_alloc(), tracemalloc_realloc(), tracemalloc_free(), _PyTraceMalloc_TraceRef() and _PyTraceMalloc_GetMemory() now check 'tracemalloc_config.tracing' after calling TABLES_LOCK(). _PyTraceMalloc_TraceRef() now always returns 0.

vstinner · 2025-01-17T08:24:26Z

I fixed race conditions in tracemalloc and added test_tracemalloc.test_tracemalloc_track_race() in the main branch. The problem is that the test does crash on FreeBSD :-( It seems like the root issue is a race condition between PyMem_SetAllocator() and PyMem_RawFree():

Thread 1 calls tracemalloc.stop() which calls PyMem_SetAllocator(): write _PyMem_Raw protected by a lock.
Thread 2 calls PyMem_Free(): read _PyMem_Raw (not protected by a lock).

void PyMem_RawFree(void *ptr)
{
    _PyMem_Raw.free(_PyMem_Raw.ctx, ptr);
}

A Python memory allocator is made of two main parts:

a function (free in this example)
a context.

The lock makes sure that PyMem_SetAllocator() is atomic. The problem is that the read is not protected by a lock and so it's possible to get a function of allocator A with a context of allocator B.

It's possible to fix PyMem_Free() by adding a lock:

void PyMem_RawFree(void *ptr)
{
    PyMutex_Lock(&ALLOCATORS_MUTEX);
    PyMemAllocatorEx raw = _PyMem_Raw;
    PyMutex_Unlock(&ALLOCATORS_MUTEX);
    raw.free(raw.ctx, ptr);
}

The new problem is that Python allocates a lot of small memory blocks (it calls often the memory allocator) and so any performance slowdown on allocation would have a significant impact on Python overall performance.

vstinner · 2025-01-17T08:36:30Z

The PyMem_SetAllocator() vs PyMem_RawFree() race condition doesn't affect Python built in release mode by default, since the default release memory allocator doesn't use a context. Only a debug build or a release build using explicitly PYTHONMALLOC=debug are affected: the debug memory allocator which uses a context.

PyMem_Malloc() and PyObject_Malloc() allocators are not affected since they are protected by another lock: the GIL. Well, at least in the default build. In Free Threaded build, they should also be affected.

The race condition is as old as Python 3.4 and PEP 445 – Add new APIs to customize Python memory allocators.

ZeroIntensity · 2025-01-17T10:35:09Z

Well, at least in the default build. In Free Threaded build, they should also be affected.

I'm not sure customizing the allocators is even supported under free-threading. Mimalloc must be used for the object allocator in order for the GC to work.

vstinner · 2025-01-17T11:14:11Z

I'm not sure customizing the allocators is even supported under free-threading. Mimalloc must be used for the object allocator in order for the GC to work.

tracemalloc doesn't replace the memory allocators, but wraps existing allocators.

But I don't think that tracemalloc is compatible with Free Threading yet (I expect bugs/crashes).

There is a race condition between PyMem_SetAllocator() and PyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() write is protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads are not protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can be called with an old context and the new function pointer. On a release build, it's not an issue since the context is not used. On a debug build, the debug hooks use the context and so can crash.

…ython#128988) There is a race condition between PyMem_SetAllocator() and PyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() write is protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads are not protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can be called with an old context and the new function pointer. On a release build, it's not an issue since the context is not used. On a debug build, the debug hooks use the context and so can crash. (cherry picked from commit 9bc1964)

tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition. Call _PyTraceMalloc_Init() at Python startup.

…129022) [3.13] gh-128679: Fix tracemalloc.stop() race conditions (#128897) tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition. Call _PyTraceMalloc_Init() at Python startup. (cherry picked from commit 6b47499)

vstinner · 2025-01-19T14:09:32Z

Ok, the tracemalloc.stop() race conditions should be fixed in main, 3.13 and 3.12 branches. Thanks for your bug report @itamarst!

Sadly, the added test crash on a debug build (especially on FreeBSD). The code is only reliable on a release build. I close the issue.

Fixing the debug hooks on memory allocators to make the code atomic would impact badly Python performance. It's a no go.

itamarst · 2025-01-19T14:10:29Z

Thank you to everyone who helped fix this!

vstinner · 2025-01-19T14:10:36Z

Thanks @tom-pytel for the initial fix! The final fix is more complex since we decided to fix more functions and more race conditions.

…ython#128988) There is a race condition between PyMem_SetAllocator() and PyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() write is protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads are not protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can be called with an old context and the new function pointer. On a release build, it's not an issue since the context is not used. On a debug build, the debug hooks use the context and so can crash.

Replace uncommon PyGILState_GetThisThreadState() with common _PyThreadState_GET().

_PyTraceMalloc_Stop() now calls PyRefTracer_SetTracer(NULL, NULL).

) _PyTraceMalloc_Stop() now calls PyRefTracer_SetTracer(NULL, NULL).

itamarst added the type-crash A hard crash of the interpreter, possibly with a core dump label Jan 9, 2025

tom-pytel added a commit to tom-pytel/cpython that referenced this issue Jan 9, 2025

pythongh-128679: fix race condition in tracemalloc

973d08a

bedevere-app bot mentioned this issue Jan 9, 2025

gh-128679: fix race condition in tracemalloc #128695

Closed

ZeroIntensity added extension-modules C modules in the Modules dir 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Jan 10, 2025

bedevere-app bot mentioned this issue Jan 10, 2025

gh-128679: Fix tracemalloc.stop() race condition #128710

Closed

bedevere-app bot mentioned this issue Jan 15, 2025

gh-128679: Redesign tracemalloc locking #128888

Merged

bedevere-app bot mentioned this issue Jan 15, 2025

gh-128679: Fix tracemalloc.stop() race conditions #128893

Merged

bedevere-app bot mentioned this issue Jan 15, 2025

[3.13] gh-128679: Fix tracemalloc.stop() race conditions #128897

Merged

bedevere-app bot mentioned this issue Jan 18, 2025

gh-128679: Skip test_tracemalloc_track_race() on debug build #128988

Merged

bedevere-app bot mentioned this issue Jan 19, 2025

[3.12] gh-128679: Fix tracemalloc.stop() race conditions (#128897) #129022

Merged

vstinner closed this as completed Jan 19, 2025

vstinner added a commit to vstinner/cpython that referenced this issue Jan 21, 2025

pythongh-128679: Use s_PyThreadState_GET() in tracemalloc.c

b8b1d1c

Replace uncommon PyGILState_GetThisThreadState() with common _PyThreadState_GET().

vstinner added a commit to vstinner/cpython that referenced this issue Jan 21, 2025

pythongh-128679: Use _PyThreadState_GET() in tracemalloc.c

96146dd

Replace uncommon PyGILState_GetThisThreadState() with common _PyThreadState_GET().

bedevere-app bot mentioned this issue Jan 21, 2025

gh-128679: Use _PyThreadState_GET() in tracemalloc.c #129126

Merged

vstinner added a commit that referenced this issue Jan 21, 2025

gh-128679: Use _PyThreadState_GET() in tracemalloc.c (#129126)

31f149d

Replace uncommon PyGILState_GetThisThreadState() with common _PyThreadState_GET().

vstinner added a commit to vstinner/cpython that referenced this issue Jan 24, 2025

pythongh-128679: Clear the ref trace in _PyTraceMalloc_Stop()

14e2553

_PyTraceMalloc_Stop() now calls PyRefTracer_SetTracer(NULL, NULL).

bedevere-app bot mentioned this issue Jan 24, 2025

[3.13] gh-128679: Clear the ref trace in _PyTraceMalloc_Stop() #129258

Merged

vstinner mentioned this issue Jan 24, 2025

gh-129185: Simplify PyTraceMalloc_Track() #129256

Merged

vstinner added a commit that referenced this issue Jan 24, 2025

[3.13] gh-128679: Clear the ref tracer in _PyTraceMalloc_Stop() (#129258

1ee321b

) _PyTraceMalloc_Stop() now calls PyRefTracer_SetTracer(NULL, NULL).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in tracemalloc causes segfaults #128679

Race condition in tracemalloc causes segfaults #128679

itamarst commented Jan 9, 2025 •

edited by bedevere-app bot

Loading

tom-pytel commented Jan 9, 2025

itamarst commented Jan 9, 2025

tomasr8 commented Jan 9, 2025

tom-pytel commented Jan 9, 2025

vstinner commented Jan 17, 2025 •

edited

Loading

vstinner commented Jan 17, 2025

ZeroIntensity commented Jan 17, 2025

vstinner commented Jan 17, 2025

vstinner commented Jan 19, 2025

itamarst commented Jan 19, 2025

vstinner commented Jan 19, 2025 •

edited

Loading

Race condition in tracemalloc causes segfaults #128679

Race condition in tracemalloc causes segfaults #128679

Comments

itamarst commented Jan 9, 2025 • edited by bedevere-app bot Loading

Crash report

What happened?

CPython versions tested on:

Operating systems tested on:

Output from running 'python -VV' on the command line:

Linked PRs

tom-pytel commented Jan 9, 2025

itamarst commented Jan 9, 2025

tomasr8 commented Jan 9, 2025

tom-pytel commented Jan 9, 2025

vstinner commented Jan 17, 2025 • edited Loading

vstinner commented Jan 17, 2025

ZeroIntensity commented Jan 17, 2025

vstinner commented Jan 17, 2025

vstinner commented Jan 19, 2025

itamarst commented Jan 19, 2025

vstinner commented Jan 19, 2025 • edited Loading

itamarst commented Jan 9, 2025 •

edited by bedevere-app bot

Loading

vstinner commented Jan 17, 2025 •

edited

Loading

vstinner commented Jan 19, 2025 •

edited

Loading