-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alloc-tls: Handle thread-local storage on platforms without #[thread_local] #54
Comments
Replying to @davidtgoldblatt's comment:
Awesome, thanks so much! I'll take a look at those. |
So to clarify our problem, we already know what we want to be stored in TLS - we just aren't sure how to get that structure into TLS in all cases. Rust supports a However, on platforms on which it's not supported, the only mechanism that the standard library exposes uses allocation under the hood ( If I'm reading your comment correctly, what you're describing is a solution to the first problem - that we solve with Rust's |
Oh, I see what you're saying. Yeah, the initial-exec TLS model addresses the issue that, if you're loading your TLS-containing object from a shared library, the memory from it will get lazily allocated, via malloc.
Some early multithreaded mallocs had a global hashmap (backed by a slow fallback allocator that doesn't use thread-local data, but that a thread needs to use only once, on its first allocator call) that mapped thread ids to the thread-local data for that thread. I suspect this approach would have unpleasantly large constant factors, though. In the end, I don't think there's a good way of getting around all the annoying platform-specific stuff if you're trying to optimize for speed; I think jemalloc ends up with something like 4 substantially different TLS implementations. You might find some useful inspiration in https://github.com/jemalloc/jemalloc/blob/dev/include/jemalloc/internal/tsd.h and friends (e.g. tsd_generic.h, tsd_win.h, etc.) and https://github.com/jemalloc/jemalloc/blob/dev/src/tsd.c (you can ignore the TSD struct and the X-macro nonsense; the tsd_fetch implementation is the interesting part). Note that TLS isn't the only reentrancy scenario you might need to worry about; on many platforms, "safe"-seeming standard library calls end up allocating (e.g. mutex locking, file reading, getting the CPU count, off the top of my head). |
That's hugely helpful, thanks! I'll mull over that for a while :) |
Update: On OS X, we're getting bitten by the same issue that rpmalloc is: mjansson/rpmalloc#29. It seems to come from the fact that when using |
@davidtgoldblatt Do you mean that if you get there late enough, |
My memory on this is a little hazy; you'd have to check the glibc (or whatever libc you're interested in) to be sure. Here's the way I remember glibc working:
So pthread_key_create won't allocate (it just acquires a mutex and bumps a counter). It's the first time you touch the per-thread data in a thread that you get an allocation. We get our pthread key in a library constructor at load time[1]. As far as I know, this is early enough that it hasn't caused problems for anyone. [1] We actually do this even if TLS is available, to get us a place to put our per-thread shutdown hooks. |
OK that makes perfect sense, thanks! We'll probably end up doing something similar because even if we fix this dylib issue (more on that in a second...), we'll still need to support platforms that don't support the I just discovered something interesting about that issue I linked to above: mjansson/rpmalloc#29. It looks like the issue subsides if you only access TLS after the dynamic loading is complete. My hunch, given the stack trace pasted below, is that our Stack trace
Have you guys run into anything similar with jemalloc? |
Doesn't ring a bell, but most of the tricky TLS stuff was figured out before my time. I don't know the details of TLS on OS X (actually, my memory was that their compiler people had disabled it there because they wanted to come up with a new implementation without having to worry about backwards compatibility). I think we exclusively use pthreads for our thread-local data on OS X. |
Gotcha. I actually managed to solve this with a surprisingly simple approach - I have a global bool that's initialized to false and then set to true in the library constructor. The TLS code checks that boolean and uses a global slow path if it's false (avoiding accessing TLS). |
On platforms without the
#[thread_local]
attribute, thread-local storage (TLS) for an allocator is particularly tricky because most obvious implementations (including the one implemented in the standard library) require allocation to work, and do so in such a way that detecting what state a TLS value is in requires allocation, meaning that recursion (caused by an allocation call accessing TLS, triggering further allocation) cannot be detected. While it's possible that this will be fixed at some point in the future, it won't be fixed any time soon.Inspired by this comment on a jemalloc issue, our best bet may be to register a handler that is called by pthread whenever a thread spawns, and use this opportunity to preemptively initialize TLS for that thread.
The text was updated successfully, but these errors were encountered: