-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for use of GLIBC's rseq #144
Comments
Please note that some distributions have backported rseq support into earlier downstream versions. For example our CentOS/RHEL version based on glibc 2.34 has it, with the glibc 2.35 ABI. The rseq kernel facility is the only way to get a useful |
You may want to have a look at how librseq integrates with glibc for rseq registration, this could help solving this at the tcmalloc level more robustly. Reference: https://git.kernel.org/pub/scm/libs/librseq/librseq.git/ The important bits are here: https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree/src/rseq.c |
any idea how this can be addressed? I have read the comment in internal/percpu.h, it does not seems like the current tcmalloc rseq usage allows sharing rseq with others by repurposing the cpu_id_start field... |
This misuse of cpu_id_start is outside of the ABI contract with the kernel. See include/uapi/linux/rseq.h:
So based on this ABI contract, userspace should never write to it. Writing to it basically makes tcmalloc incompatible with all other rseq users. We should discuss how we can extend rseq with new fields to provide the kind of signal required by tcmalloc. We should also find ways to prevent similar abuse of the rseq ABI in the future. |
my humble opinion is that the cpu_id_start bit hack is an optimization but could easily be corrected. my understanding is that tcmalloc_sampler, tcmalloc_slabs and __rseq_abi are positioned to share the same cacheline... if __rseq_abi is 20-24 bytes long, there is still room to store an extra field after __rseq_abi. It could be named last_cpu_id_start. The test would become that being said, something that I suspect tcmalloc will never give up is the ability to perform the rseq registration as it gives the freedom to decide what is around __rseq_abi. In that regard, it is an interesting use-case... for a peaceful cohabitation with glibc, it would be nice if glibc would allow this... It should be possible to have the option to register rseq and init the glibc rseq variables: __rseq_offset, __rseq_size, __rseq_flags https://www.gnu.org/software/libc/manual/html_node/Restartable-Sequences.html at least, glibc RSEQ_SIG and TCMALLOC_PERCPU_RSEQ_SIGNATURE are equal... |
I have just spotted:
Google must have internally a patched custom kernel where rseq is updating the vcpu field so your kernel_rseq is also 32 bytes long. no problem... In tcmalloc::tcmalloc_internal::Sampler, there is a 7 bytes hole:
so last_cpu_id_start could be stored in that hole... or... do the MSB hack with sample_interval_, rnd_ or bytes_until_sample_ to store initialized_ to get an extra full 8 bytes... Beside, I am not 100% too sure why initialization could not have been performed during the construction instead of having to test initialized_ every time Sampler::RecordAllocationSlow() is called... |
another thought that did come to my mind... it is super cool to have the tcmalloc_slabs pointer lowest half value be updated by the kernel... I am impressed by the creativity behind the setup... it is amazing! but, without being an assembly optimization expert, it seems like there is a penalty from reading a 64bits pointer that is not aligned on a 8 bytes boundary, no? if it was possible to squeeze 4 bytes from Sampler class (I think it is), tcmalloc_slabs could be aligned and not overlapping with rseq_abi... with a union, you could access the pointer lower half value to compare it efficiently with cpu_id_start |
There does not seem to be any.
If it's not overlapping, then the whole idea breaks down. We rely on the high part of tcmalloc_slabs being overwritten on preemption. |
I guess that it is correct for most processors if the access is within a cacheline but this might NOT be true on every processor. check the comments below the first answer here: maybe I do not understand the whole idea correctly but my understanding is that you check if you have been preempted and moved by doing:
or else you know you have not been preempted, you clear the MSB and use the slabs ptr as-is. if you had enough room to store the 4 bytes (you do) current cpu_id_start in last_cpu_id_start, in my mind doing
would be equivalent... Update: |
This is all a big hack to begin with, if somebody wants to work on proper kernel support, I am all for it:
This is also additional instructions, and every instruction here is a huge pile of $$$ for us. Btw, with the special kernel support, we can make the instruction sequence even better than it is now. |
I know this... this is why I am putting so much effort to integrate tcmalloc into my own project. I suppose that you made the analysis about the instructions count by testing 1 bit vs comparing 2 integers. Intuitively, I would think that this is equivalent but it probably isn't.... anyway, I really appreciate your explanations and the context where your design decisions have been made. thx for sharing them BTW, I am reading your proposal to address the issue discussed here. I find your proposal interesting and reasonable... I am surprised to see that Mathieu did not comment it when you made it there... one final comment... I am fascinated to see how the boundaries between userspace and kernel are getting broken down and the possibilities this is opening... I am seeing this phenomenom with rseq... and I am also an occasional io_uring contributor... What Linux is becoming is amazing, imho... |
I did not test actually, but I feel it will be more expensive.
With comparing 2 integers we will have:
So that's +1 instruction, +2 loads. And we also will need to fit that additional data into the same cache line. And, yes, being able to detect preemption rather than just "same cpu" gives some additional interesting benefits.
You are welcome. |
I did some more research about glibc 2.40 usage of rseq:
so the only place where rseq is used is in sched_getcpu(). My code is not calling sched_getcpu() and I did perf record my app for 30 minutes to be absolutely sure that no 3rd party libs was calling this function without my knowledge... there is nothing in my program that is either calling sched_getcpu() or is accessing glibc rseq area through its exported symbols so, while not the ideal situation, there is nothing in my context that is a dealbreaker that makes this issue stopping me to grant exclusive access to tcmalloc of the rseq area... I have no idea which SW is calling sched_getcpu() or accessing rseq through glibc but fortunately for tcmalloc, it appears to be not that much for the moment being... |
|
I still have a note to provide feedback in my spare-time to-do list since last year, but unfortunately I've been busy with customer projects since then. I've used a bit of idle time in my vacation in the last 2 weeks to provide feedback, but it would certainly help me prioritize dedicating more time to these matters if we can put some kind of contractual agreement in place. |
I would like to discuss the possible alternative approaches with you. This starts by having a clear view of the requirement and define metrics that appropriately measure the fitness of the proposed approaches.
All instructions are not equal. Some take more cycles to execute, others can cause stalls, and so on. We need more relevant metrics than "number of instructions" and "number of instruction bytes", which measure actual overhead. What I am unsure about is whether the code we are talking about is inlined or not ?
For something that would support multi-users, I basically see two approaches to this:
I understand based on this discussion that solution (2) would require a few more instructions: the increment, a load, and a conditional branch. How much measurable performance impact would this really have ? |
The current use of rseq by glibc is indeed limited to sched_getcpu(), but as soon as we are done upstreaming the extensible rseq integration and have all the basic pieces in place, you can expect the number of rseq usages to start increasing both in glibc and in other libraries/applications. So disabling rseq glibc integration with the tunable may be OK for specific use-cases in the short-term, but this is not future-proof at all. |
Yes, it's inlined into new/delete fast paths. |
Yes, it's hard choice. Overall anything that adds more instruction/makes the code slower will likely keep us on the current version. cc @ckennelly for this. Also cc @melver for potential synergy with our scheduling extensions. I wonder if we can have a single kernel interface that would support both cached slab pointer and user-space CID assignment. |
I don't see the issue here. The first time user-space is interested in this notification after a preemption is where the first increment from 0 happens. Then all following increments from any userspace library/program increment further away from 0. Then the kernel resets to 0 on return-to-userspace after preemption. This is all composable. Or am I missing something ?
Actually I'm not entirely sure what is fast-path or slow-path in option (2). I suspect that the increment of (thread_pointer() + __rseq_offset) + offsetof(struct rseq_abi, preempt_notif) from user-space only needs to happen when updating the cached value. Which would mean that the actual fast-path would need to load the cached value, then do a cmp on the new preempt_notif field, conditionally branch if it is zero. This may have to be done within a rseq critical section to make it atomic wrt preemption, so we add a store instruction to setup the critical section there. |
But how do they understand they were rescheduled? When they see what value? |
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, terminate the offending process with SIGSEGV. [ This change is expected to break the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. We should work with tcmalloc developers and extent rseq with adequate mechanisms to fulfill their requirements before merging this change. ] Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
I'm worried about the proliferation of complex ABIs that end up not working or have other shortcomings, but add lots of other complexity to the kernel. For example, while CIDs in general are nice to have (based on the vCPU idea), it turns out that a single CID assignment scheme is suboptimal. We prototyped a user-space CID assignment scheme with BPF, using a very narrow ABI/protocol that is optimal for that usecase. But coming up with a generic ABI that is generally useful beyond that, and doesn't add unwanted complexity to the kernel is rather difficult. Ultimately different users might want to see more or less detailed scheduling state (was a thread of same process running on CPU, different process, when was the last time a thread from this process was running on this CPU, etc. etc.). It quickly becomes quite complicated. FWIW - https://lore.kernel.org/all/CANpmjNO3m-f-Yg5EqTL3ktL2CqTw7v0EjHGVth7ssWJRnNz5xQ@mail.gmail.com/T/#u In general, having the ability to prototype extensions with BPF may help eventually settle on a fixed ABI if that's still desireable. If we do want a fixed ABI, it might help do the simplest thing that can mirror |
Here is the algorithm I have in mind (please correct me if there are holes in my understanding): Populating a tcmalloc cache, AFAIU this is not a fast-path:
Fast-path accessing the cached value, confirming whether it is usable with the preempt_notif check:
Overall, user-space only increments the preempt_notif field (with single-instruction or rseq critical section), or load it for comparison against 0. The kernel only has to set the preempt_notif field back to 0 when returning to userspace after preemption. |
Consider we have 2 libraries:
|
You're right, I was missing that case. My jet-lagged brain is not 100%. :) This requires a bit more thinking then. Can we instead use two new fields ? One is incremented by user-space and is a free-running counter as a sort-of generation counter (and remember it), and the second field is updated by the kernel with a snapshot of the generation counter when preemption happens. Assuming no overflow, user-space could check if the 2nd field >= its own remembered generation counter value. But this would require more than a simple comparison with zero on the fast path. |
Just trying to outline option (1) a little more. Let's say we have two new rseq flags in enum rseq_flags: RSEQ_FLAG_REGISTER_CLEAR_ON_PREEMPT and RSEQ_FLAG_UNREGISTER_CLEAR_ON_PREEMPT. Those would take a userspace address to a structure. Now the linked list used to chain the items could be either done by allocating kernel memory (which makes the memory footprint hard to track), or chained within userspace (which makes memory allocation easy to track, but we need to be mindful about linked list corruption by userspace, e.g. cycles). I think for your use-case you'd need this registration to be specific to each given thread. The registration/unregistration would take a pointer to e.g.:
The kernel would then clear either 4 or 8 bytes on return-to-userspace after a preemption. Should we limit this to an upper bound ? Else what happens if an application registers 10k of those for a given thread ? If this is all chained from within userspace, rather than extending rseq with new register/unregister flags, we could also add a new field to struct rseq_abi which would be the top level pointer to that linked list. |
I'm curious to hear more about the variety of concurrency IDs requirements you have observed. Note that I've recently submitted a patch series to make rseq mm_cid numa-aware: https://lore.kernel.org/lkml/[email protected]/T/#t . I would like to heard your thoughts about this and what is missing there. |
[...]
I'm wondering what would be missing from the picture in terms of flexibility with the following:
I'm curious to hear about use-cases for which this approach would not work. |
Not sure what this detection looks like. One option is to have a bunch of presets for known topologies, but there should also be an option for the user to override that. Also, within a VM guest it's going to be tricky. So I would keep it as simple as possible, and just give the admin the ability to "correct" whatever the kernel thinks it's dealing with.
This sounds good. I can't say what might be missing, but past experience says that the user should be able to influence this as much as possible so that any shortcomings that are "constant" in the kernel can be corrected without too much hassle. Thanks, |
This should work. But I would just let kernel increment a single counter (preemption count). But as you noted that's more instructions and memory loads on the fast path. Thus my original proposal of instructing kernel to store given value at the given address: For tcmalloc (or a similar memory allocator) it would allow to even reduce number of instructions on the fast path. But I can't say I 100% like the interface I proposed b/c it looks a bit like an over-engineered solution for few obscure cases. Perhaps this is a good aspect to discuss in person on LPC. |
[...]
I'm worried about exposing more boot-time parameters to the end-users than is required. Let's say we initially focus on the NUMA-awareness of mm_cid and leave the EPYC CCX indexing as future work (note: the EPYC bios can be configured to expose one NUMA node per CCX). In the case of a user caring more about compactness (minimize memory use) of the mm_cid than performance within a given machine or vm, I would say that the user should perhaps just CONFIG_NUMA=n or boot the kernel with numa=off. If the user cares about mm_cid compactness for a given process only, then perhaps they should just use cgroup cpusets or sched_setaffinity to affine the process threads to CPUs belonging to a single NUMA node, and then they are as compact as if the NUMA-awareness feature was disabled. Considering these knobs are already in place, what is the story for requiring an additional user-exposed override ? |
This is unrealistic at scale. The "user" in this case has to manage millions of machines.
Not every workload is the same. But different workloads share the same kernel. Perhaps it's not required to design the final solution right away, but design something that allows for incremental improvements (with relative ease) as we learn more. What's worse is adding say 90% of the feature, but to get the last 10% we have to start from scratch. As long as these requirements are considered, the design provisions for them to exist in future, and later iterate, is probably good enough. |
FYI, I've posted an improved version of the NUMA-awareness series. I figured after doing a lot of benchmarks with various topologies that what we're really after is cache locality, so I went for a simpler and more general approach: https://lore.kernel.org/lkml/[email protected]/T/#t Feedback is welcome! |
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead.
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead.
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead.
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead. Changes since v1: - Include pr_warn ratelimit updates from Peter. - Introduce rseq_set_ro_fields().
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead. Changes since v1: - Include pr_warn ratelimit updates from Peter. - Introduce rseq_set_ro_fields().
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected] --- Changes since v0: - Structure ending with a flexible array cannot be placed within another structure (if not last member). Fix this by declaring the kernel copy place holder as a char array instead. Changes since v1: - Include pr_warn ratelimit updates from Peter. - Introduce rseq_set_ro_fields().
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/ Link: google/tcmalloc#144 Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Florian Weimer <[email protected]> Cc: Carlos O'Donell <[email protected]> Cc: DJ Delorie <[email protected]> Cc: [email protected]
The rseq uapi requires cooperation between users of the rseq fields to ensure that all libraries and applications using rseq within a process do not interfere with each other. This is especially important for fields which are meant to be read-only from user-space, as documented in uapi/linux/rseq.h: - cpu_id_start, - cpu_id, - node_id, - mm_cid. Storing to those fields from a user-space library prevents any sharing of the rseq ABI with other libraries and applications, as other users are not aware that the content of those fields has been altered by a third-party library. This is unfortunately the current behavior of tcmalloc: it purposefully overlaps part of a cached value with the cpu_id_start upper bits to get notified about preemption, because the kernel clears those upper bits before returning to user-space. This behavior does not conform to the rseq uapi header ABI. This prevents tcmalloc from using rseq when rseq is registered by the GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq registration with a glibc tunable, which is a sad state of affairs. Considering that tcmalloc and the GNU C library are the two first upstream projects using rseq, and that they are already incompatible due to use of this hack, adding kernel-level validation of all read-only fields content is necessary to ensure future users of rseq abide by the rseq ABI requirements. Validate that user-space does not corrupt the read-only fields and conform to the rseq uapi header ABI when the kernel is built with CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only fields in the task_struct, and validating the prior values present in user-space before updating them. If the values do not match, print a warning on the console (printk_ratelimited()). This is a first step to identify misuses of the rseq ABI by printing a warning on the console. After a giving some time to userspace to correct its use of rseq, the plan is to eventually terminate offending processes with SIGSEGV. This change is expected to produce warnings for the upstream tcmalloc implementation, but tcmalloc developers mentioned they were open to adapt their implementation to kernel-level change. Signed-off-by: Mathieu Desnoyers <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: google/tcmalloc#144
From version 2.35 GLIBC incorporated rseq and it does a systemcall to register the thread's abi struct at the start of each thread. It would be nice to add support to TCMalloc to use GLIBC's rseq abi. See https://www.gnu.org/software/libc/manual/html_mono/libc.html#Restartable-Sequences for the documentation on how they implemented it.
For the time being we can enable TCMalloc's rseq when linked with GLIBC 2.35+ by disabling GLIBC's rseq call setting the environment variable: 'GLIBC_TUNABLES=glibc.pthread.rseq=0'.
The text was updated successfully, but these errors were encountered: