Don't track the size of each allocated block any more #19767

copybara-service · 2024-12-20T17:39:53Z

Don't track the size of each allocated block any more

This saves us 8 bytes per block on 64 bit builds, we no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the next pointers did not use release semantics and the reading of them in SpaceAllocated reads with relaxed order, there's no guarantee that size has actually been initialized - but worse, there is also no guarantee that next has been!. Simplified:

AddBlock:
1 ptr = malloc();
2 ptr->size = 123;
3 ptr->next = ai->blocks;
4 ai->blocks = ptr (release order);

SpaceAllocated:
5 block = ai->blocks (relaxed order)
6 block->size (acquire, but probably by accident)
7 block = block->next (relaxed order)

So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce.

In reality, operation 4 will produce an stlr on arm (forcing an order of 1, 2, 3 before 4), and block->next has a data dependency on ai->blocks which would force an ordering in the hardware between 5->6 and 5->7 even for regular ldr instructions.

The fix would be for SpaceAllocated to read ai->blocks with acquire order, but with this CL that's moot. Please check my work as I'm less familiar with the the C/C++ memory model.

Delete arena contains, it's private and the only user is its own test.

This saves us 8 bytes per block on 64 bit builds, we no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the `next` pointers did not use release semantics and the reading of them in `SpaceAllocated` reads with relaxed order, there's no guarantee that `size` has actually been initialized - but worse, *there is also no guarantee that `next` has been!*. Simplified: ``` AddBlock: 1 ptr = malloc(); 2 ptr->size = 123; 3 ptr->next = ai->blocks; 4 ai->blocks = ptr (release order); ``` ``` SpaceAllocated: 5 block = ai->blocks (relaxed order) 6 block->size (acquire, but probably by accident) 7 block = block->next (relaxed order) ``` So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce. In reality, operation 4 will produce an `stlr` on arm (forcing an order of 1, 2, 3 before 4), and `block->next` has a data dependency on `ai->blocks` which would force an ordering in the hardware between 5->6 and 5->7 even for regular `ldr` instructions. The fix would be for `SpaceAllocated` to read `ai->blocks` with acquire order, but with this CL that's moot. Please check my work as I'm less familiar with the the C/C++ memory model. Delete arena contains, it's private and the only user is its own test. PiperOrigin-RevId: 708180547

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't track the size of each allocated block any more #19767

Don't track the size of each allocated block any more #19767

copybara-service bot commented Dec 20, 2024

Don't track the size of each allocated block any more #19767

Are you sure you want to change the base?

Don't track the size of each allocated block any more #19767

Conversation

copybara-service bot commented Dec 20, 2024