Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't track the size of each allocated block any more #19767

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

Don't track the size of each allocated block any more

This saves us 8 bytes per block on 64 bit builds, we no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the next pointers did not use release semantics and the reading of them in SpaceAllocated reads with relaxed order, there's no guarantee that size has actually been initialized - but worse, there is also no guarantee that next has been!. Simplified:

AddBlock:
1 ptr = malloc();
2 ptr->size = 123;
3 ptr->next = ai->blocks;
4 ai->blocks = ptr (release order);
SpaceAllocated:
5 block = ai->blocks (relaxed order)
6 block->size (acquire, but probably by accident)
7 block = block->next (relaxed order)

So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce.

In reality, operation 4 will produce an stlr on arm (forcing an order of 1, 2, 3 before 4), and block->next has a data dependency on ai->blocks which would force an ordering in the hardware between 5->6 and 5->7 even for regular ldr instructions.

The fix would be for SpaceAllocated to read ai->blocks with acquire order, but with this CL that's moot. Please check my work as I'm less familiar with the the C/C++ memory model.

Delete arena contains, it's private and the only user is its own test.

This saves us 8 bytes per block on 64 bit builds, we no longer need to traverse the linked list of blocks to check allocated space, which means we also no longer need atomics in the linked list or even its head. This is especially beneficial as the previous implementation contained a race where we could dereference uninitialized memory; because the setting of the `next` pointers did not use release semantics and the reading of them in `SpaceAllocated` reads with relaxed order, there's no guarantee that `size` has actually been initialized - but worse, *there is also no guarantee that `next` has been!*. Simplified:
```
AddBlock:
1 ptr = malloc();
2 ptr->size = 123;
3 ptr->next = ai->blocks;
4 ai->blocks = ptr (release order);
```
```
SpaceAllocated:
5 block = ai->blocks (relaxed order)
6 block->size (acquire, but probably by accident)
7 block = block->next (relaxed order)
```

So I think a second thread calling SpaceAllocated could see the order 1, 4, 5, 6, 7, 2, 3 and read uninitialized memory - there is no data-dependency relationship or happens-before edge that this order violates, and so it would be valid for a compiler+hardware to produce.

In reality, operation 4 will produce an `stlr` on arm (forcing an order of 1, 2, 3 before 4), and `block->next` has a data dependency on `ai->blocks` which would force an ordering in the hardware between 5->6 and 5->7 even for regular `ldr` instructions.

The fix would be for `SpaceAllocated` to read `ai->blocks` with acquire order, but with this CL that's moot. Please check my work as I'm less familiar with the the C/C++ memory model.

Delete arena contains, it's private and the only user is its own test.

PiperOrigin-RevId: 708180547
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant