Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Random Crashes when Using Pool with Freeable Descriptor Sets #2388

Closed

Conversation

WDog367
Copy link
Contributor

@WDog367 WDog367 commented Nov 4, 2024

Some small changes to fix crashes and random descriptor set issues I was running into when using a pool with FREE_DESCRIPTOR_SET enabled

First issue was caused by _nextMetalArgumentBufferOffset in the pool becoming out-of-date if the allocated set in the highest index is freed and re-used. Argument buffers of any size would be allowed to use that slot, but the offset wasn't adjusted, resulting in some argument buffer ranges overlapping

Eventually crashing with the error: [mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: Lost VkDevice after MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0 (0x406e75600)" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

Second issue is caused by a bug in enumerateEnabledBits() where it is enumerating bits that were disabled.
Specifically, in the case where startIndex is higher than the first enabled bit, but the next enabled bit is in a different section than startIndex, it looks like it just return the first index in the next section, instead of the next enabled bit.

I just added a quick fix so that the enumerate func checks whether the returned bit is enabled before calling the functor. More changes to the MVKBitArray functions might be called for, but I hope this is helpful 😄

Changes made so that `_nextMetalArgumentBufferOffset` is updated after
`mvkDS->allocate()` is called and called each time the highest index
set is written

Fixes 2 cases where _nextMetalArgumentBufferOffset would become out of date:
  - if the offset was set, and then `mvkDS->allocate()` fails, the reserved
    space wouldn't actually be in use
  - if the descriptor set in the highest index is freed, then re-used (in
    a `CREATE_FREE_DESCRIPTOR_SET_BIT` pool), the new set may have a larger arg
    buffer size than the original. It would still be allowed to use that slot
    in this case (since there is no argbuffer offset in `nextDSIdx` to limit it),
    but would not update `_nextMetalArgumentBufferOffset` (since that only
    happens when adding a new set at the end of the list). This results in the
    argument buffer region overlapping with the next set to use
    `_nextMetalArgumentBufferOffset` as its base offset
Quick fix for the issue, just checking if the returned bit is
enabled, and skipping the iteration of the loop if it is not

Avoids issue where sometimes `getIndexOfFirstEnabledBit()` will
return bits that are not set -- a proper fix in that function
should still be implemented
@billhollings
Copy link
Contributor

Replaced by PR #2392.

@billhollings
Copy link
Contributor

@WDog367 Thanks for submitting this, and for pointing out the issues.

Since this PR is mostly a workaround, I've replaced it with #2392, which should fix the same issues without workarounds.

Please retest with PR #2392, and confirm that it solves the same issues this PR does. Thanks.

@WDog367
Copy link
Contributor Author

WDog367 commented Nov 7, 2024

Yes PR #2392 fixes the issue

Thank you!

@WDog367 WDog367 closed this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants