Are `wgpu_hal::CommandEncoder::set_bind_group`'s guarantees too weak / too strong? #6233

jimblandy · 2024-09-06T23:41:42Z

With #6209, the documentation for wgpu_hal::CommandEncoder::set_bind_group says:

If this is not the first call to set_bind_group within the current
render or compute pass:

If layout contains n bind group layouts, then any previously set
bind groups at indices n or higher are cleared.

If the first m bind group layouts of layout are equal to those of
the previously passed layout, but no more, then any previously set
bind groups at indices m or higher are cleared.

It follows from the above that passing the same layout as before doesn't
clear any bind groups.

@teoxoy commented:

Do all native APIs we target clear all n/m or higher bind groups?
I thought only Vulkan required users to rebind the n/m or higher bind groups.

... If not all backends "clear", then wgpu-hal shouldn't promise that.
I think we should instead require users of the API to rebind the bind groups but not promise that they will be cleared.

(Request for patience: This whole question is, for some reason, hard for me to get my head around (maybe because it's Friday? dunno), so I'm going to spell a lot of stuff out and go back to first principles - but this just is for my benefit in keeping my head straight; it's not my intention to lecture anyone or suggest that I'm telling them something they don't know.)

These docs are simultaneously placing obligations on both wgpu_hal users and wgpu_hal backends. And each obligation placed on one party is a promise that the other party can depend on.

First, the caller's obligations:

wgpu_hal users have long had an obligation to re-set bind groups after certain layout switches, before they can use a pipeline with the new layout. Specifically, the new text is supposed to capture Vukan's rules about Pipeline Layout Compatibility and "disturbed" descriptor sets. Vulkan says that a disturbed descriptor set is not usable by subsequent pipelines:

When binding a pipeline, the pipeline can correctly access any previously bound descriptor set N if it was bound with compatible pipeline layout for set N, and it was not disturbed.

As it turns out, our Metal backend has a very similar requirement, but for different reasons. It uses direct binding in which bind group indices and binding numbers within a bind group are flattened out into sequentially assigned MSL attribute index numbers, meaning that changing to a layout in which a lower-indexed bind group contains a different number of resources will skew the index numbers of higher-indexed bind groups, so that any previously-set bind groups' resources are now at the wrong indices.

My doc text talking about "clearing" bind groups was meant to spell out existing obligations on the wgpu_hal user to re-set bind groups, not to describe the backend's obligation to free a resource.

Second, the backend's obligations:

I believe my text does not effectively place any obligation on backends. For it to do so, a well-behaved user would need to be able to observe that a bind group had not actually been cleared. But the only thing that can observe the presence or absence of a bind group is a draw call. A draw call can certainly tell if the bind groups it does use are present, but it is blind to bind groups at indices it does not use.

The conversation in gpuweb/gpuweb#3787, "It should be possible to unbind a bind group / vertex buffer" would seem to suggest that a well-behaved user could observe some sort of resource exhaustion if slots are not cleared. But for wgpu_hal, at least, this is not the case: what my set_bind_group docs do, in effect, is give backends permission to overwrite the locations that are in limited supply. Specifically, at the Metal backend level, a correct wgpu_hal user will never lead wgpu_hal::metal to try to use direct resource attribute indices above 31, which is the actual limit we're contending with.

The key difference is that wgpu_hal requires the set_bind_group user to supply a pipeline layout, which gives the metal backend enough information to know exactly which direct resource ids the bind groups' contents will land on.

If wgpu_hal::metal switches to using argument buffers, I think the same logic still applies: wgpu_hal::metal will always know which Metal ``[[buffer]]` index to place argument buffers at, without ever needing to clear anything.

I think this addresses Teo's concern that we "not promise that [bind groups] will be cleared."

I filed this issue to cover the concerns Teo, raised above, as well as a concern he raised in Mozilla chat:

The reason I was saying that we don't have to change the HAL contract is because the limit only applies to implementations that use argument buffers, which we don't use yet.

Indeed, we don't need to change the obligations wgpu_hal places on backends. But we do need to state the obligations it places on its users.

The text was updated successfully, but these errors were encountered:

jimblandy · 2024-09-06T23:50:15Z

"Uh, great, so, what's the issue?"

Do the existing set_bind_group docs need further clarification?
Is the documented behavior sufficient to implement WebGPU's promised behavior on Metal, both with and without argument buffers?

teoxoy · 2024-09-09T10:58:39Z

As it turns out, our Metal backend has a very similar requirement, but for different reasons. It uses direct binding in which bind group indices and binding numbers within a bind group are flattened out into sequentially assigned MSL attribute index numbers, meaning that changing to a layout in which a lower-indexed bind group contains a different number of resources will skew the index numbers of higher-indexed bind groups, so that any previously-set bind groups' resources are now at the wrong indices.

I would say that having the wrong things bound to the wrong indices is surprising given that the docs say the bind groups have been cleared (even if this is a case where the user of the API did the wrong thing by forgetting to re-set the bind groups).

I find that hal's docs would ideally document requirements of the native APIs first and optionally inferred behavior but in this case we documented the inferred behavior and not the underlying requirements.

My doc text talking about "clearing" bind groups was meant to spell out existing obligations on the wgpu_hal user to re-set bind groups, not to describe the backend's obligation to free a resource.

I think rephrasing the docs to document the "re-set bind groups" requirement explicitly would avoid confusion.

ErichDonGubler assigned jimblandy Sep 16, 2024

ErichDonGubler added the area: documentation Documentation for crate items, public or private label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are `wgpu_hal::CommandEncoder::set_bind_group`'s guarantees too weak / too strong? #6233

Are `wgpu_hal::CommandEncoder::set_bind_group`'s guarantees too weak / too strong? #6233

jimblandy commented Sep 6, 2024 •

edited

Loading

jimblandy commented Sep 6, 2024

teoxoy commented Sep 9, 2024

Are wgpu_hal::CommandEncoder::set_bind_group's guarantees too weak / too strong? #6233

Are wgpu_hal::CommandEncoder::set_bind_group's guarantees too weak / too strong? #6233

Comments

jimblandy commented Sep 6, 2024 • edited Loading

jimblandy commented Sep 6, 2024

teoxoy commented Sep 9, 2024

Are `wgpu_hal::CommandEncoder::set_bind_group`'s guarantees too weak / too strong? #6233

Are `wgpu_hal::CommandEncoder::set_bind_group`'s guarantees too weak / too strong? #6233

jimblandy commented Sep 6, 2024 •

edited

Loading