You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The blur shader has a data race. This race is specifically problematic on WARP, but can reproduce on any GPU with small wave sizes. The access pattern for the groupshared memory is:
Store 2xf16 in every slot of the arrays.
Sync
Read 2xf16
Compute horizontal blur
Store 1xf32 in every slot of the arrays
Sync
Read 1xf32
This pattern correctly inserts barriers to prevent hazards from write -> read (readers must wait until writes complete), but is missing barriers to prevent hazards from read -> write (writers must wait until all readers complete before overwriting data).
Since WARP executes 4-channel waves sequentially, it will deterministically hit a problematic case where some readers try to load 2xf16 data, but instead they read 1xf32 data. Trying to unpack this f32 as f16s produces nans and other garbage. Theoretically any GPU with a wave size smaller than 64 (since the blur uses 8x8 thread groups) can hit this.
The text was updated successfully, but these errors were encountered:
The blur shader has a data race. This race is specifically problematic on WARP, but can reproduce on any GPU with small wave sizes. The access pattern for the groupshared memory is:
This pattern correctly inserts barriers to prevent hazards from write -> read (readers must wait until writes complete), but is missing barriers to prevent hazards from read -> write (writers must wait until all readers complete before overwriting data).
Since WARP executes 4-channel waves sequentially, it will deterministically hit a problematic case where some readers try to load 2xf16 data, but instead they read 1xf32 data. Trying to unpack this f32 as f16s produces nans and other garbage. Theoretically any GPU with a wave size smaller than 64 (since the blur uses 8x8 thread groups) can hit this.
The text was updated successfully, but these errors were encountered: