support bypassing data layout conversion for atomic operator #556

xiaohuguo2023 · 2024-04-09T14:18:29Z

This update is aiming to reduce LDS usage to remove large tile size limitation for stream-k when using atomic_add.

add support for both StoreOp and AtomicRMWOp

and it helps to remove 84

 %83 = arith.truncf %71 : tensor<256x256xf32, #mfma> to tensor<256x256xf16, #mfma>
 %84 = triton_gpu.convert_layout %83 : (tensor<256x256xf16, #mfma>) -> tensor<256x256xf16, #blocked>
          %85 = "tt.atomic_rmw"(%82, %84, %cst_0) <{atomic_rmw_op = 5 : i32, scope = 1 : i32, sem = 4 : i32}> : (tensor<256x256x!tt.ptr<f16, 1>, #blocked>, tensor<256x256xf16, #blocked>, tensor<256x256xi1, #blocked>) -> tensor<256x256xf16, #blocked>

to

 %87 = arith.truncf %75 : tensor<256x256xf32, #mfma> to tensor<256x256xf16, #mfma>
 %88 = "tt.atomic_rmw"(%86, %87, %cst) <{atomic_rmw_op = 5 : i32, scope = 1 : i32, sem = 4 : i32}> : (tensor<256x256x!tt.ptr<f16, 1>, #mfma>, tensor<256x256xf16, #mfma>, tensor<256x256xi1, #mfma>) -> tensor<256x256xf16, #mfma>
        }

support bypassing data layout conversion for atomic operator

037e6e5

xiaohuguo2023 requested review from zhanglx13, jayfurmanek and oplavsic April 9, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support bypassing data layout conversion for atomic operator #556

support bypassing data layout conversion for atomic operator #556

xiaohuguo2023 commented Apr 9, 2024

support bypassing data layout conversion for atomic operator #556

Are you sure you want to change the base?

support bypassing data layout conversion for atomic operator #556

Conversation

xiaohuguo2023 commented Apr 9, 2024