Layernorm changes #681

vgokhale · 2024-12-12T21:27:27Z

Add support to specify steps in benchmark or single value
Add / remove autotune configs
Added a non-blocked implementation for small shapes

brunomazzottiamd · 2024-12-20T11:43:59Z

python/perf-kernels/layernorm.py

-    parser.add_argument('-N', "--N_start", default="1024", type=int)
-    parser.add_argument('-Ns', "--N_step", default="2048", type=int)
-    parser.add_argument('-Ne', "--N_end", default="65536", type=int)
+    parser.add_argument('-N', "--N_start", default="65536", type=int)


[Non-blocking] I think default argparse values do not necessarily need to be strings, you can use 65536 instead of "65536", 0 instead of "0" and so on...

brunomazzottiamd · 2024-12-20T11:47:57Z

python/perf-kernels/layernorm.py

+        x_vals_list.append(args.N_start)
+        x_names = ['N']
+        mn_args = {'M': args.M_start}
+        plot_name = str("layernorm-performance" + "_M" + str(args.M_start) + "_N" + str(args.N_start))


[Non-blocking] I think f-string interpolation makes this piece of code more readable:

plot_name = f"layernorm-performance_M{args.M_start}_N{args.N_start}"

Probably the same suggestion can be applied in other assignments to plot_name as well (lines 193, 187).

brunomazzottiamd · 2024-12-20T11:50:52Z

python/perf-kernels/layernorm.py

+    sweep_m = args.M_step != 0
+    sweep_n = args.N_step != 0
+    x_vals_list = []
+    if (sweep_m):


[Non-blocking] Do we need (...) in Python if statements? Is it a coding pattern for Booleans? Being naive I would just do if sweep_m:.

The same suggestions cab be applied to line 190.

brunomazzottiamd · 2024-12-20T11:55:34Z

python/perf-kernels/layernorm.py

    #program id
    row = tl.program_id(0)
+    tl.assume(row > 0)


[Question] What tl.assume does? Is it a compile time or run time assertion?

[Question] The kernel launch grid is grid = lambda meta: (n_rows, ). What's the program ID range? Is it [0, n_rows) (open end) or [1, n_rows] (closed end)? If it is the first option, I believe that we should check for row >= 0.

xiaohuguo2023 · 2024-12-20T14:59:20Z

python/perf-kernels/layernorm.py

+        triton.Config({'waves_per_eu': 2}, num_warps=2),
+        triton.Config({'waves_per_eu': 1}, num_warps=4),
+        triton.Config({'waves_per_eu': 2}, num_warps=4),
+        triton.Config({'waves_per_eu': 2}, num_warps=8),


for rmsnorm, waves_per_eu could be 4, are we definitely sure 4 should be in the lists ?

xiaohuguo2023 · 2024-12-20T15:11:44Z

python/perf-kernels/layernorm.py

+    b_block = tl.load(b_ptr + col_offs, mask=mask, other=0.0)
+    y_block = (x_block - mean) * rstd
+    y_block = y_block * w_block + b_block
+    tl.store(y_ptr_start + col_offs, y_block, mask=mask)


do we need explicitly convert y_block back to y_ptr.dtype.type.element_ty ?

xiaohuguo2023 · 2024-12-20T15:12:33Z

python/perf-kernels/layernorm.py

@@ -112,17 +89,45 @@ def layernorm_kernel(x_ptr, y_ptr, w_ptr, b_ptr, x_row_stride, y_row_stride, n_r
    tl.store(y_ptr_start + col_offsets, y_block, mask=mask)


do we need explicitly convert y_block back to y_ptr.dtype.type.element_ty ?

xiaohuguo2023 · 2024-12-20T15:16:13Z

python/perf-kernels/layernorm.py

+    var = tl.sum(_x_block * _x_block, axis=0) / n_cols
+    rstd = tl.rsqrt(var + eps)
+
+    w_block = tl.load(w_ptr + col_offs, mask=mask, other=0.0)


do we need covert explicitly for w_block and b_block to tl.float32

vgokhale added 3 commits December 12, 2024 20:52

Initial commit

4d84505

First working commit

618493e

Uncomment autotune configs

501b5d8

vgokhale requested review from rahulbatra85 and xiaohuguo2023 December 12, 2024 21:27

vgokhale self-assigned this Dec 12, 2024

vgokhale added 3 commits December 12, 2024 21:28

Add tl.assume to non blocked impl

ed152a2

Bugfix

faa9efe

Code formatting

5e2a6c4

vgokhale requested review from scxiao and brunomazzottiamd and removed request for scxiao December 16, 2024 18:10

brunomazzottiamd approved these changes Dec 20, 2024

View reviewed changes

xiaohuguo2023 reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layernorm changes #681

Layernorm changes #681

vgokhale commented Dec 12, 2024

brunomazzottiamd Dec 20, 2024

brunomazzottiamd Dec 20, 2024

brunomazzottiamd Dec 20, 2024

brunomazzottiamd Dec 20, 2024 •

edited

Loading

xiaohuguo2023 Dec 20, 2024

xiaohuguo2023 Dec 20, 2024

xiaohuguo2023 Dec 20, 2024

xiaohuguo2023 Dec 20, 2024

		@@ -112,17 +89,45 @@ def layernorm_kernel(x_ptr, y_ptr, w_ptr, b_ptr, x_row_stride, y_row_stride, n_r
		tl.store(y_ptr_start + col_offsets, y_block, mask=mask)

Layernorm changes #681

Are you sure you want to change the base?

Layernorm changes #681

Conversation

vgokhale commented Dec 12, 2024

brunomazzottiamd Dec 20, 2024

Choose a reason for hiding this comment

brunomazzottiamd Dec 20, 2024

Choose a reason for hiding this comment

brunomazzottiamd Dec 20, 2024

Choose a reason for hiding this comment

brunomazzottiamd Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

xiaohuguo2023 Dec 20, 2024

Choose a reason for hiding this comment

xiaohuguo2023 Dec 20, 2024

Choose a reason for hiding this comment

xiaohuguo2023 Dec 20, 2024

Choose a reason for hiding this comment

xiaohuguo2023 Dec 20, 2024

Choose a reason for hiding this comment

brunomazzottiamd Dec 20, 2024 •

edited

Loading