Add Layernorm kernel #641

rahulbatra85 · 2024-09-13T19:24:16Z

No description provided.

brunomazzottiamd

@rahulbatra85, I can't see anything wrong with your PR. I've just have some questions and minor code cleanup suggestions. Feel free to ignore them if you judge appropriate.

Please drop a short line about this new kernel to python/perf-kernels/README.md file.

brunomazzottiamd · 2024-09-18T14:13:04Z

.github/workflows/amd_perf_kernel_Integration_tests.yml

@@ -128,8 +128,10 @@ jobs:
          pytest -vvv ./python/perf-kernels/flash-attention.py
          pytest -vvvv ./python/perf-kernels/softmax.py
          pytest -vvv ./python/perf-kernels/rmsnorm.py
+          pytest -vvv ./python/perf-kernels/layernorm.py


What do you think about running all tests with just one pytest invocation? According to https://docs.pytest.org/en/stable/how-to/usage.html, it's possible to do something like pytest -vvvv ./python/perf-kernels. By this way, we'll be editing .github/workflows/amd_perf_kernel_Integration_tests.yml less often and new tests are going to run by default. Do you see any drawback?

Maybe it's worth asking @micmelesse's opinion on this.

yeah that's a Michael question

Let's wait for Michael's opinion!

that is fine. I think some of the tests are broken but maybe worth it to see the state of things

python/perf-kernels/layernorm.py

brunomazzottiamd · 2024-09-18T15:18:33Z

python/perf-kernels/layernorm.py

+        y = x_hat * w + b
+        # Write output
+        tl.store(Y + cols, y, mask=mask)
+


Just an idea:
We have three for loops that do masked loads. Do you foresee any benefit of peeling the last iteration of each loop so all iterations except the last one do unmasked loads? I think Shucai and Xiaohu got some performance improvements doing this with GEMMs. I'm not sure if the idea could be beneficial for layer norm.

ok, yeah, I didn't think of that. Will try this out

Please let me know if this helped at all.

python/perf-kernels/layernorm.py

rahulbatra85 force-pushed the main_perf-layernorm branch from b772444 to 674d526 Compare September 16, 2024 15:44

rahulbatra85 requested review from brunomazzottiamd, micmelesse and jtang10 September 16, 2024 15:45

brunomazzottiamd requested changes Sep 18, 2024

View reviewed changes

rahulbatra85 force-pushed the main_perf-layernorm branch from d88abbc to 13c01c4 Compare September 18, 2024 16:19

brunomazzottiamd self-requested a review September 18, 2024 17:05

brunomazzottiamd assigned brunomazzottiamd and rahulbatra85 and unassigned brunomazzottiamd Sep 18, 2024

brunomazzottiamd approved these changes Sep 18, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

Add Layernorm kernel

042aa91

rahulbatra85 force-pushed the main_perf-layernorm branch from 13c01c4 to 042aa91 Compare September 19, 2024 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Layernorm kernel #641

Add Layernorm kernel #641

rahulbatra85 commented Sep 13, 2024

brunomazzottiamd left a comment

brunomazzottiamd Sep 18, 2024

rahulbatra85 Sep 18, 2024

brunomazzottiamd Sep 18, 2024

micmelesse Sep 18, 2024 •

edited

Loading

brunomazzottiamd Sep 18, 2024

rahulbatra85 Sep 18, 2024

brunomazzottiamd Sep 18, 2024

This comment was marked as resolved.

Add Layernorm kernel #641

Are you sure you want to change the base?

Add Layernorm kernel #641

Conversation

rahulbatra85 commented Sep 13, 2024

brunomazzottiamd left a comment

Choose a reason for hiding this comment

brunomazzottiamd Sep 18, 2024

Choose a reason for hiding this comment

rahulbatra85 Sep 18, 2024

Choose a reason for hiding this comment

brunomazzottiamd Sep 18, 2024

Choose a reason for hiding this comment

micmelesse Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

brunomazzottiamd Sep 18, 2024

Choose a reason for hiding this comment

rahulbatra85 Sep 18, 2024

Choose a reason for hiding this comment

brunomazzottiamd Sep 18, 2024

Choose a reason for hiding this comment

This comment was marked as resolved.

micmelesse Sep 18, 2024 •

edited

Loading