Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more unit tests to FA fwd kernels. #609

Open
wants to merge 5 commits into
base: main_perf
Choose a base branch
from

Conversation

xinyazhang
Copy link

Note it is not testing the backward kernel but use the kernel in ref_bwd*.py as reference.

Run the UT with pytest test_backward.py

To run a know set of parameters, change the main2 function and run python test_backward.py (pytest -k works but it's much slower and requires -s to enable standard output)

The whole test suite takes around 12 hours to complete (after disabling auto-tuning). The main problem is the Triton kernel compiling. With more tl.constexpr it make take even longer.

micmelesse and others added 5 commits June 19, 2024 08:21
Add Perf Kernels

This is a combination of 2 commits.

Add Perf Kernels

Add Perf Kernels

This is a combination of 6 commits.

add perf-kernels

fix formating issues

fix unused variables and other bugs

fix other issues

remove scripts

save

check changes

format

save

save

try

pre-commit check

save
Change all block pointers to tensor pointers

Block pointers are for nvidia TMAs. They are useful for regular loads as well but not well supported.

Also cleaned up some code I came across along the way and updated comment at the top.
Add support for layouts commonly used by users.

Add option for varlen / thd layout to specify equal context lengths for all batches. Also often used by users.
Note it is not testing the backward kernel but use
the kernel in _ref_bwd_*.py as reference.
@xinyazhang
Copy link
Author

Known problem: for Triton commit 00e09cf3008b86978f25f838659698e4a0bf6f45. Running pytest test_backward.py -v -x shows the following runtime error.

self = <.HIPLauncher object at 0x79492bd4c220>, args = (1, 1, 1, 180074496, 185729744, (8, 1, 32768, 1, 1, 1), ...), kwargs = {}

    def __call__(self, *args, **kwargs):
        print(f'{args=}')
        print(f'{kwargs=}')
>       self.launch(*args, **kwargs)
E       RuntimeError: Triton Error [HIP]:  Code: 1, Messsage: invalid argument

../../../aotriton/third_party/triton/python/triton/backends/amd/driver.py:420: RuntimeError

Removing all autotune configs (except for 'BLOCK_M': 16, 'BLOCK_N': 16,) can mitigate this problem, but this is probably not what we want.

@xinyazhang
Copy link
Author

Known problem: for Triton commit 00e09cf3008b86978f25f838659698e4a0bf6f45. Running pytest test_backward.py -v -x shows the following runtime error.

Confirmed this is caused by double loading of libamdhip64.so and can be fixed by a189c11

As a temporary solution, setting TRITON_LIBHIP_PATH to PyTorch's .so file can fix this as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants