-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
‘main’ branch seems to be error #4
Comments
the kernel in master branch is an older version. I have fixed the bug in the main branch, can you try again? |
I think you misunderstood me. Master branch could work but performance is bad. However, main branch could not run and got some compiling errors when I tried run. The compiled error is as aboved |
the performance of master branch isn't good, because its an older version of the bmm kernel, which is not as optmized as the kernel in the main branch. I have fixed the bug that's causing the main branch to have compiling error. so please try to run the kernel in the main branch again. |
Thanks, I have tried again and it really worked. BTW, this kernel is not hardware-agnostic so I need to tune some parameters or re-write the cuda kernel to get better performance on NVIDIA RTX-3090, right? |
Yes, as I explained in this blog post , this kernel is optimized for Turing series GPUs (such as Tesla T4, RTX 2080, Titan RTX...). For better performance on Ampere GPUs, it will be necessary to redesign certain parts of the kernel. |
Would you give some advice about what characteristics of hardware platform we should consider to design the performance of kernels. Thank you very much |
I'd recommend you to look into cutlass, which is open sourced and have reliable performance on varius gpu architectures. |
Thanks so much! |
I switched to another machine and run the main branch, but got some compiling errors...
when I checkout to master branch, it just worked. However, the performance of customizd kernel is out of expectations compared to torch build-in interface. I am confused
The text was updated successfully, but these errors were encountered: