Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‘main’ branch seems to be error #4

Open
wxthu opened this issue Mar 5, 2022 · 8 comments
Open

‘main’ branch seems to be error #4

wxthu opened this issue Mar 5, 2022 · 8 comments

Comments

@wxthu
Copy link

wxthu commented Mar 5, 2022

I switched to another machine and run the main branch, but got some compiling errors...
image

when I checkout to master branch, it just worked. However, the performance of customizd kernel is out of expectations compared to torch build-in interface. I am confused

@DeMoriarty
Copy link
Owner

the kernel in master branch is an older version. I have fixed the bug in the main branch, can you try again?

@wxthu
Copy link
Author

wxthu commented Mar 5, 2022

the kernel in master branch is an older version. I have fixed the bug in the main branch, can you try again?

I think you misunderstood me. Master branch could work but performance is bad. However, main branch could not run and got some compiling errors when I tried run. The compiled error is as aboved

@DeMoriarty
Copy link
Owner

the performance of master branch isn't good, because its an older version of the bmm kernel, which is not as optmized as the kernel in the main branch. I have fixed the bug that's causing the main branch to have compiling error. so please try to run the kernel in the main branch again.

@wxthu
Copy link
Author

wxthu commented Mar 5, 2022

the performance of master branch isn't good, because its an older version of the bmm kernel, which is not as optmized as the kernel in the main branch. I have fixed the bug that's causing the main branch to have compiling error. so please try to run the kernel in the main branch again.

Thanks, I have tried again and it really worked. BTW, this kernel is not hardware-agnostic so I need to tune some parameters or re-write the cuda kernel to get better performance on NVIDIA RTX-3090, right?

@DeMoriarty
Copy link
Owner

Yes, as I explained in this blog post , this kernel is optimized for Turing series GPUs (such as Tesla T4, RTX 2080, Titan RTX...). For better performance on Ampere GPUs, it will be necessary to redesign certain parts of the kernel.

@wxthu
Copy link
Author

wxthu commented Mar 6, 2022

Yes, as I explained in this blog post , this kernel is optimized for Turing series GPUs (such as Tesla T4, RTX 2080, Titan RTX...). For better performance on Ampere GPUs, it will be necessary to redesign certain parts of the kernel.

Would you give some advice about what characteristics of hardware platform we should consider to design the performance of kernels. Thank you very much

@DeMoriarty
Copy link
Owner

I'd recommend you to look into cutlass, which is open sourced and have reliable performance on varius gpu architectures.

@wxthu
Copy link
Author

wxthu commented Mar 6, 2022

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants