Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-GPU portability #42

Open
janEbert opened this issue Jun 25, 2024 · 0 comments
Open

Cross-GPU portability #42

janEbert opened this issue Jun 25, 2024 · 0 comments

Comments

@janEbert
Copy link

janEbert commented Jun 25, 2024

I was wondering whether and how ThunderKittens can help writing cross-GPU-portable code. I saw that the kernels in examples/based/linear_attn_forward between the 4090 (also used for the A100) and H100 look very different when judged by diff, but it seemed like the code is semantically pretty similar and it has just been written at different times or diverged over time. Is it feasible to assume that with some care and #ifdefs, this code could be made portable? Is portable code a use case for ThunderKittens at all or does your intended use favor writing super efficient code for just one GPU architecture?

Assuming portability is feasible at all: if I wanted to write abstracted portable, but highly efficient code with ThunderKittens, what are some specific things I need to look out for?

Thanks in advance and for the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant