Cross-GPU portability #42

janEbert · 2024-06-25T08:12:40Z

I was wondering whether and how ThunderKittens can help writing cross-GPU-portable code. I saw that the kernels in examples/based/linear_attn_forward between the 4090 (also used for the A100) and H100 look very different when judged by diff, but it seemed like the code is semantically pretty similar and it has just been written at different times or diverged over time. Is it feasible to assume that with some care and #ifdefs, this code could be made portable? Is portable code a use case for ThunderKittens at all or does your intended use favor writing super efficient code for just one GPU architecture?

Assuming portability is feasible at all: if I wanted to write abstracted portable, but highly efficient code with ThunderKittens, what are some specific things I need to look out for?

Thanks in advance and for the project!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-GPU portability #42

Cross-GPU portability #42

janEbert commented Jun 25, 2024 •

edited

Loading

Cross-GPU portability #42

Cross-GPU portability #42

Comments

janEbert commented Jun 25, 2024 • edited Loading

janEbert commented Jun 25, 2024 •

edited

Loading