Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D and 3D tile divisions so that permutation coordinates can be read from threadIdx and blockIdx #406

Open
ChrisDryden opened this issue May 13, 2024 · 3 comments

Comments

@ChrisDryden
Copy link
Contributor

ChrisDryden commented May 13, 2024

Supposedly the permutation kernels, even though they are mostly memory bound can reduce the amount of division and do thread coarsening by having a 2d or 3d grid and not have to do any division in the kernel itself

Looking into this from the advice of @ngc92:

integer divisions are really expensive, but I don't think they will matter much in a kernel as memory-bound as this. I guess the first thing to do would be some thread coarsening, so that the divisions are amortized, and possibly a 2D or 3D grid, so that you don't even have to do the divisions at all, and can just read off individual coordinates from threadIdx and blockIdx. 

Creating this issue to track progress on this

@ChrisDryden
Copy link
Contributor Author

Where this came up in discussion was regarding the possibility of adding all of the constants that can be passed into the kernel directly, such as the following values: https://github.com/karpathy/llm.c/blob/master/train_gpt2.cu#L689

Wouldn't neccesarily add more lines of code, just reorganize where the calculations are done. From a theoretical standpoint this should speed things up since it reduces the amount of calculations by a factor of how many kernels are used

@Karliz24
Copy link

👍🏻

@ChrisDryden
Copy link
Contributor Author

Created an example implementation here: #459 but it doesn't seem to be working properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants