New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

A faster flash attention bwd implementation #177

Open

tonywu95 wants to merge 4 commits into jax-ml:main from tonywu95:patch-2

Commits on Jun 22, 2023

A faster flash attention bwd implementation
```
- Decompose the bwd kernel into two kernels, one for dq and one for dk,dv. 
- Extra parallelism over the sequence length axis.
- On a benchmark, it is 4X faster compared to the previous implementation. 2X faster than XLA bwd pass.
```
tonywu95 authored Jun 22, 2023
Configuration menu
View commit details

Copy full SHA for c65144b

Browse repository at this point
Copy the full SHA

c65144b View commit details

Browse the repository at this point in the history
add back previous attention kernel as an option

tonywu95 committed Jun 22, 2023
Configuration menu
View commit details

Copy full SHA for 7121d59

Browse repository at this point
Copy the full SHA

7121d59 View commit details

Browse the repository at this point in the history

Commits on Jun 23, 2023

fix comments by sharad

tonywu95 committed Jun 23, 2023
Configuration menu
View commit details

Copy full SHA for 8156ad0

Browse repository at this point
Copy the full SHA

8156ad0 View commit details

Browse the repository at this point in the history
delete a comment

tonywu95 committed Jun 23, 2023
Configuration menu
View commit details

Copy full SHA for bdbddc9

Browse repository at this point
Copy the full SHA

bdbddc9 View commit details

Browse the repository at this point in the history