Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A faster flash attention bwd implementation #177

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Commits on Jun 22, 2023

  1. A faster flash attention bwd implementation

    - Decompose the bwd kernel into two kernels, one for dq and one for dk,dv. 
    - Extra parallelism over the sequence length axis.
    - On a benchmark, it is 4X faster compared to the previous implementation. 2X faster than XLA bwd pass.
    tonywu95 authored Jun 22, 2023
    Configuration menu
    Copy the full SHA
    c65144b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7121d59 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2023

  1. fix comments by sharad

    tonywu95 committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    8156ad0 View commit details
    Browse the repository at this point in the history
  2. delete a comment

    tonywu95 committed Jun 23, 2023
    Configuration menu
    Copy the full SHA
    bdbddc9 View commit details
    Browse the repository at this point in the history