Persistent version of Flash Attention #2407

manman-ren · 2024-08-02T18:41:02Z

Added two more variants: triton_tutorial_flash_v2_persistent and triton_tutorial_flash_v2_persistent_tma
The variants handle non-causal only. For causal, it has 2 invocations to attn_fwd_inner, which means we will have an outerloop and 2 inner loops
for ... # persistent loop
for ...
for ...
It is not clear how to flatten it into a 1D loop.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

plotfi · 2024-08-02T19:25:03Z

torchbenchmark/util/kernels/triton_fused_attention.py

+
+@triton.autotune(list(filter(keep, configs)), key=["N_CTX"])
+@triton.jit
+def _attn_fwd_persistent_tma(Q, Out, desc_q, desc_k, desc_v, sm_scale, M, desc_o,  #


Is this a copy of _attn_fwd_persistent but with TMA changes?

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Persistent version of Flash Attention

85d87d3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren marked this pull request as draft August 2, 2024 18:41

facebook-github-bot added the cla signed label Aug 2, 2024

manman-ren temporarily deployed to docker-s3-upload August 2, 2024 18:41 — with GitHub Actions Inactive

manman-ren requested review from embg and xuzhao9 August 2, 2024 19:07

plotfi reviewed Aug 2, 2024

View reviewed changes

minor cleanup

ab3ce6c

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren temporarily deployed to docker-s3-upload August 2, 2024 23:30 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent version of Flash Attention #2407

Persistent version of Flash Attention #2407

manman-ren commented Aug 2, 2024 •

edited

Loading

plotfi Aug 2, 2024

Persistent version of Flash Attention #2407

Are you sure you want to change the base?

Persistent version of Flash Attention #2407

Conversation

manman-ren commented Aug 2, 2024 • edited Loading

plotfi Aug 2, 2024

Choose a reason for hiding this comment

manman-ren commented Aug 2, 2024 •

edited

Loading