Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip FlashDecode users with cur_pos=-1 #12909

Closed
cglagovichTT opened this issue Sep 19, 2024 · 0 comments
Closed

Skip FlashDecode users with cur_pos=-1 #12909

cglagovichTT opened this issue Sep 19, 2024 · 0 comments
Assignees

Comments

@cglagovichTT
Copy link
Contributor

In order to support vLLM, we need to be able to tell FlashDecode to skip computation for a user if the cur_pos is -1. This is because vLLM will pad the decode batch to max_batch_size and set padded token_indices to -1.

@cglagovichTT cglagovichTT self-assigned this Sep 19, 2024
cglagovichTT added a commit that referenced this issue Sep 20, 2024
Index of -1 causes FlashDecode to skip computation. Based on cur_pos, skip tile reads for K and V chunks outside of valid range.

---------

Signed-off-by: Salar Hosseini <[email protected]>
Co-authored-by: Colman Glagovich <[email protected]>
Co-authored-by: Salar Hosseini <[email protected]>
skhorasganiTT pushed a commit that referenced this issue Sep 24, 2024
Index of -1 causes FlashDecode to skip computation. Based on cur_pos, skip tile reads for K and V chunks outside of valid range.

---------

Signed-off-by: Salar Hosseini <[email protected]>
Co-authored-by: Colman Glagovich <[email protected]>
Co-authored-by: Salar Hosseini <[email protected]>
(cherry picked from commit a3afddb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant