You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to support vLLM, we need to be able to tell FlashDecode to skip computation for a user if the cur_pos is -1. This is because vLLM will pad the decode batch to max_batch_size and set padded token_indices to -1.
The text was updated successfully, but these errors were encountered:
Index of -1 causes FlashDecode to skip computation. Based on cur_pos, skip tile reads for K and V chunks outside of valid range.
---------
Signed-off-by: Salar Hosseini <[email protected]>
Co-authored-by: Colman Glagovich <[email protected]>
Co-authored-by: Salar Hosseini <[email protected]>
Index of -1 causes FlashDecode to skip computation. Based on cur_pos, skip tile reads for K and V chunks outside of valid range.
---------
Signed-off-by: Salar Hosseini <[email protected]>
Co-authored-by: Colman Glagovich <[email protected]>
Co-authored-by: Salar Hosseini <[email protected]>
(cherry picked from commit a3afddb)
In order to support vLLM, we need to be able to tell FlashDecode to skip computation for a user if the cur_pos is -1. This is because vLLM will pad the decode batch to max_batch_size and set padded token_indices to -1.
The text was updated successfully, but these errors were encountered: