Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate PagedAttention Optimization custom kernel into vLLM #22

Merged
merged 13 commits into from
May 30, 2024

Conversation

lcskrishna
Copy link

This PR introduces the custom optimized PagedAttention on ROCm. To use the Custom PagedAttention use the following env variable VLLM_USE_ROCM_CUSTOM_PAGED_ATTN=1.

Currently this PagedAttention kernel is supported with fp16 datatype, with GQA ratio=1 to 16 and Head-size=128 and block-size=16.

Kernel was authored by : @sanyalington

cc: @shajrawi @sunway513

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we have VLLM_USE_ROCM_CUSTOM_PAGED_ATTN on by default? i.e. need to be set to false to use the old one? That's what we do with Triton vs CK
  2. Can you add the env variable setting to our performance guide? https://github.com/ROCm/vllm/blob/main/ROCm_performance.md

Copy link

@sanyalington sanyalington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paged attention custom reduction kernel supports a max context length of 16K beyond which it can produce incorrect results. A fix for this and further reduction kernel perf improvements are going to be available in a later drop.

@lcskrishna
Copy link
Author

@shajrawi added into the documentation and enabled by default. The context length <= 32k support will be added into the next PR after this one.

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shajrawi shajrawi merged commit 87ec0c7 into main May 30, 2024
2 checks passed
@gshtras gshtras deleted the csrikris_pa_opt_shomy_1_16 branch August 20, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants