-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ROCM libs and improvements #2358
Conversation
Hi @mht-sharma 👋 Just checking in on this: are you still working on it or is this something we should consider closed? And my intention is by no means to say that we're in a hurry 👍 |
Hi @ErikKaum , yes I am currently working on this, with a few improvements and fixes still pending. I am working with AMD to ensure these updates are finalized soon. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
torch.empty( | ||
torch.zeros( | ||
(num_blocks, num_heads, head_size // x, BLOCK_SIZE, x), | ||
dtype=dtype, | ||
device=device, | ||
), | ||
torch.empty( | ||
torch.zeros( | ||
(num_blocks, num_heads, head_size, BLOCK_SIZE), | ||
dtype=dtype, | ||
device=device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is required for custom PA kernel in ROCM.
@OlivierDehaene @Narsil could you please review the PR and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really awesome to see ROCm get up to speed again.
Added a bunch of comments, most of them smaller nitpicks.
server/text_generation_server/models/custom_modeling/flash_cohere_modeling.py
Outdated
Show resolved
Hide resolved
Closing this in favour of #2579 |
What does this PR do?
This PR introduces various library updates to address breaking changes, including optimisations for ROCm and custom kernels for low-batch-size GEMM and Paged attention. Key improvements are as follows:
rocm/vllm
commit.