Update ROCM libs and improvements #2358

mht-sharma · 2024-08-05T12:31:19Z

What does this PR do?

This PR introduces various library updates to address breaking changes, including optimisations for ROCm and custom kernels for low-batch-size GEMM and Paged attention. Key improvements are as follows:

ErikKaum · 2024-08-30T15:07:52Z

Hi @mht-sharma 👋

Just checking in on this: are you still working on it or is this something we should consider closed? And my intention is by no means to say that we're in a hurry 👍

mht-sharma · 2024-09-02T14:37:42Z

Hi @mht-sharma 👋

Just checking in on this: are you still working on it or is this something we should consider closed? And my intention is by no means to say that we're in a hurry 👍

Hi @ErikKaum , yes I am currently working on this, with a few improvements and fixes still pending. I am working with AMD to ensure these updates are finalized soon.

HuggingFaceDocBuilderDev · 2024-09-12T15:47:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

mht-sharma · 2024-09-13T08:39:56Z

server/text_generation_server/models/flash_causal_lm.py

-                    torch.empty(
+                    torch.zeros(
                        (num_blocks, num_heads, head_size // x, BLOCK_SIZE, x),
                        dtype=dtype,
                        device=device,
                    ),
-                    torch.empty(
+                    torch.zeros(
                        (num_blocks, num_heads, head_size, BLOCK_SIZE),
                        dtype=dtype,
                        device=device,


This change is required for custom PA kernel in ROCM.

mht-sharma · 2024-09-13T08:46:22Z

@OlivierDehaene @Narsil could you please review the PR and merge

danieldk

Really awesome to see ROCm get up to speed again.

Added a bunch of comments, most of them smaller nitpicks.

Dockerfile_amd

server/text_generation_server/layers/attention/rocm.py

server/text_generation_server/layers/linear.py

server/text_generation_server/layers/moe/fused_moe_rocm.py

server/text_generation_server/models/custom_modeling/flash_cohere_modeling.py

server/text_generation_server/models/globals.py

mht-sharma · 2024-09-27T15:56:26Z

Closing this in favour of #2579

mht-sharma added 2 commits August 5, 2024 12:46

style

0ad78d2

update torch

55e6059

mht-sharma force-pushed the fix_rocm_fa branch from e42e2da to 55e6059 Compare August 5, 2024 12:49

mht-sharma added 3 commits August 6, 2024 10:29

ix issues

5788c94

fix clone

d61f7e6

revert mkl

e557855

added custom PA

ff0505e

mht-sharma changed the title ~~WIP: Update ROCM libs~~ WIP: Update ROCM libs and improvements Sep 4, 2024

mht-sharma added 6 commits September 6, 2024 12:23

style

88e2997

fix style

3f2dc61

style

f3bc038

hide env vart

e2f48fa

fix mixtral model

0345816

fixed merge conflicts

0581626

mht-sharma mentioned this pull request Sep 11, 2024

TGI Improvements for ROCM 6.2 huggingface/optimum-amd#151

Open

8 tasks

mht-sharma added 2 commits September 12, 2024 13:16

add skinny kernel and merge fixes

59fd0cb

fix docker

4ba9210

mht-sharma marked this pull request as ready for review September 13, 2024 08:24

mht-sharma changed the title ~~WIP: Update ROCM libs and improvements~~ Update ROCM libs and improvements Sep 13, 2024

mht-sharma commented Sep 13, 2024

View reviewed changes

ErikKaum requested review from Narsil and OlivierDehaene September 18, 2024 11:51

mht-sharma added 4 commits September 18, 2024 12:03

euff

e6d07a6

fixed style

4fb947d

fix conflict

21d1b0c

fix issue for sliding window models

64e981f

danieldk reviewed Sep 27, 2024

View reviewed changes

mht-sharma added 4 commits September 27, 2024 10:28

addressed review comments

829144d

Merge remote-tracking branch 'upstream/main' into fix_rocm_fa

47c81d2

fix import

816d4b6

improved error messag

ac2dccd

mht-sharma requested a review from danieldk September 27, 2024 12:38

mht-sharma added 2 commits September 27, 2024 12:39

updated default value

a24c2cc

remove import

346dfe3

mht-sharma closed this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ROCM libs and improvements #2358

Update ROCM libs and improvements #2358

mht-sharma commented Aug 5, 2024 •

edited

Loading

ErikKaum commented Aug 30, 2024

mht-sharma commented Sep 2, 2024

HuggingFaceDocBuilderDev commented Sep 12, 2024

mht-sharma Sep 13, 2024

mht-sharma commented Sep 13, 2024

danieldk left a comment

mht-sharma commented Sep 27, 2024

Update ROCM libs and improvements #2358

Update ROCM libs and improvements #2358

Conversation

mht-sharma commented Aug 5, 2024 • edited Loading

What does this PR do?

ErikKaum commented Aug 30, 2024

mht-sharma commented Sep 2, 2024

HuggingFaceDocBuilderDev commented Sep 12, 2024

mht-sharma Sep 13, 2024

Choose a reason for hiding this comment

mht-sharma commented Sep 13, 2024

danieldk left a comment

Choose a reason for hiding this comment

mht-sharma commented Sep 27, 2024

mht-sharma commented Aug 5, 2024 •

edited

Loading