Skip to content

Commit

Permalink
meta-llama: update llama-stack instruction to 6bcd1bd
Browse files Browse the repository at this point in the history
Signed-off-by: Dmitry Rogozhkin <[email protected]>
  • Loading branch information
dvrogozh committed Dec 2, 2024
1 parent 20ad388 commit c6a4b10
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 13 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
From 3295a112e6e40c1f9cf80374833a20ebad648848 Mon Sep 17 00:00:00 2001
From cc788054276114390871e5172b1b1e360f14b365 Mon Sep 17 00:00:00 2001
From: Dmitry Rogozhkin <[email protected]>
Date: Mon, 18 Nov 2024 16:00:55 -0800
Subject: [PATCH] feat: enable xpu support for meta-reference stack
Expand All @@ -10,10 +10,10 @@ Signed-off-by: Dmitry Rogozhkin <[email protected]>
1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/llama_stack/providers/inline/inference/meta_reference/generation.py b/llama_stack/providers/inline/inference/meta_reference/generation.py
index 38c9824..aec503c 100644
index 080e33b..fbced7c 100644
--- a/llama_stack/providers/inline/inference/meta_reference/generation.py
+++ b/llama_stack/providers/inline/inference/meta_reference/generation.py
@@ -89,7 +89,10 @@ class Llama:
@@ -91,7 +91,10 @@ class Llama:
llama_model = model.core_model_id.value

if not torch.distributed.is_initialized():
Expand All @@ -25,7 +25,7 @@ index 38c9824..aec503c 100644

model_parallel_size = config.model_parallel_size

@@ -97,7 +100,14 @@ class Llama:
@@ -99,7 +102,14 @@ class Llama:
initialize_model_parallel(model_parallel_size)

local_rank = int(os.environ.get("LOCAL_RANK", 0))
Expand All @@ -41,7 +41,7 @@ index 38c9824..aec503c 100644

# seed must be the same in all processes
if config.torch_seed is not None:
@@ -175,19 +185,21 @@ class Llama:
@@ -176,19 +186,21 @@ class Llama:
"Currently int4 and fp8 are the only supported quantization methods."
)
else:
Expand All @@ -62,13 +62,13 @@ index 38c9824..aec503c 100644

+ model.to(device)
+
print(f"Loaded in {time.time() - start_time:.2f} seconds")
log.info(f"Loaded in {time.time() - start_time:.2f} seconds")
- return Llama(model, tokenizer, model_args, llama_model)
+ return Llama(model, tokenizer, model_args, llama_model, device)

def __init__(
self,
@@ -195,12 +207,14 @@ class Llama:
@@ -196,12 +208,14 @@ class Llama:
tokenizer: Tokenizer,
args: ModelArgs,
llama_model: str,
Expand All @@ -83,7 +83,7 @@ index 38c9824..aec503c 100644

@torch.inference_mode()
def generate(
@@ -254,14 +268,14 @@ class Llama:
@@ -253,14 +267,14 @@ class Llama:
)

pad_id = self.tokenizer.pad_id
Expand All @@ -101,7 +101,7 @@ index 38c9824..aec503c 100644
input_text_mask = tokens != pad_id
if min_prompt_len == total_len:
# TODO(ashwin): unify this branch with the one below and figure out multimodal crap
@@ -273,11 +287,11 @@ class Llama:
@@ -272,11 +286,11 @@ class Llama:
ignore_index=pad_id,
)

Expand Down
8 changes: 4 additions & 4 deletions meta-llama/run-llama-stack-on-intel-gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
[llama-stack] provides building blocks to build llama applications. It contains API specifications, API providers and distributions. Distributions can be used to build llama stack servers to serve applications.

As of [91e7efb] llama-stack requires patches to support Intel GPUs via PyTorch XPU backend:
As of [6bcd1bd] llama-stack requires patches to support Intel GPUs via PyTorch XPU backend:

* Patches for [llama-stack] at [91e7efb]:
* Patches for [llama-stack] at [6bcd1bd]:

* [0001-feat-enable-xpu-support-for-meta-reference-stack.patch]

Expand Down Expand Up @@ -207,7 +207,7 @@ The output will be similar to the following (will be on a single line vs. what i
```

[llama-stack]: https://github.com/meta-llama/llama-stack
[91e7efb]: https://github.com/meta-llama/llama-stack/commit/91e7efbc91c729d74c5cf9b3947d3e8acc1fbb71
[6bcd1bd]: https://github.com/meta-llama/llama-stack/commit/6bcd1bd9f10a7bdda040e9549828770d5793145b
[0001-feat-enable-xpu-support-for-meta-reference-stack.patch]: patches/llama-stack/0001-feat-enable-xpu-support-for-meta-reference-stack.patch

[llama-models]: https://github.com/meta-llama/llama-models
Expand All @@ -218,4 +218,4 @@ The output will be similar to the following (will be on a single line vs. what i
[0001-Add-optional-arg-to-specify-device-for-Transformer-m.patch]: patches/llama-models/0001-Add-optional-arg-to-specify-device-for-Transformer-m.patch
[0002-Add-option-to-initialize-multimodal-model-on-devices.patch]: patches/llama-models/0002-Add-option-to-initialize-multimodal-model-on-devices.patch

[Meta Reference Distribution]: https://github.com/meta-llama/llama-stack/blob/91e7efbc91c729d74c5cf9b3947d3e8acc1fbb71/docs/source/getting_started/distributions/self_hosted_distro/meta-reference-gpu.md
[Meta Reference Distribution]: https://github.com/meta-llama/llama-stack/blob/6bcd1bd9f10a7bdda040e9549828770d5793145b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md

0 comments on commit c6a4b10

Please sign in to comment.