New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Consolidate the stateless llama logic #729

Draft

gpetters-amd wants to merge 1 commit into nod-ai:main from gpetters-amd:ean-unify-sd

Contributor

gpetters-amd commented Jun 12, 2024

It's producing vmfbs now, just needs some more cleanup and vmfb runner logic if we want to do that here.

gpetters-amd requested a review from monorimet

June 12, 2024 16:28

monorimet force-pushed the ean-unify-sd branch from 94ba46d to 7754609 Compare

June 17, 2024 14:39

Member

dan-garvey commented Jun 20, 2024

why are we pulling this to @monorimet's branch? this should be fine standalone, @monorimet can rebase after merging it? That we we get good test coverage?

dan-garvey requested changes

View reviewed changes

models/turbine_models/custom_models/stateless_llama.py Show resolved Hide resolved

models/turbine_models/custom_models/stateless_llama.py Outdated Show resolved Hide resolved

models/turbine_models/custom_models/stateless_llama.py Outdated Show resolved Hide resolved

models/turbine_models/custom_models/stateless_llama.py

+                          device_inputs = [
+                              ireert.asdevicearray(self.device, input_tensor)
+                          ]
+                          if self.first_input: # or not self.streaming_llm:

Member

dan-garvey Jun 20, 2024

streaming llm commented code? is this from the original?

Contributor Author

gpetters-amd Jun 28, 2024

I redid some of the logic to remove non-streaming since I thought our plan was to only support streaming, but I think that's not actually the case. I'll add the support back in.

IanNod reviewed

View reviewed changes

models/turbine_models/custom_models/llm_cmd_opts.py Show resolved Hide resolved

models/turbine_models/custom_models/llm_cmd_opts.py Show resolved Hide resolved

models/turbine_models/custom_models/stateless_llama.py

@@ @@ -3,6 +3,7 @@ @@
               import re
               import json
               from turbine_models.turbine_tank import turbine_tank
+              from pathlib import Path

Contributor

IanNod Jun 28, 2024

unused?

models/turbine_models/custom_models/stateless_llama.py

@@ @@ -489,26 +491,362 @@ def evict_kvcache_space(self): @@
                       return blob_name, tokenizer
+              llm_model_map = {
+                  "meta-llama/Llama-2-7b-chat-hf": {

Contributor

IanNod Jun 28, 2024

not supporting the larger models we care about like 13b and 70b

models/turbine_models/custom_models/stateless_llama.py

		@@ -489,26 +491,362 @@ def evict_kvcache_space(self):
		return blob_name, tokenizer


		llm_model_map = {

Contributor

IanNod Jun 28, 2024

this might belong in a separate config file to reduce clutter, also more limiting to have this without some default setup

models/turbine_models/custom_models/stateless_llama.py Outdated

+                      pipeline_dir: str | Path = "./shark_vmfbs",
+                      external_weights_dir: str | Path = "./shark_weights",
+                      external_weights: str = "safetensors",
+                      custom_vae: str = None,

Contributor

IanNod Jun 28, 2024

remove vae and other unnecessary flags that look to come from SD (scheduler etc etc)

models/turbine_models/custom_models/stateless_llama.py Outdated

		}


		class StatelessLlamaPipeline:

Contributor

IanNod Jun 28, 2024

llama doesn't really have a pipeline, might want to remove pipeline references

models/turbine_models/custom_models/stateless_llama.py


		# FILE MANAGEMENT AND PIPELINE SETUP

		def check_prepared(

Contributor

IanNod Jun 28, 2024

The file management looks to be copied from SD code, can we just combine to reduce repeated code?

models/turbine_models/custom_models/stateless_llama.py


		# RUN

		def chat(self, prompt):

Contributor

IanNod Jun 28, 2024

Not used? Looks like this should be part of llm_runner.py, stateless_llama.py should be just for tracing and generating IR and or compiling vmfbs.

gpetters-amd force-pushed the ean-unify-sd branch from 5a9aaa0 to 16ee249 Compare

July 2, 2024 15:20

gpetters-amd changed the base branch from ean-unify-sd to main

July 2, 2024 23:05

IanNod reviewed

View reviewed changes

.github/workflows/test_models.yml Outdated

		pytest -v models/turbine_models/tests/sdxl_test.py --device vulkan --rt_device vulkan --iree_target_triple rdna3-unknown-linux
		pytest -v models/turbine_models/tests/sdxl_test.py --device rocm --rt_device hip --iree_target_triple gfx90a --precision fp16

Contributor

IanNod Jul 8, 2024

any sdxl related changes should be moved to a different PR

models/turbine_models/custom_models/llama_argmax_td_spec.mlir Outdated

		@@ -0,0 +1,169 @@
		// Copyright 2024 The IREE Authors

Contributor

IanNod Jul 8, 2024

this is only for argmax right? We should also pull in all the transform spec changes that apply from the sdxl spec file as well

models/turbine_models/custom_models/llm_cmd_opts.py Show resolved Hide resolved

models/turbine_models/custom_models/llm_cmd_opts.py Show resolved Hide resolved

models/turbine_models/custom_models/llm_cmd_opts.py Outdated

+              ##############################################################################
+              p.add_argument(
+                  "--seed", type=float, default=0, help="Seed for random number/latents generation."

Contributor

IanNod Jul 8, 2024

llama doesn't need a seed does it?

models/turbine_models/custom_models/llm_cmd_opts.py

+                  help="Path to location of vmfb files.",
+              )
+              p.add_argument(

Contributor

IanNod Jul 8, 2024

unnecessary weight flags for llama. We are only using 1 external weight file so could remove external_weights_dir, and I don't think we need external_weight_file below,

models/turbine_models/custom_models/resnet_18.py Outdated

		@@ -8,7 +8,7 @@
		from iree.compiler.ir import Context

Contributor

IanNod Jul 8, 2024

Let keep the separate model updates in separate patches. Makes it easier to track and revert patches if ever needed

models/turbine_models/tests/stateless_llama_test.py Outdated

@@ @@ -90,20 +90,42 @@ def test_vmfb_comparison(self): @@
                       upload_ir_var = os.environ.get("TURBINE_TANK_ACTION", "not_upload")
-                      blob_name = llama.export_transformer_model(
+                      # blob_name = llama.export_transformer_model(

Contributor

IanNod Jul 8, 2024

commented code?


          Fix some things

6db0f19

(And accidentally undo some cleanup, oops)

gpetters-amd force-pushed the ean-unify-sd branch from 0bfa2d9 to 6db0f19 Compare

July 17, 2024 21:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet