Add test for open-llama-3b-v2-f16 model through sharktank. (#272)

Progress on nod-ai/SHARK-Platform#22 This adds one test for a llama model running through https://github.com/nod-ai/sharktank. That project is still getting set up, so new docs for this particular workflow are coming in at nod-ai/SHARK-Platform#69 and tests in that repo are in nod-ai/SHARK-Platform#70. Specifically, this exercises: * [`sharktank/models/llama/llama.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/models/llama/llama.py) * [`sharktank/examples/export_paged_llm_v1.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/examples/export_paged_llm_v1.py) with batch sizes == [4] * The `open-llama-3b-v2-f16.gguf` file from https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf * Compilation and crashless execution, _not_ numerical correctness (yet) Ideas for future work: * Test cases for the same model/parameters * Other batch sizes * `decode()` as well as `prefill()` * Real inputs with expected outputs (`decode()` crashes on some faked inputs still 🤔) * Other flag combinations and target configurations (starting simple though) * Test cases for other models/parameters * 8b / 70b parameter models * Mistral, Mixtral, Gemma, etc.
nod-ai · Jun 28, 2024 · 3603a45 · 3603a45
1 parent 4486c44
commit 3603a45
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 3 deletions.
diff --git a/.github/workflows/test_iree.yml b/.github/workflows/test_iree.yml
@@ -146,12 +146,15 @@ jobs:
         run: |
           source ${VENV_DIR}/bin/activate
           python3 iree_tests/download_remote_files.py --root-dir pytorch/models
+          python3 iree_tests/download_remote_files.py --root-dir sharktank
 
-      - name: "Running real weights model tests"
-        if: ${{ !cancelled() }}
+      - name: "Running real weight model tests"
+        if: "matrix.models-config-file != '' && !cancelled()"
         run: |
           source ${VENV_DIR}/bin/activate
-          pytest iree_tests/pytorch/models \
+          pytest \
+            iree_tests/pytorch/models \
+            iree_tests/sharktank \
             -n 4 \
             -rpfE \
             -k real_weights \

diff --git a/iree_tests/README.md b/iree_tests/README.md
@@ -413,6 +413,25 @@ Then, run the runner with the appropriate command line args (vmfb path, device f
 You should have all the artifacts needed to add to this TestSuite at that point.
 Make sure to follow to follow appendix instructions to convert between different file types for weights and mlir.
 
+### SHARK Tank models
+
+These test cases are exported from https://github.com/nod-ai/sharktank.
+
+## Steps to add test cases
+
+* Follow instructions in https://github.com/nod-ai/sharktank/blob/main/docs/model_cookbook.md
+* Convert the exported `.mlir` to `.mlirbc`:
+
+    ```bash
+    iree-ir-tool cp file.mlir --emit-bytecode -o file.mlirbc
+    ```
+
+* Create a test_cases.json file with parameters, inputs, and outputs
+  * Parameters can come from Hugging Face by using URL from "download file"
+  * TODO: inputs and outputs should be exportable from sharktank/shortfin
+    (or a script here - need to run the tokenizer and optionally populate the
+    KV cache for some models)
+
 ## Appendix
 
 ### Working with .mlirbc files

diff --git a/iree_tests/configs/models_gpu_rocm_gfx90a.json b/iree_tests/configs/models_gpu_rocm_gfx90a.json
@@ -18,6 +18,9 @@
     "expected_compile_failures": [
       "pytorch/models/opt-125M", // TODO(#17344): need to regenerate .mlirbc
       "pytorch/models/resnet50",
+      // error: 'builtin.module' op failed to run transform dialect passes
+      // (might need to drop the iree-codegen-transform-dialect-library flag)
+      "sharktank/llama/open-llama-3b-v2-f16"
     ],
     "expected_run_failures": []
 }
diff --git a/iree_tests/sharktank/llama/open-llama-3b-v2-f16/open-llama-3b-v2-f16.mlirbc b/iree_tests/sharktank/llama/open-llama-3b-v2-f16/open-llama-3b-v2-f16.mlirbc
diff --git a/iree_tests/sharktank/llama/open-llama-3b-v2-f16/real_weights_prefill_data_flags.txt b/iree_tests/sharktank/llama/open-llama-3b-v2-f16/real_weights_prefill_data_flags.txt
@@ -0,0 +1,6 @@
+--parameters=model=open-llama-3b-v2-f16.gguf
+--function=prefill_bs4
+--input=4x1xi64=0
+--input=4xi64=1
+--input=4x1xi64=0,1,2,3
+--input=1x2662400xf16
diff --git a/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json b/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json
@@ -0,0 +1,13 @@
+{
+  "file_format": "test_cases_v0",
+  "test_cases": [
+    {
+      "name": "real_weights_prefill",
+      "runtime_flagfile": "real_weights_prefill_data_flags.txt",
+      "remote_files": [
+        "https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf/resolve/main/open-llama-3b-v2-f16.gguf",
+        // TODO: files for real inputs and real expected outputs
+      ]
+    }
+  ]
+}