Skip to content

Commit

Permalink
Add test for open-llama-3b-v2-f16 model through sharktank. (#272)
Browse files Browse the repository at this point in the history
Progress on nod-ai/SHARK-Platform#22

This adds one test for a llama model running through
https://github.com/nod-ai/sharktank. That project is still getting set
up, so new docs for this particular workflow are coming in at
nod-ai/SHARK-Platform#69 and tests in that repo are
in nod-ai/SHARK-Platform#70.

Specifically, this exercises:
*
[`sharktank/models/llama/llama.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/models/llama/llama.py)
*
[`sharktank/examples/export_paged_llm_v1.py`](https://github.com/nod-ai/sharktank/blob/main/sharktank/sharktank/examples/export_paged_llm_v1.py)
with batch sizes == [4]
* The `open-llama-3b-v2-f16.gguf` file from
https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf
* Compilation and crashless execution, _not_ numerical correctness (yet)

Ideas for future work:

* Test cases for the same model/parameters
  * Other batch sizes
  * `decode()` as well as `prefill()`
* Real inputs with expected outputs (`decode()` crashes on some faked
inputs still 🤔)
* Other flag combinations and target configurations (starting simple
though)
* Test cases for other models/parameters
  * 8b / 70b parameter models
  * Mistral, Mixtral, Gemma, etc.
  • Loading branch information
ScottTodd authored Jun 28, 2024
1 parent 4486c44 commit 3603a45
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 3 deletions.
9 changes: 6 additions & 3 deletions .github/workflows/test_iree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,15 @@ jobs:
run: |
source ${VENV_DIR}/bin/activate
python3 iree_tests/download_remote_files.py --root-dir pytorch/models
python3 iree_tests/download_remote_files.py --root-dir sharktank
- name: "Running real weights model tests"
if: ${{ !cancelled() }}
- name: "Running real weight model tests"
if: "matrix.models-config-file != '' && !cancelled()"
run: |
source ${VENV_DIR}/bin/activate
pytest iree_tests/pytorch/models \
pytest \
iree_tests/pytorch/models \
iree_tests/sharktank \
-n 4 \
-rpfE \
-k real_weights \
Expand Down
19 changes: 19 additions & 0 deletions iree_tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,25 @@ Then, run the runner with the appropriate command line args (vmfb path, device f
You should have all the artifacts needed to add to this TestSuite at that point.
Make sure to follow to follow appendix instructions to convert between different file types for weights and mlir.
### SHARK Tank models
These test cases are exported from https://github.com/nod-ai/sharktank.
## Steps to add test cases
* Follow instructions in https://github.com/nod-ai/sharktank/blob/main/docs/model_cookbook.md
* Convert the exported `.mlir` to `.mlirbc`:
```bash
iree-ir-tool cp file.mlir --emit-bytecode -o file.mlirbc
```
* Create a test_cases.json file with parameters, inputs, and outputs
* Parameters can come from Hugging Face by using URL from "download file"
* TODO: inputs and outputs should be exportable from sharktank/shortfin
(or a script here - need to run the tokenizer and optionally populate the
KV cache for some models)
## Appendix
### Working with .mlirbc files
Expand Down
3 changes: 3 additions & 0 deletions iree_tests/configs/models_gpu_rocm_gfx90a.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
"expected_compile_failures": [
"pytorch/models/opt-125M", // TODO(#17344): need to regenerate .mlirbc
"pytorch/models/resnet50",
// error: 'builtin.module' op failed to run transform dialect passes
// (might need to drop the iree-codegen-transform-dialect-library flag)
"sharktank/llama/open-llama-3b-v2-f16"
],
"expected_run_failures": []
}
Git LFS file not shown
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
--parameters=model=open-llama-3b-v2-f16.gguf
--function=prefill_bs4
--input=4x1xi64=0
--input=4xi64=1
--input=4x1xi64=0,1,2,3
--input=1x2662400xf16
13 changes: 13 additions & 0 deletions iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"file_format": "test_cases_v0",
"test_cases": [
{
"name": "real_weights_prefill",
"runtime_flagfile": "real_weights_prefill_data_flags.txt",
"remote_files": [
"https://huggingface.co/SlyEcho/open_llama_3b_v2_gguf/resolve/main/open-llama-3b-v2-f16.gguf",
// TODO: files for real inputs and real expected outputs
]
}
]
}

0 comments on commit 3603a45

Please sign in to comment.