Skip to content

Commit

Permalink
#15737: Fix rope scaling factor for 1B and 3B models, improve acc test
Browse files Browse the repository at this point in the history
  • Loading branch information
yieldthought committed Dec 11, 2024
1 parent 79e0c68 commit 34dc167
Show file tree
Hide file tree
Showing 30 changed files with 430 additions and 164 deletions.
52 changes: 26 additions & 26 deletions models/demos/llama3/PERF.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,43 @@
# Llama 3 model performance and accuracy

Performance collected from [demo/demo.py](demo/demo.py) and accuracy collected from [tests/test_llama_accuracy.py](tests/test_llama_accuracy.py). You can generate this table by running these tests with the `lt` tool (tell it to run `accuracy,demo`) and pressing `m` whilst in the results section to export to markdown.
Performance collected from [demo/demo.py](demo/demo.py) and accuracy collected from [tests/test_llama_accuracy.py](tests/test_llama_accuracy.py). You can generate this table by running these tests with the `lt` tool (tell it to run `table`) and pressing `m` whilst in the results section to export to markdown.

Note that `test_llama_accuracy.py` parses the below to determine expected values.
Note that `test_llama_accuracy.py` parses the below to determine expected values +- 0.5.

## LlamaOptimizations.performance

This configuration uses bfp4 MLP FF1+FF3 for all models.

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) |
|-------|--------|-----------|-----------|---------------|
| 1b | N150 | 79 | 98 | 90.5 |
| 1b | N300 | 81 | 98 | 101.7 |
| 1b | T3K | 81 | 98 | 96.8 |
| 3b | N150 | 85 | 96 | 49.0 |
| 3b | N300 | 88 | 97 | 56.9 |
| 3b | T3K | 88 | 97 | 54.5 |
| 8b | N150 | 86 | 98 | 28.4 |
| 8b | N300 | 84 | 98 | 38.6 |
| 8b | T3K | 84 | 97 | 52.6 |
| 11b | N300 | 86 | 97 | 38.6 |
| 11b | T3K | 84 | 98 | 52.6 |
| 70b | T3K | 94 | 100 | 14.3 |
| 1b | N150 | 88 | 98 | 85.6 |
| 1b | N300 | 88 | 98 | 93.6 |
| 1b | T3K | 88 | 98 | 90.5 |
| 3b | N150 | 89 | 98 | 46.3 |
| 3b | N300 | 91 | 98 | 52.8 |
| 3b | T3K | 89 | 98 | 52.0 |
| 8b | N150 | 87 | 98 | 27.5 |
| 8b | N300 | 86 | 98 | 36.5 |
| 8b | T3K | 84 | 97 | 46.7 |
| 11b | N300 | 88 | 98 | 36.4 |
| 11b | T3K | 87 | 98 | 46.8 |
| 70b | T3K | 94 | 100 | 13.9 |

## LlamaOptimizations.accuracy

This configuration uses bfp4 MLP FF1+FF3 only for the 3.1-70B model.

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) |
|-------|--------|-----------|-----------|---------------|
| 1b | N150 | 77 | 96 | 85.8 |
| 1b | N300 | 80 | 98 | 98.6 |
| 1b | T3K | 78 | 98 | 97.2 |
| 3b | N150 | 88 | 98 | 44.1 |
| 3b | N300 | 88 | 98 | 53.9 |
| 3b | T3K | 88 | 98 | 54.8 |
| 8b | N150 | 89 | 98 | 23.5 |
| 8b | N300 | 90 | 98 | 34.1 |
| 8b | T3K | 88 | 97 | 49.9 |
| 11b | N300 | 90 | 97 | 33.8 |
| 11b | T3K | 88 | 97 | 52.6 |
| 70b | T3K | 94 | 100 | 14.5 |
| 1b | N150 | 88 | 98 | 81.7 |
| 1b | N300 | 88 | 98 | 91.5 |
| 1b | T3K | 88 | 98 | 87.8 |
| 3b | N150 | 89 | 98 | 41.9 |
| 3b | N300 | 91 | 98 | 50.4 |
| 3b | T3K | 89 | 98 | 51.4 |
| 8b | N150 | 87 | 98 | 22.9 |
| 8b | N300 | 86 | 98 | 32.8 |
| 8b | T3K | 84 | 97 | 46.0 |
| 11b | N300 | 88 | 98 | 32.4 |
| 11b | T3K | 87 | 98 | 44.1 |
| 70b | T3K | 94 | 100 | 13.9 |
6 changes: 5 additions & 1 deletion models/demos/llama3/demo/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,11 @@ def run_llama3_demo(
for batch_id in range(batch_size):
prefill_seq_len = prefill_lens[batch_id]
rot_mats_prefill = get_prefill_rot_mat(
model_args.head_dim, model_args.max_seq_len, mesh_device, seq_len=prefill_seq_len
model_args.head_dim,
model_args.max_seq_len,
mesh_device,
seq_len=prefill_seq_len,
scale_factor=model_args.rope_scaling_factor,
)
if decoding_pos[batch_id] < prefill_seq_len:
pt_prefill_input[batch_id][
Expand Down
Loading

0 comments on commit 34dc167

Please sign in to comment.