Skip to content

Commit

Permalink
Add Llama3-8B perf numbers
Browse files Browse the repository at this point in the history
  • Loading branch information
yanboliang committed Apr 29, 2024
1 parent d64aa05 commit f9b5da4
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
| Llama-2-70B | Base | OOM ||
| | 8-bit | 19.13 | 1322.58 |
| | 4-bit (G=32) | 25.25 | 1097.66 |
| Llama-3-8B | Base | 93.95 | 1508.18 |
| | 8-bit | 114.35 | 978.02 |

### Speculative Sampling
[Verifier: Llama-70B (int4), Draft: Llama-7B (int4)](./scripts/speculate_70B_int4.sh): 48.4 tok/s
Expand All @@ -101,6 +103,10 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
| | 2 | 21.32 | 1481.87 |
| | 4 | 38.01 | 1340.76 |
| | 8 | 62.50 | 1135.29 |
| Llama-3-8B | 1 | 93.97 | 1508.46 |
| | 2 | 149.44 | 1358.63 |
| | 4 | 217.80 | 1218.76 |
| | 8 | 271.03 | 1041.99 |

### Tensor Parallelism + Quantization
| Model | Technique | Tokens/Second | Memory Bandwidth (GB/s) |
Expand Down

0 comments on commit f9b5da4

Please sign in to comment.