Add Llama3-8B perf numbers

pytorch-labs · Apr 29, 2024 · f9b5da4 · f9b5da4
1 parent d64aa05
commit f9b5da4
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -86,6 +86,8 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 | Llama-2-70B | Base    | OOM     ||
 |           | 8-bit   | 19.13    | 1322.58 |
 |           | 4-bit (G=32)   | 25.25    | 1097.66 |
+| Llama-3-8B  | Base    |  93.95  | 1508.18 |
+|           | 8-bit   | 114.35   | 978.02 |
 
 ### Speculative Sampling
 [Verifier: Llama-70B (int4), Draft: Llama-7B (int4)](./scripts/speculate_70B_int4.sh): 48.4 tok/s
@@ -101,6 +103,10 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 |           | 2   | 21.32   | 1481.87 |
 |           | 4   | 38.01   | 1340.76 |
 |           | 8   | 62.50   | 1135.29 |
+| Llama-3-8B  | 1    |  93.97  | 1508.46 |
+|           | 2   | 149.44   | 1358.63 |
+|           | 4   | 217.80   | 1218.76 |
+|           | 8   | 271.03   | 1041.99 |
 
 ### Tensor Parallelism + Quantization
 | Model    | Technique | Tokens/Second | Memory Bandwidth (GB/s) |