From 6e00f492758fdb350df14cb19870e32c7b1ac788 Mon Sep 17 00:00:00 2001
From: Yanbo Liang <ybliang8@gmail.com>
Date: Sun, 28 Apr 2024 23:10:58 -0700
Subject: [PATCH 1/3] Add Llama3-8B perf numbers

---
 README.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/README.md b/README.md
index b1210f2..bd8fbfa 100644
--- a/README.md
+++ b/README.md
@@ -89,6 +89,8 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 | Llama-2-70B | Base    | OOM     ||
 |           | 8-bit   | 19.13    | 1322.58 |
 |           | 4-bit (G=32)   | 25.25    | 1097.66 |
+| Llama-3-8B  | Base    |  93.95  | 1508.18 |
+|           | 8-bit   | 114.35   | 978.02 |
 
 ### Speculative Sampling
 [Verifier: Llama-70B (int4), Draft: Llama-7B (int4)](./scripts/speculate_70B_int4.sh): 48.4 tok/s
@@ -104,6 +106,10 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 |           | 2   | 21.32   | 1481.87 |
 |           | 4   | 38.01   | 1340.76 |
 |           | 8   | 62.50   | 1135.29 |
+| Llama-3-8B  | 1    |  93.97  | 1508.46 |
+|           | 2   | 149.44   | 1358.63 |
+|           | 4   | 217.80   | 1218.76 |
+|           | 8   | 271.03   | 1041.99 |
 
 ### Tensor Parallelism + Quantization
 | Model    | Technique | Tokens/Second | Memory Bandwidth (GB/s) |

From 0fb6914ade98123e0cfa56a59a7eb30c6af6c1ad Mon Sep 17 00:00:00 2001
From: Yanbo Liang <ybliang8@gmail.com>
Date: Sat, 15 Jun 2024 22:30:58 -0700
Subject: [PATCH 2/3] Update

---
 README.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index bd8fbfa..9ad8bc8 100644
--- a/README.md
+++ b/README.md
@@ -89,8 +89,8 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 | Llama-2-70B | Base    | OOM     ||
 |           | 8-bit   | 19.13    | 1322.58 |
 |           | 4-bit (G=32)   | 25.25    | 1097.66 |
-| Llama-3-8B  | Base    |  93.95  | 1508.18 |
-|           | 8-bit   | 114.35   | 978.02 |
+| Llama-3-8B  | Base    |  94.25  | 1411.95 |
+|           | 8-bit   | 139.55   | 1047.23 |
 
 ### Speculative Sampling
 [Verifier: Llama-70B (int4), Draft: Llama-7B (int4)](./scripts/speculate_70B_int4.sh): 48.4 tok/s
@@ -106,10 +106,10 @@ Benchmarks run on an 8xA100-80GB, power limited to 330W with a hybrid cube mesh
 |           | 2   | 21.32   | 1481.87 |
 |           | 4   | 38.01   | 1340.76 |
 |           | 8   | 62.50   | 1135.29 |
-| Llama-3-8B  | 1    |  93.97  | 1508.46 |
-|           | 2   | 149.44   | 1358.63 |
-|           | 4   | 217.80   | 1218.76 |
-|           | 8   | 271.03   | 1041.99 |
+| Llama-3-8B  | 1    |  94.19  | 1411.76 |
+|           | 2   | 150.48   | 1208.80 |
+|           | 4   | 219.77   | 991.63 |
+|           | 8   | 274.65   | 768.55 |
 
 ### Tensor Parallelism + Quantization
 | Model    | Technique | Tokens/Second | Memory Bandwidth (GB/s) |

From 744c9279a856428901613e39feef4270f2d2156c Mon Sep 17 00:00:00 2001
From: Yanbo Liang <ybliang8@gmail.com>
Date: Sun, 16 Jun 2024 19:48:29 -0700
Subject: [PATCH 3/3] Update

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 9ad8bc8..900651a 100644
--- a/README.md
+++ b/README.md
@@ -70,6 +70,7 @@ codellama/CodeLlama-34b-Python-hf
 mistralai/Mistral-7B-v0.1
 mistralai/Mistral-7B-Instruct-v0.1
 mistralai/Mistral-7B-Instruct-v0.2
+meta-llama/Meta-Llama-3-8B
 ```
 
 For example, to convert Llama-2-7b-chat-hf