diff --git a/index.html b/index.html index 17a448e3..d8efbf5c 100644 --- a/index.html +++ b/index.html @@ -84,14 +84,11 @@

Code LLaMA Demo on an NVIDIA GeForce RTX 4070 laptop:

-VLM Demo on an Apple MacBook Pro (M1, 2021):

-

-

LLaMA Chat Demo on an Apple MacBook Pro (M1, 2021):

-

+

Overview

-

+

LLM Compression: SmoothQuant and AWQ

SmoothQuant: Smooth the activation outliers by migrating the quantization difficulty from activations to weights, with a mathematically equal transformation (100*1 = 10*10).

@@ -99,7 +96,7 @@

smoothquant_intuition

AWQ (Activation-aware Weight Quantization): Protect salient weight channels by analyzing activation magnitude as opposed to the weights.

-

+

LLM Inference Engine: TinyChatEngine