From a273fb8b74f93e37f5ff8614ad19f49a09eec213 Mon Sep 17 00:00:00 2001 From: RaymondWang0 Date: Fri, 2 Feb 2024 18:33:52 +0000 Subject: [PATCH] deploy: de340b91be14263cc2ff1e092bf428695e64da3e --- index.html | 69 +++++++++++++++++++++++++++++++++---------- search/all_0.js | 2 +- search/all_10.js | 6 ++-- search/all_11.js | 6 ++-- search/all_12.js | 2 +- search/all_13.js | 4 +-- search/all_14.js | 14 ++++----- search/all_15.js | 8 ++--- search/all_16.js | 16 ++-------- search/all_17.js | 14 ++++++++- search/all_18.js | 2 +- search/all_19.js | 4 +++ search/all_2.js | 2 +- search/all_3.js | 2 +- search/all_4.js | 16 +++++----- search/all_5.js | 8 ++--- search/all_6.js | 27 +++++++++-------- search/all_7.js | 17 ++++++----- search/all_8.js | 4 +-- search/all_9.js | 2 +- search/all_a.js | 2 +- search/all_b.js | 4 +-- search/all_c.js | 4 +-- search/all_d.js | 43 ++++++++++++++------------- search/all_e.js | 15 +++++----- search/all_f.js | 4 +-- search/searchdata.js | 2 +- vlm_demo_m1.gif | Bin 0 -> 6316196 bytes 28 files changed, 173 insertions(+), 126 deletions(-) create mode 100644 search/all_19.js create mode 100644 vlm_demo_m1.gif diff --git a/index.html b/index.html index bf0054e3..17a448e3 100644 --- a/index.html +++ b/index.html @@ -84,11 +84,14 @@

Code LLaMA Demo on an NVIDIA GeForce RTX 4070 laptop:

+VLM Demo on an Apple MacBook Pro (M1, 2021):

+

+

LLaMA Chat Demo on an Apple MacBook Pro (M1, 2021):

-

+

Overview

-

+

LLM Compression: SmoothQuant and AWQ

SmoothQuant: Smooth the activation outliers by migrating the quantization difficulty from activations to weights, with a mathematically equal transformation (100*1 = 10*10).

@@ -96,7 +99,7 @@

smoothquant_intuition

AWQ (Activation-aware Weight Quantization): Protect salient weight channels by analyzing activation magnitude as opposed to the weights.

-

+

LLM Inference Engine: TinyChatEngine