diff --git a/index.html b/index.html index 845527ce..ba4efe1a 100644 --- a/index.html +++ b/index.html @@ -79,10 +79,20 @@

This is enabled by LLM model compression technique: SmoothQuant and AWQ (Activation-aware Weight Quantization), co-designed with TinyChatEngine that implements the compressed low-precision model.

Demo on an NVIDIA GeForce RTX 4070 laptop:

-

chat_demo_gpu coding_demo_gpu

+ + + + + +
chat_demo_gpu coding_demo_gpu
LLaMA Chat Code LLaMA

Demo on an Apple MacBook Pro (M1, 2021):

-

chat_demo_m1 coding_demo_m1

+ + + + + +
chat_demo_m1 coding_demo_m1
LLaMA Chat Code LLaMA

Feel free to check out our slides for more details!

Overview