Skip to content

Latest commit

 

History

History
48 lines (43 loc) · 1.77 KB

Meta-Llama-3-8B-Instruct-asym-recipe.md

File metadata and controls

48 lines (43 loc) · 1.77 KB

This recipe is outdated, we recommend using symmetric quantization. You can remove --asym from the command.

A sample command to generate an INT4 model.

auto-round \
--model  meta-llama/Meta-Llama-3-8B-Instruct \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsamples 512 \
--asym \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

quant lm-head

auto-round \
--model  meta-llama/Meta-Llama-3-8B-Instruct \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsamples 512 \
--asym \
--quant_lm_head \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

lm-eval 0.4.2 is used

Metric BF16 w4g128 w/o lm-head w4g128 with lm-head
Avg. 0.6352 0.6312 0.6303
mmlu 0.6386 0.6306 0.6243
winogrande 0.7143 0.7238 0.7261
truthfulqa_mc1 0.3623 0.3537 0.3574
rte 0.6751 0.6859 0.6715
piqa 0.7867 0.7797 0.7775
openbookqa 0.3400 0.3300 0.3340
lambada_openai 0.7182 0.7200 0.7118
hellaswag 0.5769 0.5699 0.5686
boolq 0.8297 0.8309 0.8266
arc_easy 0.8152 0.8089 0.8123
arc_challenge 0.5299 0.5102 0.5111