This recipe is outdated, we recommend using symmetric quantization. You can remove --asym from the command.

A sample command to generate an INT4 model.

auto-round \
--model  meta-llama/Meta-Llama-3-8B-Instruct \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsamples 512 \
--asym \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

quant lm-head

auto-round \
--model  meta-llama/Meta-Llama-3-8B-Instruct \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsamples 512 \
--asym \
--quant_lm_head \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

lm-eval 0.4.2 is used

Metric	BF16	w4g128 w/o lm-head	w4g128 with lm-head
Avg.	0.6352	0.6312	0.6303
mmlu	0.6386	0.6306	0.6243
winogrande	0.7143	0.7238	0.7261
truthfulqa_mc1	0.3623	0.3537	0.3574
rte	0.6751	0.6859	0.6715
piqa	0.7867	0.7797	0.7775
openbookqa	0.3400	0.3300	0.3340
lambada_openai	0.7182	0.7200	0.7118
hellaswag	0.5769	0.5699	0.5686
boolq	0.8297	0.8309	0.8266
arc_easy	0.8152	0.8089	0.8123
arc_challenge	0.5299	0.5102	0.5111

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-Llama-3-8B-Instruct-asym-recipe.md

Meta-Llama-3-8B-Instruct-asym-recipe.md

Files

Meta-Llama-3-8B-Instruct-asym-recipe.md

Latest commit

History

Meta-Llama-3-8B-Instruct-asym-recipe.md

File metadata and controls