This recipe is outdated, we recommend using symmetric quantization. You can remove --asym from the command.

auto-round \
--model 01-ai/Yi-6B-Chat  \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 1000 \
--nsamples 512 \
--asym \
--minmax_lr 2e-3 \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Due to licensing restrictions, we are unable to release the model. Install lm-eval-harness from source, and the git id 96d185fa6232a5ab685ba7c43e45d1dbb3bb906d.

We used the following command for evaluation. For reference, the results of official AWQ-INT4 release are listed.

lm_eval --model hf  --model_args pretrained="./",autogptq=True,gptq_use_triton=True,trust_remote_code=True --device cuda:0 --tasks ceval-valid,cmmlu,mmlu,gsm8k --batch_size 16 --num_fewshot 0

Metric	BF16	01-ai/Yi-6B-Chat-4bits	INT4
Avg.	0.6043	0.5867	0.5939
mmlu	0.6163	0.6133	0.6119
cmmlu	0.7431	0.7312	0.7314
ceval	0.7355	0.7155	0.7281
gsm8k	0.3222	0.2866	0.3040

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yi-6B-Chat-asym-recipe.md

Yi-6B-Chat-asym-recipe.md

Files

Yi-6B-Chat-asym-recipe.md

Latest commit

History

Yi-6B-Chat-asym-recipe.md

File metadata and controls