Skip to content

Latest commit

 

History

History
42 lines (37 loc) · 1.98 KB

gemma-7b-asym-recipe.md

File metadata and controls

42 lines (37 loc) · 1.98 KB

This recipe is outdated, we recommend using symmetric quantization. You can remove --asym from the command.

A sample command to generate an INT4 model.

auto-round \
--model  google/gemma-7b \
--device 0 \
--group_size 128 \
--bits 4 \
--minmax_lr 2e-3 \
--model_dtype "float16" \
--iters 1000 \
--nsamples 512 \
--asym \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

pip install lm-eval==0.4.2 pip install auto-gptq

Please note that there is a discrepancy between the baseline result and the official data, which is a known issue within the official model card community.

Given that the Gemma model family exhibits inconsistent results between FP16 and BF16 on lm-eval, we recommend converting to FP16 for both tuning and evaluation.

lm_eval --model hf --model_args pretrained="Intel/gemma-7b-int4-inc",autogptq=True,gptq_use_triton=True,dtype=float16 --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 32
Metric BF16 FP16 AutoRound v0.1 AutoRound V0.2
Avg. 0.6208 0.6302 0.6242 0.6254
mmlu 0.6126 0.6189 0.6085 0.6147
lambada_openai 0.6707 0.7308 0.7165 0.7270
hellaswag 0.6039 0.6063 0.6017 0.6017
winogrande 0.7356 0.7506 0.7482 0.7490
piqa 0.8014 0.8025 0.7976 0.7982
truthfulqa_mc1 0.3121 0.3121 0.3060 0.2840
openbookqa 0.3300 0.3220 0.3340 0.3240
boolq 0.8254 0.8324 0.8300 0.8407
rte 0.6643 0.6859 0.6787 0.6968
arc_easy 0.8068 0.8262 0.8089 0.8194
arc_challenge 0.5043 0.5000 0.4915 0.4949