Evaluate the model

Install lm-eval-harness from source, lm-eval 0.4.2 is used

pip install auto-gptq

Please note that there is a discrepancy between the baseline result and the official data, which is a known issue within the official model card community. Given that the Gemma model family exhibits inconsistent results between FP16 and BF16 on lm-eval, we recommend converting to FP16 for both tuning and evaluation.

lm_eval --model hf --model_args pretrained="Intel/gemma-2b-int4-inc",autogptq=True,gptq_use_triton=True,dtype=float16 --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 16

Metric	BF16	FP16	AutoRound v0.1	AutoRound v0.2
Avg.	0.5263	0.5277	0.5235	0.5248
mmlu	0.3287	0.3287	0.3297	0.3309
lambada_openai	0.6344	0.6375	0.6307	0.6379
hellaswag	0.5273	0.5281	0.5159	0.5184
winogrande	0.6504	0.6488	0.6543	0.6575
piqa	0.7671	0.7720	0.7612	0.7606
truthfulqa_mc1	0.2203	0.2203	0.2203	0.2191
openbookqa	0.2980	0.3020	0.3000	0.3060
boolq	0.6927	0.6936	0.6939	0.6966
arc_easy	0.7420	0.7403	0.7353	0.7357
arc_challenge	0.4019	0.4061	0.3933	0.3857

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemma-2b-acc.md

gemma-2b-acc.md

Evaluate the model

Files

gemma-2b-acc.md

Latest commit

History

gemma-2b-acc.md

File metadata and controls

Evaluate the model