perplexity output of LLAMA2 7B quantized by IQ3_S seems to be weired #6971

penghongbo · 2024-04-29T06:25:53Z

penghongbo
Apr 29, 2024

I am learing llama.cpp. When I did a test with perplexity for different quantized 7B model, I found most of the looks reasonable (the Final estimated PPL < 10). But the IQ3_S quantized model returns a value larger than 300. Is this correct? I used model llama-2-7b-chat and wiki.test.raw (4358 lines).

I am not sure whethether I need to use imatrix for IQ3_S as it is mentioned in #5866. Please advise. Thanks.

turian · 2024-05-07T00:20:57Z

turian
May 7, 2024

This is possibly due to a tokenization bug which was introduced about a week ago: #7049

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perplexity output of LLAMA2 7B quantized by IQ3_S seems to be weired #6971

{{title}}

Replies: 1 comment

{{title}}

Select a reply

perplexity output of LLAMA2 7B quantized by IQ3_S seems to be weired #6971

penghongbo Apr 29, 2024

Replies: 1 comment

turian May 7, 2024

penghongbo
Apr 29, 2024

turian
May 7, 2024