perplexity output of LLAMA2 7B quantized by IQ3_S seems to be weired #6971
penghongbo
started this conversation in
General
Replies: 1 comment
-
This is possibly due to a tokenization bug which was introduced about a week ago: #7049 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am learing llama.cpp. When I did a test with perplexity for different quantized 7B model, I found most of the looks reasonable (the Final estimated PPL < 10). But the IQ3_S quantized model returns a value larger than 300. Is this correct? I used model llama-2-7b-chat and wiki.test.raw (4358 lines).
I am not sure whethether I need to use imatrix for IQ3_S as it is mentioned in #5866. Please advise. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions