This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
Run (New) 5-bit Quantized Models #239
Closed
PeytonCleveland
started this conversation in
Ideas
Replies: 2 comments 2 replies
-
Hey, thanks for writing in! Glad to hear you're enjoying it :) Q5_1 QNT1 (the new format) should already work with the latest |
Beta Was this translation helpful? Give feedback.
2 replies
-
The 5-bit models should all be working at present - I've been using Q5_1 in "production" deployments with no issues. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First off, wanted to say thanks to everyone thats worked on this project! I'm relatively new to Rust, I come from a Node background, but I've been able to create a Warp server around llm without much hassle.
One thing I wanted to ask, is there any plan to support 5-bit GGML models with llm? With the recent, breaking changes to GGML: ggerganov/ggml#154, both 4-bit and 5-bit formats have changed. I'd like to be able to run these newer formats, specifically 5_1, as perplexity seems to be almost equal to F16 without too much of a hit to size and inference speed: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1
Beta Was this translation helpful? Give feedback.
All reactions