Run (New) 5-bit Quantized Models #239

PeytonCleveland · 2023-05-17T14:12:21Z

PeytonCleveland
May 17, 2023

First off, wanted to say thanks to everyone thats worked on this project! I'm relatively new to Rust, I come from a Node background, but I've been able to create a Warp server around llm without much hassle.

One thing I wanted to ask, is there any plan to support 5-bit GGML models with llm? With the recent, breaking changes to GGML: ggerganov/ggml#154, both 4-bit and 5-bit formats have changed. I'd like to be able to run these newer formats, specifically 5_1, as perplexity seems to be almost equal to F16 without too much of a hit to size and inference speed: https://huggingface.co/eachadea/ggml-vicuna-13b-1.1

philpax · 2023-05-17T14:14:59Z

philpax
May 17, 2023
Maintainer

Hey, thanks for writing in! Glad to hear you're enjoying it :)

Q5_1 QNT1 (the new format) should already work with the latest main. Have you run into any issues while trying it out?

2 replies

PeytonCleveland May 17, 2023
Author

Looks like I can run old 5-bit models, but I've tried several newer format models and get: thread 'main' panicked at 'Failed to load model: invalid file format version 2', src/main.rs:29:27

One of the ones I've tried: https://huggingface.co/TheBloke/stable-vicuna-13B-GGML

philpax May 17, 2023
Maintainer

I just downloaded stable-vicuna-13B.ggml.q5_1.bin and can confirm it works on the latest main.

Super-important: the latest main is not the current release on crates.io. We're planning on cutting a new release soon, but have been trying to put out fires caused by the quantization changes.

If you're using it as a dependency, I'd suggest switching to a Git dependency on this repository for a bit, or waiting a few days and using an older model until then.

Sorry about the confusion!

philpax · 2023-06-18T21:53:12Z

philpax
Jun 18, 2023
Maintainer

The 5-bit models should all be working at present - I've been using Q5_1 in "production" deployments with no issues.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run (New) 5-bit Quantized Models #239

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Run (New) 5-bit Quantized Models #239

PeytonCleveland May 17, 2023

Replies: 2 comments · 2 replies

philpax May 17, 2023 Maintainer

PeytonCleveland May 17, 2023 Author

philpax May 17, 2023 Maintainer

philpax Jun 18, 2023 Maintainer

PeytonCleveland
May 17, 2023

Replies: 2 comments 2 replies

philpax
May 17, 2023
Maintainer

PeytonCleveland May 17, 2023
Author

philpax May 17, 2023
Maintainer

philpax
Jun 18, 2023
Maintainer