Replies: 1 comment
-
ggerganov/llama.cpp#5761 more on this topic :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
A team from Microsoft recently came up with 1-bit quantization drastically reducing memory footprint and token throughput. The weights are effectively encoded as a ternary bit {-1, 0, 1}. Reported results are super encouraging.
@ggerganov is it of interest for GGML? I can implement it and perform a bunch of benchmarks.
Beta Was this translation helpful? Give feedback.
All reactions