You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this PR in mlx_lm, users can quantize the KV cache. ml-explore/mlx-examples#1075 . Quantizing the KV cache improves performance at large context size.
Unfortunately the KV cache quantization is only available for KVCache and not RotatingKVCache, so we would need to refactor our implementation to use KVCache to allow for quantization
The text was updated successfully, but these errors were encountered:
With this PR in
mlx_lm
, users can quantize the KV cache. ml-explore/mlx-examples#1075 . Quantizing the KV cache improves performance at large context size.Unfortunately the KV cache quantization is only available for
KVCache
and notRotatingKVCache
, so we would need to refactor our implementation to useKVCache
to allow for quantizationThe text was updated successfully, but these errors were encountered: