Add KV cache quantization feature #31

neilmehta24 · 2024-11-08T15:59:48Z

With this PR in mlx_lm, users can quantize the KV cache. ml-explore/mlx-examples#1075 . Quantizing the KV cache improves performance at large context size.

Unfortunately the KV cache quantization is only available for KVCache and not RotatingKVCache, so we would need to refactor our implementation to use KVCache to allow for quantization

The text was updated successfully, but these errors were encountered:

neilmehta24 added the enhancement New feature or request label Nov 8, 2024

neilmehta24 self-assigned this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KV cache quantization feature #31

Add KV cache quantization feature #31

neilmehta24 commented Nov 8, 2024

Add KV cache quantization feature #31

Add KV cache quantization feature #31

Comments

neilmehta24 commented Nov 8, 2024