Skip to content

v2.2.0

Compare
Choose a tag to compare
@Narsil Narsil released this 23 Jul 16:30

Notable changes

  • Llama 3.1 support (including 405B, FP8 support in a lot of mixed configurations, FP8, AWQ, GPTQ, FP8+FP16).
  • Gemma2 softcap support
  • Deepseek v2 support.
  • Lots of internal reworks/cleanup (allowing for cool features)
  • Lots of AWQ/GPTQ work with marlin kernels (everything should be faster by default)
  • Flash decoding support (FLASH_DECODING=1 environment variables which will probably enable some nice improvements in the future)

What's Changed

New Contributors

Full Changelog: v2.1.1...v2.2.0