v2.3.1
Important changes
- Added support for Mllama (3.2, vision models). Flashed, unpadded.
- FP8 performance improvements
- Moe performance improvements
- BREAKING CHANGE - When using tools, models could answer with a tool call
notify_error
with the content error, it will instead output regular generation.
What's Changed
- nix: remove unused
_server.nix
file by @danieldk in #2538 - chore: Add old V2 backend by @OlivierDehaene in #2551
- Remove duplicated
RUN
inDockerfile
by @alvarobartt in #2547 - Micro cleanup. by @Narsil in #2555
- Hotfixing main by @Narsil in #2556
- Add support for scalar FP8 weight scales by @danieldk in #2550
- Add
DenseMoELayer
and wire it up in Mixtral/Deepseek V2 by @danieldk in #2537 - Update the link to the Ratatui organization by @orhun in #2546
- Simplify crossterm imports by @orhun in #2545
- Adding note for private models in quick-tour document by @ariG23498 in #2548
- Hotfixing main. by @Narsil in #2562
- Cleanup Vertex + Chat by @Narsil in #2553
- More tensor cores. by @Narsil in #2558
- remove LORA_ADAPTERS_PATH by @nbroad1881 in #2563
- Add LoRA adapters support for Gemma2 by @alvarobartt in #2567
- Fix build with
--features google
by @alvarobartt in #2566 - Improve support for GPUs with capability < 8 by @danieldk in #2575
- flashinfer: pass window size and dtype by @danieldk in #2574
- Remove compute capability lazy cell by @danieldk in #2580
- Update architecture.md by @ulhaqi12 in #2577
- Update ROCM libs and improvements by @mht-sharma in #2579
- Add support for GPTQ-quantized MoE models using MoE Marlin by @danieldk in #2557
- feat: support phi3.5 moe by @drbh in #2479
- Move flake back to tgi-nix
main
by @danieldk in #2586 - MoE Marlin: support
desc_act
forgroupsize != -1
by @danieldk in #2590 - nix: experimental support for building a Docker container by @danieldk in #2470
- Mllama flash version by @Narsil in #2585
- Max token capacity metric by @Narsil in #2595
- CI (2592): Allow LoRA adapter revision in server launcher by @drbh in #2602
- Unroll notify error into generate response by @drbh in #2597
- New release 2.3.1 by @Narsil in #2604
New Contributors
- @alvarobartt made their first contribution in #2547
- @orhun made their first contribution in #2546
- @ariG23498 made their first contribution in #2548
- @ulhaqi12 made their first contribution in #2577
- @mht-sharma made their first contribution in #2579
Full Changelog: v2.3.0...v2.3.1