TGI is back to Apache 2.0!

Highlights

License was reverted to Apache 2.0
Cuda graphs are now used by default. They improve latency substancially on high end nodes.
Llava-next was added. It is the second multimodal model available on TGI after Idefics.
Cohere Command R+ support. TGI is the fastest open source backend for Command R+
FP8 support.
We now share the vocabulary for all medusa heads, greatly improving latency and memory use.

Try out Command R+ with Medusa heads on 4xA100s with:

model=text-generation-inference/commandrplus-medusa
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 4

What's Changed

Add cuda graphs sizes and make it default. by @Narsil in #1703
Pickle conversion now requires --trust-remote-code. by @Narsil in #1704
Push users to streaming in the readme. by @Narsil in #1698
Fixing cohere tokenizer. by @Narsil in #1697
Force weights_only (before fully breaking pickle files anyway). by @Narsil in #1710
Regenerate ld.so.cache by @oOraph in #1708
Revert license to Apache 2.0 by @OlivierDehaene in #1714
Automatic quantization config. by @Narsil in #1719
Adding Llava-Next (Llava 1.6) with full support. by @Narsil in #1709
fix: fix CohereForAI/c4ai-command-r-plus by @OlivierDehaene in #1707
Update libraries by @abhishekkrthakur in #1713
Dev/mask ldconfig output v2 by @oOraph in #1716
Fp8 Support by @Narsil in #1726
Upgrade EETQ (Fixes the cuda graphs). by @Narsil in #1729
fix(router): fix a possible deadlock in next_batch by @OlivierDehaene in #1731
chore(cargo-toml): apply lto fat and codegen-units of one by @somehowchris in #1651
Improve the defaults for the launcher by @Narsil in #1727
feat: medusa shared by @OlivierDehaene in #1734
Fix typo in guidance.md by @eltociear in #1735

New Contributors

@somehowchris made their first contribution in #1651

Full Changelog: v1.4.5...v2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0

TGI is back to Apache 2.0!

Highlights

What's Changed

New Contributors

Contributors