Release v.1.4.4 · huggingface/text-generation-inference

Highlights

Handle concurrent grammar requests by @drbh in #1610
Fix idefics default. by @Narsil in #1614
Fix async client timeout by @hugoabonizio in #1617
accept legacy request format and response by @drbh in #1527
add missing stop parameter for chat request by @drbh in #1619
correctly index into mask when applying grammar by @drbh in #1618
Use a better model for the quick tour by @lewtun in #1639
Upgrade nix version from 0.27.1 to 0.28.0 by @yuanwu2017 in #1638
Update peft + transformers + accelerate + bnb + safetensors by @abhishekkrthakur in #1646
Fix index in ChatCompletionChunk by @Wauplin in #1648
Fixing minor typo in documentation: supported hardware section by @SachinVarghese in #1632
bump minijina and add test for core templates by @drbh in #1626
support force downcast after FastRMSNorm multiply for Gemma by @drbh in #1658
prefer spaces url over temp url by @drbh in #1662
improve tool type, bump pydantic and outlines by @drbh in #1650
Remove unecessary cuda graph. by @Narsil in #1664
Repair idefics integration tests. by @Narsil in #1663
fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py by @SeongBeomLEE in #1637
Inline images for multimodal models. by @Narsil in #1666

Full Changelog: v1.4.3...v1.4.4