v.1.4.4
Highlights
- CohereForAI/c4ai-command-r-v01 model support
What's Changed
- Handle concurrent grammar requests by @drbh in #1610
- Fix idefics default. by @Narsil in #1614
- Fix async client timeout by @hugoabonizio in #1617
- accept legacy request format and response by @drbh in #1527
- add missing stop parameter for chat request by @drbh in #1619
- correctly index into mask when applying grammar by @drbh in #1618
- Use a better model for the quick tour by @lewtun in #1639
- Upgrade nix version from 0.27.1 to 0.28.0 by @yuanwu2017 in #1638
- Update peft + transformers + accelerate + bnb + safetensors by @abhishekkrthakur in #1646
- Fix index in ChatCompletionChunk by @Wauplin in #1648
- Fixing minor typo in documentation: supported hardware section by @SachinVarghese in #1632
- bump minijina and add test for core templates by @drbh in #1626
- support force downcast after FastRMSNorm multiply for Gemma by @drbh in #1658
- prefer spaces url over temp url by @drbh in #1662
- improve tool type, bump pydantic and outlines by @drbh in #1650
- Remove unecessary cuda graph. by @Narsil in #1664
- Repair idefics integration tests. by @Narsil in #1663
- fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py by @SeongBeomLEE in #1637
- Inline images for multimodal models. by @Narsil in #1666
New Contributors
- @hugoabonizio made their first contribution in #1617
- @yuanwu2017 made their first contribution in #1638
- @abhishekkrthakur made their first contribution in #1646
- @Wauplin made their first contribution in #1648
- @SachinVarghese made their first contribution in #1632
- @SeongBeomLEE made their first contribution in #1637
Full Changelog: v1.4.3...v1.4.4