Release v0.12.1 · predibase/lorax

🎉 Enhancements

Add support for adapter loading in mllama by @ajtejankar in #669
Record number of skipped tokens in the response by @tgaddair in #681
Record TTFT and TPOT in response headers by @tgaddair in #684
Add cli arg --speculation-max-batch-size by @tgaddair in #686
Use --predibase-api-token parameter when downloading by @joseph-predibase in #687
Launcher args for compile max batch size and rank by @tgaddair in #690

Fix stella embeddings + Integration tests for lorax by @magdyksaleh in #668
Fix lora loading and indexing bug in mllama by @ajtejankar in #682
Set maximum grpc message receive size to 2GiB by @tgaddair in #667
Fix frequency_penalty and presence_penalty by @tgaddair in #672
Fix scores (remove debug code) by @tgaddair in #673
Fix top_p to allow setting it to 1.0 by @magdyksaleh in #676
Format fixes tool calling by @magdyksaleh in #680
Use predibase API token when downloading pbase files by @joseph-predibase in #688
Pbase adapter source resolution by @magdyksaleh in #689
fix: Make logprob field optional for response Pydantic validation by @jeffreyftang in #692

Full Changelog: v0.12.0...v0.12.1