v0.4.5
What's Changed
Important
Version 0.4.4 was skipped.
Quite a few changes this time around, most notably:
- Implement DeciLM by @AlpinDale in #158
- Support prompt logprobs by @AlpinDale in #162
- Support safetensors for Mixtral along with expert parallelism for better multi-gpu by @AlpinDale in #167
- Implement CUDA graphs for better multi-GPU and optimizing smaller models by @AlpinDale in #172
- Fix peak memory profiling to allow higher gmu values by @AlpinDale in #166
- Restore compatibility with Python 3.8 and 3.9 by @g4rg in #170
- Lazily import model classes to avoid import overhead by @AlpinDale in #165
- Add RoPE scaling support for Mixtral models by @g4rg in #174
- Make OpenAI API keys optional by @AlpinDale in #176
Full Changelog: v0.4.4...v0.4.5