What's Changed
🚀 Features
- support yarn in turbomind backend by @irexyc in #2519
- add linear op on dlinfer platform by @yao-fengchen in #2627
- support turbomind head_dim 64 by @irexyc in #2715
- [Feature]: support LlavaForConditionalGeneration with turbomind inference by @deepindeed2022 in #2710
- Support Mono-InternVL with PyTorch backend by @wzk1015 in #2727
- Support Qwen2-MoE models by @lzhangzz in #2723
- Support mixtral moe AWQ quantization. by @AllentDan in #2725
- Support chemvlm by @RunningLeon in #2738
- Support molmo in turbomind by @lvhan028 in #2716
💥 Improvements
- Call cuda empty_cache to prevent OOM when quantizing model by @AllentDan in #2671
- feat: support dynamic/llama3 rotary embedding in ascend graph mode by @tangzhiyi11 in #2670
- Add ensure_ascii = False for json.dumps by @AllentDan in #2707
- Flatten cache and add flashattention by @grimoire in #2676
- Support ep, column major moe kernel. by @grimoire in #2690
- Remove one of the duplicate bos tokens by @AllentDan in #2708
- Check server input by @irexyc in #2719
- optimize dlinfer moe by @tangzhiyi11 in #2741
🐞 Bug fixes
- Support min_tokens, min_p parameters for api_server by @AllentDan in #2681
- fix index error when computing ppl on long-text prompt by @lvhan028 in #2697
- Better tp exit log. by @grimoire in #2677
- miss to read moe_ffn weights from converted tm model by @lvhan028 in #2698
- Fix turbomind TP by @lzhangzz in #2706
- fix decoding kernel for deepseekv2 by @grimoire in #2688
- fix tp exit code for pytorch engine by @RunningLeon in #2718
- fix assert pad >= 0 failed when inter_size is not a multiple of group… by @Vinkle-hzt in #2740
- fix issue that mono-internvl failed to fallback pytorch engine by @lvhan028 in #2744
- Remove use_fast=True when loading tokenizer for lite auto_awq by @AllentDan in #2758
- set wrong head_dim for mistral-nemo by @lvhan028 in #2761
📚 Documentations
- Update ascend readme by @jinminxi104 in #2756
- fix ascend get_started.md link by @CyCle1024 in #2696
- Fix llama3.2 VL vision in "Supported Modals" documents by @blankanswer in #2703
🌐 Other
- [ci] support v100 dailytest by @zhulinJulia24 in #2665
- [ci] add more testcase into evaluation and daily test by @zhulinJulia24 in #2721
- feat: support multi cards in ascend graph mode by @tangzhiyi11 in #2755
- bump version to v0.6.3 by @lvhan028 in #2754
New Contributors
- @blankanswer made their first contribution in #2703
- @tangzhiyi11 made their first contribution in #2670
- @wzk1015 made their first contribution in #2727
- @Vinkle-hzt made their first contribution in #2740
Full Changelog: v0.6.2...v0.6.3