Release LMDeploy Release V0.6.3 · InternLM/lmdeploy

What's Changed

support yarn in turbomind backend by @irexyc in #2519
add linear op on dlinfer platform by @yao-fengchen in #2627
support turbomind head_dim 64 by @irexyc in #2715
[Feature]: support LlavaForConditionalGeneration with turbomind inference by @deepindeed2022 in #2710
Support Mono-InternVL with PyTorch backend by @wzk1015 in #2727
Support Qwen2-MoE models by @lzhangzz in #2723
Support mixtral moe AWQ quantization. by @AllentDan in #2725
Support chemvlm by @RunningLeon in #2738
Support molmo in turbomind by @lvhan028 in #2716

Call cuda empty_cache to prevent OOM when quantizing model by @AllentDan in #2671
feat: support dynamic/llama3 rotary embedding in ascend graph mode by @tangzhiyi11 in #2670
Add ensure_ascii = False for json.dumps by @AllentDan in #2707
Flatten cache and add flashattention by @grimoire in #2676
Support ep, column major moe kernel. by @grimoire in #2690
Remove one of the duplicate bos tokens by @AllentDan in #2708
Check server input by @irexyc in #2719
optimize dlinfer moe by @tangzhiyi11 in #2741

Support min_tokens, min_p parameters for api_server by @AllentDan in #2681
fix index error when computing ppl on long-text prompt by @lvhan028 in #2697
Better tp exit log. by @grimoire in #2677
miss to read moe_ffn weights from converted tm model by @lvhan028 in #2698
Fix turbomind TP by @lzhangzz in #2706
fix decoding kernel for deepseekv2 by @grimoire in #2688
fix tp exit code for pytorch engine by @RunningLeon in #2718
fix assert pad >= 0 failed when inter_size is not a multiple of group… by @Vinkle-hzt in #2740
fix issue that mono-internvl failed to fallback pytorch engine by @lvhan028 in #2744
Remove use_fast=True when loading tokenizer for lite auto_awq by @AllentDan in #2758
set wrong head_dim for mistral-nemo by @lvhan028 in #2761

[ci] support v100 dailytest by @zhulinJulia24 in #2665
[ci] add more testcase into evaluation and daily test by @zhulinJulia24 in #2721
feat: support multi cards in ascend graph mode by @tangzhiyi11 in #2755
bump version to v0.6.3 by @lvhan028 in #2754

Full Changelog: v0.6.2...v0.6.3