[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

tjtanaa · 2025-01-15T04:41:12Z

This is a PR to merge from ROCm/main to llama_fp8

Notes:
The test cases in tests/kernels/test_moe.py have been updated based on the main branch. However, these test cases do not take the MOE_SHUFFLE environment variable into account, unlike the llama-fp8 branch. If shuffling is required, we will need to revert the test cases to the version from the llama-fp8 branch.

Signed-off-by: Jee Jee Li <[email protected]>

…llm-project#11435) Signed-off-by: Yuan Tang <[email protected]>

…m-project#11469) Signed-off-by: DarkLight1337 <[email protected]>

…lm-project#11472)

…oject#11456) Signed-off-by: Jiaxin Shan <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

…project#11494) Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: lucast2021 <[email protected]> Co-authored-by: lucast2021 <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

…ct#11509) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…sampler (vllm-project#10681) Signed-off-by: Sourashis Roy <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

…#11521) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…zation (vllm-project#11523) Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: HandH1998 <[email protected]>

Signed-off-by: mgoin <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: robertgshaw2-neuralmagic <[email protected]>

Signed-off-by: simon-mo <[email protected]>

Co-authored-by: Simon Mo <[email protected]>

…project#11394) Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: Woosuk Kwon <[email protected]>

Signed-off-by: youkaichao <[email protected]>

…llm-project#11534)

Signed-off-by: Mengqing Cao <[email protected]>

…11465) Signed-off-by: Alex He <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Jee Jee Li <[email protected]>

Signed-off-by: Chenguang Li <[email protected]>

Signed-off-by: youkaichao <[email protected]>

Signed-off-by: Alex-Brooks <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

…ect#11998) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…llm-project#11935) Signed-off-by: Harry Mellor <[email protected]>

Signed-off-by: youkaichao <[email protected]>

…ct#11516) Signed-off-by: Shanshan Shen <[email protected]>

vllm-project#11982) Signed-off-by: elijah <[email protected]>

This reverts commit 889e662.

* Commiting the *multilingual* P3L test. * Created a *multi-lingual* P3L test. * Making ruff happy. * . * Added a reference to the language-scripture Confluence table. * Typo fixing. * Harmonizing naming. * Fixing comments in the header. --------- Co-authored-by: Alexei V. Ivanov <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]>

* [Bugfix][V1] Fix molmo text-only inputs (vllm-project#11676) Signed-off-by: Jee Jee Li <[email protected]> * [Kernel] Move attn_type to Attention.__init__() (vllm-project#11690) Signed-off-by: Chen Zhang <[email protected]> * [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685) Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> * [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (vllm-project#11772) Signed-off-by: DarkLight1337 <[email protected]> * [Model] Future-proof Qwen2-Audio multi-modal processor (vllm-project#11776) Signed-off-by: DarkLight1337 <[email protected]> * [XPU] Make pp group initilized for pipeline-parallelism (vllm-project#11648) Signed-off-by: yisheng <[email protected]> * [Doc][3/N] Reorganize Serving section (vllm-project#11766) Signed-off-by: DarkLight1337 <[email protected]> * [Kernel][LoRA]Punica prefill kernels fusion (vllm-project#11234) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Abatom <[email protected]> Co-authored-by: Zhonghua Deng <[email protected]> * [Bugfix] Update attention interface in `Whisper` (vllm-project#11784) Signed-off-by: Roger Wang <[email protected]> * [CI] Fix neuron CI and run offline tests (vllm-project#11779) Signed-off-by: Liangfu Chen <[email protected]> * fix init error for MessageQueue when n_local_reader is zero (vllm-project#11768) * [Doc] Create a vulnerability management team (vllm-project#9925) Signed-off-by: Russell Bryant <[email protected]> * [CI][CPU] adding build number to docker image name (vllm-project#11788) Signed-off-by: Yuan Zhou <[email protected]> * [V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (vllm-project#11798) Signed-off-by: Roger Wang <[email protected]> * [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (vllm-project#11800) Signed-off-by: DarkLight1337 <[email protected]> * [doc] add doc to explain how to use uv (vllm-project#11773) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [V1] Support audio language models on V1 (vllm-project#11733) Signed-off-by: Roger Wang <[email protected]> * [doc] update how pip can install nightly wheels (vllm-project#11806) Signed-off-by: youkaichao <[email protected]> * [Doc] Add note to `gte-Qwen2` models (vllm-project#11808) Signed-off-by: DarkLight1337 <[email protected]> * [optimization] remove python function call for custom op (vllm-project#11750) Signed-off-by: youkaichao <[email protected]> * [Bugfix] update the prefix for qwen2 (vllm-project#11795) Co-authored-by: jiadi.jjd <[email protected]> * [Doc]Add documentation for using EAGLE in vLLM (vllm-project#11417) Signed-off-by: Sourashis Roy <[email protected]> * [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (vllm-project#11794) * [Doc] Group examples into categories (vllm-project#11782) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Fix image input for Pixtral-HF (vllm-project#11741) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] sort torch profiler table by kernel timing (vllm-project#11813) * Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (vllm-project#11824) * Fixed docker build for ppc64le (vllm-project#11518) Signed-off-by: Nishidha Panpaliya <[email protected]> * [OpenVINO] Fixed Docker.openvino build (vllm-project#11732) Signed-off-by: Ilya Lavrenov <[email protected]> * [Bugfix] Add checks for LoRA and CPU offload (vllm-project#11810) Signed-off-by: Jee Jee Li <[email protected]> * [Docs] reorganize sponsorship page (vllm-project#11639) Signed-off-by: simon-mo <[email protected]> * [Bug] Fix pickling of `ModelConfig` when RunAI Model Streamer is used (vllm-project#11825) Signed-off-by: DarkLight1337 <[email protected]> * [misc] improve memory profiling (vllm-project#11809) Signed-off-by: youkaichao <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [doc] update wheels url (vllm-project#11830) Signed-off-by: youkaichao <[email protected]> * [Docs] Update sponsor name: 'Novita' to 'Novita AI' (vllm-project#11833) * [Hardware][Apple] Native support for macOS Apple Silicon (vllm-project#11696) Signed-off-by: Wallas Santos <[email protected]> Co-authored-by: Michael Goin <[email protected]> * [torch.compile] consider relevant code in compilation cache (vllm-project#11614) Signed-off-by: youkaichao <[email protected]> * [VLM] Reorganize profiling/processing-related code (vllm-project#11812) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Move examples into categories (vllm-project#11840) Signed-off-by: Harry Mellor <[email protected]> * [Doc][4/N] Reorganize API Reference (vllm-project#11843) Signed-off-by: DarkLight1337 <[email protected]> * [CI/Build][Bugfix] Fix CPU CI image clean up (vllm-project#11836) Signed-off-by: jiang1.li <[email protected]> * [Bugfix][XPU] fix silu_and_mul (vllm-project#11823) Signed-off-by: yan ma <[email protected]> * [Misc] Move some model utils into vision file (vllm-project#11848) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Expand Multimodal API Reference (vllm-project#11852) Signed-off-by: DarkLight1337 <[email protected]> * [Misc]add some explanations for BlockHashType (vllm-project#11847) * [TPU][Quantization] TPU `W8A8` (vllm-project#11785) Co-authored-by: Woosuk Kwon <[email protected]> * [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698) Signed-off-by: Randall Smith <[email protected]> * [Docs] Add Google Cloud Meetup (vllm-project#11864) * [CI] Turn on basic correctness tests for V1 (vllm-project#10864) * treat do_lower_case in the same way as the sentence-transformers library (vllm-project#11815) Signed-off-by: Max de Bayser <[email protected]> * [Doc] Recommend uv and python 3.12 for quickstart guide (vllm-project#11849) Signed-off-by: mgoin <[email protected]> * [Misc] Move `print_*_once` from utils to logger (vllm-project#11298) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]> Co-authored-by: Maxime Fournioux <[email protected]> * [Doc] Intended links Python multiprocessing library (vllm-project#11878) * [perf]fix current stream (vllm-project#11870) Signed-off-by: youkaichao <[email protected]> * [Bugfix] Override dunder methods of placeholder modules (vllm-project#11882) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] fix beam search input errors and latency benchmark script (vllm-project#11875) Signed-off-by: Ye Qi <[email protected]> Co-authored-by: yeq <[email protected]> * [Doc] Add model development API Reference (vllm-project#11884) Signed-off-by: DarkLight1337 <[email protected]> * [platform] Allow platform specify attention backend (vllm-project#11609) Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> * [ci]try to fix flaky multi-step tests (vllm-project#11894) Signed-off-by: youkaichao <[email protected]> * [Misc] Provide correct Pixtral-HF chat template (vllm-project#11891) Signed-off-by: DarkLight1337 <[email protected]> * [Docs] Add Modal to deployment frameworks (vllm-project#11907) * [Doc][5/N] Move Community and API Reference to the bottom (vllm-project#11896) Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: Simon Mo <[email protected]> * [VLM] Enable tokenized inputs for merged multi-modal processor (vllm-project#11900) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Show default pooling method in a table (vllm-project#11904) Signed-off-by: DarkLight1337 <[email protected]> * [torch.compile] Hide KV cache behind torch.compile boundary (vllm-project#11677) Signed-off-by: Chen Zhang <[email protected]> * [Bugfix] Validate lora adapters to avoid crashing server (vllm-project#11727) Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [BUGFIX] Fix `UnspecifiedPlatform` package name (vllm-project#11916) Signed-off-by: Kunshang Ji <[email protected]> * [ci] fix gh200 tests (vllm-project#11919) Signed-off-by: youkaichao <[email protected]> * [misc] remove python function call for custom activation op (vllm-project#11885) Co-authored-by: youkaichao <[email protected]> * [platform] support pytorch custom op pluggable (vllm-project#11328) Signed-off-by: wangxiyuan <[email protected]> * Replace "online inference" with "online serving" (vllm-project#11923) Signed-off-by: Harry Mellor <[email protected]> * [ci] Fix sampler tests (vllm-project#11922) Signed-off-by: youkaichao <[email protected]> * [Doc] [1/N] Initial guide for merged multi-modal processor (vllm-project#11925) Signed-off-by: DarkLight1337 <[email protected]> * [platform] support custom torch.compile backend key (vllm-project#11318) Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]> * [Doc] Rename offline inference examples (vllm-project#11927) Signed-off-by: Harry Mellor <[email protected]> * [Docs] Fix docstring in `get_ip` function (vllm-project#11932) Signed-off-by: Kuntai Du <[email protected]> * Doc fix in `benchmark_long_document_qa_throughput.py` (vllm-project#11933) Signed-off-by: Kuntai Du <[email protected]> * [Hardware][CPU] Support MOE models on x86 CPU (vllm-project#11831) Signed-off-by: jiang1.li <[email protected]> * [Misc] Clean up debug code in Deepseek-V3 (vllm-project#11930) Signed-off-by: Isotr0py <[email protected]> * [Misc] Update benchmark_prefix_caching.py fixed example usage (vllm-project#11920) Signed-off-by: Ren MinMin <[email protected]> Co-authored-by: Ren MinMin <[email protected]> * [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939) Signed-off-by: Travis Johnson <[email protected]> * [mypy] Fix mypy warnings in api_server.py (vllm-project#11941) Signed-off-by: Fred Reiss <[email protected]> * [ci] fix broken distributed-tests-4-gpus (vllm-project#11937) Signed-off-by: youkaichao <[email protected]> * [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672) Signed-off-by: Sungjae Lee <[email protected]> * [Bugfix] fused_experts_impl wrong compute type for float32 (vllm-project#11921) Signed-off-by: shaochangxu.scx <[email protected]> Co-authored-by: shaochangxu.scx <[email protected]> * [CI/Build] Move model-specific multi-modal processing tests (vllm-project#11934) Signed-off-by: DarkLight1337 <[email protected]> * [Doc] Basic guide for writing unit tests for new models (vllm-project#11951) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix RobertaModel loading (vllm-project#11940) Signed-off-by: NickLucche <[email protected]> * [Model] Add cogagent model support vLLM (vllm-project#11742) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> * [V1] Avoid sending text prompt to core engine (vllm-project#11963) Signed-off-by: Roger Wang <[email protected]> * [CI/Build] Add markdown linter (vllm-project#11857) Signed-off-by: Rafael Vasquez <[email protected]> * [Model] Initialize support for Deepseek-VL2 models (vllm-project#11578) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Hardware][CPU] Multi-LoRA implementation for the CPU backend (vllm-project#11100) Signed-off-by: Akshat Tripathi <[email protected]> Signed-off-by: Oleg Mosalov <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Oleg Mosalov <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Isotr0py <[email protected]> * [Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764) * [V1][Core][1/n] Logging and Metrics (vllm-project#11962) Signed-off-by: [email protected] <[email protected]> * [Model] Support GGUF models newly added in `transformers` 4.46.0 (vllm-project#9685) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (vllm-project#11973) Signed-off-by: [email protected] <[email protected]> * [MISC] fix typo in kv transfer send recv test (vllm-project#11983) * [Bug] Fix usage of `.transpose()` and `.view()` consecutively. (vllm-project#11979) * [CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972) Signed-off-by: Sungjae Lee <[email protected]> * [Misc] Fix Deepseek V2 fp8 kv-scale remapping (vllm-project#11947) Signed-off-by: Yida Wu <[email protected]> * [Misc]Minor Changes about Worker (vllm-project#11555) Signed-off-by: Chenguang Li <[email protected]> * [platform] add ray_device_key (vllm-project#11948) Signed-off-by: youkaichao <[email protected]> * Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980) Signed-off-by: Alex-Brooks <[email protected]> * [Kernel] unified_attention for Attention.forward (vllm-project#11967) Signed-off-by: Chen Zhang <[email protected]> * [Doc][V1] Update model implementation guide for V1 support (vllm-project#11998) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Doc] Organise installation documentation into categories and tabs (vllm-project#11935) Signed-off-by: Harry Mellor <[email protected]> * [platform] add device_control env var (vllm-project#12009) Signed-off-by: youkaichao <[email protected]> * [Platform] Move get_punica_wrapper() function to Platform (vllm-project#11516) Signed-off-by: Shanshan Shen <[email protected]> * bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (vllm-project#11982) Signed-off-by: elijah <[email protected]> * Using list * Revert "[misc] improve memory profiling (vllm-project#11809)" This reverts commit 889e662. * Trying to make scales work with compileable attention * Docs lint --------- Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: yisheng <[email protected]> Signed-off-by: Abatom <[email protected]> Signed-off-by: Liangfu Chen <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> Signed-off-by: youkaichao <[email protected]> Signed-off-by: Sourashis Roy <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Nishidha Panpaliya <[email protected]> Signed-off-by: Ilya Lavrenov <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: yan ma <[email protected]> Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]> Signed-off-by: Ye Qi <[email protected]> Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Kuntai Du <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Ren MinMin <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Fred Reiss <[email protected]> Signed-off-by: Sungjae Lee <[email protected]> Signed-off-by: shaochangxu.scx <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: Rafael Vasquez <[email protected]> Signed-off-by: Akshat Tripathi <[email protected]> Signed-off-by: Oleg Mosalov <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: Yida Wu <[email protected]> Signed-off-by: Chenguang Li <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Shanshan Shen <[email protected]> Signed-off-by: elijah <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: YiSheng5 <[email protected]> Co-authored-by: Zhonghua Deng <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Yuan <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: jiangjiadi <[email protected]> Co-authored-by: jiadi.jjd <[email protected]> Co-authored-by: sroy745 <[email protected]> Co-authored-by: Jie Fu (傅杰) <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Divakar Verma <[email protected]> Co-authored-by: WangErXiao <[email protected]> Co-authored-by: Nishidha <[email protected]> Co-authored-by: Ilya Lavrenov <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Wallas Henrique <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Yan Ma <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: rasmith <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: Maxime Fournioux <[email protected]> Co-authored-by: Guspan Tanadi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: yeq <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Charles Frye <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: cennn <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: minmin <[email protected]> Co-authored-by: Ren MinMin <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Fred Reiss <[email protected]> Co-authored-by: Sungjae Lee <[email protected]> Co-authored-by: shaochangxu <[email protected]> Co-authored-by: shaochangxu.scx <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: sixgod <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Rafael Vasquez <[email protected]> Co-authored-by: Akshat Tripathi <[email protected]> Co-authored-by: Oleg Mosalov <[email protected]> Co-authored-by: Avshalom Manevich <[email protected]> Co-authored-by: Yangcheng Li <[email protected]> Co-authored-by: Siyuan Li <[email protected]> Co-authored-by: Concurrensee <[email protected]> Co-authored-by: Chenguang Li <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: elijah <[email protected]>

…-llama-fp8

hongxiayang

thanks!

jeejeelee and others added 30 commits December 24, 2024 13:05

[Misc] Move weights mapper (vllm-project#11443)

196c34b

Signed-off-by: Jee Jee Li <[email protected]>

[Bugfix] Fix issues in CPU build Dockerfile. Fixes vllm-project#9182 (v…

409475a

…llm-project#11435) Signed-off-by: Yuan Tang <[email protected]>

[Model] Automatic conversion of classification and reward models (vll…

3f3e92e

…m-project#11469) Signed-off-by: DarkLight1337 <[email protected]>

[V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (vl…

9832e55

…lm-project#11472)

[Misc] Update disaggregation benchmark scripts and test logs (vllm-pr…

fc60166

…oject#11456) Signed-off-by: Jiaxin Shan <[email protected]>

[Frontend] Enable decord to load video from base64 (vllm-project#11492)

b689ada

Signed-off-by: DarkLight1337 <[email protected]>

[Doc] Improve GitHub links (vllm-project#11491)

6ad909f

Signed-off-by: DarkLight1337 <[email protected]>

[Misc] Move some multimodal utils to modality-specific modules (vllm-…

51a624b

…project#11494) Signed-off-by: DarkLight1337 <[email protected]>

Mypy checking for vllm/compilation (vllm-project#11496)

dbeac95

Signed-off-by: lucast2021 <[email protected]> Co-authored-by: lucast2021 <[email protected]>

[Misc][LoRA] Fix LoRA weight mapper (vllm-project#11495)

aa25985

Signed-off-by: Jee Jee Li <[email protected]>

[Doc] Add QVQ and QwQ to the list of supported models (vllm-proje…

7492a36

…ct#11509) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[V1] Adding min tokens/repetition/presence/frequence penalties to V1 …

dcb1a94

…sampler (vllm-project#10681) Signed-off-by: Sourashis Roy <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

[Model] Modify MolmoForCausalLM MLP (vllm-project#11510)

f57ee56

Signed-off-by: Jee Jee Li <[email protected]>

[Misc] Add placeholder module (vllm-project#11501)

eec906d

Signed-off-by: DarkLight1337 <[email protected]>

[Doc] Add video example to openai client for multimodal (vllm-project…

b85a977

…#11521) Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[1/N] API Server (Remove Proxy) (vllm-project#11529)

720b10f

[2/N] API Server: Avoid ulimit footgun (vllm-project#11530)

55fb97f

Deepseek v3 (vllm-project#11502)

f49777b

Signed-off-by: mgoin <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: robertgshaw2-neuralmagic <[email protected]>

[Docs] Document Deepseek V3 support (vllm-project#11535)

82d24f7

Signed-off-by: simon-mo <[email protected]>

Update openai_compatible_server.md (vllm-project#11536)

0c0c201

Co-authored-by: Simon Mo <[email protected]>

[V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (vllm-…

371d04d

…project#11394) Signed-off-by: Woosuk Kwon <[email protected]>

[V1] Fix yapf (vllm-project#11538)

81b979f

Signed-off-by: Woosuk Kwon <[email protected]>

[CI] Fix broken CI (vllm-project#11543)

46d4359

[misc] fix typing (vllm-project#11540)

eb881ed

Signed-off-by: youkaichao <[email protected]>

[V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (v…

1b875a0

…llm-project#11534)

[BugFix] Fix quantization for all other methods (vllm-project#11547)

2339d59

[Platform] Move model arch check to platform (vllm-project#11503)

6c6f7fe

Signed-off-by: Mengqing Cao <[email protected]>

Update deploying_with_k8s.md with AMD ROCm GPU example (vllm-project#…

d003f3e

…11465) Signed-off-by: Alex He <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Bugfix] Fix TeleChat2ForCausalLM weights mapper (vllm-project#11546)

2c9b8ea

Signed-off-by: Jee Jee Li <[email protected]>

noemotiovon and others added 21 commits January 13, 2025 15:47

[Misc]Minor Changes about Worker (vllm-project#11555)

c3f05b0

Signed-off-by: Chenguang Li <[email protected]>

[platform] add ray_device_key (vllm-project#11948)

89ce62a

Signed-off-by: youkaichao <[email protected]>

Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980)

5340a30

Signed-off-by: Alex-Brooks <[email protected]>

[Kernel] unified_attention for Attention.forward (vllm-project#11967)

0f8cafe

Signed-off-by: Chen Zhang <[email protected]>

[Doc][V1] Update model implementation guide for V1 support (vllm-proj…

cd82499

…ect#11998) Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

[Doc] Organise installation documentation into categories and tabs (v…

e8c23ff

…llm-project#11935) Signed-off-by: Harry Mellor <[email protected]>

[platform] add device_control env var (vllm-project#12009)

458e63a

Signed-off-by: youkaichao <[email protected]>

[Platform] Move get_punica_wrapper() function to Platform (vllm-proje…

a7d5968

…ct#11516) Signed-off-by: Shanshan Shen <[email protected]>

bugfix: Fix signature mismatch in benchmark's get_tokenizer function (

c6db213

vllm-project#11982) Signed-off-by: elijah <[email protected]>

Merge remote-tracking branch 'upstream/main'

ce53f46

Using list

5a51290

Revert "[misc] improve memory profiling (vllm-project#11809)"

079750e

This reverts commit 889e662.

Trying to make scales work with compileable attention

043c93d

Docs lint

16f8680

Merge remote-tracking branch 'origin/main' into upstream_merge_25_01_13

eb4abfd

Merge remote-tracking branch 'origin/main' into main-to-llama-fp8

7b8c3be

Merge remote-tracking branch 'origin/main' into main-to-llama-fp8

ed572dd

Merge remote-tracking branch 'origin/llama_fp8_12062024' into main-to…

36999a2

…-llama-fp8

linter formatting bug fixes

02962b6

tjtanaa marked this pull request as ready for review January 15, 2025 06:09

tjtanaa marked this pull request as draft January 15, 2025 06:09

tjtanaa marked this pull request as ready for review January 15, 2025 07:56

tjtanaa marked this pull request as draft January 15, 2025 07:59

tjtanaa marked this pull request as ready for review January 15, 2025 08:04

vllmellm added 2 commits January 15, 2025 09:24

inherit config file updates under fused_moe from main branch.

7c05f3e

match tests for the MOE layers with main.

af684f9

hongxiayang approved these changes Jan 15, 2025

View reviewed changes

hongxiayang merged commit d9385b4 into ROCm:llama_fp8_12062024 Jan 15, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

tjtanaa commented Jan 15, 2025 •

edited by github-actions bot

Loading

hongxiayang left a comment

[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

[MFM-20250115] Merge from ROCm/main to llama_fp8 #360

Conversation

tjtanaa commented Jan 15, 2025 • edited by github-actions bot Loading

hongxiayang left a comment

Choose a reason for hiding this comment

tjtanaa commented Jan 15, 2025 •

edited by github-actions bot

Loading