-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Misc] Add offline test for disaggregated prefill
#12418
opened Jan 24, 2025 by
Shaoting-Feng
Loading…
[Bugfix] Disable w16a16 2of4 sparse CompressedTensors24
#12417
opened Jan 24, 2025 by
tlrmchlsmth
Loading…
[V1] Revert ONLY add when PR is ready to merge/full CI is needed
uncache_blocks
and support recaching full blocks
ready
#12415
opened Jan 24, 2025 by
comaniac
Loading…
[Bugfix][Kernel] Fix moe align block issue for mixtral
ready
ONLY add when PR is ready to merge/full CI is needed
#12413
opened Jan 24, 2025 by
ElizaWszola
Loading…
[Bugfix] Fix BLIP-2 processing
ready
ONLY add when PR is ready to merge/full CI is needed
#12412
opened Jan 24, 2025 by
DarkLight1337
Loading…
[Frontend] Support override generation config in args
#12409
opened Jan 24, 2025 by
liuyanyi
Loading…
[ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8
ready
ONLY add when PR is ready to merge/full CI is needed
#12408
opened Jan 24, 2025 by
divakar-amd
Loading…
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len
#12407
opened Jan 24, 2025 by
WangErXiao
Loading…
[Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons).
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#12405
opened Jan 24, 2025 by
LucasWilkinson
Loading…
[Bugfix] Fix output_tokens is 0 if using tgi backend
#12394
opened Jan 24, 2025 by
sywangyi
Loading…
[torch.compile] PyTorch 2.6 and nightly compatibility
#12393
opened Jan 24, 2025 by
youkaichao
Loading…
[Hardware][Intel GPU] add XPU bf16 support
documentation
Improvements or additions to documentation
#12392
opened Jan 24, 2025 by
jikunshang
Loading…
[Frontend] Rerank API (Jina- and Cohere-compatible API)
documentation
Improvements or additions to documentation
frontend
#12376
opened Jan 24, 2025 by
K-Mistele
Loading…
[Core] add and implement
VLLM_LOGITS_PROCESSOR_THREADS
#12368
opened Jan 23, 2025 by
akeshet
Loading…
[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)
#12359
opened Jan 23, 2025 by
SanjuCSudhakaran
•
Draft
[Misc] Add FA2 support to ViT MHA layer
ready
ONLY add when PR is ready to merge/full CI is needed
#12355
opened Jan 23, 2025 by
Isotr0py
Loading…
[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense
#12347
opened Jan 23, 2025 by
tjohnson31415
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.