feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

sangstar · 2024-06-17T15:13:37Z

vllm-tensorizer hasn't had updates since vLLM's formal adoption of tensorizer model loading. An update to build for the most recent commit to vLLM that includes sharded tensorizer support is presented, along with some fixes to successfully build vLLM with recent updates to the source code. These include:

Building vLLM's wheel from vLLM's source code proper, rather than CoreWeave's vLLM fork (reflecting its official adoption of tensorizer)
vLLM's adoption of cmake
Updated xformers version to 0.0.26.post1
vLLM now formally using their own forked version of flash-attn, which is built here from source

vLLM replaced their usages of the regular `flash-attn` library with their own `vllm-flash-attn` fork, which, as of right now, is fairly easy to compile. This change compiles it from source for compatibility with the `ml-containers/torch` base images. [skip ci]

build(vllm-tensorizer): Compile `vllm-flash-attn` from source

sangstar and others added 9 commits June 14, 2024 10:04

fix: Update repo to non-forked vllm and commit hash

c7fde68

fix: Untruncate commit hash

02e2398

fix: Specify exact setuptools version

4cb97cb

fix: Add cmake to list of installed packages

943033e

fix: Removed MAX_JOBS build arg

ad8176a

fix: Update torch-extras to use torch2.3.0

b22ab62

build(vllm-tensorizer): Build with cuda-nvtx installed

93acafd

Merge pull request #70 from coreweave/es/vllm-tensorizer

467a303

build(vllm-tensorizer): Compile `vllm-flash-attn` from source

sangstar requested a review from Eta0 June 17, 2024 15:13

harubaru mentioned this pull request Jul 31, 2024

LLM Finetuner Cleanup coreweave/kubernetes-cloud#376

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

sangstar commented Jun 17, 2024 •

edited

Loading

feat(vllm-tensorizer): Update vllm-tensorizer cloned repository, build with vllm-flash-attn, other optimizations #72

Are you sure you want to change the base?

feat(vllm-tensorizer): Update vllm-tensorizer cloned repository, build with vllm-flash-attn, other optimizations #72

Conversation

sangstar commented Jun 17, 2024 • edited Loading

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

feat(vllm-tensorizer): Update `vllm-tensorizer` cloned repository, build with `vllm-flash-attn`, other optimizations #72

sangstar commented Jun 17, 2024 •

edited

Loading