Skip to content

Commit

Permalink
fix gh200 tests on main (vllm-project#11246)
Browse files Browse the repository at this point in the history
Signed-off-by: youkaichao <[email protected]>
  • Loading branch information
youkaichao authored and Ubuntu committed Jan 19, 2025
1 parent df1ce6c commit 5fd760d
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 6 deletions.
4 changes: 2 additions & 2 deletions .buildkite/run-gh200-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ set -ex

# Try building the docker image
DOCKER_BUILDKIT=1 docker build . \
--target test \
-platform "linux/arm64" \
--target vllm-openai \
--platform "linux/arm64" \
-t gh200-test \
--build-arg max_jobs=66 \
--build-arg nvcc_threads=2 \
Expand Down
5 changes: 1 addition & 4 deletions docs/source/serving/deploying_with_docker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,13 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
# Example of building on Nvidia GH200 server. (Memory usage: ~12GB, Build time: ~1475s / ~25 min, Image size: 7.26GB)
$ DOCKER_BUILDKIT=1 sudo docker build . \
--target vllm-openai \
-platform "linux/arm64" \
--platform "linux/arm64" \
-t vllm/vllm-gh200-openai:latest \
--build-arg max_jobs=66 \
--build-arg nvcc_threads=2 \
--build-arg torch_cuda_arch_list="9.0+PTX" \
--build-arg vllm_fa_cmake_gpu_arches="90-real"
To run vLLM:

.. code-block:: console
Expand Down

0 comments on commit 5fd760d

Please sign in to comment.