[CI/Build] Adds Modal runners for performance benchmark #11239

erik-dunteman · 2024-12-16T19:39:23Z

This PR is to improve performance benchmark by:

decreasing overall GPU spend by scaling to zero when not in use
allowing concurrent jobs to run

We do this by moving from single always-on GPU agents to CPU-based runners running Modal client to spawn GPUs as needed. Currently for A100 and H100.

Structure

launch-modal-runner.py is our new job script, which will build and launch the following image onto the targeted GPU, and execute the bash command just as is done in the current docker plugin setup.

BASE_IMG = f"public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:{BUILDKITE_COMMIT}"

Admin changes required:

change A100 and H100 buildkite queue to CPU agents
add modal_token_id and modal_token_secret to buildkite agent secrets

github-actions · 2024-12-16T19:39:46Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

erik-dunteman · 2024-12-16T19:53:30Z

@simon-mo this PR is a WIP, need your advice on a couple things:

General integration questions

is it ok to move to CPU based buildkite agents for the A100 and H100 queues? Is there anything I could do to set that up or do you already have cpu agents that could be used?

Env issues - VLLM dev install

When running run-performance-benchmarks.sh in the public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:{BUILDKITE_COMMIT} image, the python3 -m vllm.entrypoints.openai.api_server commands are failing with module not found: vllm.

Running run-performance-benchmarks.sh in the image is similar to what's currently done with the docker setup, minus the devshm mount, so perhaps you're mounting in the library?

I've tried installing VLLM manually, both in the image build step with a pip install from git source, and at runtime (as you see in the current state of this PR). Logs here for running the current setup: https://gist.github.com/erik-dunteman/f75f0733ac6a78de73d25220a4a3f58a

Would love your guidance on what's missing for getting VLLM installed. Ideally in the build step to keep billable GPU time down, but at runtime if needed.

erik-dunteman · 2024-12-16T19:57:28Z

will address ruff and other checks once I get above VLLM install issue sorted and confirm scripts run as expected

erik-dunteman added 2 commits December 16, 2024 11:18

structurally working but missing vllm install

bd2b987

updated structure

4b9b399

mergify bot added the ci/build label Dec 16, 2024

Test ci

6881f38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/Build] Adds Modal runners for performance benchmark #11239

[CI/Build] Adds Modal runners for performance benchmark #11239

erik-dunteman commented Dec 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 16, 2024

erik-dunteman commented Dec 16, 2024

erik-dunteman commented Dec 16, 2024

[CI/Build] Adds Modal runners for performance benchmark #11239

Are you sure you want to change the base?

[CI/Build] Adds Modal runners for performance benchmark #11239

Conversation

erik-dunteman commented Dec 16, 2024 • edited by github-actions bot Loading

Structure

Admin changes required:

github-actions bot commented Dec 16, 2024

erik-dunteman commented Dec 16, 2024

General integration questions

Env issues - VLLM dev install

erik-dunteman commented Dec 16, 2024

erik-dunteman commented Dec 16, 2024 •

edited by github-actions bot

Loading