Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement inference server by using vllm #624

Merged
merged 7 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:

- name: Run inference api unit tests
run: |
make inference-api-e2e
DEVICE=cpu make inference-api-e2e

- name: Upload Codecov report
uses: codecov/codecov-action@e28ff129e5465c2c0dcc6f003fc735cb6ae0c673 # v4.5.0
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
*.dylib
bin/*
Dockerfile.cross
__pycache__/

# Test binary, build with `go test -c`
*.test
Expand Down
7 changes: 4 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,10 @@ unit-test: ## Run unit tests.
## E2E tests
## --------------------------------------

inference-api-e2e:
pip install -r presets/inference/text-generation/requirements.txt
pytest -o log_cli=true -o log_cli_level=INFO presets/inference/text-generation/tests
inference-api-e2e:
pip install virtualenv
./hack/run-pytest-in-venv.sh presets/inference/vllm presets/inference/vllm/requirements.txt
./hack/run-pytest-in-venv.sh presets/inference/text-generation presets/inference/text-generation/requirements.txt

# Ginkgo configurations
GINKGO_FOCUS ?=
Expand Down
36 changes: 36 additions & 0 deletions hack/run-pytest-in-venv.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env bash

ishaansehgal99 marked this conversation as resolved.
Show resolved Hide resolved
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <test_dir> <requirements.txt>"
exit 1
fi

PROJECT_DIR=$(dirname "$(dirname "$(realpath "$0")")")

TEST_DIR="$PROJECT_DIR/$1"
REQUIREMENTS="$PROJECT_DIR/$2"
VENV_DIR=$(mktemp -d)

cleanup() {
rm -rf "$VENV_DIR"
}
trap cleanup EXIT

cd $VENV_DIR
printf "Creating virtual environment in %s\n" "$VENV_DIR"
python3 -m virtualenv venv
source "$VENV_DIR/venv/bin/activate"
if [ "$?" -ne 0 ]; then
printf "Failed to activate virtual environment\n"
exit 1
fi

printf "Installing requirements from %s\n" "$REQUIREMENTS"
pip install -r "$REQUIREMENTS" > "$VENV_DIR/pip.log"
if [ "$?" -ne 0 ]; then
cat "$VENV_DIR/pip.log"
exit 1
fi

printf "Running tests in %s\n" "$TEST_DIR"
pytest -o log_cli=true -o log_cli_level=INFO "$TEST_DIR"
2 changes: 1 addition & 1 deletion presets/inference/llama2-chat/inference_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ def get_metrics():
return {"error": str(e)}

def setup_worker_routes():
@app_worker.get("/healthz")
@app_worker.get("/health")
def health_check():
if not torch.cuda.is_available():
raise HTTPException(status_code=500, detail="No GPU available")
Expand Down
2 changes: 1 addition & 1 deletion presets/inference/text-generation/api_spec.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
}
}
},
"/healthz": {
"/health": {
"get": {
"summary": "Health Check Endpoint",
"operationId": "health_check_healthz_get",
Expand Down
4 changes: 2 additions & 2 deletions presets/inference/text-generation/inference_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ def home():
class HealthStatus(BaseModel):
status: str = Field(..., example="Healthy")
@app.get(
"/healthz",
"/health",
response_model=HealthStatus,
summary="Health Check Endpoint",
responses={
Expand Down Expand Up @@ -461,7 +461,7 @@ def get_metrics():
if torch.cuda.is_available():
gpus = GPUtil.getGPUs()
gpu_info = [GPUInfo(
id=gpu.id,
id=str(gpu.id),
name=gpu.name,
load=f"{gpu.load * 100:.2f}%",
temperature=f"{gpu.temperature} C",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def test_read_main(configured_app):

def test_health_check(configured_app):
client = TestClient(configured_app)
response = client.get("/healthz")
response = client.get("/health")
assert response.status_code == 200
assert response.json() == {"status": "Healthy"}

Expand Down
Loading
Loading