Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add local-telemetry stack for investigating server performance #16483

Merged
merged 7 commits into from
Dec 31, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/v3/develop/settings-ref.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1175,7 +1175,7 @@ A connection timeout, in seconds, applied to database connections. Defaults to `

**Type**: `number | None`

**Default**: `5`
**Default**: `5.0`

**TOML dotted key path**: `server.database.connection_timeout`

Expand Down
76 changes: 76 additions & 0 deletions load_testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# investigating server performance

requirements:

- docker
- opentelemetry libraries

```bash
uv pip install opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-sqlalchemy \
opentelemetry-instrumentation-fastapi
```

#### note

allow the following scripts to run via `chmod +x` (or similar)

```bash
./load_testing/local-telemetry/start
./load_testing/run-server.sh
./load_testing/populate-server.sh
```

### start the local telemetry stack

```bash
./load_testing/local-telemetry/start
```

### run the server with tracing

You can run the server with either SQLite (default) or PostgreSQL using the `run-server.sh` script:

```bash
# Run with SQLite (default)
./load_testing/run-server.sh

# Run with PostgreSQL 15
./load_testing/run-server.sh postgres:15
```

The script will:
- For SQLite: Use the default SQLite configuration
- For PostgreSQL:
- Start a Docker container with the specified version
- Configure the database connection
- Handle container lifecycle (reuse if possible, recreate if version changes)

If you need to run the server manually, here are the environment variables used:

```bash
prefect config set PREFECT_API_URL=http://localhost:4200/api

unset $(env | grep OTEL_ | cut -d= -f1)
export OTEL_SERVICE_NAME=prefect-server
export OTEL_TRACES_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_LOG_LEVEL=debug
export PYTHONPATH=/Users/nate/github.com/prefecthq/prefect/src
zzstoatzz marked this conversation as resolved.
Show resolved Hide resolved
zzstoatzz marked this conversation as resolved.
Show resolved Hide resolved
```

### populate the server

create a work pool and some deployments
```bash
./load_testing/populate-server.sh
```

### start a worker

```bash
prefect worker start --pool local
```
20 changes: 20 additions & 0 deletions load_testing/local-telemetry/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# local-telemetry

This directory includes an OpenTelemetry stack that you can run locally for testing and debugging.

## Quickstart

```bash
$ ./local-telemetry/start
```

This will start the local OpenTelemetry stack in the background. Several services
will be running:

* Jaeger is a frontend for viewing traces, and will be available at http://localhost:16686
* Prometheus captures metrics, and exposes a frontend at http://localhost:9090

Then, run your local server according to the instructions in the [load_testing/README.md](../README.md) file.

When making requests against your local server, you'll see trace appearing in the
Jaeger frontend at `http://localhost:16686`.
32 changes: 32 additions & 0 deletions load_testing/local-telemetry/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
services:
prometheus:
image: prom/prometheus:v2.48.0
ports:
- 9090:9090
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
jaeger:
image: jaegertracing/all-in-one:1.51.0
depends_on:
- prometheus
ports:
- 5775:5775/udp # zipkin.thrift (legacy)
- 6831:6831/udp # jaeger.thrift (compact)
- 6832:6832/udp # jaeger.thrift (binary)
- 5778:5778 # configs
- 16686:16686 # frontend
- 14250:14250 # model.proto
- 14268:14268 # jaeger.thrift (direct)
- 14269:14269 # jaeger's health check
- 9411:9411 # zipkin (http)
collector:
image: otel/opentelemetry-collector-contrib:0.113.0
depends_on:
- jaeger
- prometheus
ports:
- 4317:4317 # OTLP (gRPC)
- 8888:8888 # collector's metrics for Prometheus
- 8889:8889 # Prometheus exporter
volumes:
- ./otelcol-config.yaml:/etc/otelcol-contrib/config.yaml:ro
38 changes: 38 additions & 0 deletions load_testing/local-telemetry/otelcol-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

processors:
batch:

exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true

debug:

prometheus:
endpoint: "0.0.0.0:8889"
send_timestamps: true
metric_expiration: 180m
resource_to_telemetry_conversion:
enabled: true

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger, debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus, debug]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
9 changes: 9 additions & 0 deletions load_testing/local-telemetry/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
global:
scrape_interval: 15s

scrape_configs:
- job_name: opentelemetry
static_configs:
- targets:
- collector:8888
- collector:8889
5 changes: 5 additions & 0 deletions load_testing/local-telemetry/start
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
if ! docker network inspect telemetry > /dev/null 2>&1; then
docker network create telemetry
fi
docker compose --project-directory $(dirname $0) up -d
5 changes: 5 additions & 0 deletions load_testing/populate-server.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

prefect --no-prompt work-pool create local --type process --overwrite

prefect --no-prompt deploy --all --prefect-file load_testing/prefect.yaml
Comment on lines +3 to +5
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

populate the server

87 changes: 87 additions & 0 deletions load_testing/run-server.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/usr/bin/env bash

DB_TYPE=${1:-sqlite} # Default to sqlite if no argument provided

# Function to start postgres container
start_postgres() {
local version=$1
local container_name="prefect-postgres"
local volume_name="prefectdb"

# Check if container exists
if docker ps -a --format '{{.Names}}' | grep -q "^${container_name}$"; then
echo "Found existing PostgreSQL container..."

# Get current version from running container
local current_version
current_version=$(docker exec ${container_name} postgres --version 2>/dev/null | grep -oE '[0-9]+' | head -1 || echo "0")

if [ "$current_version" != "${version%%.*}" ]; then
echo "Version mismatch: existing=${current_version}, requested=${version%%.*}"
echo "Removing container and volume for clean start..."
docker rm -f ${container_name} >/dev/null 2>&1
docker volume rm ${volume_name} >/dev/null 2>&1
else
echo "Version matches, reusing existing container..."
if ! docker start ${container_name} >/dev/null 2>&1; then
echo "Failed to start existing container, recreating..."
docker rm -f ${container_name} >/dev/null 2>&1
docker volume rm ${volume_name} >/dev/null 2>&1
fi
fi
fi

# Start container if it doesn't exist or was removed
if ! docker ps --format '{{.Names}}' | grep -q "^${container_name}$"; then
echo "Starting PostgreSQL ${version} container..."
docker run -d --name ${container_name} \
-v ${volume_name}:/var/lib/postgresql/data \
-p 5432:5432 \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=yourTopSecretPassword \
-e POSTGRES_DB=prefect \
postgres:${version}
fi

echo "Waiting for PostgreSQL to be ready..."
local retries=0
local max_retries=30
while ! docker exec ${container_name} pg_isready -U postgres > /dev/null 2>&1; do
((retries++))
if [ $retries -gt $max_retries ]; then
echo " Failed to start PostgreSQL after ${max_retries} seconds"
docker logs ${container_name}
exit 1
fi
echo -n "."
sleep 1
done
echo " PostgreSQL is ready!"
}

# Set database URL based on type
if [[ $DB_TYPE == sqlite ]]; then
: # Use default SQLite configuration
elif [[ $DB_TYPE == postgres:* ]]; then
PG_VERSION=${DB_TYPE#postgres:}
start_postgres $PG_VERSION
export PREFECT_API_DATABASE_CONNECTION_URL="postgresql+asyncpg://postgres:yourTopSecretPassword@localhost:5432/prefect"
else
echo "Invalid database type. Use 'sqlite' or 'postgres:<version>'"
exit 1
fi

PREFECT_API_URL=http://localhost:4200/api \
OTEL_SERVICE_NAME=prefect-server \
OTEL_TRACES_EXPORTER=otlp \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_LOG_LEVEL=debug \
PYTHONPATH=src \
opentelemetry-instrument \
uvicorn \
--app-dir src \
--factory prefect.server.api.server:create_app \
--host 127.0.0.1 \
--port 4200 \
--timeout-keep-alive 5
2 changes: 1 addition & 1 deletion schemas/settings.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1028,7 +1028,7 @@
"type": "null"
}
],
"default": 5,
"default": 5.0,
"description": "A connection timeout, in seconds, applied to database connections. Defaults to `5`.",
"supported_environment_variables": [
"PREFECT_SERVER_DATABASE_CONNECTION_TIMEOUT",
Expand Down
1 change: 0 additions & 1 deletion src/prefect/main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Import user-facing API
from typing import Any

from prefect.deployments import deploy
from prefect.states import State
from prefect.logging import get_run_logger
Expand Down
Loading
Loading