Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding jaeger otel example + small typos in instance settings #5009

Merged
merged 2 commits into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions examples/deploy/otel-tracing-jaeger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Tracing with Jaeger
===================

[Jaeger](https://www.jaegertracing.io/) is an open-source distributed tracing system for monitoring and debugging microservices. Originally developed by Uber, it helps track requests across services, analyze latency, identify bottlenecks, and diagnose failures.

Key use cases include debugging production issues, monitoring performance, visualizing service dependencies, and optimizing system reliability. As Jaeger supports the OpenTelemetry protocol, it can be used to collect traces from Windmill.

Follow the guide on [setting up Jaeger](https://windmill.dev/docs/misc/guides/otel#setting-up-jaeger) for more details.

## Setting up Jaeger along with Windmill

Start all services by running:

```bash
docker-compose up -d
```

## Configuring Windmill to use Jaeger

In the Windmill UI available at `http://localhost`, complete the initial setup and go to "Instances Settings" and "OTEL/Prom" tab and fill in the Jaeger endpoint and service name and toggle the Tracing option to send traces to Jaeger.

## Open the Jaeger UI

The Jaeger UI, if hosted with the `docker-compose.yml` file above, will be available at `http://localhost:16686`. When running a script or workflow with Windmill, you will be able to see the traces in the Jaeger UI and investigate them. This can be useful to understand the performance of a workflow and identify bottlenecks in the Windmill server or client.

## Searching for specific traces

To search/filter for a specific trace, for example a workflow, you can use the search function in the Jaeger UI by filtering by tags set by Windmill.

The following tags are useful to filter for specific traces:

- `job_id`: The ID of the job
- `root_job`: The ID of the root job (flow)
- `parent_job`: The ID of the parent job (flow)
- `flow_step_id`: The ID of the step within the workflow
- `script_path`: The path of the script
- `workspace_id`: The name of the workspace
- `worker_id`: The ID of the worker
- `language`: The language of the script
- `tag`: The queue tag of the workflow

## Monitoring metrics with Jaeger

Jaeger can be used to generate time series for metrics of the collected traces. These time series can be used to compare the performance of individual steps within a workflow and their overall performance and relative contribution over time, as well as identify and troubleshoot issues and anomalies.

In the Jaeger UI, you will now be able to see metrics time series for the traces in the "Monitor" tab.
201 changes: 201 additions & 0 deletions examples/deploy/otel-tracing-jaeger/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
version: "3.7"

services:
db:
deploy:
# To use an external database, set replicas to 0 and set DATABASE_URL to the external database url in the .env file
replicas: 1
image: postgres:16
shm_size: 1g
restart: unless-stopped
volumes:
- db_data:/var/lib/postgresql/data
expose:
- 5432
ports:
- 5432:5432
environment:
POSTGRES_PASSWORD: changeme
POSTGRES_DB: windmill
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5

windmill_server:
image: ${WM_IMAGE}
pull_policy: always
deploy:
replicas: 1
restart: unless-stopped
expose:
- 8000
- 2525
environment:
- DATABASE_URL=${DATABASE_URL}
- MODE=server
depends_on:
db:
condition: service_healthy
volumes:
- worker_logs:/tmp/windmill/logs

windmill_worker:
image: ${WM_IMAGE}
pull_policy: always
deploy:
replicas: 3
resources:
limits:
cpus: "1"
memory: 2048M
# for GB, use syntax '2Gi'
restart: unless-stopped
environment:
- DATABASE_URL=${DATABASE_URL}
- MODE=worker
- WORKER_GROUP=default
depends_on:
db:
condition: service_healthy
# to mount the worker folder to debug, KEEP_JOB_DIR=true and mount /tmp/windmill
volumes:
# mount the docker socket to allow to run docker containers from within the workers
- /var/run/docker.sock:/var/run/docker.sock
- worker_dependency_cache:/tmp/windmill/cache
- worker_logs:/tmp/windmill/logs

## This worker is specialized for "native" jobs. Native jobs run in-process and thus are much more lightweight than other jobs
windmill_worker_native:
# Use ghcr.io/windmill-labs/windmill-ee:main for the ee
image: ${WM_IMAGE}
pull_policy: always
deploy:
replicas: 1
resources:
limits:
cpus: "1"
memory: 2048M
# for GB, use syntax '2Gi'
restart: unless-stopped
environment:
- DATABASE_URL=${DATABASE_URL}
- MODE=worker
- WORKER_GROUP=native
- NUM_WORKERS=8
- SLEEP_QUEUE=200
depends_on:
db:
condition: service_healthy
volumes:
- worker_logs:/tmp/windmill/logs
# This worker is specialized for reports or scraping jobs. It is assigned the "reports" worker group which has an init script that installs chromium and can be targeted by using the "chromium" worker tag.
# windmill_worker_reports:
# image: ${WM_IMAGE}
# pull_policy: always
# deploy:
# replicas: 1
# resources:
# limits:
# cpus: "1"
# memory: 2048M
# # for GB, use syntax '2Gi'
# restart: unless-stopped
# environment:
# - DATABASE_URL=${DATABASE_URL}
# - MODE=worker
# - WORKER_GROUP=reports
# depends_on:
# db:
# condition: service_healthy
# # to mount the worker folder to debug, KEEP_JOB_DIR=true and mount /tmp/windmill
# volumes:
# # mount the docker socket to allow to run docker containers from within the workers
# - /var/run/docker.sock:/var/run/docker.sock
# - worker_dependency_cache:/tmp/windmill/cache
# - worker_logs:/tmp/windmill/logs

# The indexer powers full-text job and log search, an EE feature.
windmill_indexer:
image: ${WM_IMAGE}
pull_policy: always
deploy:
replicas: 0 # set to 1 to enable full-text job and log search
restart: unless-stopped
expose:
- 8001
environment:
- PORT=8001
- DATABASE_URL=${DATABASE_URL}
- MODE=indexer
depends_on:
db:
condition: service_healthy
volumes:
- windmill_index:/tmp/windmill/search
- worker_logs:/tmp/windmill/logs

lsp:
image: ghcr.io/windmill-labs/windmill-lsp:latest
pull_policy: always
restart: unless-stopped
expose:
- 3001
volumes:
- lsp_cache:/root/.cache

multiplayer:
image: ghcr.io/windmill-labs/windmill-multiplayer:latest
deploy:
replicas: 0 # Set to 1 to enable multiplayer, only available on Enterprise Edition
restart: unless-stopped
expose:
- 3002

caddy:
image: ghcr.io/windmill-labs/caddy-l4:latest
restart: unless-stopped
# Configure the mounted Caddyfile and the exposed ports or use another reverse proxy if needed
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
# - ./certs:/certs # Provide custom certificate files like cert.pem and key.pem to enable HTTPS - See the corresponding section in the Caddyfile
ports:
# To change the exposed port, simply change 80:80 to <desired_port>:80. No other changes needed
- 80:80
- 25:25
# - 443:443 # Uncomment to enable HTTPS handling by Caddy
environment:
- BASE_URL=":80"
# - BASE_URL=":443" # uncomment and comment line above to enable HTTPS via custom certificate and key files
# - BASE_URL=mydomain.com # Uncomment and comment line above to enable HTTPS handling by Caddy

# Jaeger OpenTelemetry Example
# https://windmill.dev/docs/misc/guides/otel#setting-up-jaeger
jaeger:
image: jaegertracing/jaeger:latest
ports:
- "16686:16686"
expose:
- 4317
- 8889
volumes:
- ./jaeger-config.yaml:/etc/jaeger/config.yml
command: ["--config", "/etc/jaeger/config.yml"]

prometheus:
image: prom/prometheus:latest
restart: unless-stopped
expose:
- 9090
volumes:
- ./prometheus-config.yaml:/etc/prometheus/prometheus.yml
command:
- "--config.file=/etc/prometheus/prometheus.yml"

volumes:
db_data: null
worker_dependency_cache: null
worker_logs: null
windmill_index: null
lsp_cache: null
53 changes: 53 additions & 0 deletions examples/deploy/otel-tracing-jaeger/jaeger-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
service:
extensions: [jaeger_storage, jaeger_query]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger_storage_exporter, spanmetrics]
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
telemetry:
resource:
service.name: jaeger
metrics:
level: detailed
address: 0.0.0.0:8888
logs:
level: DEBUG

extensions:
jaeger_query:
storage:
traces: some_storage
metrics: some_metrics_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000
metric_backends:
some_metrics_storage:
prometheus:
endpoint: http://prometheus:9090
normalize_calls: true
normalize_duration: true

connectors:
spanmetrics:

receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"

processors:
batch:

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
prometheus:
endpoint: "0.0.0.0:8889"
9 changes: 9 additions & 0 deletions examples/deploy/otel-tracing-jaeger/prometheus-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

scrape_configs:
- job_name: aggregated-trace-metrics
static_configs:
- targets: ['jaeger:8889']
4 changes: 2 additions & 2 deletions frontend/src/lib/components/InstanceSetting.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@
<div>
<label for="refresh_index_period" class="block text-sm font-medium">
Refresh index period (s) <Tooltip>
The index will query new jobs peridically and write them on the index. This
The index will query new jobs periodically and write them on the index. This
setting sets that period.
</Tooltip></label
>
Expand Down Expand Up @@ -548,7 +548,7 @@
<h3>Service Logs Index</h3>
<div>
<label for="commit_log_max_batch_size" class="block text-sm font-medium"
>Commit max batch size Commit max batch size <Tooltip>
>Commit max batch size <Tooltip>
The max amount of documents per commit. In this case 1 document is one log file
representing all logs during 1 minute for a specific host. To optimize indexing
throughput, it is best to keep this as high as possible. However, especially when
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/lib/components/instanceSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ export const settings: Record<string, Setting[]> = {
{
label: 'Prometheus',
description:
'Expose Prometheus metrics for workers and servers on port 8001 at /metrics. <a href="https://www.windmill.dev/docs/advanced/instance_settings#expose-metrics">Learn more</a>',
'Expose Prometheus metrics for workers and servers on port 8001 at /metrics. <a target="_blank" href="https://www.windmill.dev/docs/advanced/instance_settings#expose-metrics">Learn more</a>',
key: 'expose_metrics',
fieldType: 'boolean',
storage: 'setting',
Expand Down