Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: document reverse proxy config for job streams #4018

Merged
merged 10 commits into from
Aug 5, 2024
12 changes: 12 additions & 0 deletions docs/apis-tools/go-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m

**If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this.

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder:

```go
var JobWorkerBuilderStep3 builder;
// builder is set in some way
builder.StreamRequestTimeout(30 * time.Minute);
```

## Additional resources

- [Job worker reference](/components/concepts/job-workers.md)
11 changes: 11 additions & 0 deletions docs/apis-tools/java-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m
If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again.
:::

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder:

```java
final JobWorkerBuilderStep3 builder = ...;
builder.streamTimeout(Duration.ofMinutes(30));
```

## Multi-tenancy

You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure
Expand Down
4 changes: 4 additions & 0 deletions docs/components/concepts/job-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu

On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible.

### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

### Troubleshooting

Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams.
Expand Down
25 changes: 25 additions & 0 deletions docs/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
id: job-streaming
title: "Job streaming"
sidebar_label: "Job streaming"
description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster."
---

[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers.

When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals.

:::note
This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example.

Proxies which support forwarding HTTP/2 keepalive do not require any change.
:::

The following configuration is recommended for impacted reverse proxies:

- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker).
- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes).

## Nginx

Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration.
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1032,6 +1032,7 @@ module.exports = {
"Zeebe Gateway": [
"self-managed/zeebe-deployment/zeebe-gateway/overview",
"self-managed/zeebe-deployment/zeebe-gateway/interceptors",
"self-managed/zeebe-deployment/zeebe-gateway/job-streaming",
],
},
{
Expand Down
12 changes: 12 additions & 0 deletions versioned_docs/version-8.4/apis-tools/go-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m

**If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this.

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder:

```go
var JobWorkerBuilderStep3 builder;
// builder is set in some way
builder.StreamRequestTimeout(30 * time.Minute);
```

## Additional resources

- [Job worker reference](/components/concepts/job-workers.md)
11 changes: 11 additions & 0 deletions versioned_docs/version-8.4/apis-tools/java-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m
If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again.
:::

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder:

```java
final JobWorkerBuilderStep3 builder = ...;
builder.streamTimeout(Duration.ofMinutes(30));
```

## Multi-tenancy

You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure
Expand Down
4 changes: 4 additions & 0 deletions versioned_docs/version-8.4/components/concepts/job-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu

On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible.

### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

### Troubleshooting

Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
id: job-streaming
title: "Job streaming"
sidebar_label: "Job streaming"
description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster."
---

[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers.

When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals.

:::note
This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example.

Proxies which support forwarding HTTP/2 keepalive do not require any change.
:::

The following configuration is recommended for impacted reverse proxies:

- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker).
- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes).

## Nginx

Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration.
12 changes: 12 additions & 0 deletions versioned_docs/version-8.5/apis-tools/go-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m

**If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this.

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder:

```go
var JobWorkerBuilderStep3 builder;
// builder is set in some way
builder.StreamRequestTimeout(30 * time.Minute);
```

## Additional resources

- [Job worker reference](/components/concepts/job-workers.md)
11 changes: 11 additions & 0 deletions versioned_docs/version-8.5/apis-tools/java-client/job-worker.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m
If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again.
:::

#### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder:

```java
final JobWorkerBuilderStep3 builder = ...;
builder.streamTimeout(Duration.ofMinutes(30));
```

## Multi-tenancy

You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure
Expand Down
4 changes: 4 additions & 0 deletions versioned_docs/version-8.5/components/concepts/job-workers.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu

On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible.

### Proxying

If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming).

### Troubleshooting

Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
id: job-streaming
title: "Job streaming"
sidebar_label: "Job streaming"
description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster."
---

[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers.

When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals.

:::note
This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example.

Proxies which support forwarding HTTP/2 keepalive do not require any change.
:::

The following configuration is recommended for impacted reverse proxies:

- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker).
- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes).

## Nginx

Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration.
3 changes: 2 additions & 1 deletion versioned_sidebars/version-8.4-sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -1406,7 +1406,8 @@
{
"Zeebe Gateway": [
"self-managed/zeebe-deployment/zeebe-gateway/overview",
"self-managed/zeebe-deployment/zeebe-gateway/interceptors"
"self-managed/zeebe-deployment/zeebe-gateway/interceptors",
"self-managed/zeebe-deployment/zeebe-gateway/job-streaming"
]
},
{
Expand Down
3 changes: 2 additions & 1 deletion versioned_sidebars/version-8.5-sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -1536,7 +1536,8 @@
{
"Zeebe Gateway": [
"self-managed/zeebe-deployment/zeebe-gateway/overview",
"self-managed/zeebe-deployment/zeebe-gateway/interceptors"
"self-managed/zeebe-deployment/zeebe-gateway/interceptors",
"self-managed/zeebe-deployment/zeebe-gateway/job-streaming"
]
},
{
Expand Down
Loading