diff --git a/docs/apis-tools/go-client/job-worker.md b/docs/apis-tools/go-client/job-worker.md index 56a3c7b61d..58ff305ee8 100644 --- a/docs/apis-tools/go-client/job-worker.md +++ b/docs/apis-tools/go-client/job-worker.md @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m **If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this. +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder: + +```go +var JobWorkerBuilderStep3 builder; +// builder is set in some way +builder.StreamRequestTimeout(30 * time.Minute); +``` + ## Additional resources - [Job worker reference](/components/concepts/job-workers.md) diff --git a/docs/apis-tools/java-client/job-worker.md b/docs/apis-tools/java-client/job-worker.md index 5036a120ec..e6c8c9e315 100644 --- a/docs/apis-tools/java-client/job-worker.md +++ b/docs/apis-tools/java-client/job-worker.md @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again. ::: +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder: + +```java +final JobWorkerBuilderStep3 builder = ...; +builder.streamTimeout(Duration.ofMinutes(30)); +``` + ## Multi-tenancy You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure diff --git a/docs/components/concepts/job-workers.md b/docs/components/concepts/job-workers.md index 2fce1935e6..1e509b23a6 100644 --- a/docs/components/concepts/job-workers.md +++ b/docs/components/concepts/job-workers.md @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible. +### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + ### Troubleshooting Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams. diff --git a/docs/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md b/docs/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md new file mode 100644 index 0000000000..9a3df3ef3c --- /dev/null +++ b/docs/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md @@ -0,0 +1,25 @@ +--- +id: job-streaming +title: "Job streaming" +sidebar_label: "Job streaming" +description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster." +--- + +[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers. + +When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals. + +:::note +This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example. + +Proxies which support forwarding HTTP/2 keepalive do not require any change. +::: + +The following configuration is recommended for impacted reverse proxies: + +- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker). +- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes). + +## Nginx + +Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration. diff --git a/sidebars.js b/sidebars.js index a5d3ed2740..5f8d0e1260 100644 --- a/sidebars.js +++ b/sidebars.js @@ -1032,6 +1032,7 @@ module.exports = { "Zeebe Gateway": [ "self-managed/zeebe-deployment/zeebe-gateway/overview", "self-managed/zeebe-deployment/zeebe-gateway/interceptors", + "self-managed/zeebe-deployment/zeebe-gateway/job-streaming", ], }, { diff --git a/versioned_docs/version-8.4/apis-tools/go-client/job-worker.md b/versioned_docs/version-8.4/apis-tools/go-client/job-worker.md index 56a3c7b61d..58ff305ee8 100644 --- a/versioned_docs/version-8.4/apis-tools/go-client/job-worker.md +++ b/versioned_docs/version-8.4/apis-tools/go-client/job-worker.md @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m **If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this. +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder: + +```go +var JobWorkerBuilderStep3 builder; +// builder is set in some way +builder.StreamRequestTimeout(30 * time.Minute); +``` + ## Additional resources - [Job worker reference](/components/concepts/job-workers.md) diff --git a/versioned_docs/version-8.4/apis-tools/java-client/job-worker.md b/versioned_docs/version-8.4/apis-tools/java-client/job-worker.md index 5036a120ec..e6c8c9e315 100644 --- a/versioned_docs/version-8.4/apis-tools/java-client/job-worker.md +++ b/versioned_docs/version-8.4/apis-tools/java-client/job-worker.md @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again. ::: +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder: + +```java +final JobWorkerBuilderStep3 builder = ...; +builder.streamTimeout(Duration.ofMinutes(30)); +``` + ## Multi-tenancy You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure diff --git a/versioned_docs/version-8.4/components/concepts/job-workers.md b/versioned_docs/version-8.4/components/concepts/job-workers.md index 2fce1935e6..1e509b23a6 100644 --- a/versioned_docs/version-8.4/components/concepts/job-workers.md +++ b/versioned_docs/version-8.4/components/concepts/job-workers.md @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible. +### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + ### Troubleshooting Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams. diff --git a/versioned_docs/version-8.4/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md b/versioned_docs/version-8.4/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md new file mode 100644 index 0000000000..9a3df3ef3c --- /dev/null +++ b/versioned_docs/version-8.4/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md @@ -0,0 +1,25 @@ +--- +id: job-streaming +title: "Job streaming" +sidebar_label: "Job streaming" +description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster." +--- + +[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers. + +When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals. + +:::note +This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example. + +Proxies which support forwarding HTTP/2 keepalive do not require any change. +::: + +The following configuration is recommended for impacted reverse proxies: + +- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker). +- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes). + +## Nginx + +Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration. diff --git a/versioned_docs/version-8.5/apis-tools/go-client/job-worker.md b/versioned_docs/version-8.5/apis-tools/go-client/job-worker.md index 56a3c7b61d..58ff305ee8 100644 --- a/versioned_docs/version-8.5/apis-tools/go-client/job-worker.md +++ b/versioned_docs/version-8.5/apis-tools/go-client/job-worker.md @@ -106,6 +106,18 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m **If streaming is enabled, back pressure applies to both pushing and polling**. You can then use `MaxJobsActive` and `Concurrency` as a way to soft-bound the memory usage of your worker. For example, given a maximum variable payload for a job of 1MB, `MaxJobsActive = 32`, and `Concurrency = 10`, then a single worker could use up to 42MB of memory. You can estimate a worst case scenario using the configured maximum message size, as no job payload will ever exceed this. +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Go job workers have a stream timeout of one hour. You can overwrite this by calling the `StreamRequestTimeout` of the job worker builder: + +```go +var JobWorkerBuilderStep3 builder; +// builder is set in some way +builder.StreamRequestTimeout(30 * time.Minute); +``` + ## Additional resources - [Job worker reference](/components/concepts/job-workers.md) diff --git a/versioned_docs/version-8.5/apis-tools/java-client/job-worker.md b/versioned_docs/version-8.5/apis-tools/java-client/job-worker.md index 5036a120ec..e6c8c9e315 100644 --- a/versioned_docs/version-8.5/apis-tools/java-client/job-worker.md +++ b/versioned_docs/version-8.5/apis-tools/java-client/job-worker.md @@ -185,6 +185,17 @@ To avoid your workers being overloaded with too many jobs, e.g. running out of m If the worker blocks longer than the job's deadline, the job will **not** be passed to the worker, but will be dropped. As it will time out on the broker side, it will be pushed again. ::: +#### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + +By default, the Java job workers have a stream timeout of one hour. You can overwrite this by calling the `streamTimeout` of the job worker builder: + +```java +final JobWorkerBuilderStep3 builder = ...; +builder.streamTimeout(Duration.ofMinutes(30)); +``` + ## Multi-tenancy You can configure a job worker to pick up jobs belonging to one or more tenants. When using the builder, you can configure diff --git a/versioned_docs/version-8.5/components/concepts/job-workers.md b/versioned_docs/version-8.5/components/concepts/job-workers.md index 2fce1935e6..1e509b23a6 100644 --- a/versioned_docs/version-8.5/components/concepts/job-workers.md +++ b/versioned_docs/version-8.5/components/concepts/job-workers.md @@ -169,6 +169,10 @@ If you're using Prometheus, you can use the following query to estimate the queu On the server side (e.g. if you're running a self-managed cluster), you can measure the rate of jobs which are not pushed due to clients which are not ready via the metric `zeebe_broker_jobs_push_fail_try_count_total{code="BLOCKED"}`. If the rate of this metric is high for a sustained amount of time, it may be a good indicator that you need to scale your workers. Unfortunately, on the server side we don't differentiate between clients, so this metric doesn't tell you which worker deployment needs to be scaled. We thus recommend using client metrics whenever possible. +### Proxying + +If you're using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly with an error. If you observe regular 504 timeouts, read our guide on [job streaming](../../../self-managed/zeebe-deployment/zeebe-gateway/job-streaming). + ### Troubleshooting Since this feature requires a good amount of coordination between various components over the network, we've built in some tools to help monitor the health of the job streams. diff --git a/versioned_docs/version-8.5/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md b/versioned_docs/version-8.5/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md new file mode 100644 index 0000000000..9a3df3ef3c --- /dev/null +++ b/versioned_docs/version-8.5/self-managed/zeebe-deployment/zeebe-gateway/job-streaming.md @@ -0,0 +1,25 @@ +--- +id: job-streaming +title: "Job streaming" +sidebar_label: "Job streaming" +description: "Streaming job workers is expected to be long-lived to cut down on the latency overhead involved with re-creating a stream and propagating this throughout the cluster." +--- + +[Job streaming](../../../components/concepts/job-workers.md#job-streaming) is a long-lived process designed to reduce the latency involved with re-creating and propagating job workers. + +When using a reverse proxy or a load balancer between your worker and your gateway, you may need to configure additional parameters to ensure the job stream is not closed unexpectedly. Impacted proxies will see HTTP 504 (gateway timeout) errors returned to the job streaming worker at regular intervals. + +:::note +This configuration is _only_ required for reverse proxies which do not support forwarding HTTP/2 keepalive (on either side). See [this nginx ticket](https://trac.nginx.org/nginx/ticket/1887), for example. + +Proxies which support forwarding HTTP/2 keepalive do not require any change. +::: + +The following configuration is recommended for impacted reverse proxies: + +- On your client, set an explicit stream timeout of one hour. See additional examples in [Java](../../../../apis-tools/java-client/job-worker) and [Go](../../../../apis-tools/go-client/job-worker). +- On your reverse proxy, ensure the read response timeout is set to slightly higher than your client (for example, an hour and ten minutes). + +## Nginx + +Nginx is a known proxy which does not support forward HTTP/2 pings from either side as a form of keepalive. To resolve related gateway timeouts, configure an appropriate `grpc_send_timeout` that it is _higher_ than your job worker stream timeout configuration. diff --git a/versioned_sidebars/version-8.4-sidebars.json b/versioned_sidebars/version-8.4-sidebars.json index ecd0e49166..61cbc01e87 100644 --- a/versioned_sidebars/version-8.4-sidebars.json +++ b/versioned_sidebars/version-8.4-sidebars.json @@ -1406,7 +1406,8 @@ { "Zeebe Gateway": [ "self-managed/zeebe-deployment/zeebe-gateway/overview", - "self-managed/zeebe-deployment/zeebe-gateway/interceptors" + "self-managed/zeebe-deployment/zeebe-gateway/interceptors", + "self-managed/zeebe-deployment/zeebe-gateway/job-streaming" ] }, { diff --git a/versioned_sidebars/version-8.5-sidebars.json b/versioned_sidebars/version-8.5-sidebars.json index 99dc072722..09fd676c16 100644 --- a/versioned_sidebars/version-8.5-sidebars.json +++ b/versioned_sidebars/version-8.5-sidebars.json @@ -1536,7 +1536,8 @@ { "Zeebe Gateway": [ "self-managed/zeebe-deployment/zeebe-gateway/overview", - "self-managed/zeebe-deployment/zeebe-gateway/interceptors" + "self-managed/zeebe-deployment/zeebe-gateway/interceptors", + "self-managed/zeebe-deployment/zeebe-gateway/job-streaming" ] }, {