From ad69b6de556c22f749afcf81a8ace3a66ef39894 Mon Sep 17 00:00:00 2001 From: Matt Linville Date: Wed, 25 Sep 2024 13:55:59 -0700 Subject: [PATCH] Edits to address feedback about non-production clusters --- src/current/v23.1/node-shutdown.md | 29 +++++++++++++++-------------- src/current/v23.2/node-shutdown.md | 29 +++++++++++++++-------------- src/current/v24.1/node-shutdown.md | 29 +++++++++++++++-------------- src/current/v24.2/node-shutdown.md | 29 +++++++++++++++-------------- 4 files changed, 60 insertions(+), 56 deletions(-) diff --git a/src/current/v23.1/node-shutdown.md b/src/current/v23.1/node-shutdown.md index e50c47e0997..cb6c8340330 100644 --- a/src/current/v23.1/node-shutdown.md +++ b/src/current/v23.1/node-shutdown.md @@ -5,9 +5,9 @@ toc: true docs_area: manage --- -A node **shutdown** terminates the `cockroach` process on the node. +A node **shutdown** terminates the `cockroach` process on the node. This page describes how node shutdown works and shows how to safely shut down a node in production or [an entire production cluster](#shut-down-a-cluster). -There are two ways to handle node shutdown: +There are two ways to shut down a node: - **Drain a node** to temporarily stop it when you plan restart it later, such as during cluster maintenance. When you drain a node: - Clients are disconnected, and subsequent connection requests are sent to other nodes. @@ -24,7 +24,6 @@ This page describes: - How to [prepare for graceful shutdown](#prepare-for-graceful-shutdown) on CockroachDB {{ site.data.products.core }} clusters by coordinating load balancer, client application server, process manager, and cluster settings. - How to [perform node shutdown](#perform-node-shutdown) on CockroachDB {{ site.data.products.core }} deployments by manually draining or decommissioning a node. - How to handle node shutdown when CockroachDB is deployed using [Kubernetes](#decommissioning-and-draining-on-kubernetes) or in a [CockroachDB {{ site.data.products.advanced }} cluster](#decommissioning-and-draining-on-cockroachdb-advanced). -- How to [shut down the entire cluster](#shut-down-a-cluster) temporarily or permanently. {{site.data.alerts.callout_success}} This guidance applies to primarily to manual deployments. For more details about graceful termination when CockroachDB is deployed using Kubernetes, refer to [Decommissioning and draining on Kubernetes](#decommissioning-and-draining-on-kubernetes). For more details about graceful termination in a CockroachDB {{ site.data.products.advanced }} cluster, refer to [Decommissioning and draining on CockroachDB {{ site.data.products.advanced }}](#decommissioning-and-draining-on-cockroachdb-advanced). @@ -65,14 +64,14 @@ After this stage, the node is automatically drained. However, to avoid possible An operator [initiates the draining process](#drain-the-node-and-terminate-the-node-process) on the node. Draining a node disconnects clients after active queries are completed, and transfers any [range leases]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) and [Raft leaderships]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft) to other nodes, but does not move replicas or data off of the node. -When draining is complete, the node must be shut down prior to any maintenance. After a 60-second wait at minimum, you can send a `SIGTERM` signal to the `cockroach` process to shut it down. {% include_cached new-in.html version="v24.2" %}The `--shutdown` flag of [`cockroach node drain`]({% link {{ page.version.version }}/cockroach-node.md %}#flags) automatically terminates the `cockroach` process after draining completes. +When draining is complete, the node must be shut down prior to any maintenance. At minimum 60 seconds after draining is complete, you can send a `SIGTERM` signal to the `cockroach` process to shut it down. {% include_cached new-in.html version="v24.2" %}The `--shutdown` flag of [`cockroach node drain`]({% link {{ page.version.version }}/cockroach-node.md %}#flags) automatically terminates the `cockroach` process after draining completes. After you perform the required maintenance, you can restart the `cockroach` process on the node for it to rejoin the cluster. -{% capture drain_early_termination_warning %}Do not terminate the `cockroach` process before all of the phases of draining are complete. Otherwise, you may experience latency spikes until the [leases]({% link {{ page.version.version }}/architecture/glossary.md %}#leaseholder) that were on that node have transitioned to other nodes. It is safe to terminate the `cockroach` process only after a node has completed the drain process. This is especially important in a containerized system, to allow all TCP connections to terminate gracefully.{% endcapture %} +{% capture drain_early_termination_warning %}In a production cluster, do not terminate the `cockroach` process before all of the phases of draining are complete. Otherwise, you may experience latency spikes until the [leases]({% link {{ page.version.version }}/architecture/glossary.md %}#leaseholder) that were on that node have transitioned to other nodes. It is safe to terminate the `cockroach` process only after a node has completed the drain process. This is especially important in a containerized system, to allow all TCP connections to terminate gracefully.{% endcapture %} {{site.data.alerts.callout_danger}} -{{ drain_early_termination_warning }} If necessary, adjust the [`server.shutdown.initial_wait`](#server-shutdown-initial_wait) and the [termination grace period]({% link {{ page.version.version}}/node-shutdown.md %}?filters=decommission#termination-grace-period) cluster settings and adjust your process manager or other deployment tooling to allow adequate time for the node to finish draining before it is terminated or restarted. +{{ drain_early_termination_warning }} If necessary, before you begin draining a node, adjust the [`server.shutdown.initial_wait`](#server-shutdown-initial_wait) and the [termination grace period]({% link {{ page.version.version}}/node-shutdown.md %}?filters=decommission#termination-grace-period) settings for a production cluster and adjust your process manager or other deployment tooling to allow adequate time for the node to finish draining before it is terminated or restarted. Adjusting these settings does not require a node to restart. {{site.data.alerts.end}} @@ -124,7 +123,7 @@ After draining is complete: - If the node was drained manually because an operator issued a `cockroach node drain` command: - {% include_cached new-in.html version="v24.2" %}If you pass the `--shutdown` flag to [`cockroach node drain`]({% link {{ page.version.version }}/cockroach-node.md %}#flags), the `cockroach` process terminates automatically after draining completes. - If the node's major version is being updated, the `cockroach` process terminates automatically after draining completes. - - Otherwise, the `cockroach` process must be terminated manually. A minimum of 60 seconds after draining is complete, send it a `SIGTERM` signal to terminate it. Refer to [Terminate the node process](#drain-the-node-and-terminate-the-node-process). + - Otherwise, the `cockroach` process must be terminated manually. For a production cluster, wait at minimum 60 seconds after draining is complete, then send it a `SIGTERM` signal to terminate it. Refer to [Terminate the node process](#drain-the-node-and-terminate-the-node-process). @@ -148,7 +147,7 @@ CockroachDB's node shutdown behavior does not match any of the [PostgreSQL serve Each of the [node shutdown steps](#node-shutdown-sequence) is performed in order, with each step commencing once the previous step has completed. However, because some steps can be interrupted, it's best to ensure that all steps complete gracefully. -Before you [perform node shutdown](#perform-node-shutdown), review the following prerequisites to graceful shutdown: +Before you [perform node shutdown](#perform-node-shutdown) on a production cluster, review the following prerequisites to graceful shutdown: