Skip to content

Commit

Permalink
Update cluster reboot nodes doc (#7060)
Browse files Browse the repository at this point in the history
Co-authored-by: ahreehong <[email protected]>
  • Loading branch information
eks-distro-pr-bot and ahreehong authored Nov 20, 2023
1 parent 527979c commit dd5cf29
Showing 1 changed file with 68 additions and 12 deletions.
80 changes: 68 additions & 12 deletions docs/content/en/docs/clustermgmt/cluster-rebootnode.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,81 @@ Rebooting a cluster node as described here is good for all nodes, but is critica
If it does go down while running the `boots` service, the Bottlerocket node will not be able to boot again until the `boots` service is restored on another machine. This is because Bottlerocket must get its address from a DHCP service.
{{% /alert %}}

1. Cordon the node so no further workloads are scheduled to run on it:
1. On your admin machine, set the following environment variables that will come in handy later
```bash
export CLUSTER_NAME=mgmt
export MGMT_KUBECONFIG=${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig
```

1. [Backup cluster]({{< relref "/docs/clustermgmt/cluster-backup-restore/backup-cluster" >}})

This ensures that there is an up-to-date cluster state available for restoration in the case that the cluster experiences issues or becomes unrecoverable during reboot.

1. Verify DHCP lease time will be longer than the maintenance time, and that IPs will be the same before and after maintenance.

This step is critical in ensuring the cluster will be healthy after reboot. If IPs are not preserved before and after reboot, the cluster may not be recoverable.

{{% alert title="Warning" color="warning" %}}
If this cannot be verified, do not proceed any further
{{% /alert %}}

1. Pause the reconciliation of the cluster being shut down.

This ensures that the EKS Anywhere cluster controller will not reconcile on the nodes that are down and try to remediate them.

- add the paused annotation to the EKSA clusters and CAPI clusters:
```bash
kubectl cordon <node-name>
kubectl annotate clusters.anywhere.eks.amazonaws.com $CLUSTER_NAME anywhere.eks.amazonaws.com/paused=true --kubeconfig=$MGMT_KUBECONFIG
```

1. Drain the node of all current workloads:
**NOTE**: If you are using vSphere provider, it is also necessary to set `cluster.spec.paused` to true. For example:
```bash
kubectl edit clusters.cluster.x-k8s.io -n eksa-system $CLUSTER_NAME --kubeconfig=$MGMT_KUBECONFIG
```
add the `paused: true` line under the spec section:
```bash
...
spec:
paused: true
```

1. For all of the nodes in the cluster, perform the following steps in this order: worker nodes, control plane nodes, and etcd nodes.

1. Cordon the node so no further workloads are scheduled to run on it:

```bash
kubectl drain <node-name>
```
```bash
kubectl cordon <node-name>
```

1. Shut down. Using the appropriate method for your provider, shut down the node.
1. Drain the node of all current workloads:

1. Perform system maintenance or other task you need to do on the node and boot up the node.
```bash
kubectl drain <node-name>
```

1. Uncordon the node so that it can begin receiving workloads again.
1. Using the appropriate method for your provider, shut down the node.

```bash
kubectl uncordon <node-name>
```

1. Perform system maintenance or other tasks you need to do on each node. Then boot up the node in this order: etcd nodes, control plane nodes, and worker nodes.

1. Uncordon the nodes so that they can begin receiving workloads again.

```bash
kubectl uncordon <node-name>
```

1. Remove the paused annotations from EKS Anywhere cluster.
```bash
kubectl annotate clusters.anywhere.eks.amazonaws.com $CLUSTER_NAME anywhere.eks.amazonaws.com/paused- --kubeconfig=$MGMT_KUBECONFIG
```

**NOTE**: If you are using vSphere provider, it is also necessary to set `cluster.spec.paused` to false
```bash
kubectl edit clusters.cluster.x-k8s.io -n eksa-system $CLUSTER_NAME --kubeconfig=$MGMT_KUBECONFIG
```
set paused in the spec section to false:
```bash
...
spec:
paused: false
```

0 comments on commit dd5cf29

Please sign in to comment.