Skip to content

Commit

Permalink
Fix BR etcd backup and restore page
Browse files Browse the repository at this point in the history
  • Loading branch information
jiayiwang7 authored and eks-distro-pr-bot committed Oct 20, 2023
1 parent 76cf349 commit 2470eda
Showing 1 changed file with 36 additions and 36 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ aliases:
/docs/tasks/etcd-backup-restore/bottlerocket-etcd-backup/
date: 2021-11-04
description: >
How to backup and restore External ETCD on Bottlerocket OS
How to backup and restore External etcd on Bottlerocket OS
---
{{% alert title="Note" color="warning" %}}
External etcd topology is supported for vSphere, CloudStack and Snow clusters, but not yet for Bare Metal or Nutanix clusters.
Expand All @@ -29,24 +29,24 @@ export MANAGEMENT_CLUSTER_NAME="eksa-management" # Set this to the managemen
export CLUSTER_NAME="eksa-workload" # Set this to name of the cluster you want to backup (management or workload)
export SSH_KEY="path-to-private-ssh-key" # Set this to the cluster's private SSH key path
export SSH_USERNAME="ec2-user" # Set this to the SSH username
export SNAPSHOT_PATH="/tmp/snapshot.db" # Set this to the path where you want the ETCD snapshot to be saved
export SNAPSHOT_PATH="/tmp/snapshot.db" # Set this to the path where you want the etcd snapshot to be saved

export MANAGEMENT_KUBECONFIG=${MANAGEMENT_CLUSTER_NAME}/${MANAGEMENT_CLUSTER_NAME}-eks-a-cluster.kubeconfig
export CLUSTER_KUBECONFIG=${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig
export ETCD_ENDPOINTS=$(kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system get machines --selector cluster.x-k8s.io/cluster-name=${CLUSTER_NAME},cluster.x-k8s.io/etcd-cluster=${CLUSTER_NAME}-etcd -ojsonpath='{.items[*].status.addresses[0].address}')
export CONTROL_PLANE_ENDPOINTS=($(kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system get machines --selector cluster.x-k8s.io/control-plane-name=${CLUSTER_NAME} -ojsonpath='{.items[*].status.addresses[0].address}'))
```

### Prepare ETCD nodes for backup and restore
Install SCP on the ETCD nodes:
### Prepare etcd nodes for backup and restore
Install SCP on the etcd nodes:
```bash
echo -n ${ETCD_ENDPOINTS} | xargs -I {} -d" " ssh -o StrictHostKeyChecking=no -i ${SSH_KEY} ${SSH_USERNAME}@{} sudo yum -y install openssh-clients
```

### Create ETCD Backup
### Create etcd Backup
Make sure to setup the [admin environment variables]({{< relref "#admin-machine-environment-variables-setup" >}}) and [prepare your ETCD nodes for backup]({{< relref "#prepare-etcd-nodes-for-backup-and-restore" >}}) before moving forward.

1. SSH into one of the ETCD nodes
1. SSH into one of the etcd nodes
```bash
export ETCD_NODE=$(echo -n ${ETCD_ENDPOINTS} | cut -d " " -f1)
ssh -i ${SSH_KEY} ${SSH_USERNAME}@${ETCD_NODE}
Expand All @@ -59,14 +59,14 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-
1. Set these environment variables
```bash
# get the container ID corresponding to ETCD pod
# get the container ID corresponding to etcd pod
export ETCD_CONTAINER_ID=$(ctr -n k8s.io c ls | grep -w "etcd-io" | cut -d " " -f1)
# get the ETCD endpoint
# get the etcd endpoint
export ETCD_ENDPOINT=$(cat /etc/kubernetes/manifests/etcd | grep -wA1 ETCD_ADVERTISE_CLIENT_URLS | tail -1 | grep -oE '[^ ]+$')
```
1. Create the ETCD snapshot
1. Create the etcd snapshot
```bash
ctr -n k8s.io t exec -t --exec-id etcd ${ETCD_CONTAINER_ID} etcdctl \
--endpoints=${ETCD_ENDPOINT} \
Expand All @@ -82,36 +82,36 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-
chown 1000 /run/host-containerd/io.containerd.runtime.v2.task/default/admin/rootfs/home/ec2-user/snapshot.db
```
1. Exit out of ETCD node. You will have to type `exit` twice to get back to the admin machine
1. Exit out of etcd node. You will have to type `exit` twice to get back to the admin machine
```bash
exit
exit
```
1. Copy over the snapshot from the ETCD node
1. Copy over the snapshot from the etcd node
```bash
scp -i ${SSH_KEY} ${SSH_USERNAME}@${ETCD_NODE}:/home/ec2-user/snapshot.db ${SNAPSHOT_PATH}
```
You should now have the ETCD snapshot in your current working directory.
You should now have the etcd snapshot in your current working directory.
### Restore ETCD from Backup
Make sure to setup the [admin environment variables]({{< relref "#admin-machine-environment-variables-setup" >}}) and [prepare your ETCD nodes for restore]({{< relref "#prepare-etcd-nodes-for-backup-and-restore" >}}) before moving forward.
### Restore etcd from Backup
Make sure to setup the [admin environment variables]({{< relref "#admin-machine-environment-variables-setup" >}}) and [prepare your etcd nodes for restore]({{< relref "#prepare-etcd-nodes-for-backup-and-restore" >}}) before moving forward.
1. Pause cluster reconciliation
Before starting the process of restoring ETCD, you have to pause some cluster reconciliation objects so EKS Anywhere doesn't try to perform any operations on the cluster while you restore the ETCD snapshot.
Before starting the process of restoring etcd, you have to pause some cluster reconciliation objects so EKS Anywhere doesn't try to perform any operations on the cluster while you restore the etcd snapshot.
```bash
# Pause control plane reconcilation
kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system annotate machinehealthchecks ${CLUSTER_NAME}-kcp-unhealthy cluster.x-k8s.io/paused=true
# Pause ETCD reconcilation
# Pause etcd reconcilation
kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system annotate etcdadmclusters ${CLUSTER_NAME}-etcd cluster.x-k8s.io/paused=true
```

2. Stop control plane core components

You also need to stop the control plane core components so the Kubernetes API server doesn't try to communicate with ETCD while you perform ETCD operations.
You also need to stop the control plane core components so the Kubernetes API server doesn't try to communicate with etcd while you perform etcd operations.
- You can use this command to get the control plane node IPs which you can use to SSH
```bash
Expand Down Expand Up @@ -141,37 +141,37 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-
```
Repeat these steps for each control plane node.
1. Copy the backed-up ETCD snapshot to all the ETCD nodes
1. Copy the backed-up etcd snapshot to all the etcd nodes
```bash
echo -n ${ETCD_ENDPOINTS} | xargs -I {} -d" " scp -o StrictHostKeyChecking=no -i ${SSH_KEY} ${SNAPSHOT_PATH} ${SSH_USERNAME}@{}:/home/ec2-user
```
1. Perform the ETCD restore
1. Perform the etcd restore
For this step, you have to SSH into each ETCD node and run the restore command.
- Get ETCD nodes IPs for SSH'ing into the nodes
For this step, you have to SSH into each etcd node and run the restore command.
- Get etcd nodes IPs for SSH'ing into the nodes
```bash
# This should print out all the control plane IPs
# This should print out all the etcd IPs
echo -n ${ETCD_ENDPOINTS} | xargs -I {} -d " " echo "{}"
```
```bash
# SSH into the control plane node using the IPs printed in previous command
ssh -i ${SSH_KEY} ${SSH_USERNAME}@<ETCD IP from previous command>
# SSH into the etcd node using the IPs printed in previous command
ssh -i ${SSH_KEY} ${SSH_USERNAME}@<etcd IP from previous command>
# drop into bottlerocket's root shell
sudo sheltie
# copy over the ETCD snapshot to the appropriate location
# copy over the etcd snapshot to the appropriate location
cp /run/host-containerd/io.containerd.runtime.v2.task/default/admin/rootfs/home/ec2-user/snapshot.db /var/lib/etcd/data/etcd-snapshot.db
# setup the ETCD environment
# setup the etcd environment
export ETCD_NAME=$(cat /etc/kubernetes/manifests/etcd | grep -wA1 ETCD_NAME | tail -1 | grep -oE '[^ ]+$')
export ETCD_INITIAL_ADVERTISE_PEER_URLS=$(cat /etc/kubernetes/manifests/etcd | grep -wA1 ETCD_INITIAL_ADVERTISE_PEER_URLS | tail -1 | grep -oE '[^ ]+$')
export ETCD_INITIAL_CLUSTER=$(cat /etc/kubernetes/manifests/etcd | grep -wA1 ETCD_INITIAL_CLUSTER | tail -1 | grep -oE '[^ ]+$')
export INITIAL_CLUSTER_TOKEN="etcd-cluster-1"
# get the container ID corresponding to ETCD pod
# get the container ID corresponding to etcd pod
export ETCD_CONTAINER_ID=$(ctr -n k8s.io c ls | grep -w "etcd-io" | cut -d " " -f1)
# run the restore command
Expand All @@ -185,22 +185,22 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-
--cert=/var/lib/etcd/pki/server.crt \
--key=/var/lib/etcd/pki/server.key
# move the ETCD data files out of the container to a temporary location
# move the etcd data files out of the container to a temporary location
mkdir -p /tmp/etcd-files
$(ctr -n k8s.io snapshot mounts /tmp/etcd-files/ ${ETCD_CONTAINER_ID})
mv /tmp/etcd-files/${ETCD_NAME}.etcd /tmp/
# stop the ETCD pod
# stop the etcd pod
mkdir -p /tmp/temp-manifests
mv /etc/kubernetes/manifests/* /tmp/temp-manifests
# backup the previous ETCD data files
# backup the previous etcd data files
mv /var/lib/etcd/data/member /var/lib/etcd/data/member.backup
# copy over the new ETCD data files to the data directory
# copy over the new etcd data files to the data directory
mv /tmp/${ETCD_NAME}.etcd/member /var/lib/etcd/data/
# re-start the ETCD pod
# re-start the etcd pod
mv /tmp/temp-manifests/* /etc/kubernetes/manifests/
```

Expand All @@ -220,7 +220,7 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-
exit
```

Repeat this step for each ETCD node.
Repeat this step for each etcd node.

1. Restart control plane core components

Expand Down Expand Up @@ -253,16 +253,16 @@ Make sure to setup the [admin environment variables]({{< relref "#admin-machine-

1. Unpause the cluster reconcilers

Once the ETCD restore is complete, you can resume the cluster reconcilers.
Once the etcd restore is complete, you can resume the cluster reconcilers.
```bash
# unpause control plane reconcilation
kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system annotate machinehealthchecks ${CLUSTER_NAME}-kcp-unhealthy cluster.x-k8s.io/paused-
# unpause ETCD reconcilation
# unpause etcd reconcilation
kubectl --kubeconfig=${MANAGEMENT_KUBECONFIG} -n eksa-system annotate etcdadmclusters ${CLUSTER_NAME}-etcd cluster.x-k8s.io/paused-
```

At this point you should have the ETCD cluster restored to snapshot.
At this point you should have the etcd cluster restored to snapshot.
To verify, you can run the following commands:
```bash
kubectl --kubeconfig=${CLUSTER_KUBECONFIG} get nodes
Expand Down

0 comments on commit 2470eda

Please sign in to comment.