-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rotate-ca doesn't work if the initial server is deleted from cluster #11233
Comments
Can you provide specific steps to reproduce the problem? All servers have equal access to the datastore and other related files once they join the cluster. You can remove the cluster-init or server flags from servers once joined, they make no difference. There is nothing special about the init node. |
the Steps To Reproduce section above should reproduce the issue. You mentioned remove the server flags makes no difference, which means for embedded etcd it's fine to always bootstrap from etcd rather than from http right? If that's the case maybe we can easily fix this by adding condition to Bootstrap function to skip http bootstrap in this case. It can save us from manually modifying k3s start up config. |
What are you doing with this node after you delete it from the cluster? Are you leaving the host up, with the k3s service running (probably crashlooping), and the other two nodes pointed at it as their --server join URL? The server might be up for a few seconds at a time, but after deleting it from the cluster it should exit and then crashloop when systemd tries to restart it:
This is why we recommend using a fixed registration address that points at current cluster members. If you delete a server from the cluster, but leave nodes point directly at it, you're not likely to get what you want. |
Also, I just tried this and it works fine. After deleting systemd-node-1 the service on that node starts crashlooping, and I left it alone. I then rotated and restarted on systemd-node-2 and confirmed that the certs are updated: root@systemd-node-2:/# kubectl delete node systemd-node-1
node "systemd-node-1" deleted
root@systemd-node-2:/# curl -sL https://github.com/k3s-io/k3s/raw/master/contrib/util/rotate-default-ca-certs.sh | bash -
Using /usr/bin/openssl: OpenSSL 1.1.1l 24 Aug 2021 SUSE release 150400.7.34.1
Generating k3s-client-root root and cross-signed certificate authority key and certificates
...
Cross-signed CA certs and keys now available in /var/lib/rancher/k3s/server/rotate-ca
Updated server token: K1017335c44120d45e92beb31b8009d0385b9135602c743807b3ae0928cabc5cc10::server:token
Updated agent token: K1017335c44120d45e92beb31b8009d0385b9135602c743807b3ae0928cabc5cc10::server:token
To update certificates, you may now run:
k3s certificate rotate-ca --path=/var/lib/rancher/k3s/server/rotate-ca
root@systemd-node-2:/# k3s certificate rotate-ca --path=/var/lib/rancher/k3s/server/rotate-ca
certificates saved to datastore
root@systemd-node-2:/# ls -la /var/lib/rancher/k3s/server/tls/
total 152
drwx------ 4 root root 4096 Nov 5 19:56 .
drwx------ 9 root root 4096 Nov 5 20:00 ..
-rw-r--r-- 1 root root 1177 Nov 5 19:56 client-admin.crt
-rw------- 1 root root 227 Nov 5 19:56 client-admin.key
-rw-r--r-- 1 root root 1182 Nov 5 19:56 client-auth-proxy.crt
-rw------- 1 root root 227 Nov 5 19:56 client-auth-proxy.key
-rw------- 1 root root 570 Nov 5 19:45 client-ca.crt
-rw------- 1 root root 227 Nov 5 19:45 client-ca.key
-rw-r--r-- 1 root root 570 Nov 5 19:56 client-ca.nochain.crt
-rw-r--r-- 1 root root 1165 Nov 5 19:56 client-controller.crt
-rw------- 1 root root 227 Nov 5 19:56 client-controller.key
-rw-r--r-- 1 root root 1165 Nov 5 19:56 client-k3s-cloud-controller.crt
-rw------- 1 root root 227 Nov 5 19:56 client-k3s-cloud-controller.key
-rw-r--r-- 1 root root 1153 Nov 5 19:56 client-k3s-controller.crt
-rw------- 1 root root 227 Nov 5 19:56 client-k3s-controller.key
-rw-r--r-- 1 root root 1181 Nov 5 19:56 client-kube-apiserver.crt
-rw------- 1 root root 227 Nov 5 19:56 client-kube-apiserver.key
-rw-r--r-- 1 root root 1144 Nov 5 19:56 client-kube-proxy.crt
-rw------- 1 root root 227 Nov 5 19:56 client-kube-proxy.key
-rw------- 1 root root 227 Nov 5 19:56 client-kubelet.key
-rw-r--r-- 1 root root 1153 Nov 5 19:56 client-scheduler.crt
-rw------- 1 root root 227 Nov 5 19:56 client-scheduler.key
-rw-r--r-- 1 root root 1185 Nov 5 19:56 client-supervisor.crt
-rw------- 1 root root 227 Nov 5 19:56 client-supervisor.key
-rw-r--r-- 1 root root 5209 Nov 5 19:57 dynamic-cert.json
drwx------ 2 root root 4096 Nov 5 19:56 etcd
-rw------- 1 root root 591 Nov 5 19:45 request-header-ca.crt
-rw------- 1 root root 227 Nov 5 19:45 request-header-ca.key
-rw------- 1 root root 566 Nov 5 19:45 server-ca.crt
-rw------- 1 root root 227 Nov 5 19:45 server-ca.key
-rw-r--r-- 1 root root 566 Nov 5 19:56 server-ca.nochain.crt
-rw------- 1 root root 1675 Nov 5 19:56 service.current.key
-rw------- 1 root root 1675 Nov 5 19:45 service.key
-rw-r--r-- 1 root root 1400 Nov 5 19:56 serving-kube-apiserver.crt
-rw------- 1 root root 227 Nov 5 19:56 serving-kube-apiserver.key
-rw------- 1 root root 227 Nov 5 19:56 serving-kubelet.key
drwx------ 2 root root 4096 Nov 5 19:56 temporary-certs
root@systemd-node-2:/# systemctl restart k3s
root@systemd-node-2:/# ls -la /var/lib/rancher/k3s/server/tls/
total 152
drwx------ 4 root root 4096 Nov 5 19:56 .
drwx------ 9 root root 4096 Nov 5 20:00 ..
-rw-r--r-- 1 root root 3071 Nov 5 20:01 client-admin.crt
-rw------- 1 root root 227 Nov 5 20:01 client-admin.key
-rw-r--r-- 1 root root 3142 Nov 5 20:01 client-auth-proxy.crt
-rw------- 1 root root 227 Nov 5 20:01 client-auth-proxy.key
-rw------- 1 root root 2468 Nov 5 20:00 client-ca.crt
-rw------- 1 root root 454 Nov 5 20:00 client-ca.key
-rw-r--r-- 1 root root 627 Nov 5 20:01 client-ca.nochain.crt
-rw-r--r-- 1 root root 3063 Nov 5 20:01 client-controller.crt
-rw------- 1 root root 227 Nov 5 20:01 client-controller.key
-rw-r--r-- 1 root root 3059 Nov 5 20:01 client-k3s-cloud-controller.crt
-rw------- 1 root root 227 Nov 5 20:01 client-k3s-cloud-controller.key
-rw-r--r-- 1 root root 3051 Nov 5 20:01 client-k3s-controller.crt
-rw------- 1 root root 227 Nov 5 20:01 client-k3s-controller.key
-rw-r--r-- 1 root root 3075 Nov 5 20:01 client-kube-apiserver.crt
-rw------- 1 root root 227 Nov 5 20:01 client-kube-apiserver.key
-rw-r--r-- 1 root root 3047 Nov 5 20:01 client-kube-proxy.crt
-rw------- 1 root root 227 Nov 5 20:01 client-kube-proxy.key
-rw------- 1 root root 227 Nov 5 19:56 client-kubelet.key
-rw-r--r-- 1 root root 3051 Nov 5 20:01 client-scheduler.crt
-rw------- 1 root root 227 Nov 5 20:01 client-scheduler.key
-rw-r--r-- 1 root root 3083 Nov 5 20:01 client-supervisor.crt
-rw------- 1 root root 227 Nov 5 20:01 client-supervisor.key
-rw-r--r-- 1 root root 5209 Nov 5 19:57 dynamic-cert.json
drwx------ 2 root root 4096 Nov 5 19:56 etcd
-rw------- 1 root root 2555 Nov 5 20:00 request-header-ca.crt
-rw------- 1 root root 454 Nov 5 20:00 request-header-ca.key
-rw------- 1 root root 2468 Nov 5 20:00 server-ca.crt
-rw------- 1 root root 454 Nov 5 20:00 server-ca.key
-rw-r--r-- 1 root root 627 Nov 5 20:01 server-ca.nochain.crt
-rw------- 1 root root 1675 Nov 5 20:01 service.current.key
-rw------- 1 root root 3350 Nov 5 20:00 service.key
-rw-r--r-- 1 root root 3302 Nov 5 20:01 serving-kube-apiserver.crt
-rw------- 1 root root 227 Nov 5 20:01 serving-kube-apiserver.key
-rw------- 1 root root 227 Nov 5 19:56 serving-kubelet.key
drwx------ 2 root root 4096 Nov 5 19:56 temporary-certs
root@systemd-node-2:/# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
systemd-node-2 Ready control-plane,etcd,master 5m29s v1.30.6+k3s1 172.17.0.5 <none> openSUSE Leap 15.4 6.8.0-1016-aws containerd://1.7.22-k3s1
systemd-node-3 Ready control-plane,etcd,master 4m56s v1.30.6+k3s1 172.17.0.6 <none> openSUSE Leap 15.4 6.8.0-1016-aws containerd://1.7.22-k3s1
root@systemd-node-2:/# cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
server \
'--token=token' \
'--server' \
'https://172.17.0.4:6443' \
|
Can you confirm that you still see this issue on the most recent release of K3s? If so we can reopen. |
Yeah I made a mistake, this error occurs only when there is a fixed registration address, for example if we use kubevip to register servers. For simplicity, we can reproduce the problem by replacing the initial node, in which case all servers points to each other, here are the reproduce steps: root@node-1 [ ~ ]# curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.31.2+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server \
--cluster-init
root@node-2 [ ~ ]# curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.31.2+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server \
--server https://<ip of node-1>:6443
root@node-3 [ ~ ]# curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.31.2+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server \
--server https://<ip of node-1>:6443
root@node-1 [ ~ ]# k3s kubectl delete node node-1
// wait for the node to be deleted
root@node-1 [ ~ ]# /usr/local/bin/k3s-uninstall.sh
// rejoin node-1 to cluster and point to node-2
root@node-1 [ ~ ]# curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.31.2+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server \
--server https://<ip of node-2>:6443
root@node-1 [ ~ ]# curl -sL https://github.com/k3s-io/k3s/raw/master/contrib/util/rotate-default-ca-certs.sh | bash -
root@node-1 [ ~ ]# k3s certificate rotate-ca --path=/var/lib/rancher/k3s/server/rotate-ca
root@node-1 [ ~ ]# ls -al /var/lib/rancher/k3s/server/tls
total 152
drwx------. 4 root root 4096 Nov 7 08:42 .
drwx------. 8 root root 4096 Nov 7 08:42 ..
-rw-r--r--. 1 root root 1173 Nov 7 08:42 client-admin.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-admin.key
-rw-r--r--. 1 root root 1178 Nov 7 08:42 client-auth-proxy.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-auth-proxy.key
-rw-------. 1 root root 566 Nov 7 08:41 client-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 client-ca.key
-rw-r--r--. 1 root root 566 Nov 7 08:49 client-ca.nochain.crt
-rw-r--r--. 1 root root 1161 Nov 7 08:42 client-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-controller.key
-rw-r--r--. 1 root root 1157 Nov 7 08:42 client-k3s-cloud-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-k3s-cloud-controller.key
-rw-r--r--. 1 root root 1149 Nov 7 08:42 client-k3s-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-k3s-controller.key
-rw-r--r--. 1 root root 1173 Nov 7 08:42 client-kube-apiserver.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-kube-apiserver.key
-rw-r--r--. 1 root root 1140 Nov 7 08:42 client-kube-proxy.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-kube-proxy.key
-rw-------. 1 root root 227 Nov 7 08:42 client-kubelet.key
-rw-r--r--. 1 root root 1149 Nov 7 08:42 client-scheduler.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-scheduler.key
-rw-r--r--. 1 root root 1181 Nov 7 08:42 client-supervisor.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-supervisor.key
-rw-r--r--. 1 root root 4519 Nov 7 08:42 dynamic-cert.json
drwx------. 2 root root 4096 Nov 7 08:42 etcd
-rw-------. 1 root root 591 Nov 7 08:41 request-header-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 request-header-ca.key
-rw-------. 1 root root 570 Nov 7 08:41 server-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 server-ca.key
-rw-r--r--. 1 root root 570 Nov 7 08:49 server-ca.nochain.crt
-rw-------. 1 root root 1675 Nov 7 08:49 service.current.key
-rw-------. 1 root root 1675 Nov 7 08:41 service.key
-rw-r--r--. 1 root root 1364 Nov 7 08:42 serving-kube-apiserver.crt
-rw-------. 1 root root 227 Nov 7 08:42 serving-kube-apiserver.key
-rw-------. 1 root root 227 Nov 7 08:42 serving-kubelet.key
drwx------. 2 root root 4096 Nov 7 08:42 temporary-certs
root@node-1 [ ~ ]# systemctl restart k3s
root@node-1 [ ~ ]# ls -al /var/lib/rancher/k3s/server/tls
total 152
drwx------. 4 root root 4096 Nov 7 08:42 .
drwx------. 8 root root 4096 Nov 7 08:42 ..
-rw-r--r--. 1 root root 1173 Nov 7 08:42 client-admin.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-admin.key
-rw-r--r--. 1 root root 1178 Nov 7 08:42 client-auth-proxy.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-auth-proxy.key
-rw-------. 1 root root 566 Nov 7 08:41 client-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 client-ca.key
-rw-r--r--. 1 root root 566 Nov 7 09:04 client-ca.nochain.crt
-rw-r--r--. 1 root root 1161 Nov 7 08:42 client-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-controller.key
-rw-r--r--. 1 root root 1157 Nov 7 08:42 client-k3s-cloud-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-k3s-cloud-controller.key
-rw-r--r--. 1 root root 1149 Nov 7 08:42 client-k3s-controller.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-k3s-controller.key
-rw-r--r--. 1 root root 1173 Nov 7 08:42 client-kube-apiserver.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-kube-apiserver.key
-rw-r--r--. 1 root root 1140 Nov 7 08:42 client-kube-proxy.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-kube-proxy.key
-rw-------. 1 root root 227 Nov 7 08:42 client-kubelet.key
-rw-r--r--. 1 root root 1149 Nov 7 08:42 client-scheduler.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-scheduler.key
-rw-r--r--. 1 root root 1181 Nov 7 08:42 client-supervisor.crt
-rw-------. 1 root root 227 Nov 7 08:42 client-supervisor.key
-rw-r--r--. 1 root root 4519 Nov 7 08:42 dynamic-cert.json
drwx------. 2 root root 4096 Nov 7 08:42 etcd
-rw-------. 1 root root 591 Nov 7 08:41 request-header-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 request-header-ca.key
-rw-------. 1 root root 570 Nov 7 08:41 server-ca.crt
-rw-------. 1 root root 227 Nov 7 08:41 server-ca.key
-rw-r--r--. 1 root root 570 Nov 7 09:04 server-ca.nochain.crt
-rw-------. 1 root root 1675 Nov 7 09:04 service.current.key
-rw-------. 1 root root 1675 Nov 7 08:41 service.key
-rw-r--r--. 1 root root 1364 Nov 7 08:42 serving-kube-apiserver.crt
-rw-------. 1 root root 227 Nov 7 08:42 serving-kube-apiserver.key
-rw-------. 1 root root 227 Nov 7 08:42 serving-kubelet.key
drwx------. 2 root root 4096 Nov 7 08:42 temporary-certs
|
Ok, so in this case the server ends up bootstrapping from either itself, or the other un-rotated node? |
Yes, in this case no server will bootstrap from datastore, and I think it's the most common scenario for capi managed k3s (if 1 round of rolling update has occurred). |
IIRC, in the case of HTTP bootstrap it just serves the content from disk instead of extracting it from the datastore. We could perhaps take a look at changing how that works on the node providing the content, to prefer pulling it out of the datastore if the datastore is available. Assuming that is less likely to cause problems than changing other things. |
Reopening to come up with a plan to address this. The easy answer on current releases is to ensure that there is no server address set on the node you're rotating CA certs on. This should be doable in CAPI since you manage the server config. Maybe just always remove the server address from servers once they're joined? But we can probably improve on that. |
Yes, that's a workaround we can do, looking forward to the improvement. |
Environmental Info:
K3s Version: k3s version v1.30.3+k3s1 (f646604)
Node(s) CPU architecture, OS, and Version: Linux k3s1 6.6.47.1-1.azl3 #1 SMP PREEMPT_DYNAMIC Sat Aug 24 02:52:27 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 3 servers with embedded etcd
Describe the bug:
rotate-ca command just replaces the ca certs in datastore(embedded etcd in our case), and only the server without joinUrl or token will check and save ca certs from datastore to local disk at restart, that's the initial server with "--cluster-init" param. if that server is deleted or replaced(that's very common if we use capi for k3s to manage nodes), then all servers in cluster start with joinUrl and token, they only get bootstrap data(includes ca certs) from joinUrl via http at startup.
Given that api server serves the bootstrap data from disk rather than datastore, so no server in the cluster will use the new ca certs however we restart them.
ref: bootstrap logic, http bootstrap handler
Steps To Reproduce:
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.30.3+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server --cluster-init
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.30.3+k3s1 K3S_TOKEN=test INSTALL_K3S_SKIP_SELINUX_RPM=true sh -s - server --server https://<ip of 1st node>:6443
k3s kubectl delete node <name of 1st node>
curl -sL https://github.com/k3s-io/k3s/raw/master/contrib/util/rotate-default-ca-certs.sh | bash -
k3s certificate rotate-ca --path=/var/lib/rancher/k3s/server/rotate-ca
systemctl restart k3s
Expected behavior:
CA certs in
/var/lib/rancher/k3s/server/tls
is replaced with new caActual behavior:
CA in
/var/lib/rancher/k3s/server/tls
didn't changeAdditional context / logs:
I think a possible solution is serving bootstrap data from datastore rather than from disk
The text was updated successfully, but these errors were encountered: