-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K0S run via docker-compose doesn't recover from host rebooting (single host) #5023
Comments
The logs of the k0s Docker container would be helpful, the logs of the failing containers, too. You could also add the |
Hi,
|
Could be that there are some stuck containers from previous runs. When shutting down k0s, it won't stop running pods/containers. You need to drain the node manually. Moreover, when running k0s in Docker, the cgroups hierarchy is possibly not properly respected, and container processes might keep running (or at least their cgroup hierarchy). I can imagine that this causes some troubles. Can you maybe try to add volumes for |
I can't drain the node manually since we are talking about unexpected machine restart/reboot. |
I've added the 2 new (anonymous) volumes and it didn't make any difference (the K0S failed to come up after the first restart), Attaching the updated sample compose file: |
Any advice on the subject would be very much appreciated... |
The issue is marked as stale since no activity has been recorded in 30 days |
Anyone? |
The problem seems to stem from unstable k0s controller IPs. Once the cluster is initialized, the IP of the controller may not change, but docker compose will do exactly that. I could make it work after a reboot when using fixed IPs: --- aio-compose-sample.yaml
+++ aio-compose-sample.yaml
@@ -23,7 +23,9 @@
- "6443:6443"
- "80:30080"
- "443:30443"
- network_mode: "bridge"
+ networks:
+ sample_net:
+ ipv4_address: 192.168.1.100
environment:
K0S_CONFIG: |-
apiVersion: k0s.k0sproject.io/v1beta1
@@ -121,7 +123,9 @@
- MSSQL_SA_PASSWORD=SomePass
ports:
- '1433:1433'
- network_mode: "bridge"
+ networks:
+ sample_net:
+ ipv4_address: 192.168.1.200
dpr:
container_name: dpr
image: registry:2
@@ -141,4 +145,13 @@
REGISTRY_HTTP_TLS_KEY: /var/ssl/private/dpr.key
ports:
- '5443:5443'
- network_mode: "bridge"
+ networks:
+ sample_net:
+ ipv4_address: 192.168.1.201
+
+networks:
+ sample_net:
+ driver: bridge
+ ipam:
+ config:
+ - subnet: 192.168.1.0/24 Seems that the paragraph about custom networks in Docker has to be revisited. Apparently it works with custom networks too, nowadays, at least as long as they are bridge networks? |
10x for the prompt reply, but in the designated environment we don't control the IP(s), and we can't guarantee the ability to make the IP(s) constant, any other direction that doesn't involve fixing the IP(s)? |
You can use a load-balanced DNS name to access the controller(s), as well. See the docs on Control Plane High Availability for details on that. If all you have is a single-controller setup, you can make it simpler by using localhost or 127.0.0.1 or the docker-managed host name for everything. Note that this will then show up in the kubeconfig files generated by k0s/k0sctl, as well. You need to change the server URL in the kubeconfigs accordingly to connect to the cluster from the outside. Try to set the following in your k0s config (note that I haven't tested this 🙃): spec:
api:
externalAddress: 127.0.0.1
storage:
type: etcd
etcd:
peerAddress: 127.0.0.1 Also note that for a single node, it's usually easier not to use etcd, but an SQLite database via kine. I'd replace the k0s flags |
Thanks, I Will give it a try and update back how it works. |
I've made the change but the cluster doesn't come up now. |
Two things: First, I didn't really think about the fact that endpoints in Kubernetes can't contain loopback addresses. Therefore, you also need to disable k0s's endpoint reconciler so that it doesn't try to set 127.0.0.1 as the address for --- aio-compose-sample.yaml
+++ aio-compose-sample.yaml
@@ -3,7 +3,7 @@
k0s:
container_name: k0s
image: docker.io/k0sproject/k0s:v1.30.4-k0s.0
- command: sh -c "apk add --no-cache --no-check-certificate ca-certificates && update-ca-certificates && k0s controller --config=/etc/k0s/config.yaml --enable-worker --no-taints --enable-metrics-scraper --debug"
+ command: sh -c "apk add --no-cache --no-check-certificate ca-certificates && update-ca-certificates && k0s controller --config=/etc/k0s/config.yaml --single --enable-metrics-scraper --disable-components=endpoint-reconciler"
hostname: k0s
privileged: true
cgroup: host |
10x, I will make another try with all the inputs (sorry for missing the command line required changes) and update back. |
Hi @twz123, |
@twz123 - sorry for the delay, but we have performed additional tests, and the solution seems rock solid! |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.4+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
K0S running on a single node system (multiple services run by docker-compose) doesn't survive multiple reboots (it comes up after a few, and then it doesn't, or it doesn't come up at all after some reboots).
Attached below a sample docker-compose file to demo the problem.
Tried on Ubuntu 24.04 and CentOS 9 - same results
Steps to reproduce
docker compose -f aio-compose-sample.yaml up -d --wait
Expected behavior
The K0S should survive restarts, always.
Actual behavior
After a few restarts, the K0S breaks down:
Screenshots and logs
Kindly advise what logs are needed, and I'll be happy to add them.
Additional context
Adding a sample docker compose to demonstrate the problem:
aio-compose-sample.zip
Docker version info:
The text was updated successfully, but these errors were encountered: