-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race issue after node reboot #1221
Comments
|
just update doing -f looks like fix the issue in the copy command |
Coincidentally, we also saw this error crop up yesterday with one of our edge clusters after rebooting. |
As an FYI i see different deployment yamls use different way to copy the cni binary in init container: the first one[1] will use although im not sure that copying file atomically will solve the above issue. see: multus-cni/deployments/multus-daemonset.yml Line 207 in 8e5060b
also |
This should hopefully be addressed with #1213 |
Saw this in minikube today. No rebooting, just staring up a new minikube cluster. |
I also got a reproduction after rebooting a node and having multus restart. I mitigated it by deleting
|
Seems I can make this happen anytime I ungracefully restart a node, worker or master it creates this error and stops pod network sandbox recreation completely on that node. The fix mentioned above does work, but this likely means a power outage of a node will require manual intervention whereas otherwise without multus this is not required, this error should be handled properly. |
+1. This seems like a pretty serious issue. Can we get a fix merge for it soon please? |
Additionally can confirm this behavior. as @dougbtv mentioned... removing |
+1 happend to me as well, cluster did not come up. Any chance to fix this soon? |
same here, cluster kubespray 1.29 |
Certainly need to fix right away. |
@dougbtv : Hit exactly the same issue. It helps by deleting /opt/cni/bin/multus-shim. when could this be fixed? |
Hit the same issue with kube-ovn. Already posted it there (kubeovn/kube-ovn#4470) |
Hi, it looks like there is an issue after a node reboot where we can have a race in multus that will prevent the pod from starting
The problem is mainly after reboot that the multus-shim gets called by crio to start pods but the multus pod is not able to start because the init container fails to cp the shim.
The reason it failed to copy is because crio called the shim who is stuck waiting for the communication with the pod
The text was updated successfully, but these errors were encountered: