-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proactively bounce capi-controller-manager in case of netsplits #7445
Comments
I think this is a general problem for all kind of controllers and is not specific to CAPI's controllers. It should be (or maybe is already) solved in Kubernetes itself. Did you check if the pod gets evicted after five minutes which would map to the default of the kube-controller-managers As addition, there are also two other maybe interesting configuration parameters for kube-apiserver which I got from this issue: FeatureGate A workaround would be to run the CAPI Controller Manager (and also the other controllers) by multiple replicas and anti-affinity. Failed leader election should lead to failover to another replica. |
Not sure if I understood the issue correctly, but changes to the CCM should be out of scope of the Cluster API project? I thought the issue is that CCM in the workload cluster doesn't set the providerID not the CAPI controller in the mgmt cluster? |
To be clear:
Wasn't your problem that the cloud controller manager was not setting the providerIDs on nodes? |
|
Is the CAPI Controller Manager running with multiple replicas? If the controller is out of the network (split from the leader API server on the mgmt cluster) it should lose the lease and another replica should take up the lease. If the management Cluster is cut off from the workload cluster network entirely it will just keep retrying to contact the workload cluster. |
in this case, the capi-controller-manager's node
|
Yup !!!!!!!!! |
There's a few questions that come to mind:
Could this be approached by watching a metric and alerting that the CP is not able to contact workloads? This is discussed here. That would give someone with insight into the network set up a change to remediate manually, or set up some deployment specific automation. |
I think the answer is monitoring to be honest. Monitor Cluster API and then produce corresponding alerts with either manual playbooks or some automatic mitigation (although automatic is super hard with things like net splits) Some thoughts:
If we include "can the CAPI controller contact workload clusters" into our readiness/liveness probes (or if the controller just shuts down) we get the following consequences:
I don't think with a behavior like that we could maintain stable operations. Let's assume we have a bunch of workload clusters and some of them are offline (either because they are simply broken / misconfigured or some edge "temporary not reachable" scenario). The Cluster API controller would just be permanently restarted. Imagine what happens if you have a few hundred clusters.. Same result if you have some workload clusters that are reachable on some mgmt cluster nodes and others only from other nodes. |
(deleting my comment as I meant to put it in the other issue) but I still think we should proactively time bomb this container :) |
/triage accepted trying to make up my mind on two sides of the problem:
|
Maybe it's just a naive time bomb expiration? Like once every week or something the pod self terminates gracefully . |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/close unfortunately we did not reach an agreement on a way forward and the issue was not active in the last year |
@fabriziopandini: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(edit: removed ccm, bc its an ambiguous abbreviation)
User Story
As a user on the edge, in case that one node in my MC gets somehow netsplit off from my Controlplane worker nodes, id like the capi controller manager to fail, that way it bounces to a "potentially" healthy, connected other node in my MC.
Detailed Description
As an example of how I was stepping through this earlier, the steps i followed here worked kubernetes-sigs/cluster-api-provider-vsphere#1660...
Anything else you would like to add:
I saw this in a very odd environment, admittedly, but I think its still a good idea to broaden the definition of health for the capi controller manager, if possible. We wouldnt want a single capi-controller-manager that was having issues to slow down the remediation of a fleet of clusters running on different networks .
/kind feature
Diagram below shows the issue i ran into in this netsplit situation.
The text was updated successfully, but these errors were encountered: