-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAPV controller manager stuck during reconcile #2832
Comments
I would guess that maybe a controller is stuck. This could be confirmed via metrics (active workers or something) and via a go routine dump of the controller (via |
We feel we have the same problem, especially when deleting clusters nothing really happens until we restart CAPV. |
Could you please note which version of CAPV you had been using when this issue occured? |
for this env the combo is:
ill try updating all of them and see. |
Please check if you have |
Not set! |
^^ Should help to figure out where the controller is stuck |
Same bug after update CAPV controller from v1.8.4 to v1.10.0 |
There's no way for anyone to debug this without a go routine dump / stack traces. Until then we can only recommend for anyone using older versions to ensure |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/kind bug
We have 1 management cluster with 7 workload clusters. Each workload cluster has ~25 worker nodes.
Sometimes during the reconciliation of all workload clusters, CAPV stops reconciling without any significant information in the logs (nor in CAPI logs). No new VMs are visible in vCenter, nothing is deleted, and new Machines remain in the "Provisioning" state indefinitely.
The quickest fix is to restart the CAPV deployment, after which everything runs smoothly again.
CAPV controller manager:
CAPI controller manager:
Omit state where CAPV runs in this strange state, without any info in logs.
Our workaround is scheduled Job for CAPV restart twice per day.
Environment:
kubectl version
): 1.24.17/etc/os-release
): Ubuntu 22.04The text was updated successfully, but these errors were encountered: