Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm uninstall stuck indefinitely if tigera-operator pre-delete job fails to schedule #9220

Open
TheNilesh opened this issue Sep 11, 2024 · 1 comment

Comments

@TheNilesh
Copy link

Expected Behavior

During Helm uninstallation of the tigera-operator chart, the uninstall job (defined as a pre-delete hook) should execute and complete or fail, allowing Helm to remove the associated resources without hanging indefinitely.

Current Behavior

Helm uninstallation becomes stuck indefinitely if the tigera-operator-uninstall job (pre-delete hook) is not scheduled or fails to execute, particularly when the cluster has no active nodes. This prevents the uninstallation process from completing, requiring manual intervention.

Possible Solution

A potential solution could involve adding configuration options to manage the behavior of the pre-delete hook. For example:

  • Introduce a configurable timeout for the pre-delete job. If the job is not scheduled or completed within the specified time, Helm should forcefully delete the release to avoid being stuck.
  • Implement a flag in the Helm chart to allow force deletion of resources after the pre-delete job times out.
  • Allow disabling the pre-delete hook for clusters that may not have nodes, ensuring the Helm uninstall can still proceed in control-plane-only environments.
  • Use Kubernetes' activeDeadlineSeconds and backoffLimit to control retries and timeouts for the job, ensuring it doesn’t block indefinitely.

Steps to Reproduce (for bugs)

  1. Install the tigera-operator Helm chart on a Kubernetes cluster managed through a hosted control plane using Cluster API (CAPI), Kamaji, and Sveltos.
  2. Remove all nodes from the virtual cluster (leaving only the control plane).
  3. Attempt to uninstall the Helm chart using helm uninstall tigera-operator -n tigera-operator.
  4. Observe that Helm becomes stuck, waiting for the uninstall job that never runs due to the lack of nodes.

Context

This issue affects environments where the Helm chart is installed on a hosted control plane. Specifically, it impacts clusters managed by Cluster API (CAPI), Kamaji, and Sveltos. When no worker nodes are present, the uninstall job (pre-delete hook) cannot be scheduled, leaving the Helm uninstallation stuck indefinitely. This issue interferes with automated cluster management workflows, where clean resource removal is required even when no nodes are attached to the cluster. This mainly happens when managing tigera-operator installation through the Sveltos cluster profile.

Your Environment

  • Calico version: v3.28.0
  • Orchestrator version: Kubernetes 1.29.6
  • Operating System and version: Ubuntu 20.04
  • Cluster API (CAPI), Kamaji, Sveltos
@caseydavenport
Copy link
Member

I believe helm has a --no-hooks option you can pass in order to disable the hook if there are no scheduleable nodes in the
cluster. https://helm.sh/docs/helm/helm_uninstall/

  --no-hooks             prevent hooks from running during uninstallation

Would that do the trick in your case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants