Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

Merged
merged 1 commit into from
Jun 27, 2023

Conversation

mihaialexandrescu
Copy link
Contributor

Description

Currently, testUninstall() (tests/e2e/test_uninstall.go) breaks down during various components' uninstallation after helm uninstall when we list or get k8s resources from the cluster to check that the removal of resources and objects was successful.
These errors happen in the same range of places (but not one in particular) but not for the same components across runs. These are transient states and repeating the kubectl get commands the test steps use but after the test fails yield correct/clean results. This initially led me to believe we don't give the removal process (which is a helm.DeleteE function call) enough time to complete its tasks and clean up all related objects.

Examples of failures:

  • error on the koperator uninstall step because kubectl --namespace kafka api-resources --verbs list --output name --sort-by name returned this error among the other api-resources

    • error: unable to retrieve the complete list of server APIs: kafka.banzaicloud.io/v1beta1: the server could not find the requested resource 
      
    • code :
      k8sResourceKinds, err := listK8sResourceKinds(kubectlOptions, "")
      Expect(err).ShouldNot(HaveOccurred())
    • Not an issue when I repeat the make test-e2e command right after
  • error on the cert-manager uninstall step because kubectl --namespace cert-manager get pods,services,deployments.apps,daemonset.apps,replicasets.apps,statefulsets.apps,secrets,serviceaccounts,configmaps,mutatingwebhookconfigurations.admissionregistration.k8s.io,validatingwebhookconfigurations.admissionregistration.k8s.io,jobs.batch,cronjobs.batch,poddisruptionbudgets.policy,podsecuritypolicies.policy,persistentvolumeclaims,persistentvolumes --selector=app.kubernetes.io/managed-by=Helm,app.kubernetes.io/instance=cert-manager -o=go-template='{{range .items}}{{.kind}}{{"/"}}{{.metadata.name}}{{if .metadata.namespace}}{{"."}}{{.metadata.namespace}}{{end}}{{"\n"}}{{end}}' --all-namespaces returned a few pods that were probably in Terminating state

    • Pod/cert-manager-589f57598d-2mc8x.cert-manager
      Pod/cert-manager-cainjector-6b7bf5fc86-xnx4r.cert-manager
      Pod/cert-manager-webhook-fbc478968-pphxt.cert-manager 
      
    • code :
      Expect(remainedResources).Should(BeEmpty())
    • Those cert-manager pods, for instance, are not there later.

In practice, the issue is that we don't instruct helm to wait for the right things to be fully executed.
The simplest addition would be to add --wait to the helm extra args for the delete command but per Issue 10586 from the helm repo that is not enough and we indeed see issues with pods (see the second example failure above) so we have to also add --cascade=foreground to the extra args alongside --wait.

What --cascade=foreground means is not detailed in the helm docs but the info is available in the k8s docs : https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion

I tested this a number of times on a PKE cluster both by running testInstall() and testUninstall() in one go and by running them in consecutive make test-e2e commands. I also tried removing the new extraArgs and the tests failed like initially described in the problem statement.

This is a small fix to #987.

The update to tests/e2e/koperator_suite_test.go is just about some whitespaces and indentation mismatches left after commenting in a different PR.

Type of Change

  • Bug Fix

Checklist

  • All code style checks pass

@mihaialexandrescu mihaialexandrescu marked this pull request as ready for review June 27, 2023 09:31
@mihaialexandrescu mihaialexandrescu requested a review from a team as a code owner June 27, 2023 09:31
@mihaialexandrescu mihaialexandrescu merged commit 2d4c6d4 into master Jun 27, 2023
@mihaialexandrescu mihaialexandrescu deleted the e2e-tests/fix-wait-for-uninstall branch June 27, 2023 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants