[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

mihaialexandrescu · 2023-06-27T09:17:57Z

Description

Currently, testUninstall() (tests/e2e/test_uninstall.go) breaks down during various components' uninstallation after helm uninstall when we list or get k8s resources from the cluster to check that the removal of resources and objects was successful.
These errors happen in the same range of places (but not one in particular) but not for the same components across runs. These are transient states and repeating the kubectl get commands the test steps use but after the test fails yield correct/clean results. This initially led me to believe we don't give the removal process (which is a helm.DeleteE function call) enough time to complete its tasks and clean up all related objects.

Examples of failures:

error on the koperator uninstall step because kubectl --namespace kafka api-resources --verbs list --output name --sort-by name returned this error among the other api-resources
- ```
error: unable to retrieve the complete list of server APIs: kafka.banzaicloud.io/v1beta1: the server could not find the requested resource 
```
- code :
  
  koperator/tests/e2e/uninstall.go
  
  Lines 52 to 53 in 25c4e97
  
  k8sResourceKinds, err := listK8sResourceKinds(kubectlOptions, "")
  
  Expect(err).ShouldNot(HaveOccurred())
- Not an issue when I repeat the make test-e2e command right after
error on the cert-manager uninstall step because kubectl --namespace cert-manager get pods,services,deployments.apps,daemonset.apps,replicasets.apps,statefulsets.apps,secrets,serviceaccounts,configmaps,mutatingwebhookconfigurations.admissionregistration.k8s.io,validatingwebhookconfigurations.admissionregistration.k8s.io,jobs.batch,cronjobs.batch,poddisruptionbudgets.policy,podsecuritypolicies.policy,persistentvolumeclaims,persistentvolumes --selector=app.kubernetes.io/managed-by=Helm,app.kubernetes.io/instance=cert-manager -o=go-template='{{range .items}}{{.kind}}{{"/"}}{{.metadata.name}}{{if .metadata.namespace}}{{"."}}{{.metadata.namespace}}{{end}}{{"\n"}}{{end}}' --all-namespaces returned a few pods that were probably in Terminating state
- ```
Pod/cert-manager-589f57598d-2mc8x.cert-manager
Pod/cert-manager-cainjector-6b7bf5fc86-xnx4r.cert-manager
Pod/cert-manager-webhook-fbc478968-pphxt.cert-manager 
```
- code :
  
  koperator/tests/e2e/uninstall.go
  
  Line 194 in 25c4e97
  
  Expect(remainedResources).Should(BeEmpty())
- Those cert-manager pods, for instance, are not there later.

In practice, the issue is that we don't instruct helm to wait for the right things to be fully executed.
The simplest addition would be to add --wait to the helm extra args for the delete command but per Issue 10586 from the helm repo that is not enough and we indeed see issues with pods (see the second example failure above) so we have to also add --cascade=foreground to the extra args alongside --wait.

What --cascade=foreground means is not detailed in the helm docs but the info is available in the k8s docs : https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion

I tested this a number of times on a PKE cluster both by running testInstall() and testUninstall() in one go and by running them in consecutive make test-e2e commands. I also tried removing the new extraArgs and the tests failed like initially described in the problem statement.

This is a small fix to #987.

The update to tests/e2e/koperator_suite_test.go is just about some whitespaces and indentation mismatches left after commenting in a different PR.

Type of Change

Bug Fix

Checklist

All code style checks pass

use --wait and --cascade=foreground in helm uninstall

18e5d65

mihaialexandrescu marked this pull request as ready for review June 27, 2023 09:31

mihaialexandrescu requested a review from a team as a code owner June 27, 2023 09:31

pregnor approved these changes Jun 27, 2023

View reviewed changes

Kuvesz approved these changes Jun 27, 2023

View reviewed changes

mihaialexandrescu merged commit 2d4c6d4 into master Jun 27, 2023

mihaialexandrescu deleted the e2e-tests/fix-wait-for-uninstall branch June 27, 2023 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

mihaialexandrescu commented Jun 27, 2023

	k8sResourceKinds, err := listK8sResourceKinds(kubectlOptions, "")
	Expect(err).ShouldNot(HaveOccurred())

[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004

Conversation

mihaialexandrescu commented Jun 27, 2023

Description

Type of Change

Checklist