[e2e tests] use --wait and --cascade=foreground in helm uninstall #1004
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Currently,
testUninstall()
(tests/e2e/test_uninstall.go) breaks down during various components' uninstallation afterhelm uninstall
when welist
orget
k8s resources from the cluster to check that the removal of resources and objects was successful.These errors happen in the same range of places (but not one in particular) but not for the same components across runs. These are transient states and repeating the
kubectl get
commands the test steps use but after the test fails yield correct/clean results. This initially led me to believe we don't give the removal process (which is ahelm.DeleteE
function call) enough time to complete its tasks and clean up all related objects.Examples of failures:
error on the koperator uninstall step because
kubectl --namespace kafka api-resources --verbs list --output name --sort-by name
returned this error among the other api-resourceskoperator/tests/e2e/uninstall.go
Lines 52 to 53 in 25c4e97
error on the cert-manager uninstall step because
kubectl --namespace cert-manager get pods,services,deployments.apps,daemonset.apps,replicasets.apps,statefulsets.apps,secrets,serviceaccounts,configmaps,mutatingwebhookconfigurations.admissionregistration.k8s.io,validatingwebhookconfigurations.admissionregistration.k8s.io,jobs.batch,cronjobs.batch,poddisruptionbudgets.policy,podsecuritypolicies.policy,persistentvolumeclaims,persistentvolumes --selector=app.kubernetes.io/managed-by=Helm,app.kubernetes.io/instance=cert-manager -o=go-template='{{range .items}}{{.kind}}{{"/"}}{{.metadata.name}}{{if .metadata.namespace}}{{"."}}{{.metadata.namespace}}{{end}}{{"\n"}}{{end}}' --all-namespaces
returned a few pods that were probably inTerminating
statekoperator/tests/e2e/uninstall.go
Line 194 in 25c4e97
In practice, the issue is that we don't instruct
helm
to wait for the right things to be fully executed.The simplest addition would be to add
--wait
to the helm extra args for the delete command but per Issue 10586 from the helm repo that is not enough and we indeed see issues with pods (see the second example failure above) so we have to also add--cascade=foreground
to the extra args alongside--wait
.What
--cascade=foreground
means is not detailed in the helm docs but the info is available in the k8s docs : https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletionI tested this a number of times on a PKE cluster both by running testInstall() and testUninstall() in one go and by running them in consecutive
make test-e2e
commands. I also tried removing the new extraArgs and the tests failed like initially described in the problem statement.This is a small fix to #987.
The update to tests/e2e/koperator_suite_test.go is just about some whitespaces and indentation mismatches left after commenting in a different PR.
Type of Change
Checklist