Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E Tweaks/Fixes #279

Merged

Conversation

dhaiducek
Copy link
Member

  • Remove unused Pause() util function
  • Add missing cleanup commands that prevented re-running from working properly
  • Add CheckComplianceStatus() function to also return the status message in the failure message
  • Add KubectlDelete() with common flags, including wait=false to skip waiting for finalizers and ignore-not-found
  • Add GetComplianceStatus() to do the compliance check directly, providing the status message on failure.
  • Replace WithOffset() with GinkgoHelper(), providing enhanced failure tracing

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 19, 2024

Sample output from the new CheckComplianceStatus() function:

 [FAILED]
  Unexpected compliance state. Status message: pods [nginx-pod-e2e] not found in namespace default
  Expected
      <string>: Compliant
  to equal
      <string>: NonCompliant

And, similarly, the updated OperatorPolicy function:

 [FAILED]
  Unexpected compliance state. Status message: NonCompliant; the policy spec is valid
  - the OperatorGroup matches what is required by the policy
  - the Subscription matches what is required by the policy
  - no InstallPlans requiring approval were found
  - ClusterServiceVersion (quay-operator.v3.10.6) - installing: waiting for deployment quay-operator-tng to become ready: deployment "quay-operator-tng" not available: Deployment does not have minimum availability.
  - there are CRDs present for the operator
  - the deployments quay-operator-tng do not have their minimum availability
  - CatalogSource was found
  Expected
      <v1.ComplianceState>: NonCompliant
  to equal
      <v1.ComplianceState>: Compliant

@dhaiducek dhaiducek force-pushed the e2e-fixes branch 3 times, most recently from 600e0a7 to bad64e2 Compare July 19, 2024 21:41
@dhaiducek
Copy link
Member Author

/hold to retry a bunch of times

@@ -163,6 +159,14 @@ func Kubectl(args ...string) {
}
}

func KubectlDelete(args ...string) {
deleteArgs := []string{
"delete", "--wait=false", "--ignore-not-found",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping --wait=false would save some time, but it looks like the time to run the tests is similar.

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 25, 2024

Failure on run # 9:

• [FAILED] [62.974 seconds]
Test results of namespace selection Checking results of different namespaceSelectors [It] No namespaceSelector specified
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case19_ns_selector_test.go:63

  Timeline >>
  STEP: Applying prerequisites @ 07/24/24 20:14:25.918
  STEP: patching policy with the test selector @ 07/24/24 20:14:28.495
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case19_ns_selector_test.go:61 @ 07/24/24 20:15:28.775
  << Timeline

  [FAILED] Timed out after 60.000s.
  Expected
      <string>: configmaps [configmap-selector-e2e] not found in namespaces: case19a-2-e2e, case19a-3-e2e, case19a-4-e2e, case19a-5-e2e, case19b-1-e2e, case19b-2-e2e, case19b-3-e2e, case19b-4-e2e, default, kube-node-lease, kube-public, kube-system, local-path-storage, managed, olm, open-cluster-management-agent-addon, operator-policy-testns, operators, range1, range2
  to equal
      <string>: namespaced object configmap-selector-e2e of kind ConfigMap has no namespace specified from the policy namespaceSelector nor the object metadata
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case19_ns_selector_test.go:61 @ 07/24/24 20:15:28.775

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10014681844/job/27879424536?pr=279

@dhaiducek
Copy link
Member Author

Failure in run # 8:

• [FAILED] [65.816 seconds]
Testing OperatorPolicy Test health checks on OLM resources after OperatorPolicy operator installation [It] Should generate conditions and relatedobjects of CSV [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:887

  Timeline >>
  STEP: Creating the parent object @ 07/24/24 16:40:25.987
  STEP: Creating the child object with the owner reference @ 07/24/24 16:40:26.128
  STEP: Verifying the child object exists @ 07/24/24 16:40:26.131
  STEP: Creating the parent object @ 07/24/24 16:40:26.133
  STEP: Creating the child object with the owner reference @ 07/24/24 16:40:26.335
  STEP: Verifying the child object exists @ 07/24/24 16:40:26.337
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:899 @ 07/24/24 16:41:26.341
  << Timeline

  [FAILED] Timed out after 60.001s.
  Expected
      <string>: 
  to equal
      <string>: InstallSucceeded
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:899 @ 07/24/24 16:41:26.341

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10014681844/job/27868969069?pr=279

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 25, 2024

Failure in run # 5 (hosted mode):

• [FAILED] [70.859 seconds]
Testing OperatorPolicy Test reporting of unapproved version after installation [It] Should report a violation after the versions list is patched to exclude the current version [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3642

  Timeline >>
  STEP: Creating the parent object @ 07/20/24 12:11:55.673
  STEP: Creating the child object with the owner reference @ 07/20/24 12:11:55.788
  STEP: Verifying the child object exists @ 07/20/24 12:11:55.792
  STEP: Patching the versions field to exclude the installed version @ 07/20/24 12:11:55.794
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3647 @ 07/20/24 12:12:55.884
  << Timeline

  [FAILED] Timed out after 60.000s.
  The function passed to Eventually failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:153 with:
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3647 @ 07/20/24 12:12:55.884

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10014681844/job/27696819706?pr=279

@yiraeChristineKim
Copy link
Contributor

Is the current test fail related to delete --wait=false?

@yiraeChristineKim
Copy link
Contributor

I like this changes Thanks Dale!

@yiraeChristineKim
Copy link
Contributor

/approve

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 25, 2024

Failure in run # 11 (hosted mode):

• [FAILED] [27.641 seconds]
Testing OperatorPolicy Test CRD deletion delayed because of a finalizer [It] Initially behaves correctly as musthave [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2671

  Timeline >>
  STEP: Creating the parent object @ 07/25/24 13:30:45.993
  STEP: Creating the child object with the owner reference @ 07/25/24 13:30:46.209
  STEP: Verifying the child object exists @ 07/25/24 13:30:46.214
  STEP: Waiting for a CRD to appear, which should indicate the operator is installing @ 07/25/24 13:30:46.353
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2687 @ 07/25/24 13:30:57.435
  [FAILED] in [AfterAll] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/e2e_suite_test.go:198 @ 07/25/24 13:30:57.505
<< Timeline

  [FAILED] Failed after 1.035s.
  The function passed to Consistently failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:153 with:
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2687 @ 07/25/24 13:30:57.435

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10014681844/job/27912190218?pr=279

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 26, 2024

Hosted mode failures:

• [FAILED] [71.033 seconds]
Testing OperatorPolicy Testing an all default operator policy [It] Should create the Subscription with default values [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:247

  Timeline >>
  STEP: Creating the parent object @ 07/26/24 15:20:30.197
  STEP: Creating the child object with the owner reference @ 07/26/24 15:20:30.389
  STEP: Verifying the child object exists @ 07/26/24 15:20:30.393
  STEP: Verifying the policy is compliant @ 07/26/24 15:20:30.395
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:249 @ 07/26/24 15:21:30.396
<< Timeline

  [FAILED] Timed out after 60.000s.
  The function passed to Eventually failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:139 with:
  Expected
      <map[int]v1.RelatedObject | len:0>: {}
  to have length 1
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:249 @ 07/26/24 15:21:30.396
• [FAILED] [120.129 seconds]
Test an alternative kubeconfig for policy evaluation [It] should create the namespace using the alternative kubeconfig [hosted-mode]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case21_alternative_kubeconfig_test.go:43

  Timeline >>
  STEP: Creating the parent object @ 07/26/24 15:20:29.982
  STEP: Creating the child object with the owner reference @ 07/26/24 15:20:30.09
  STEP: Verifying the child object exists @ 07/26/24 15:20:30.093
  STEP: Verifying that the create-ns policy is compliant @ 07/26/24 15:20:30.095
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case21_alternative_kubeconfig_test.go:53 @ 07/26/24 15:22:30.096
  [FAILED] in [AfterAll] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case21_alternative_kubeconfig_test.go:39 @ 07/26/24 15:22:30.11
  << Timeline

  [FAILED] Timed out after 120.001s.
  The function passed to Eventually failed at /home/runner/work/config-policy-controller/config-policy-controller/test/utils/utils.go:207 with:
  Unexpected compliance state. Status message: 
  Expected
      <string>: Compliant
  to equal
      <string>: 
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case21_alternative_kubeconfig_test.go:53 @ 07/26/24 15:22:30.096

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10096004705/job/27969196160?pr=279

@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 29, 2024

Minimum K8s version failure (hosted mode):

  [FAILED] Failed after 1.034s.
  The function passed to Consistently failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:153 with:
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2687 @ 07/27/24 12:08:36.212

Latest K8s version failure:

• [FAILED] [1.064 seconds]
Test an objectDefinition with an invalid field [It] Fails when an invalid field is provided
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:32

  Timeline >>
  STEP: Creating the case23-invalid-field policy @ 07/27/24 11:31:49.263
  STEP: Verifying that the case23-invalid-field policy is noncompliant @ 07/27/24 11:31:50.271
  STEP: Verifying events do not continue to be created after the first violation for created objects @ 07/27/24 11:31:50.277
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:76 @ 07/27/24 11:31:50.284
  << Timeline

  [FAILED] Failed after 0.007s.
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:76 @ 07/27/24 11:31:50.284

^^^ At first I was worried my changes had caused it, but sure enough there was another event sent by the controller, so this failure is valid:

2024-07-27T11:31:50.249Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:879	Sending an update policy status event for the object template	{"policy": "case23-invalid-field", "policy": "case23-invalid-field", "index": 0}
2024-07-27T11:31:50.249Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:3118	Sending parent policy compliance event
2024-07-27T11:31:50.258Z	debug	configuration-policy-controller	controllers/configurationpolicy_controller.go:3169	Sending policy status update event
2024-07-27T11:31:50.258Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:3188	Policy status message	{"policy": "case23-invalid-field", "status": "NonCompliant: configmaps [case23] not found in namespace default"}
2024-07-27T11:31:50.258Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:3118	Sending parent policy compliance event
2024-07-27T11:31:50.259Z	debug	events	recorder/recorder.go:104	Policy status is NonCompliant: configmaps [case23] not found in namespace default	{"type": "Warning", "object": {"kind":"ConfigurationPolicy","namespace":"managed","name":"case23-invalid-field","uid":"fa44acbe-1d72-4152-85f2-2423ca8a5b9a","apiVersion":"policy.open-cluster-management.io/v1","resourceVersion":"893"}, "reason": "Policy updated"}
2024-07-27T11:31:50.275Z	debug	configuration-policy-controller	controllers/configurationpolicy_controller.go:3169	Sending policy status update event
2024-07-27T11:31:50.276Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:3188	Policy status message	{"policy": "case23-invalid-field", "status": "NonCompliant: configmaps [case23] in namespace default is missing, and cannot be created, reason: `ConfigMap in version \"v1\" cannot be handled as a ConfigMap: strict decoding error: unknown field \"invalid\"`"}
2024-07-27T11:31:50.276Z	debug	events	recorder/recorder.go:104	Policy status is NonCompliant: configmaps [case23] in namespace default is missing, and cannot be created, reason: `ConfigMap in version "v1" cannot be handled as a ConfigMap: strict decoding error: unknown field "invalid"`	{"type": "Warning", "object": {"kind":"ConfigurationPolicy","namespace":"managed","name":"case23-invalid-field","uid":"fa44acbe-1d72-4152-85f2-2423ca8a5b9a","apiVersion":"policy.open-cluster-management.io/v1","resourceVersion":"894"}, "reason": "Policy updated"}

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10096004705/attempts/9

Includes:
- `wait=false` to skip waiting for finalizers
- `ignore-not-found` to ignore not found errors

Signed-off-by: Dale Haiducek <[email protected]>
Does the compliance check directly, providing
the status message on failure.

Signed-off-by: Dale Haiducek <[email protected]>
@dhaiducek
Copy link
Member Author

dhaiducek commented Jul 30, 2024

• [FAILED] [13.718 seconds]
Testing OperatorPolicy Testing operator policies that specify the same subscription [It] Should not cause an infinite reconcile loop when enforced [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3235

   [FAILED] Failed after 3.230s.
  Expected
      <int>: 626
  to equal
      <int>: 625
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3274 @ 07/30/24 17:16:11.341

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10151580028/job/28116848871?pr=279

@dhaiducek
Copy link
Member Author

/unhold

I'm not sure how useful it is to keep re-running these since the failures seem to be different each time.

@dhaiducek
Copy link
Member Author

Hosted mode failure:

• [FAILED] [70.862 seconds]
Testing OperatorPolicy Test reporting of unapproved version after installation [It] Should report a violation after the versions list is patched to exclude the current version [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3642

  Timeline >>
  STEP: Creating the parent object @ 07/30/24 21:29:33.904
  STEP: Creating the child object with the owner reference @ 07/30/24 21:29:34.02
  STEP: Verifying the child object exists @ 07/30/24 21:29:34.025
  STEP: Patching the versions field to exclude the installed version @ 07/30/24 21:29:34.027
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3647 @ 07/30/24 21:30:34.115
   << Timeline

  [FAILED] Timed out after 60.001s.
  The function passed to Eventually failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:153 with:
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3647 @ 07/30/24 21:30:34.115

ref: https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/10151580028/job/28126239338?pr=279

Copy link

openshift-ci bot commented Aug 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhaiducek, mprahl, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dhaiducek,mprahl,yiraeChristineKim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants