Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1128: Removed multiple DPA get calls for Reconcilers #1316

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stillalearner
Copy link
Contributor

Fixed #1128

controllers/bsl.go Outdated Show resolved Hide resolved
kaovilai
kaovilai previously approved these changes Feb 7, 2024
Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@kaovilai
Copy link
Member

kaovilai commented Feb 7, 2024

/retest

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2024
@stillalearner
Copy link
Contributor Author

/retest

3 similar comments
@stillalearner
Copy link
Contributor Author

/retest

@stillalearner
Copy link
Contributor Author

/retest

@stillalearner
Copy link
Contributor Author

/retest

@weshayutin
Copy link
Contributor

@sseago @shubham-pampattiwar please review :)

@shubham-pampattiwar
Copy link
Member

shubham-pampattiwar commented Feb 12, 2024

I am unsure about the changes here, I think the rationale behind fetching the DPA explicitly in every reconcile function was to get the latest DPA available in the cluster.

@stillalearner
Copy link
Contributor Author

I am unsure about the changes here, I think the rationale behind fetching the DPA explicitly in every reconcile function was to get the latest DPA available in the cluster.

But we are not updating the DPA after validation right? so we can use the same DPA fetching it once?

@kaovilai
Copy link
Member

kaovilai commented Feb 12, 2024

I am unsure about the changes here, I think the rationale behind fetching the DPA explicitly in every reconcile function was to get the latest DPA available in the cluster.

Each reconcile you see is called right after the other in the same loop. Current implementation prior to this PR mean we're doing like 10 or so get calls within a span of 1-2 seconds, more likely to have issues with "outdated-copy" during update calls that rely on multiple gets so close to each other. Also the client backoff throttling.

I think it's best to get once, update once in entire batch.

Perhaps add to the end func ReconcileUpdateDPA() to do the update/patch call once.

This would help in cu environments with busy api server.

@kaovilai
Copy link
Member

	if err := r.Get(ctx, req.NamespacedName, &dpa); err != nil {
		log.Error(err, "unable to fetch DataProtectionApplication CR")
		return result, nil
	}
+	//--deep copy here

	// set client to pkg/client for use in non-reconcile functions
	oadpClient.SetClient(r.Client)

	_, err := ReconcileBatch(r.Log,
		r.ValidateDataProtectionCR,
		r.ReconcileFsRestoreHelperConfig,
...
	)

	if err != nil {
		apimeta.SetStatusCondition(&dpa.Status.Conditions,
			metav1.Condition{
				Type:    oadpv1alpha1.ConditionReconciled,
				Status:  metav1.ConditionFalse,
				Reason:  oadpv1alpha1.ReconciledReasonError,
				Message: err.Error(),
			},
		)

	} else {
		apimeta.SetStatusCondition(&dpa.Status.Conditions,
			metav1.Condition{
				Type:    oadpv1alpha1.ConditionReconciled,
				Status:  metav1.ConditionTrue,
				Reason:  oadpv1alpha1.ReconciledReasonComplete,
				Message: oadpv1alpha1.ReconcileCompleteMessage,
			},
		)
	}
+	// patch spec updates
	statusErr := r.Client.Status().Update(ctx, &dpa)
	if err == nil { // Don't mask previous error
		err = statusErr
	}

@stillalearner
Copy link
Contributor Author

@kaovilai why do we need a deep copy here? Do you mean to create a copy of DPA (lets say copiedDpa) and pass that copiedDpa in all the reconcile functions, and then based on that copiedDpa, update the dpa using Patch() ?

@kaovilai
Copy link
Member

kaovilai commented Feb 13, 2024

I meant deepcopy to have orignal in the function, then pass to copy to funcs, that way you have orignalFromClusterDPA, and updatedFromReconcilerDpa

Example: r.Client.Patch(ctx, newDeploy, client.MergeFrom(original))

@kaovilai
Copy link
Member

Reconcile shouldn't update dpa.. so ignore my comment about patching.

@stillalearner
Copy link
Contributor Author

@shubham-pampattiwar review please.

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 14, 2024
@kaovilai
Copy link
Member

/retest

@@ -134,7 +131,7 @@ func (r *DPAReconciler) ReconcileBackupStorageLocations(log logr.Logger) (bool,

// TODO: check for BSL status condition errors and respond here
if bslSpec.Velero != nil {
err := r.updateBSLFromSpec(&bsl, &dpa, *bslSpec.Velero)
err := r.updateBSLFromSpec(&bsl, dpa, *bslSpec.Velero)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all DPAReconciler methods, I would remove dpa as a parameter. the functions can access it through r.dpa, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code example of my suggestion

-func (r *DPAReconciler) updateBSLFromSpec(bsl *velerov1.BackupStorageLocation, dpa *oadpv1alpha1.DataProtectionApplication, bslSpec velerov1.BackupStorageLocationSpec) error {
+func (r *DPAReconciler) updateBSLFromSpec(bsl *velerov1.BackupStorageLocation, bslSpec velerov1.BackupStorageLocationSpec) error {
	// Set controller reference to Velero controller
-	err := controllerutil.SetControllerReference(dpa, bsl, r.Scheme)
+	err := controllerutil.SetControllerReference(r.dpa, bsl, r.Scheme)
	if err != nil {
		return err
	}
	...

	return nil
}

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2024
Copy link

openshift-ci bot commented Apr 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stillalearner
Once this PR has been reviewed and has the lgtm label, please assign mateusoliveira43 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test failing

/go/src/github.com/openshift/oadp-operator/bin/golangci-lint run
controllers/dpa_controller.go:79:2: import-shadowing: The name 'log' shadows an import name (revive)
	log := r.Log.WithValues("dpa", req.NamespacedName)
	^

Copy link

openshift-ci bot commented Jun 19, 2024

@stillalearner: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.13-e2e-test-azure a595dc2 link true /test 4.13-e2e-test-azure
ci/prow/4.12-e2e-test-aws a595dc2 link true /test 4.12-e2e-test-aws
ci/prow/4.13-e2e-test-aws a595dc2 link true /test 4.13-e2e-test-aws
ci/prow/4.14-e2e-test-azure a595dc2 link true /test 4.14-e2e-test-azure
ci/prow/unit-test a595dc2 link true /test unit-test
ci/prow/4.12-e2e-test-azure a595dc2 link true /test 4.12-e2e-test-azure
ci/prow/4.15-images a595dc2 link true /test 4.15-images
ci/prow/4.16-ci-index a595dc2 link true /test 4.16-ci-index
ci/prow/4.16-images a595dc2 link true /test 4.16-images
ci/prow/4.15-ci-index a595dc2 link true /test 4.15-ci-index
ci/prow/4.15-e2e-test-aws a595dc2 link true /test 4.15-e2e-test-aws
ci/prow/4.16-e2e-test-kubevirt-aws a595dc2 link true /test 4.16-e2e-test-kubevirt-aws
ci/prow/4.15-e2e-test-kubevirt-aws a595dc2 link true /test 4.15-e2e-test-kubevirt-aws
ci/prow/4.16-e2e-test-aws a595dc2 link true /test 4.16-e2e-test-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mateusoliveira43
Copy link
Contributor

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Right now each reconcile functions call get for DPA, we can do it once and pass it to each function
7 participants