Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Metal3DataTemplate: requeue if reconcileDelete did not clear finalizer #1997

Closed
wants to merge 1 commit into from

Conversation

tmmorin
Copy link
Contributor

@tmmorin tmmorin commented Sep 23, 2024

What this PR does / why we need it:

This MR is a proposal to address #1994.

The reconcileDelete implementation of the Metal3DataTemplate controller needs to requeue if it decided that it can't yet unset the finalizer on the Metal3DataTemplate. If no requeue is made, there's a scenario where Metal3Data still exists when reconcileDelete is first called, which results in not unsetting the finalize - in that situation, if no change of the Metal3DataTemplate triggers a new reconciliation, the finalizer never gets unset.

To finalize this PR I had to adjust the expectations of the unit test (expect requeue where this initially wasn't expected).

This made we realize that there is another situation where we want to requeue: the case where dataTemplateMgr.UpdateDatas returns an error. The code was relying on checkReconcileError, which (a) can only return requeue results on baremetal.ReconcileError errors (not on other errors), and (b) even for those would not return a requeue for "Terminal" errors, which I is I think only a valid choice for when this function is called for reconcileNormal ; for deletes, it seems that requeues should always be done to avoid the resource remaining stuck on a finalizer.

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sunnatillo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot
Copy link
Contributor

Hi @tmmorin. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 23, 2024
@tuminoid
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2024
@tuminoid
Copy link
Member

/test metal3-centos-e2e-integration-test-main metal3-ubuntu-e2e-integration-test-main

@tuminoid
Copy link
Member

Fix the DCO by signing off the commit, and unit tests are failing.

@tmmorin tmmorin force-pushed the m3dt-delete-requeue branch 5 times, most recently from 5ada41d to f953fc7 Compare September 25, 2024 08:44
@tmmorin
Copy link
Contributor Author

tmmorin commented Sep 25, 2024

@tuminoid I updated the unit tests, and also the code to also do requeues on failure when calling dataTemplateMgr.UpdateDatas - I updated the PR description accordingly.

@tuminoid
Copy link
Member

/test metal3-centos-e2e-integration-test-main metal3-ubuntu-e2e-integration-test-main

@tmmorin
Copy link
Contributor Author

tmmorin commented Sep 26, 2024

I had a interesting discussion by @pierrecregut on #1994, leading to the conclusion that the cleanest solution would be to fix the miscomputation of indexes. Indeed, requeuing here is somehow sweeping dust under the rug, even if it reaches the goal of ultimately avoiding finalizer remaining stuck.

We'll work with Pierre to push an alternative solution fixing the miscomputation of indexes in UpdateDatas.

Feedback is welcome on whether it's worth keeping my requeue proposal, as a kind of safeguard despite the "sweeping dust under the rug" aspect.

@pierrecregut
Copy link
Contributor

The alternative implementation is here: #2000

@tmmorin
Copy link
Contributor Author

tmmorin commented Sep 26, 2024

I find the solution proposed by @pierrecregut in #2000 much nicer, and if reviewers' feedback is positive on #2000, I'll close this PR here.

@tmmorin
Copy link
Contributor Author

tmmorin commented Oct 3, 2024

closing this PR, #2000 is the right solution

@tmmorin tmmorin closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants