Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Treating VSphereVM wasNotFoundByBIOSUUID as transient error #2136

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

zhanggbj
Copy link
Contributor

@zhanggbj zhanggbj commented Aug 2, 2023

What this PR does / why we need it:
We used to treate WasNotFoundByUUID as terminal error, and set failureMessage and failureReason, so VSphereVM won't be reconciled anymore. However, in some situation, this error is recoverable, for instance, if the storage was temporarily disconnected but later recovered, this error will be transient.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2102

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2023
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 2, 2023
@zhanggbj
Copy link
Contributor Author

zhanggbj commented Aug 2, 2023

Also CC @sbueringer for a code review, thx!

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

One question but as of that this looks great.

/hold

To wait for other opinions 👍

controllers/vspherevm_controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2023
apis/v1beta1/condition_consts.go Outdated Show resolved Hide resolved
controllers/vspherevm_controller.go Outdated Show resolved Hide resolved
controllers/vspherevm_controller.go Outdated Show resolved Hide resolved
@chrischdi
Copy link
Member

chrischdi commented Aug 2, 2023

To summarise what happens on current code state when "not found by biosUUID" gets hit:

  • All 3 objects (Machine, VSphereMachine, VSphereVM) will get set additional conditions:
    • Machine:
      • type: Ready
      • type: InfrastructureReady
    • VSphereMachine & VSphereVM:
      • type: Ready
      • type: VMProvisioned

For all of these six conditions, message, reason and severity is the same:

   message: vm with bios uuid 420a150b-bb27-8db3-5a61-6e3f472a4137aaa not found
   reason: NotFoundByBIOSUUID
   severity: Error

Additional for VSphereVM only: (to be dropped according above comments)

  • .status.ready gets removed (because it gets set to the default false and by that omitted)

Note: tested via kubectl edit vspherevm <name> + changing the .spec.biosUUID` field

Note2: Fixing the issue gets everything back to a clean healthy state.

@zhanggbj
Copy link
Contributor Author

zhanggbj commented Aug 3, 2023

@chrischdi @sbueringer Thanks again for your valuable feedback and quick action:) All comments are addressed and PR is ready for a second review.

@chrischdi
Copy link
Member

/lgtm

sbueringer for approval :-)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fb9d01b7b4457f36cff01e44ef77d4131acd6850

@chrischdi chrischdi mentioned this pull request Aug 3, 2023
@sbueringer
Copy link
Member

/retitle ⚠️ Treating VSphereVM wasNotFoundByBIOSUUID as transient error
(for the behavioral change)

@k8s-ci-robot k8s-ci-robot changed the title ✨ Treating VSphereVM wasNotFoundByBIOSUUID as transient error ⚠️ Treating VSphereVM wasNotFoundByBIOSUUID as transient error Aug 3, 2023
@sbueringer
Copy link
Member

Thank you very much!!

/lgtm
/approve

(let's not backport this one given the change in behavior)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2023
@sbueringer
Copy link
Member

Ups forgot

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2023
@sbueringer
Copy link
Member

Unrelated flake

/retest

@k8s-ci-robot k8s-ci-robot merged commit e7716fb into kubernetes-sigs:main Aug 3, 2023
6 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone Aug 3, 2023
adityabhatia pushed a commit to adityabhatia/cluster-api-provider-vsphere that referenced this pull request Aug 31, 2023
⚠️ Treating VSphereVM wasNotFoundByBIOSUUID as transient error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't set failureMessage and failureReason if VSphereVM was not found by BiosUUID
4 participants