Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Perform guest shutdown if VMware tools installed when deleting VM #1982

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

laozc
Copy link
Member

@laozc laozc commented Jul 11, 2023

What this PR does / why we need it:
Currently a hard power off is initiated when the vSphere VM gets deleted.
This will not allow the VM to perform a graceful shutdown for the OS and all the services running in the OS.

This PR addresses the issue by:

  1. Determine whether we should do a guest shutdown when both returns true.
  2. Initiate a Guest Shutdown when possible.
  3. Wait until the power state of the guest machine becomes Powered Off

The guest shutdown may stall for a long while and we'll address the issue in a later PR.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1981

Special notes for your reviewer:
Guest Shutdown API does not have an associated task in the vCenter. Therefore in the PR we use a local task to mimic the wait and update process as like other vCenter Task based operations.

Release note:

Perform guest initiated graceful shutdown when VM get deleted.

The field powerOffMode can be used to decide whether mode is desired.
- hard: power off
- soft: guest initiated graceful shutdown
- trySoft: try guest initiated graceful shutdown first and then power off forcibly if shutdown times out

VMware Tools is required in the image for guest initiated graceful shutdown.
By default powerOffMode is set to hard.

When trySoft mode is used, the user may also set guestSoftPowerOffTimeout to
control how long to wait for the guest shutdown before powering off the VM forcibly.
This setting defaults to 5 minutes when omitted.

Refer to the CRD spec documentation for more details.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 11, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @laozc!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-vsphere 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-vsphere has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 11, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @laozc. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 11, 2023
@randomvariable
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 11, 2023
@randomvariable
Copy link
Member

Should we also introduce a timeout in case the guest shutdown stalls?

@laozc
Copy link
Member Author

laozc commented Jul 11, 2023

Should we also introduce a timeout in case the guest shutdown stalls?

Definitely.
I'm thinking of forcibly shutting the VM down with PowerOff task when the guest shutdown stalls for over 15 minutes.
Still working on it.

@laozc laozc changed the title ⚠️ [WIP] Perform guest shutdown if possible when destroy VM ⚠️ [WIP] Perform guest shutdown if possible when deleting VM Jul 11, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 13, 2023
@laozc laozc changed the title ⚠️ [WIP] Perform guest shutdown if possible when deleting VM ⚠️ Perform guest shutdown if possible when deleting VM Jul 13, 2023
@laozc laozc changed the title ⚠️ Perform guest shutdown if possible when deleting VM ⚠️ [WIP] Perform guest shutdown if possible when deleting VM Jul 13, 2023
@laozc laozc changed the title ⚠️ [WIP] Perform guest shutdown if possible when deleting VM ⚠️ Perform guest shutdown if VMware tools installed when deleting VM Jul 14, 2023
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 18, 2023
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 18, 2023
@randomvariable
Copy link
Member

Perform guest initiated graceful shutdown when VM get deleted

Can we have a more detailed release note highlighting the option?

@sbueringer
Copy link
Member

As far as I can tell the last remaing point is this one: #1982 (comment)

Otherwise lgtm from my side

@randomvariable
Copy link
Member

This looks good now. I'll let someone else do another pass.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2023
@sbueringer
Copy link
Member

This looks good now. I'll let someone else do another pass.

/approve

This looks good now. I'll let someone else do another pass.

/approve

I think using the api/ version was the last one. But we can follow-up for that.

@laozc Can you please squash the commits? Then we would merge

/lgtm
/hold
for squash

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 31, 2023
@randomvariable
Copy link
Member

Can you please squash the commits?

Will the tide/merge-method-squash not label not work?

@randomvariable
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2023
@laozc
Copy link
Member Author

laozc commented Aug 1, 2023

Sure. PR was squashed and rebased.

@k8s-ci-robot
Copy link
Contributor

@laozc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-vsphere-apidiff-main dc62f79 link false /test pull-cluster-api-provider-vsphere-apidiff-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@randomvariable
Copy link
Member

/unhold

Thanks very much for persisting amongst all the reviews.

/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2023
@sbueringer
Copy link
Member

/remove-label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Aug 1, 2023
@sbueringer
Copy link
Member

Absolutely agree. Thank you very much! nice work :)

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 962d23e7f339fa0744ee268bf874876fd5872e2c

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: randomvariable, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [randomvariable,sbueringer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit c6341d3 into kubernetes-sigs:main Aug 1, 2023
4 checks passed
@laozc laozc deleted the guest-shutdown branch August 1, 2023 06:06
@laozc
Copy link
Member Author

laozc commented Aug 1, 2023

Thank you for your reviews :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perform Guest Shutdown instead of Power Off to allow graceful shutdown
7 participants