Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for InstanceTerminationAction for instance preemption. #1315

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jwmay2012
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
Allows specifying for instances to be deleted instead of just stopped when the instance is preempted.

Special notes for your reviewer:
https://cloud.google.com/compute/docs/instances/create-use-spot#rest_1
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#instance_termination_action

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add support for InstanceTerminationAction for instance preemption. `GCPMachine.Spec.InstanceTerminationAction = "Delete"`

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jwmay2012
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 19, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @jwmay2012. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 19, 2024
Copy link

netlify bot commented Sep 19, 2024

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit 8717e90
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-gcp/deploys/66f2e92407e12a0008f6b8a4
😎 Deploy Preview https://deploy-preview-1315--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@salasberryfin
Copy link
Contributor

/ok-to-test

Thanks for working on this @jwmay2012

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2024
@jwmay2012
Copy link
Contributor Author

I tested this code and it works. It's only supposed to be used with Spot VMs. Not STANDARD or the older preemptible VMs.
Normal behavior is the VM gets stopped, and the Node registers as UNKNOWN or shutdown/failed.

The result with InstanceTerminationAction="Delete" is the Node gets deleted from the workload cluster along with the instance, but the Machine from the management cluster is still present and revives a Zombie instance. If that Spot instance was deleted via the console, it can come back online running. If that Spot instance was deleted via a preemption, it comes back Stopped. In both cases, the previous Node never returns and you end up with a Machine without a Node.

🤷🏻‍♂️

This isn't as helpful as I was hoping it would be. I was hoping this would allow Spot machines to get automatically re-created when a VM is preempted. But no.

It might be worth it to have this option exposed, but I don't know if I can consider it useful given how CAPI reacts (or doesn't react) to it.

Up to you :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants