Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable machine replacement #10946

Open
Meecr0b opened this issue Jul 26, 2024 · 5 comments
Open

Configurable machine replacement #10946

Meecr0b opened this issue Jul 26, 2024 · 5 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@Meecr0b
Copy link

Meecr0b commented Jul 26, 2024

What would you like to be added (User Story)?

As a operator i would like to be able to configure a time after machines are getting replaced automatically for testing and security reasons.

Detailed Description

Problem Statement:

Regularly replacing machines help in testing application behavior during rolling updates and ensures machines are refreshed periodically, especially important after security incidents.

Proposed Solution:

Implement rolloutBefore.machineExpiry{Minutes,Hours,Days} parameter within the Cluster API (like rolloutBefore.certificatesExpiryDays implemented for KCP), allowing users to specify the maximum time a machine should exist before being automatically replaced.

Benefits:

  • Testing Rolling Updates: Simplifies the process of regularly testing how applications behave during rolling updates.
  • Security and Compliance: Ensures machines are periodically replaced, reducing the risk of lingering vulnerabilities and ensuring machines are clean post-security incidents.
  • Operational Efficiency: Automates a routine maintenance task, reducing manual workload and the potential for human error.

Impact:

  • This feature would be highly valuable for IT operations teams managing Kubernetes clusters, particularly those with strict compliance and security requirements.
  • It enhances cluster maintenance workflows, contributing to overall system reliability and security.

Anything else you would like to add?

Current workarounds:

  • setting spec.rolloutAfter periodically via CronJob for MachineDeployment
  • running clusterctl alpha rollout restart machinedeployment/my-md-0 periodically

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2024
@sbueringer
Copy link
Member

/triage accepted

/cc @fabriziopandini @chrischdi

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2024
@fabriziopandini
Copy link
Member

q: is this about replacing nodes (the node at Kubernetes level) or the entire machine where the node is hosted?

@Meecr0b
Copy link
Author

Meecr0b commented Jul 30, 2024

Hi @fabriziopandini it's about machines, i'll update the issue.

@fabriziopandini fabriziopandini added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 31, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Jul 31, 2024
@fabriziopandini
Copy link
Member

ACK, thanks for the clarification
We need to think a bit about API modeling, but this is a nice feature to have
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

ACK, thanks for the clarification
We need to think a bit about API modeling, but this is a nice feature to have
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 31, 2024
@fabriziopandini fabriziopandini added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants