Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider if to add more on KCP checks on machine status #10949

Open
fabriziopandini opened this issue Jul 26, 2024 · 3 comments
Open

Consider if to add more on KCP checks on machine status #10949

fabriziopandini opened this issue Jul 26, 2024 · 3 comments
Labels
area/provider/control-plane-kubeadm Issues or PRs related to KCP help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@fabriziopandini
Copy link
Member

What would you like to be added (User Story)?

As an operator, I need as much as possible info about the status of my control plane machines

Detailed Description

KCP implements checks for components status on controlled machines: API Server, Scheduler, Controller Manager, etcd

We should consider if to add more to the list of components, e.g. kube-proxy, kubelet (node).
This could provide an interesting signal to catch problems that could happen during upgrades when e.g. the control plane comes up because it is implemented via static pod, but other "regular" pods scheduled on the control plane node might not come up.

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
/area provider/control-plane-kubeadm

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/provider/control-plane-kubeadm Issues or PRs related to KCP needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2024
@sbueringer
Copy link
Member

/triage accepted

I think this is important. When we discovered this issue: #10947 KCP was updating through the entire control plane and during the control plane upgrade only static pods came up (i.e. kube-proxy and CNI didn't). I.e. not even the Nodes became ready.

I think this is pretty dangerous behavior, worst case it makes the entire control plane unavailable. So it would be great to have some more checks to safeguard against this.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 29, 2024
@sbueringer sbueringer added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 29, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Jul 29, 2024
@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 31, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 31, 2024
@sbueringer
Copy link
Member

/help

This requires a little bit of research to figure out what kind of checks can be done (and we have to be careful that we don't break existing working upgrade flows).

@k8s-ci-robot
Copy link
Contributor

@sbueringer:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

This requires a little bit of research to figure out what kind of checks can be done (and we have to be careful that we don't break existing working upgrade flows).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/control-plane-kubeadm Issues or PRs related to KCP help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants