Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: graduate WaitForAllControlPlaneComponents to Beta #129620

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented Jan 14, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

  • Set the feature gate to Beta and enabled by default.
  • Make sure that the source of truth for which address/port to use for a component health check comes from the respective component static Pod manifest. That is done to comply with any user --patches that are applied on top
    of the ClusterConfiguration.

Which issue(s) this PR fixes:

xref

Special notes for your reviewer:

first commit fixes circular dependency problems. check the commit message.

Does this PR introduce a user-facing change?

kubeadm: graduated the WaitForAllControlPlaneComponents feature gate to Beta. When checking the health status of a control plane component, make sure that the address and port defined as arguments in the respective component's static Pod manifest are used.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 14, 2025
@neolit123
Copy link
Member Author

/triage accepted
/priority backlog

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 14, 2025
@neolit123
Copy link
Member Author

/hold for review

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jan 14, 2025
@neolit123
Copy link
Member Author

i believe we do have users that use the scheduler --config option and have no flags in the scheduler manifest.
like so:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#scheduler-flags

for those users this new logic will use the default address/port.

as far as i can tell there is no way to set the --secrure-port and --bind-address as fields in the KubeSchedulerConfiguration?
https://kubernetes.io/docs/reference/config-api/kube-scheduler-config.v1
if that is somehow possible then the logic in this PR will not work and the user must also pass these flags to the scheduler.

@neolit123
Copy link
Member Author

neolit123 commented Jan 14, 2025

there are some really messy circular dependencies going on. will try to debug more tomorrow or later this week..
edit - or maybe it gets fixed by this new commit.

@neolit123 neolit123 force-pushed the 1.33-update-all-cp-components-check branch from 1a1451a to 3a74d70 Compare January 14, 2025 19:52
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 14, 2025
@neolit123 neolit123 changed the title kubeadm: graduate WaitForAllControlPlaneComponents to Beta WIP: kubeadm: graduate WaitForAllControlPlaneComponents to Beta Jan 14, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2025
@neolit123 neolit123 force-pushed the 1.33-update-all-cp-components-check branch from 3a74d70 to d70fa87 Compare January 15, 2025 12:56
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 15, 2025
@neolit123
Copy link
Member Author

neolit123 commented Jan 15, 2025

there are some really messy circular dependencies going on. will try to debug more tomorrow or later this week.. edit - or maybe it gets fixed by this new commit.

reworked the logic to something much cleaner.

@neolit123 neolit123 changed the title WIP: kubeadm: graduate WaitForAllControlPlaneComponents to Beta kubeadm: graduate WaitForAllControlPlaneComponents to Beta Jan 15, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 15, 2025
@neolit123 neolit123 force-pushed the 1.33-update-all-cp-components-check branch from d70fa87 to de6fa77 Compare January 15, 2025 12:58
@neolit123
Copy link
Member Author

neolit123 commented Jan 15, 2025

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hashim21223445, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123 neolit123 force-pushed the 1.33-update-all-cp-components-check branch from de6fa77 to 41a4b59 Compare January 15, 2025 18:18
- Set the feature gate to Beta and enabled by default.
- Make sure that the source of truth for which address/port
to use for a component health check comes from the respective
component static Pod manifest. That is done to comply
with any user --patches that are applied on top
of the ClusterConfiguration.
@neolit123 neolit123 force-pushed the 1.33-update-all-cp-components-check branch from 41a4b59 to 8dfaec6 Compare January 15, 2025 18:21
@pacoxu
Copy link
Member

pacoxu commented Jan 16, 2025

/cc @HirazawaUi


// getControlPlaneComponents reads the static Pods of control plane components
// and returns a slice of 'controlPlaneComponent'.
func getControlPlaneComponents(podMap map[string]*v1.Pod, addressAPIServer string) ([]controlPlaneComponent, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that some duplicate code can be omitted.

func getControlPlaneComponents(podMap map[string]*v1.Pod, addressAPIServer string) ([]controlPlaneComponent, error) {
	var (
		// By default kubeadm deploys the kube-controller-manager and kube-scheduler
		// with --bind-address=127.0.0.1. This should match get{Scheduler|ControllerManager}Command().
		addressKCM       = "127.0.0.1"
		addressScheduler = "127.0.0.1"

		portAPIServer = fmt.Sprintf("%d", constants.KubeAPIServerPort)
		portKCM       = fmt.Sprintf("%d", constants.KubeControllerManagerPort)
		portScheduler = fmt.Sprintf("%d", constants.KubeSchedulerPort)

		errs []error
		result []controlPlaneComponent
	)

	type componentConfig struct {
		name        string
		podKey      string
		args        []string
		defaultAddr string
		defaultPort string
		endpoint    string
	}

	components := []componentConfig{
		{
			name:        "kube-apiserver",
			podKey:      constants.KubeAPIServer,
			args:        []string{argAdvertiseAddress, argPort},
			defaultAddr: addressAPIServer,
			defaultPort: portAPIServer,
			endpoint:    endpointLivez,
		},
		{
			name:        "kube-controller-manager",
			podKey:      constants.KubeControllerManager,
			args:        []string{argBindAddress, argPort},
			defaultAddr: addressKCM,
			defaultPort: portKCM,
			endpoint:    endpointHealthz,
		},
		{
			name:        "kube-scheduler",
			podKey:      constants.KubeScheduler,
			args:        []string{argBindAddress, argPort},
			defaultAddr: addressScheduler,
			defaultPort: portScheduler,
			endpoint:    endpointLivez,
		},
	}

	for _, component := range components {
		address, port := component.defaultAddr, component.defaultPort

		values, err := getControlPlaneComponentAddressAndPort(
			podMap[component.podKey],
			component.podKey,
			component.args,
		)
		if err != nil {
			errs = append(errs, err)
		}

		if len(values[0]) != 0 {
			address = values[0]
		}
		if len(values[1]) != 0 {
			port = values[1]
		}

		result = append(result, controlPlaneComponent{
			name: component.name,
			url:  fmt.Sprintf("https://%s/%s", net.JoinHostPort(address, port), component.endpoint),
		})
	}

	if len(errs) > 0 {
		return nil, utilerrors.NewAggregate(errs)
	}
	return result, nil
}

Copy link
Member Author

@neolit123 neolit123 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this, i will update to use it like that.

if err := waiter.WaitForControlPlaneComponents(&initCfg.ClusterConfiguration,
pods, err := staticpodutil.ReadMultipleStaticPodsFromDisk(data.ManifestDir(),
constants.ControlPlaneComponents...)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we make kubeadm join provide the same error output as kubeadm init here? And make handleError a utility function.
ref:

handleError := func(err error) error {
context := struct {
Error string
Socket string
}{
Error: fmt.Sprintf("%v", err),
Socket: data.Cfg().NodeRegistration.CRISocket,
}
kubeletFailTempl.Execute(data.OutputWriter(), context)
return errors.New("could not initialize a Kubernetes cluster")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's not a bad idea, but probably should be done in a separate PR after this one merges. you can log an issue in k/kubedam so that we don't forget. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants