Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s control plane high-availability mode #940

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

cvetkovic
Copy link
Contributor

@cvetkovic cvetkovic commented Feb 19, 2024

NOTE: Only merge this PR in conjunction with vhive-serverless/invitro#379.

Modified setup scripts such that Kubernetes control plane can be deployed in high-availability mode, i.e. mode where etcd, kube-scheduler, kube-controller-manager, and etcd are replicated for fault tolerance purposes. The load balancer sits in front of the API Server replicas and is reachable through the virtual IP address - 10.0.1.254. The setup favors Cloudlab and assumes that all nodes of the cluster are part of 10.0.1.0/24.

I tested the setup on a 5-node xl170 Cloudlab cluster with 1 and 3 control plane replicas. The cluster deploys without any issues and seems to be working fine.

@ustiugov, @leokondrashov: Can you once more check to make sure everything is fine here? I see a lot of changes were made in the setup scripts that I might not be exactly aware of.

The setup visually looks as in the following image.

image

@leokondrashov
Copy link
Contributor

NOTE: Only merge this PR in conjunction with vhive-serverless/invitro#379.

I think we need to introduce versioning of vHive in Invitro repo so that we won't need this note. Because it is not only about this PR but also the ongoing knative and k8s upgrade in vHive.

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good. I left some questions below about the changes, most are just to confirm that it works as expected (I didn't get deep into the HA setting documentation). The most important part is to document these settings in the qiuckstart or place it in separate .md file with links from quickstart. Also, add line into changelog about it; it seems important enough.

configs/setup/kube.json Show resolved Hide resolved
@@ -71,43 +73,22 @@ func CreateMultinodeCluster(stockContainerd string) error {
return nil
}

// Create kubelet service on master node
func CreateMasterKubeletService() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need need it? Does it automatically attach to unix:///run/containerd/containerd.sock? That's the most important part here. But also, we are loosing the timeout and verbosity parameters. Do we need them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because kubeadm starts kubelet process automatically, hence there is no need to start kubelet manually. Regarding the containerd socket, this is now provided as argument to kubeadm init command.

image

@leokondrashov: What is the socket path for Firecracker? Should we make this be configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't create the kubelet, but rather sets it up with parameters. The socket on the master node doesn't need to be configurable since we use Firecracker only for user containers. But the verbosity and runtime-request-timeout set inside might be used for something.

Copy link
Contributor Author

@cvetkovic cvetkovic Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. You are right about not starting the kubelet process.

For the runtime-request-timeout parameter, it seems to me we can safely remove this (see DOCS). This should mean that each request from the control plane to the kubelet should be executed within 2 minutes (default), rather than 15 minutes.

The container-runtime has been deprecated since v1.24 according to this.

scripts/cluster/create_multinode_cluster.go Show resolved Hide resolved
scripts/cluster/create_multinode_cluster.go Outdated Show resolved Hide resolved
scripts/setup.go Show resolved Hide resolved
configs/k8s_ha/haproxy.cfg Show resolved Hide resolved
@@ -1,87 +1,116 @@
# vHive Setup Scripts

- [vHive Setup Scripts](#vhive-setup-scripts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this syllabus completely, since github has automatic one.


For fault tolerance purposes, Kubernetes control plane components can be replicated. This can be done by combining
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still better have a description of what and how options are used (specifically, REGULAR, which appears throughout the docs).

Signed-off-by: Lazar Cvetković <[email protected]>
Signed-off-by: Lazar Cvetković <[email protected]>
Signed-off-by: Lazar Cvetković <[email protected]>
Signed-off-by: Lazar Cvetković <[email protected]>
Signed-off-by: Lazar Cvetković <[email protected]>
Signed-off-by: Lazar Cvetković <[email protected]>
@leokondrashov
Copy link
Contributor

@cvetkovic Hi, what's the status of this PR? My point about the documentation still stands. We should have instructions on how to use the tools in this repo, not just references to general k8s help and a way to use it from Invitro. For example, what would be the steps to setup the HA control plane without all the other things we have in Invitro?

// Original Bash Scripts: scripts/cluster/create_multinode_cluster.sh

if err := CreateMasterKubeletService(); err != nil {
Copy link
Contributor

@leokondrashov leokondrashov Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command fixes #967 for the first master node. Without it, we still use the control IP on the master node if we use CloudLab, which can lead to the suspension of the account if traffic is big enough.

All other master nodes use CreateWorkerKubeletService, which has the fix for this problem. But it has a different one: in the Firecracker setup, we use our CRI socket, which we don't need to use on controller nodes.

Please consider running CreateMasterKubeletService for all master nodes.

UPD: running for each node is up to invitro setup scripts, not vhive's, but we need to make it possible from here. And there is not much difference between the master and worker commands. I think we can combine them into a single command setup_kubelet with a sandbox type as an argument. When setting up the node, we always choose containerd for all master and backup nodes and change it only for regular nodes.

--apiserver-advertise-address=%s \
--cri-socket unix:///run/containerd/containerd.sock \

command := fmt.Sprintf(`sudo kubeadm init --v=%d \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add --apiserver-advertise-address here and for kubeadm join of control plane nodes. Right now, all the IPs that loadbalancer uses are control network IPs, which is bad (See comment about kubelet creation).

Comment on lines 150 to 160
shellOut, err := utils.ExecShellCmd("sed -n '/.*kubeadm join.*/p' < %s/masterNodeInfo | sed -n 's/.*join \\(.*\\):\\(\\S*\\) --token \\(\\S*\\).*/\\1 \\2 \\3/p'", configs.System.TmpDir)
if !utils.CheckErrorWithMsg(err, "Failed to extract master node information from logs!\n") {
if !utils.CheckErrorWithMsg(err, "Failed to extract API Server address, port, and token from logs!\n") {
return err
}
splittedOut := strings.Split(shellOut, " ")
configs.Kube.ApiserverAdvertiseAddress = splittedOut[0]
configs.Kube.ApiserverPort = splittedOut[1]
configs.Kube.ApiserverToken = splittedOut[2]

// API Server discovery token
shellOut, err = utils.ExecShellCmd("sed -n '/.*sha256:.*/p' < %s/masterNodeInfo | sed -n 's/.*\\(sha256:\\S*\\).*/\\1/p'", configs.System.TmpDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands parsing tmp/masterNodeInfo return double lines of same info because in the output of kubeadm init there are two lines with kubeadm join (for control plane and workers). Because of that, masterKey.yaml is malformed.

@lrq619
Copy link
Collaborator

lrq619 commented Jul 2, 2024

Hello, what is the status of this PR? Can we merge it now? @leokondrashov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants