forked from kubernetes-sigs/cluster-api
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for manual KCP and etcd orchestration #25
Closed
g-gaston
wants to merge
36
commits into
abhay-krishna:eks-a-release-1.4
from
g-gaston:improve-external-managed-etcd
Closed
Allow for manual KCP and etcd orchestration #25
g-gaston
wants to merge
36
commits into
abhay-krishna:eks-a-release-1.4
from
g-gaston:improve-external-managed-etcd
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Vignesh Goutham Ganesh <[email protected]> Templating bottlerocket init data formats Signed-off-by: Vignesh Goutham Ganesh <[email protected]> Generating proper vip destnation path for bottlerocket among other cluster config patches Signed-off-by: Vignesh Goutham Ganesh <[email protected]> Fix kube-vip config patching by creating new Files array Adding pause and bottlerocket bootstrap Cluster configuration to kubeadm v1beta1 api Add bottlerocket API changes to v1beta1 types Set bottlerocket bootstrap container and pause container source in user data Add proxy config fields to v1beta1 api Support proxy configuration for bottlerocket Add network settings only when there is a proxy defined Remove userdata logging for bottlerocket
Unstacked etcd: API and config changes Unstacked etcd: Changes in CAPI core controllers This commit adds the following changes in the cluster controller: * A change in reconcileControlPlane to check if the cluster is using managed external etcd, and if etcd is not ready then pause the control plane provisioning. * A new phase in cluster controller's phases for reconciling the etcd cluster. If the etcd cluster is ready then this phase will resume control plane provisioning This commit also adds the following change in the machine controller: * The machine controller upon creation of the first etcd machine will save its IP. Unstacked etcd: Changes in KCP controller The KubeadmControlPlane controller will get the external etcd endpoints from the object referenced by cluster.spec.managedExternalEtcdRef. The validating webhook will allow the external etcd endpoints to be updated. Unstacked etcd: Change in docker infra provider Docker is the only infrastructure provider that performs a kubectl patch on the k8s node corresponding to a Machine. This needs to be skipped for etcd machines since they are not registered as nodes and do not run kubelet. Unstacked etcd: Ignore nodeRef check for etcd machines during clusterctl move Clusterctl before beginning the move checks if all CAPI objects are ready and provisioned. One of these checks is for Machine.Status.NodeRef field. This check needs to be skipped for etcd machines since they are not registered as Kubernetes nodes so they don't have a corresponding Node. Delete managed external etcd cluster last Update KCP controller to use renamed etcd endpoints field The field 'endpoint' on EtcdadmCluster's status has been renamed to 'endpoints'. This commit updates the KCP controller to use the renamed field cr https://code.amazon.com/reviews/CR-54310674 Add external etcd api changes to v1beta1 capi CRDs Fix etcdMachine to cluster reconciler so it listens on Machine events cr: https://code.amazon.com/reviews/CR-56463335 Add external etcd api to v1alpha4 Fix watch on machine object for etcdMachine to cluster mapper While cherry-picking commits from 0.3.19 branch the watch got modified by mistake. This commit fixes it by changing it back to watching Machine objects. Retain update permission on etcdadmcluster for KCP controller KCP controller updates the etcdadmcluster object once KCP upgrade is completed. It needs update permission on etcdadmcluster object for this. We previously had added this permission, it got dropped while rebasing commits on the new 1.0.1 branch. This commit adds back the permission.
Rename controlplane upgrade annotation variable Add controlplane upgrade complete annotation to sync etcd to v1beta1 Check if controlplane is paused before patching it with paused annotation Patch etcd cluster with upgrade complete annotation only after upgrade After KCP upgrade is completed, the controller checks the condition "MachinesSpecUpToDate" exists and marks it to true. The controller also updates the etcd cluster with an annotation to indicate the controlplane upgrade is complete. But it should annotate etcd cluster only if the MachineSpecUpToDate condition is False, since that will happen only immediately after an upgrade. Without checking for false it will keep annotating the etcd cluster on further reconcile calls. cr: https://code.amazon.com/reviews/CR-55949234 Synchronize upgrade flow Etcd machines need to be rolled out first so that KCP can get the correct set of etcd endpoints. KCP rollout should be stalled till etcd upgrade is done. Cluster controller may not be able to add the paused annotation on time once updated spec is applied, since etcd and KCP controllers could have started reconciling at the same time. So KCP checks if etcd cluster has the upgrading annotation, and does not proceed further until the annotation is removed. Once KCP has rolled out updated machines, it will add an annotation on the etcd cluster to indicate that the KCP upgrade is complete. In absence of static etcd endpoints, currently etcd controller retains older etcd members for KCP upgrade, and once KCP upgrade complete annotation is set, the etcd controller will remove these older etcd members cr: https://code.amazon.com/reviews/CR-54933898
…with bottlerocket format cr: https://code.amazon.com/reviews/CR-55711271
Bottlerocket expects no-proxy setting to be a comma-separated list of strings. The proxy template was parsing the input no-proxy list and converting it to a string. This commit changes the template to get the desired format. Current generated value of no-proxy: "[localhost 127.0.0.1]" New generated value for no-proxy: ["localhost", "127.0.0.1"] cr: https://code.amazon.com/reviews/CR-58004615
seperate taints template into its own template add parse taints method for converting taints config to toml add taints to BottlerocketSettingsInput add template parsing to node userdata generation account for multiple value:effect mappings in each taint key
Signed-off-by: Michael Chu <[email protected]>
Once the first etcd member is initialized, the machine controller has to update the secret with the address of the machine, so it can be used by other members to join during cluster creation. The etcdadm-bootstrap-provider changes this address into an etcd client URL before passing it in to the join command. Recent changes in etcdadm, and etcdadm-controller will allow passing in client URLs of all etcd members. So this commit changes the format of stored address for the first machine from an IP address to the etcd client URL. NOTE: This only happens once initially during cluster creation. We need to keep this Secret because after clusterctl move, the etcdCluster's Initialized condition needs to be set based on the existence of this Secret.
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>
Host containers are a feature within BR that allows us to pull images without the need of having to bootstrap kuberentes. Such containers can be superpowered and user-data can be attached to each one of them. As such, this commit creates the `BottlerocketHostContainer` struct to allow the user to customize the those fields. Users can specify an arbitrary number of host containers in the `AdditionalHostContainers` field. This commit also does some refactoring around the templating system BR has to generate the TOML files. It generifies the host-container template to be reused as much as a user wants to. SIM: https://i.amazon.com/P66557529 cr: https://code.amazon.com/reviews/CR-71408825
…pdatable in webhook
…cd machine ready label
Signed-off-by: Ahree Hong <[email protected]>
Signed-off-by: Abhinav <[email protected]>
Signed-off-by: Ahree Hong <[email protected]>
Signed-off-by: Abhinav Pandey <[email protected]>
Signed-off-by: Ahree Hong <[email protected]>
Signed-off-by: Ahree Hong <[email protected]>
EKS-A uses haproxy 2.5 which errors if the maxconn value requires more FDs than allowed by the ulimit setting of docker. 100k maxconn is too high for the default ulimit on an al2 node.
* add support for custom cert bundles br * take in non base-64 encoded data for cert Signed-off-by: Ahree Hong <[email protected]>
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>
This was referenced Aug 17, 2023
abhinavmpandey08
approved these changes
Aug 21, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
jaxesn
force-pushed
the
eks-a-release-1.4
branch
from
August 21, 2023 18:49
06a4bf0
to
5c1c8b1
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These changes allows us to implement this.
Tldr: adds a new annotation to the KCP that when present, stops the cluster controller from pausing/unpausing the KCP when etcdadm cluster is not ready. We make this behavior opt-out to maintain compatibility with previous eks-a controllers/CLI that won't have this new orchestration logic (to be implemented in eks-a).
Some extra improvements:
AddAnnotations
(already contributed upstream)This PR is only for review, it should not be merged. Once approved, I'll create the corresponding patch.