Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid NodeHealthy condition in etcd machines #24

Conversation

g-gaston
Copy link
Collaborator

There is no impact in the execution flow. It's mostly to avoid confusion when looking at the machine status.

Don't merge, this is just for review.

ader1990 and others added 30 commits June 6, 2023 18:04
In certain baremetal environments, where there are multiple connected
and/or disconnected network ports, the network target is reached more
slowly, and the kubeadm.service might fail because it does not have the
proper pre-kubeadm commands correctly done (like a ctr image pull) or it
cannot connect to other k8s nodes.
…ot/cherry-pick-8772-to-release-1.4

[release-1.4] 🐛 ignition: start kubeadm after network.target
…ot/cherry-pick-8805-to-release-1.4

[release-1.4] 📖 Remove use of deprecated clusterctl flag in docs
…ot/cherry-pick-8812-to-release-1.4

[release-1.4] 📖 add in-cluster ipam and helm addon providers
…ot/cherry-pick-8845-to-release-1.4

[release-1.4] 🌱 Add dependency label to github actions update
…ot/cherry-pick-8863-to-release-1.4

[release-1.4] 🌱 KCP: fix noisy error log triggered by missing patch helper
…ot/cherry-pick-8859-to-release-1.4

[release-1.4] 🐛 Always build Kind images for upgrade tests
…ot/cherry-pick-8873-to-release-1.4

[release-1.4] 🐛 Ensure ALWAYS_BUILD_KIND_IMAGES is defaulted correctly in ci script
…nd-image-sha

[release-1.4] 🐛 Use known kindest/node image versions by sha for e2e upgrade tests
Signed-off-by: killianmuldoon <[email protected]>
…ot/cherry-pick-8883-to-release-1.4

[release-1.4] 🌱 Update cert-manager to v1.12.2
(cherry picked from commit 7a4e06c)
…nd-mapper

[release-1.4] 🐛 Add kind mapper
Signed-off-by: killianmuldoon <[email protected]>
…nd-v0.20.0

[release-1.4] 🌱 Update KIND to v0.20.0
…ot/cherry-pick-8914-to-release-1.4

[release-1.4] 🐛 test/capd: fix kind mapper entry for v1.25.11
…ot/cherry-pick-8924-to-release-1.4

[release-1.4] 🐛 Fix deprecation marker for RolloutAfter
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>

Templating bottlerocket init data formats

Signed-off-by: Vignesh Goutham Ganesh <[email protected]>

Generating proper vip destnation path for bottlerocket among other cluster config patches

Signed-off-by: Vignesh Goutham Ganesh <[email protected]>

Fix kube-vip config patching by creating new Files array

Adding pause and bottlerocket bootstrap Cluster configuration to kubeadm v1beta1 api

Add bottlerocket API changes to v1beta1 types

Set bottlerocket bootstrap container and pause container source in user data

Add proxy config fields to v1beta1 api

Support proxy configuration for bottlerocket

Add network settings only when there is a proxy defined

Remove userdata logging for bottlerocket
Unstacked etcd: API and config changes

Unstacked etcd: Changes in CAPI core controllers

This commit adds the following changes in the cluster controller:
* A change in reconcileControlPlane to check if the cluster is using managed
external etcd, and if etcd is not ready then pause the control plane provisioning.
* A new phase in cluster controller's phases for reconciling the etcd cluster. If the
etcd cluster is ready then this phase will resume control plane provisioning

This commit also adds the following change in the machine controller:
* The machine controller upon creation of the first etcd machine will save its IP.

Unstacked etcd: Changes in KCP controller

The KubeadmControlPlane controller will get the external etcd endpoints
from the object referenced by cluster.spec.managedExternalEtcdRef.
The validating webhook will allow the external etcd endpoints to be updated.

Unstacked etcd: Change in docker infra provider

Docker is the only infrastructure provider that performs a kubectl patch on the k8s
node corresponding to a Machine. This needs to be skipped for etcd machines since they
are not registered as nodes and do not run kubelet.

Unstacked etcd: Ignore nodeRef check for etcd machines during clusterctl move

Clusterctl before beginning the move checks if all CAPI objects are
ready and provisioned. One of these checks is for Machine.Status.NodeRef field.
This check needs to be skipped for etcd machines since they are not registered
as Kubernetes nodes so they don't have a corresponding Node.

Delete managed external etcd cluster last

Update KCP controller to use renamed etcd endpoints field

The field 'endpoint' on EtcdadmCluster's status has been renamed
to 'endpoints'. This commit updates the KCP controller to use the
renamed field

cr https://code.amazon.com/reviews/CR-54310674

Add external etcd api changes to v1beta1 capi CRDs

Fix etcdMachine to cluster reconciler so it listens on Machine events

cr: https://code.amazon.com/reviews/CR-56463335

Add external etcd api to v1alpha4

Fix watch on machine object for etcdMachine to cluster mapper

While cherry-picking commits from 0.3.19 branch the watch got modified
by mistake. This commit fixes it by changing it back to watching Machine
objects.

Retain update permission on etcdadmcluster for KCP controller

KCP controller updates the etcdadmcluster object once KCP upgrade is completed.
It needs update permission on etcdadmcluster object for this.
We previously had added this permission, it got dropped while rebasing
commits on the new 1.0.1 branch. This commit adds back the permission.
Rename controlplane upgrade annotation variable

Add controlplane upgrade complete annotation to sync etcd to v1beta1

Check if controlplane is paused before patching it with paused annotation

Patch etcd cluster with upgrade complete annotation only after upgrade

After KCP upgrade is completed, the controller checks the condition "MachinesSpecUpToDate"
exists and marks it to true. The controller also updates the etcd cluster with an
annotation to indicate the controlplane upgrade is complete. But it should annotate
etcd cluster only if the MachineSpecUpToDate condition is False, since that will happen
only immediately after an upgrade. Without checking for false it will keep annotating
the etcd cluster on further reconcile calls.

cr: https://code.amazon.com/reviews/CR-55949234

Synchronize upgrade flow

Etcd machines need to be rolled out first so that KCP can get the
correct set of etcd endpoints. KCP rollout should be stalled till etcd upgrade
is done. Cluster controller may not be able to add the paused annotation on time
once updated spec is applied, since etcd and KCP controllers could have
started reconciling at the same time. So KCP checks if etcd cluster has the
upgrading annotation, and does not proceed further until the annotation is removed.
Once KCP has rolled out updated machines, it will add an annotation on the etcd
cluster to indicate that the KCP upgrade is complete. In absence of static etcd
endpoints, currently etcd controller retains older etcd members for KCP upgrade,
and once KCP upgrade complete annotation is set, the etcd controller will
remove these older etcd members

cr: https://code.amazon.com/reviews/CR-54933898
danbudris and others added 26 commits June 30, 2023 10:45
seperate taints template into its own template

add parse taints method for converting taints config to toml

add taints to BottlerocketSettingsInput

add template parsing to node userdata generation

account for multiple value:effect mappings in each taint key
Once the first etcd member is initialized, the machine controller has
to update the secret with the address of the machine, so it can be used
by other members to join during cluster creation.
The etcdadm-bootstrap-provider changes this address into an etcd client URL
before passing it in to the join command.
Recent changes in etcdadm, and etcdadm-controller will allow passing in
client URLs of all etcd members. So this commit changes the format of stored
address for the first machine from an IP address to the etcd client URL.
NOTE: This only happens once initially during cluster creation. We need to
keep this Secret because after clusterctl move, the etcdCluster's Initialized
condition needs to be set based on the existence of this Secret.
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>
Host containers are a feature within BR that allows us to pull images
without the need of having to bootstrap kuberentes. Such containers can
be superpowered and user-data can be attached to each one of them. As
such, this commit creates the `BottlerocketHostContainer` struct to allow
the user to customize the those fields. Users can specify an arbitrary
number of host containers in the `AdditionalHostContainers` field.

This commit also does some refactoring around the templating system BR
has to generate the TOML files. It generifies the host-container
template to be reused as much as a user wants to.

SIM: https://i.amazon.com/P66557529
cr: https://code.amazon.com/reviews/CR-71408825
EKS-A uses haproxy 2.5 which errors if the maxconn value
requires more FDs than allowed by the ulimit setting of docker.
100k maxconn is too high for the default ulimit on an al2 node.
* add support for custom cert bundles br

* take in non base-64 encoded data for cert

Signed-off-by: Ahree Hong <[email protected]>
Signed-off-by: Vignesh Goutham Ganesh <[email protected]>
…y-mirror-config-1.4

🌱 Add registryMirrorConfigurations to allowedList in KubeadmControlPlane webhook
@g-gaston g-gaston force-pushed the eks-a-release-1.4 branch 2 times, most recently from 5a333e5 to 06a4bf0 Compare July 27, 2023 17:17
@g-gaston g-gaston closed this Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.