Let's chat about pain points about Kubernetes upgrades with Cluster API #8614
Replies: 3 comments 5 replies
-
I am writing on behalf of the Kubernetes platform team from Mercedes Benz.
Usually we try to have the most recent Kubernetes version minus one. Currently we're upgrading to Kubernetes v1.26. This means that we try to upgrade our Kubernetes clusters three times a year (nearly 1000 clusters in 3 weeks 😅 ).
We are using CAPI to upgrade, yes. For sure we have a custom pipeline to deploy the CAPI resources.
Currently not. We're planning to use ClusterClass, but only if it's possible to migrate existing clusters into ClusterClass.
We are already able to kind of skip a Kubernetes version.
Tbh, we don't have any pain points with upgrading our clusters, except for the CC migration problem. Btw, I just published a repository with a proof of concept how we have migrated our legacy Kubernetes clusters into Cluster API 🙂 Tobias Giese [email protected], Mercedes-Benz Tech Innovation GmbH, legal info/Impressum |
Beta Was this translation helpful? Give feedback.
-
At Porter, we manage more than a hundred K8s clusters - most of them are EKS, with about 10% comprising GKE and AKS clusters. Much of what I'm writing below pertains to how we manage cluster upgrades without CAPI at the moment, and how we see CAPI making our upgrade process better. This also assumes we're able to migrate all our managed clusters(EKS/GKE/AKS) into CAPA/CAPG/CAPZ over the next couple months.
This has changed over time - until December 2022, we were only upgrading clusters when AWS would announce EOS for a particular version. January 2023 is when we elected to ensure all clusters are as close to the latest supported version as possible, by leapfrogging over versions. We're now looking at settling into a six-month upgrade cycle, where each cycle involves upgrading all clusters by atleast 2 versions, to the latest supported version.
We don't use CAPI for upgrades just yet, but we hope to change that over the next couple months. At the moment, upgrades are carried out by writing custom Golang code which is responsible for upgrading EKS control planes over multiple versions, as well as updating launch templates for self-managed nodegroups and then running instance refresh ops. While all our clusters were creating using Terraform, the drift between TF state and the actual infrastructure is too large for us to risk Terraform-based upgrades. I've personally also experimented with running upgrades via Crossplane - the experiment shows promise, although the ability to run reliable instance refresh ops is not possible with Crossplane.
If we're able to bring in existing EKS clusters into CAPA, then based on the docs here, we imagine the process would look a lot like this:
The actual steps will probably be automated - script chart/template/API object version upgrades ,then call the CAPI manager cluster to update the version in the
Not just yet, but I like the concept of a
Since we primarily operate on top of managed K8s offerings, the idea of a LTS release doesn't truly exist for us since cloud providers tend to treat each version the same. What does make a lot more sense for us - staying ahead by upgrading to the latest versions available. That's currently possible with our in-house upgrade process, and it seems like CAPA allows for jumping over versions.
|
Beta Was this translation helpful? Give feedback.
-
Thank you all folks for the super valuable insight! This will give a new valuable tool to CAPI users, but ultimately, let everyone choose if to use it or not |
Beta Was this translation helpful? Give feedback.
-
As per our discussion during the May 3rd office hours, it would be great to collect feedback about the user experience on Kubernetes upgrades with Cluster API.
The discussion has been triggered by the idea of streamlined upgrades presented in this KEP, but we reached the conclusion that we would like to get a better understanding of the community sentiment around this topic before moving forward.
A couple of questions we are interested in (but any kind of feedback is more than welcome!):
How often do you upgrade the Kubernetes version on your clusters? Do you use CAPI to upgrade or something else? If something else, how do you upgrade clusters? If you use CAPI, what are the steps you follow? Do you use ClusterClass? Would you be interested in something like LTS support and/or skipping version upgrades for worker nodes? What are currently your biggest pain points related to upgrading?
Beta Was this translation helpful? Give feedback.
All reactions