-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeadmConfig changes should be reconciled for machine pools, triggering instance recreation #8858
Comments
/triage accepted |
Frankly speaking, what I found confusing in the UX is that we are changing an external reference in place, and this is inconsistent with everything else in CAPI. Additionally, the reference to a bootstrap object is defined inside a machine template, while e.g. in MD/MS we have a similar API modeling but we use a bootstrap template instead of a bootstrap object. However, IMO while designing a solution for problem 1, I think we should aim for a solution where changes to external objects should surface to the parent objects, no matter if user-driven (e.g rotating template references) or machine-driven (by changing the data secret name). This approach - changing the parent object - is consistent with everything else in CAPI and provides a clean pattern for all the providers, and IMO this could address problem 2. WRT to problem 3, It seems to me that what we are saying is going in the direction of each machine having its own bootstrap token so machine provisioning is not affected by changed to concurrent changes to the bootstrap object, and this could be somehow overlapping with @Jont828 work (see eg. #8828 (comment)) Last note, what we are deciding |
But then the providers need to watch Here is one idea how to add such a contract with providers:
What do you think of this? |
This is a dealbreaker if we're talking about watching specific Kubeadm* types. Infra providers should be bootstrap provider agnostic and work with all bootstrap providers and control providers as kubeadm is not the only one out there. If we're trying to watch the bootstrap reference that would be fine as we can do that without referencing kubeadmconfig specifically. In that case, |
@CecileRobertMichon I agree that infra providers shouldn't directly watch The hierarchy is CAPA's current controller has this:
Does the |
yes, that's correct |
we are discussing this issue at today's community meeting, a point raised by @CecileRobertMichon about an alternative to using the status field, what if we could use annotations instead? |
Adding on top of ^^, we should keep into account that there are users relying on Machine pools + other bootstrap providers (different than the kubeadm one) |
Here is the current bootstrap provider contract: https://cluster-api.sigs.k8s.io/developer/providers/bootstrap If we add the field to MachinePoolStatus based on Boostrap config status field we would need to either make the status field part of the boostrap status contract or make it opt-in/optional so it doesn't break other bootstrap providers that don't implement the field. cc @johananl @richardcase who are working on #5294 which might help simplify this |
I think we can make it an optional part of the contract, mentioning that bootstrap providers SHOULD (not MUST) fill in the new fields. The infrastructure provider can then deal with it backwards-compatibly. For example, CAPA currently creates a new launch configuration version if bootstrap data changed, but does not roll out new nodes (= trigger autoscaling group instance refresh). If it sees the "checksum without random bootstrap token" field filled, it can take a decision to roll out nodes immediately, and otherwise fall back to previous behavior. Likewise for other providers. Regarding #5294, I'm not sure what exactly will change. Some items like "only cloud-init supported" are out of date by now, and my proposal would actually avoid having cloud-init/ignition-specific stuff in CAPA (as in the hack kubernetes-sigs/cluster-api-provider-aws#4245 which parses cloud-init data). @johananl @richardcase could you provide a renewed summary of the high-level changes planned in that issue? I'd like to see if 1) it can resolve problems we address here and 2) whether the timeline is too long out so that it makes sense to implement solutions for this issue very soon. Overall, I'd prefer implementing a proposal soon in order to make machine pools actually useful and less surprising right now. The discussed proposal is backward-compatible, optional and can therefore easily be changed or removed later on. Getting it implemented for the Update of my proposal after recent discussions:
|
I'm sorry I have very limited time ATM to dig into this rabbit hole, but I lack some knowledge of MachinePools, and catching up with this + this specific problem requires a dedicated slot that I'm struggling to get. I still have some concerns about one of the assumptions of this thread, and more specifically about the fact that we are changing the BootstrapConfig (the KubeadmConfig in this example) in place I'm not sure the BootstrapConfig was originally conceived for supporting in-place changes. If I look at the code, It seems to me that this behavior is supported incidentally, because we are rotating the bootstrap token for security reasons, which is something we added at a later stage, AFAIK without any intent to support this use case (in-place changes). Now, the assumption above is making this a supported use case, but in a way that is very specific for machine pools which might create some "unexpected side effect" when BootstrapConfig is used with Machines (what if users then ask for us to support for in-place changes on stand-alone machines, given that the BootstrapConfig can be changed in place?). The assumption above is also driving us to implement two additional layers of complexity in our code. The first layer of complexity is for handling the change of spec without waiting for bootstrap token rotation & computing two checksums which are based on a very specific Kubeadm feature, the bootstrap token, but which barely makes sense for other bootstrap provider implementations. The second layer of complexity is to bubble up two checksums to the machine pool object. Given all that, I'm starting to wonder if this is the right path to take because it seems we keep adding complexity and exceptions to solve a problem that we created ourselves with the initial assumption. If I think about what we need, ultimately we need a way to know that the BootstrapConfig object is changed, which is something we usually do in a very explicit way by rotating the bootstrap object. If we assume to rotate the BootstrapConfig in the MachinePool, the InfrastructureMachinePool will be immediately notified of the change, because they are already watching for the parent MachinePool object. If this is correct, probably what is missing is that we don't know which machines have been created with the new and the old bootstrap token. This could probably be addressed in a couple of ways, but this is also where my lack of knowledge shows up. For implementation with machine pool machines, I assume we can use the single machines to track this info But those are just assumptions |
Additional consideration. This is interesting in my mind because it can fit more naturally in the MachinePoolMachine story as well because the MachinePool implementation in CAPI can create those machines upfront / with the right data secret/infra machine etc. and the InfrastructureMachinePool implementation can simply attach to them the missing info when available |
Thanks for tagging me @CecileRobertMichon. I'll try to add my thoughts in general as well as in the context of the WIP proposal for addressing #5294. Terminology disamiguationFirst, I think we have to update our terminology to clearly distinguish between two related yet distinct concerns:
Note that the bootstrap process is based on top of provisioning: We currently use cloud-init or Ignition in order to convey kubeadm configuration to a machine as well as to execute the shell commands which trigger the kubeadm init/join process. However, provisioning isn't the same as bootstrap because provisioning can also contain OS-level customizations which aren’t related to bootstrap (e.g. tweaking a sysctl parameter). Furthermore, even if provisioning contained only bootstrap-related configuration, IMO we'd still need the separate terminology because kubeadm configuration isn’t the same as cloud-init or Ignition configuration and it’s important to distinguish between the two given the different stages performed in order to get from a declarative kubeadm config to a VM with a running kubelet. My thoughtsWith that said, here are my thoughts: Machine recreation on bootstrap config changeFollowing what @fabriziopandini said, I think that before we figure out how to recreate machines following a change to the bootstrap config, we should decide whether we want to allow doing so in the first place. Personally, I don't see why not but I don't have a strong opinion here. Regardless of what gets decided, I agree that this should either be supported and working or explicitly unsupported and therefore documented and rejected by a webhook. In any case, I think we can agree that "supported by accident" isn't something we want. Multiple checksumsI don't think we want to maintain two bootstrap config checksums for the following reasons:
I'd have to dig deeper into the codebase to get a clearer understanding of the problem, but I'm guessing the thing which shouldn't be hashed (bootstrap token) should have resided elsewhere if it's not a part of CAPI <> CAPx contractGreat point about the CAPI <> CAPx contract @AndiDog. This should be a core piece of the #5294 work and I've already started thinking about it. I'll try to copy you in relevant discussions around the proposal process. We already have https://cluster-api.sigs.k8s.io/developer/providers/machine-infrastructure which might be relevant but I need to dig deeper and see if we have an entire contract missing or we just need to update an existing one. High-level description of #5294The #5294 proposal is still WIP, but very briefly here is what I envision at the moment:
The proposal needs a lot more fleshing out so I don't know about a timeline. I'm still gathering requirements all over the place since #5294 touches a very central area of the codebase which affects a lot of use cases. My hope is that we can cleanly separate the effort into manageable tasks and focus on a narrow change set at a given time without losing the bigger picture. Maybe we should break #5294 down to a couple of smaller proposals - we'll see. Infra providers watching KubeadmConfigFollowing from the above, I don't think infra providers should watch KubeadmConfig because I think it goes against the separation of concerns we want to eventually achieve. Infra providers should watch a location with bootstrapper-agnostic provisioning config (e.g. a cloud-config document) and pass that config to machines. Any comments on the above are welcome. Sorry about the 📜🙂 |
@johananl great input, thanks for being so detailed! The terminology makes sense.
This is a very technical issue: AWS EC2 instances allow 16 KB of user data (essentially one byte blob) which must include everything to set up the instance on top of the base image (e.g. Ubuntu, Flatcar Linux or a custom one). Since the bootstrap token changes over time because it expires, the |
Thanks @AndiDog, the issue is clearer to me now. Actually, we already do have a partial solution which might go in the direction you've outlined. Ignition support on AWS utilizes S3 as a storage for Ignition config: The rationale for using S3 in that specific context was to bypass the AWS user data size limit, IIRC because things like TLS certificates took a lot of space and caused issues with some cluster configurations. But the user data size limit is a broader problem which affects more use cases, not just Ignition on AWS, and is one of the user stories covered in #5294 (see U13 - "large payloads"). |
#5294 is a discussion way bigger than this specific issue. That means that if we put this in the critical path most probably it will delay the solution to this specific problem and move off the table more tactical alternatives we were considering till now like #8858 (comment) or the idea I was mulling about in #8858 (comment) about leveraging on object rotation. I just want to make sure everyone agrees on the implications, starting from @AndiDog who reported this issue. |
I attempted the workaround of swapping the
Implementation-wise, I use Helm template magic to append a hash of kind: MachinePool
spec:
template:
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
name: somename-<hash of KubeadmConfig.spec> Sadly, it doesn't work at the moment. Here's why (I'm continuing my problem numbering 😂):
One bug in CAPI, one in CAPA. We believed that changing the bootstrap object reference (here: Do you think these are easier to fix than the other idea (supporting reconciliation of in-place changes to |
@AndiDog thanks for reporting back your findings Just for clarification, the idea I was suggesting about template rotation was not a workaround or something I was expecting to work out of the box. It was an idea about an alternative possible UX that we should aim to fix this issue, given that as explained above it seems to me that the checksum idea adds a lot of complexity instead of going at the root cause of the problem, which ultimately requires a discussion on the MachinePool API. |
We discussed this in a meeting (@CecileRobertMichon, @fabriziopandini, @vincepri) and decided to try and support the reference rotation (changed |
One additional thought: we might want to think about how to deal with "orphan" configs (for example, avoid rotating the KubeadmConfig bootstrap token if the config isn't actually in use by a MachinePool) |
It seems problem 4 was already taken care of in issue #6563 (PR #8667) and I was referring to older code above. Nevertheless, I have to look into this again since the PR's test change only covers a changed name of |
The mentioned problems 4+5 (#8858 (comment)) are fixed via new PRs. CAPI was fine by now and I only added a test (#9616). CAPA required a fix (kubernetes-sigs/cluster-api-provider-aws#4589). With this, changing the |
One idea was confirmed as reasonable by several people in the office hours: consider changing the bootstrap object reference as desire to roll out. I had the doubt that both the old and new bootstrap config (with different names) could produce the same output and the user could be surprised that all nodes get replaced. But then again, @CecileRobertMichon and others rightfully commented "Why would you create a new one with the same content?" since that seems unusual. Next, I will try this solution and see whether other opinions appear as to how this can be solved. And @vincepri suggested factoring out a new issue for splitting off the random, long-lived bootstrap token from |
@AndiDog In #8842, we're adding MachinePool Machines to DockerMachinePools. This would affect the ClusterClass rollout e2e test as we wouldn't be able to rollout changes to the KubeadmConfig to the MachinePool Machines, so I've made a change to exclude them for now. But in the future, we'd want to revert this change. Tagging @willie-yao as well. |
kubernetes-sigs/cluster-api-provider-aws#4619, which solves the issue for CAPA, will soon be merged. It's already working well in our fork. CAPI code also seems to be fine by – I didn't run into more issues. For CAPZ, kubernetes-sigs/cluster-api-provider-azure#2972 might be a related problem, but I'm not currently involved in investigating that. Whether the |
For CAPA, the fix is now merged (kubernetes-sigs/cluster-api-provider-aws#4619). Given that CAPI code was already fixed earlier (via #8667), I think we can close this. /close |
@AndiDog: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@AndiDog Sorry it's a pretty long issue with various linked issues/PRs etc. Can you please summarize what a user would have to do to rollout a change to a KubeadmConfig with a reguar cluster (not using ClusterClass). Is it as simple as modifying the KubeadmConfig object that is linked in a MachinePool directly? (I'm trying to figure out if we got the ClusteClass implementation right) |
@sbueringer Indeed, many ideas and suggestions. This one got implemented until now: Change In practice, that means a user's |
Thank you very much. This means as of today ClusterClass will not be able to rollout changes to bootstrapconfigs for machine pools. I'll open a new issue for that |
What would you like to be added (User Story)?
As operator, I want
KubeadmConfig.spec.{files,preKubeadmCommands,...}
changes to have an effect on MachinePool-creates nodes, resulting in server instance recreation.Detailed Description
Discussed in office hours 2023-06-14 (notes, main points copied into this issue below).
Situation:
MachinePool
manifest referencesAWSMachinePool
andKubeadmConfig
(a very regular machine pool config)Expectation: Changing
KubeadmConfig.spec.*
should lead to recreating (“rolling”) nodes. With infra provider CAPA, nothing happens at the moment. Here's why.Problem 1: CAPI’s
KubeadmConfigReconciler
does not immediately update the bootstrap secret onceKubeadmConfig.spec
changes, but only once it rotates the bootstrap token (purpose: new machine pool-created nodes can join the cluster later on). This means several minutes of waiting for reconciliation.Problem 2: CAPA (and likely all other infra providers) does not watch the bootstrap secret, so it cannot immediately react to
KubeadmConfig.spec
changes either.MachinePool.spec.template.spec.bootstrap.dataSecretName
every time because that triggers reconciliation for theMachinePool
object (machinepool_controller_phases.go code).MachinePool
support inClusterClass
we have to decide what the “ideal” way to rollout BootstrapConfig isProblem 3: The bootstrap secret contains both the “how to set up this server init data” (e.g. cloud-init / ignition) and the random bootstrap token by which nodes join the cluster. If only the token gets refreshed (
DefaultTokenTTL
is 15 minutes), we don’t want nodes to be recreated, since that would recreate all nodes every few minutes.Anything else you would like to add?
Label(s) to be applied
/kind feature
/area bootstrap
/area machinepool
The text was updated successfully, but these errors were encountered: