Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 ✨ 🧑‍🤝‍🧑 add proposal for Node Bootstrapping working group #11407

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

t-lo
Copy link
Contributor

@t-lo t-lo commented Nov 12, 2024

What this PR does / why we need it:

Propose a working group for node bootstrapping and cluster provisioning.
The need for this working group originated from an ongoing discussion around separating cluster provisioning and node bootstrapping, as stated in the WG's User Story.

Which issue(s) this PR fixes

CC

Tags

/area provider/bootstrap-kubeadm
/area bootstrap

/kind documentation
/kind proposal

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 12, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Nov 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

This PR is currently missing an area label, which is used to identify the modified component when generating release notes.

Area labels can be added by org members by writing /area ${COMPONENT} in a comment

Please see the labels list for possible areas.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Hi @t-lo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 12, 2024
@t-lo
Copy link
Contributor Author

t-lo commented Nov 12, 2024

"@elmiko" "@eljohnson92" I took the liberty to add you as co-stakeholders to the WG proposal - IIRC you expressed interest in participating. I hope that's OK?

@@ -0,0 +1,88 @@
---
title: ClusterAPI Node Provisioning working group
Copy link
Member

@johananl johananl Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about carrying over the conflation between bootstrap and provisioning into the working group...
The filename and title on line 2 have "node provisioning" in it but the h1 heading below says "node bootstrapping".

Also, I'm not sure what we mean by "cluster provisioning".

To clarify, the two concerns which are currently intertwined are the following:

  1. Generating cloud-config or Ignition configuration and executing it on nodes to make OS customizations, file manipulations on disk etc.
  2. Generating kubeadm configuration and executing kubeadm to turn a host into a k8s node.

I don't care too much what we call each concern as long as the two are clearly demarcated and we know which one we're discussing at any given moment.

In my WIP design document I offered to call no. 1 "provisioning" and no. 2 "bootstrap".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I was trying to use "cluster provisioning" and "node bootstrapping" consistently but apparently it escaped me in the title.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified wording in the PR.

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch 2 times, most recently from 49bf126 to bf5ce21 Compare November 12, 2024 16:46
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 22ad278 to 6353aad Compare November 13, 2024 07:29
@t-lo t-lo changed the title docs: add proposal for Node Bootstrapping working group 📖 ✨ 🧑‍🤝‍🧑 add proposal for Node Bootstrapping working group Nov 13, 2024
docs/community/20241112-node-bootstrapping.md Show resolved Hide resolved
Comment on lines 38 to 43
Initial focus of the working group is on separating node bootstrapping and node
OS configuration / customisation ("node provisioning" from here on).
At the time of writing, bootstrapping and provisioning are architecturally
intertwined; both are implemented in the bootstrap provider.
This increases maintenance effort and effectively prevents re-using provisioning
features across bootstrap providers.
Copy link
Member

@fabriziopandini fabriziopandini Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be I'm wrong, but I think that this statement is somehow more hinting a solution (separation) than a problem.

Base on my understanding, we are trying to solve two problems:

  • to offer to the users more combinations of provisioner/bootstrappers.
  • to ensure those combinations can evolve freely, taking befits from the greatest an the latest the underlying technologies

Separation is a possible way to get there, but IMO, users don't care about separation at all.
In fact, users will always need both a provisioner and a bootstrapper to get a working cluster.

If we split provisioner and a bootstrapper into two separate components, I fear we are not making a good service both to the users, nor to ourself (as project maintainers), nor to the community that builds on top of Cluster API. Why?

  • we are adding to the system another dimension of complexity, that makes this change highly disruptive for users
  • we are imaging a change that will impact not only the users, but the entire of the CAPI ecosystem, think e.g. to the CAPI operator, or all the products built on top of Cluster API.
  • we are adding to the users an additional operational burden (they have to take care of yet another component/provider, its upgrades, its compatibility matrix, its CVEs etc etc).
  • we are imaging a change so complex and extensive that, being very pragmatic, the chances that this change might happen drastically drop (e.g. it will impact clusterctl, clusterclass, runtime extensions, all the controllers, almost everything; in fact this is why this problem is laying around for so long)

Frankly speaking, this seems too much to digest for everyone.

The good news, it that I think that it is possible to look at the problem above from a different angle and find a better solution.

How to solve the two problem above (offer more combinations/ensure those combinations can evolve freely) without introducing separation?

That's up to the working group to figure out, but If I can give my two cents, what we are looking for is a solution that doesn't impact users, but makes it easier to implement and evolve different provisioners (e.g. we want to bump ignition!), while re-using existing bootstrappers (e.g. kubeadm).

I think that the key point in the sentence above is software re-use

So, a possible solution might be in a combination of federated development/separated repositories (providers as we call it in CAPI) + something that makes it easy to reuse code e.g. making kubeadm provider a library (details TBD of course, but I hope you get the idea)

I'm pretty sure if three years ago there was something like a kubeadm library, today we will have a "kubeadm ignition provider" in a separate repository and everyone everyone would be happy. Why not to make this happen now?

Please also see https://docs.google.com/document/d/1Fz5vWwhWA-d25_QDqep0LWF6ae0DnTqd5-8k8N0vDDM/edit?disco=AAABX57uOnU where I make other considerations about the same point

Happy to chat about it if you are interested on this angle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points - will rework the proposal accordingly. Admittedly I wrote the first draft with the existing proposal in mind.

Thank you for your review!

Copy link
Member

@johananl johananl Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabriziopandini. I completely understand the concerns you've raised and am likewise worried by them. I'm also 100% for maximizing code reuse.

If I understand correctly, what you are proposing is to keep provisioners (e.g. Ignition) tightly coupled with bootstrappers (e.g. kubeadm) and offer multiple "flavors" of bootstrap providers for each bootstrapper, one for each provisioner. So, for example, we would have a kubeadm + cloud-init bootstrap provider and a kubeadm + Ignition bootstrap provider in separate repositories.

If that's what you meant, I'm seeing the following problems with this approach:

  1. The cartesian product of bootstrappers and provisioners can grow very large. True, with 1 bootstrapper and 2 provisioners we would have only 2 repositories to maintain, but what if we have 3 bootstrappers and 3 provisioners? That's 9 repositories already. AFAICT this isn't premature optimization since we seem to already have 8 supported bootstrappers and at least 3 provisioners (cloud-init and Ignition, with CloudBase Init either implemented or planned - not sure). That's 24 repos if we don't separate. In addition, we should think not only about the number of repos but also about project personas and separation of conerns: IMO, ideally the maintainer of a provider should only touch his/her codebase when something changes in the technology specific to that provider. For example, as the maintainer of the kubeadm provider, I don't want to worry about changes in Ignition or cloud-init -- that's for the maintainers of the relevant provisioner providers to take care of.

  2. Even if we decide the number of repos is manageable, it's not clear to me how we would achieve good code reuse when we have cloud-config fields embedded in types such as KubeadmConfigSpec. Example:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: capi-azure-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    ...
    diskSetup:
      filesystems:
      - device: /dev/disk/azure/scsi1/lun0
        extraOpts:
        - -E
        - lazy_itable_init=1,lazy_journal_init=1
        filesystem: ext4
        label: etcd_disk
      ...
      partitions:
      - device: /dev/disk/azure/scsi1/lun0
        layout: true
        overwrite: false
        tableType: gpt

In the snippet above, diskSetup is a field under KubeadmConfigSpec. This field is specific to cloud-init. That means KubeadmConfigSpec assumes cloud-init is the only provisioner. True, we could implement "translators" under the hood for Ignition and any other provisioner, but there is no guarantee for feature parity among all provisioners so IMO it's only a matter of time until we run into issues where some feature is supported by cloud-init but not by other provisioners.

I think we have a major design flaw in the current core API (mixing bootstrap and provisioning in the data model). True, it will be very disruptive to change this, but from my understanding you're justifiably concerned about building more complexity (e.g. Ignition 3) on top of a suboptimal foundation, and to me it seems we'd be doing exactly that if we keep the API as is and keep adding layers on top of it.

Also, I know that in the past some maintainers have expressed similar concerns: #4172 (comment)

We already have a bootstrapper contract. What I'm proposing is to formalize a provisioner contract and add the first two provisioner providers (cloud-init and Ignition) and do the necessary breaking changes in the relevant CRDs to support a loose coupling of bootstrappers and provisioners while devising a migration path for users. Such a contract would make bootstrappers and provisioners fully independent.

Bootstrapper contract TL;DR: "The provider should generate bootstrap configuration and store it at a well-known location in the k8s datastore (secrets, CRs or whatever)."

Provisioner contract TL;DR: "The provider should read the bootstrap configuration as well as any user-specified custom provisioning code, generate unified configurations for master and worker nodes and store it at a well-known location in the k8s datastore so that infra providers can read them and inject them to machines."

This way we can flexibly combine bootstrappers and provisioners by relying solely on k8s constructs, make everything more maintainable and likely become future proof in terms of evolving both bootstrappers and provisioners over time.

To summarize, I think that if we avoid tackling the design problem head on and look for ways around it, we'd find ourselves in a similar situation later on, only then we'd be disrupting even more areas of the projects.

That said, I'm open to changing my mind on this, and from Microsoft's perspective if you see a shorter/simpler path which would allow us to graduate Ignition support to GA and add v3 support - we'd be happy of course 🙂

Copy link
Member

@sbueringer sbueringer Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to Fabrizio's comment.

I think we also have to keep in mind that Cluster API is at a point where we can't freely re-shuffle fundamental pieces of the project anymore. We made some very disruptive changes in v1alpha2 and v1alpha3 (4,5 years ago). We declared Cluster API production ready 3 years ago (Kubernetes Cluster API reaches production readiness with version 1.0).

We have to be very very conscious of all the folks relying on Cluster API as a stable fundation (because we told them it is - 3 years ago). We really have to be careful to not blow up the ecosystem by making disruptive changes.

To summarize, I think we have to be very very careful with anything that basically comes down to a Cluster API v2.

(this should come down to evolving Cluster API responsibly and keeping everything backwards compatible, xref: https://cluster-api.sigs.k8s.io/user/manifesto#the-right-to-be-unfinished)

Copy link
Member

@fabriziopandini fabriziopandini Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johananl I really appreciate the fact that you clearly went miles down this path, but me and probably also someone else probably have to catch up on your work and make up their own mind around this problem. (and this was probably one of the key driver behind the idea of proposing a working group I guess).

What I'm advocating for is that this working group should be open to other options, because if you tight this charter to a single option, this will implicitly prevent or limit other's contributions.

Also, please note that, I'm bringing this up mostly because, based on my experience, limiting others contribution in the initial phase might become an issue for the success of the working group down the line, when broader consensus will be required e.g.g to get proposal approved.

I would defer all the other discussion to the working group, but I also don't want to be un polite and to seem one that is ignoring your other comments, so I drop some point of discussions for the working group sessions.

WRT to the initial assumption of the use cases we have in front of us, I get that theoretically we have to address a possible high number of combinations, but from the other side, as a matter of fact only the ignition team has raised this concern so far.

I'm not sure why, but I think that we should consider that either not all the combinations of providers/bootstrappers makes sense or simply there is no demand for many of those combinations (but getting a better understanding of the problem scope is again part of the working group goal IMO).

WRT to the possible solution, frankly speaking I think it is too early to get into the weed of different options without a better understanding of the context.

But just to avoid misunderstanding, I'm well aware of the fact that the current KubeadmConfig API now is not in an ideal state for what we are discussing. As a matter of fact it wasn't designed with re-use in mind.

However, just to be clear, what I'm advocating for is to being open to explore options for making re-use possible while minimising the number of breaking changes/impact for users.

Whatever idea I dropped in the previous note is just something to make the case that other options might exist, and it should not be intended as a fully defined alternative (sorry if this was not clear in my previous comment).

But finding a solution for this problem is a discussion that I will be happy to have if the charter of this working group opens up to such option. Otherwise, why we even start a working group if it cannot shape the target solution?

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 6353aad to f61f4ee Compare November 15, 2024 15:19
@t-lo
Copy link
Contributor Author

t-lo commented Nov 15, 2024

Thank you Johanan, Fabrizio, and Stefan for tuning in! This is immensely helpful.
I also love the fact that we're already iterating over design thoughts; this is exactly the momentum we were hoping for with the working group.

Made a few changes to the proposal; reworked the whole user story part to focus on goals instead of implementations, and rephrased the "problem statement" section a bit to not hint at a solution when describing the issue.

Added a new section on stability and compatibility - this really was the proverbial elephant in the room for me since in Flatcar, we put a massive (occasionally painful) focus on never breaking user workloads ever - kudos to Stefan for calling this out. I'll make sure to hold the working group proposals to an equally high standard.

I don't think we're quite there yet but we're definitely making progress. Ready for another round of feedback!

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from f61f4ee to dbc9ed4 Compare November 15, 2024 15:31
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 15, 2024
Co-Authored-by: Johanan Liebermann <[email protected]>
Signed-off-by: Thilo Fromm <[email protected]>
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from dbc9ed4 to 857dd3d Compare November 15, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants