Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support taint propagation from MachinePool to the underlying nodes #11175

Open
Archisman-Mridha opened this issue Sep 12, 2024 · 5 comments
Open
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@Archisman-Mridha
Copy link

Archisman-Mridha commented Sep 12, 2024

What would you like to be added (User Story)?

It'll be very helpful if we have this feature support :

Declare what taints should be applied to the underlying node(s) in the Machine / MachineDeployment / MachinePool spec.

Detailed Description

Especially, when I'm creating a MachinePool. I want to specify taints in the MachinePool spec, and those taints will be applied to each node managed by that MachinePool.
Those nodes will then be reserved for specific types of workloads which have tolerations against those node taints.

This is will help manage node taints in a declarative manner using GitOps.

Anything else you would like to add?

I've come up with a Plan Of Action here : Obmondo@ab19c36

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates an issue lacks a `priority/foo` label and requires one. labels Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 12, 2024
@fabriziopandini fabriziopandini added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 17, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates an issue lacks a `priority/foo` label and requires one. label Sep 17, 2024
@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 17, 2024

If it is only for machine creation, the current solution is to rely on the capabilities of bootstrap providers, but I'm aware of the fact that not all the MP implementations use a bootstrap providers.

If we are looking for a more generic story, that do not apply only on creation, this requires some design, especially for figuring out the interactions between the new "taint propagation" feature and taints defined by bootstrap providers

In order to define next steps, might be worth taking a look at https://cluster-api.sigs.k8s.io/developer/architecture/controllers/metadata-propagation and underlying proposal:

cc @jackfrancis @mboersma @willie-yao

cc @elmiko this discussion might be useful to revive #7685

@Archisman-Mridha
Copy link
Author

@fabriziopandini Thanks for the resources you linked. I wasn't aware of the JoinConfiguration.NodeRegistration.Taints field in the KubeadmConfigTemplate custom resource. I can use that to apply taint to newly minted worker nodes (managed by a MachinePool in my case). Currently this is what my company needs.

However there will be a downside - we can't in-place change (add / remove) taints for those worker nodes. And I'm interested in contributing the solution for this. We want the user to be able to specify the taints in the Machine spec. ClusterAPI will then sync those taints between the Machine resource and the corresponding Kubernetes node.

That's what I'm trying to implement here : Obmondo@ab19c36. Can you please give any opinion on whether I'm going in the right direction or not ?

@JoelSpeed
Copy link
Contributor

JoelSpeed commented Sep 18, 2024

However there will be a downside - we can't in-place change (add / remove) taints for those worker nodes

Removing taints becomes problematic if you have the potential for multiple actors to add taints (which is always the case, think about cordoning a node using kubectl).

Typically in cases where we've had fields like this before, you'd omit the removal side of things, and the taints would be added if they are in the spec, and manually removed if removed from spec.

However, now that we have SSA, in theory we can make this work. If we were to use SSA to apply the taints, then whenever we apply the list of taints from our side, any that we applied (and therefore own according to field managers) that we no longer specify would be removed for us. If they are later re-added by something other, then they would be the field owner now and we wouldn't know that we previously owned it, so, this could be safe if we use SSA

Edit: Scratch that, taints are atomic 😢

@fabriziopandini
Copy link
Member

fabriziopandini commented Sep 18, 2024

@JoelSpeed good catch about taints being atomic/the fact that we cannot remove taint automatically.

@Archisman-Mridha

That's what I'm trying to implement here : Obmondo@ab19c36. Can you please give any opinion on whether I'm going in the right direction or not ?

I don't have bandwidth to research this ATM, but at first sight

  • You should take care of the full propagation story, down from CC to Machines and nodes (the change can't be limited to the machine/machine controller)
  • You should figure out the interactions between the new "taint propagation" feature and taints defined by bootstrap providers (and consider that not all the bootstrap providers are CABPK)
  • If you are interested in getting this to work for MachinePool, this could require some additional considerations. e.g. MachinePool machines are implemented partially and have some caveats, in-place propagation is not yet implemented for MachinePools and probably more (Disclaimer: I'm not using MachinePools, I just help there due to lack of maintainers)

Considering all of this, I strongly suggest to do some design work before implementing, identifying API types changes, controller impacted etc.

It could be a comment on this thread for starting, then we can figure out if it is also needed to amend some of the doc/proposals linked above or else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants