-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistently propagate down timeouts from MD => MS => Machines #10753
Comments
This issue is currently awaiting triage. If CAPI contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@enxebre @fabriziopandini @chrischdi @vincepri |
Isn't this a good default behavior? Ideally, if a MachineSet is stuck in deleting, I should be able to change only that specific one without impacting the new one? |
Good question. Timeouts today don't trigger a rollout (i.e. a new MS) and also don't trigger a separate revision. Might be annoying to have to manually go to the MS and update the timeouts there when you otherwise primarily/only interact with the MD |
Maybe one important point to consider. Going forward we won't have old MachineSets that stay around forever. All of them will be in "scaling down" until they are gone (once we dropped revision management). So I think our only goal for them is to properly scale them down. There probably is no reason that they have to preserve their previous timeout configuration |
The proposed steps sgtm Stefan.
I think it would be suboptimal UX for the use cases where MachineDeloyment is the consumer facing API to force user interaction with the machineSets that are still managed by it. |
I missed this part, what are we doing for rollbacks? |
Found #10479 — which I wasn't aware of, and I guess now this issue makes a bit more sense to me on the reasoning 😄 |
Goal
Goal of this issue is to consistently propagate down timeouts (NodeDrainTimeout, NodeDeletionTimeout, ...) from MDs to MSs to Machines. This is desirable so that users can still change timeouts even if a Machine is e.g. stuck in draining.
We had a first PR which ensures a MachineSet propagates down the timeouts to Machines which are in deleting: #10589
But there are a few other cases, as described here: #10589 (inlining below for convenience)
The following specifically focuses on cases where Machines are deleted by the MS controller.
Case 1. MD is deleted
The following happens:
=> The MS will already be gone when the deletionTimestamp is set on the Machines. In this case folks would have to modify the timestamps on each Machine individually. Because the MS doesn't exist anymore it's not possible to propagate down timeouts from the MS to Machines
Case 2. MD is scaled down to 0
The following happens:
This use case was addressed by: #10589
Case 3. MD rollout
The following happens:
=> In this scenario today the MD controller does not propagate the timeouts from MD to all MS (only to the new/current one, not to the old ones). So the Machines of the old MS won't get new timeouts set in the MD
Implementation
To address all scenarios I would propose to always propagate timeouts from MD => MS => Machine. To make that happen we have to implement the following:
Follow-up:
The text was updated successfully, but these errors were encountered: