-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1: Bring back a "soft expiration" mechanism #1750
Comments
This issue is currently awaiting triage. If Karpenter contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Can you add more details about why this doesn't work for you? Why do you view expiration as a graceful mechanism rather than a forceful? What do you use it for? Is it better to use a different mechanism? |
Added more context. Maybe giving the user a choice between soft and forceful could be good? |
In our case it's simply a question of being able to control when expiration happens. So essentially having a maintenance window to make sure expirations don't happen during certain times of the day or only during weekends. But perhaps there are other mechanisms for achieving this? |
We consider this feature as important for providing runtime environment, which is both secure and zero downtime. |
This matches our use case as well, where we would like to rotate some nodes every X hour and still have graceful shutdowns. We are getting interruptions for non-HA-compatible workload because of the current forceful mechanism. I don't see any good alternatives for rotating the nodes gracefully (except some solutions which would involve adding code and complexity on our end). Maybe a good way forward would be to let the user specify graceful/forceful for expireAfter with an optional graceful termination timeout? I guess the reason/motivation behind the current behavior is that karpenter wants to ensure that the node is really expired right away once the max lifetime is reached. I also guess that one could argue that |
I was unaware the expireAfter no longer respects PDBs, I thought nodes would be immediately marked for disruption but it would still perform terminations in a graceful manner. Is that not true? Also does @dnmgns would adding a schedule option for do-not-disrupt help? I opened #1719 which might be related to your use case. |
Yes, this is all about controlling when we introduce churn, even if that reason is to enforce policy. |
We have a similar use case and would also like to get back the previous
We would also like to get back a way to disable the expiry on existing nodes that are set to expire (e.g. If we need to disable node rotation during incidents). We used to be able to set |
Description
What problem are you trying to solve?
expireAfter
should respect disruption budgets in v1, like it was in 0.37We do want to keep our nodes "fresh" for security reasons, but we only want these rotations to happen during working hours, in order to minimise the chance (even if tiny) of something going wrong and getting paged during nights/weekends.
How important is this feature to you?
6 out of 10.
See: aws/karpenter-provider-aws#7122
The text was updated successfully, but these errors were encountered: