Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: allow tuning tolerationSeconds default value #489

Open
lmiccini opened this issue Apr 8, 2024 · 0 comments
Open

Enhancement: allow tuning tolerationSeconds default value #489

lmiccini opened this issue Apr 8, 2024 · 0 comments

Comments

@lmiccini
Copy link

lmiccini commented Apr 8, 2024

While looking into nodes failure detection and pods failover behavior/performance we stumbled upon:

https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

The tolerationSeconds parameter allows you to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.

afaicu we don't explicitly set this tolerationSeconds value anywhere so it means each pod uses a default of 300s , resulting in workloads potentially taking more than five minutes to be rescheduled.

We are planning to document this and provide recommendations on how to accelerate the failover if required, still we thought it was worth bringing this up to see if we want to expose this parameter so that each operator can eventually set a more appropriate default value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant