-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early cordon #405
Early cordon #405
Conversation
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #405 +/- ##
==========================================
+ Coverage 39.09% 43.75% +4.65%
==========================================
Files 7 7
Lines 931 1104 +173
==========================================
+ Hits 364 483 +119
- Misses 540 575 +35
- Partials 27 46 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Todd Ekenstam <[email protected]> Signed-off-by: sbadiger <[email protected]>
* Process drain-failures at the end Signed-off-by: ssheladiya <[email protected]> Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Co-authored-by: Venkata Gunapati <[email protected]>
Co-authored-by: Venkata Gunapati <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently, upgrade-manager supports 2 different strategies:
Eager mode - Eagerly wait for replacement nodes and only then drain & terminate the previous instances.
Lazy mode - Rotate (drain and terminate) the desired number of nodes without waiting for the replacement nodes.
In these two strategies, we cordon only the nodes that are in the current batch (batch size is determined by maxUnavailable mentioned in the RollingUpgrade CR. By default maxUnavailable=1)
While the upgrade is in progress, the remaining older nodes that are not yet considered in the node-rotation batch, might have newer deployments / pods scheduled.
These newly scheduled pods could have yet another restart when the underlying older nodes are considered for rotation.
There is also an added time for draining these nodes with additional new pods.
With the approach in PR, we will cordon all the nodes in the respective IG when a rollingUpgrade CR is being processed. The newer pods will always scheduled on newer nodes when an upgrade is in progress.