-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Guardrail to avoid downtime #3878
base: master
Are you sure you want to change the base?
Conversation
Published E2E Test Results 4 files 4 suites 3h 13m 37s ⏱️ For more details on these failures, see this check. Results for commit c3f42e3. ♻️ This comment has been updated with latest results. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3878 +/- ##
==========================================
+ Coverage 82.69% 82.73% +0.03%
==========================================
Files 163 163
Lines 22895 22911 +16
==========================================
+ Hits 18934 18956 +22
+ Misses 3087 3083 -4
+ Partials 874 872 -2 ☔ View full report in Codecov by Sentry. |
Published Unit Test Results2 281 tests 2 281 ✅ 2m 59s ⏱️ Results for commit c3f42e3. ♻️ This comment has been updated with latest results. |
b337b49
to
34878b6
Compare
7d87a76
to
fda1eba
Compare
Signed-off-by: Abhishek Bansal <[email protected]>
39898f7
to
3ef7a60
Compare
Quality Gate passedIssues Measures |
Very similar logic is already defined via argo-rollouts/rollout/service.go Line 292 in 5f59344
|
@zachaller but this check is not preventing downtime if we trigger rollout in between the deployment. |
This PR introduces a guardrail in the rollouts controller to help prevent unexpected downtime during rollouts. The guardrail adds an additional check to ensure that the number of replicas in the stable ReplicaSet is sufficient before any traffic switch occurs.
For example, if the desired weight for canary replicas is set to 60%, and there are 10 replicas in total, the code will verify that, before diverting 60% of the traffic to the canary replicas, at least 40% of the replicas (i.e., 4 in this case) are available in the stable ReplicaSet.
This check is crucial to prevent potential downtime. A common scenario where downtime can occur is when a rollout is already in progress, and a new deployment is triggered. In such cases, if the stable replicas are insufficient, it could lead to service disruption.
Resolves #3372
Checklist:
"fix(controller): Updates such and such. Fixes #1234"
.