Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Guardrail to avoid downtime #3878

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Abhish2702
Copy link

@Abhish2702 Abhish2702 commented Oct 8, 2024

This PR introduces a guardrail in the rollouts controller to help prevent unexpected downtime during rollouts. The guardrail adds an additional check to ensure that the number of replicas in the stable ReplicaSet is sufficient before any traffic switch occurs.

For example, if the desired weight for canary replicas is set to 60%, and there are 10 replicas in total, the code will verify that, before diverting 60% of the traffic to the canary replicas, at least 40% of the replicas (i.e., 4 in this case) are available in the stable ReplicaSet.

This check is crucial to prevent potential downtime. A common scenario where downtime can occur is when a rollout is already in progress, and a new deployment is triggered. In such cases, if the stable replicas are insufficient, it could lead to service disruption.

Resolves #3372
Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@Abhish2702 Abhish2702 changed the title Guardrail to avoid downtime fix: Guardrail to avoid downtime Oct 8, 2024
Copy link
Contributor

github-actions bot commented Oct 8, 2024

Published E2E Test Results

  4 files    4 suites   3h 13m 37s ⏱️
113 tests 104 ✅  7 💤 2 ❌
454 runs  424 ✅ 28 💤 2 ❌

For more details on these failures, see this check.

Results for commit c3f42e3.

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.73%. Comparing base (5f59344) to head (c3f42e3).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3878      +/-   ##
==========================================
+ Coverage   82.69%   82.73%   +0.03%     
==========================================
  Files         163      163              
  Lines       22895    22911      +16     
==========================================
+ Hits        18934    18956      +22     
+ Misses       3087     3083       -4     
+ Partials      874      872       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Oct 8, 2024

Published Unit Test Results

2 281 tests   2 281 ✅  2m 59s ⏱️
  128 suites      0 💤
    1 files        0 ❌

Results for commit c3f42e3.

♻️ This comment has been updated with latest results.

Signed-off-by: Abhishek Bansal <[email protected]>
@zachaller
Copy link
Collaborator

zachaller commented Nov 20, 2024

Very similar logic is already defined via

func (c *rolloutContext) ensureSVCTargets(svcName string, rs *appsv1.ReplicaSet, checkRsAvailability bool) error {
I do not think it makes sense to add another check

@Abhish2702
Copy link
Author

Very similar logic is already defined via

func (c *rolloutContext) ensureSVCTargets(svcName string, rs *appsv1.ReplicaSet, checkRsAvailability bool) error {

I do not think it makes sense to add another check

@zachaller but this check is not preventing downtime if we trigger rollout in between the deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants