Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] maxSurge behaves differently during updates #1741

Open
ypnuaa037 opened this issue Sep 12, 2024 · 7 comments
Open

[BUG] maxSurge behaves differently during updates #1741

ypnuaa037 opened this issue Sep 12, 2024 · 7 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@ypnuaa037
Copy link

What happened:
I tested cloneset with 8 replicas and updateStrategy below:
updateStrategy:
maxSurge: 3
maxUnavailable: 0
partition: 0
type: ReCreate
I thought the update should be divided into three batches(3, 3, 2),which should update as below:

  1. creating 3 new pods and then terminating 3 old pods
  2. creating 3 new pods and then terminating 3 old pods
  3. creating 2 new pods and then terminating 2 old pods
    But actually, update was divided into four batches(3, 3, 1, 1),the last two old pods was updated one by one

What you expected to happen:
Each batch of updates should be as many as possible without exceeding maxSurge.

How to reproduce it (as minimally and precisely as possible):
updateStrategy:
maxSurge: 3
maxUnavailable: 0
partition: 0
type: ReCreate

Anything else we need to know?:

Environment:

  • Kruise version: 1.3.0
  • Kubernetes version (use kubectl version): 1.23.5
  • Install details (e.g. helm install args):
  • Others:
@ypnuaa037 ypnuaa037 added the kind/bug Something isn't working label Sep 12, 2024
@furykerry furykerry assigned ABNER-1 and unassigned FillZpp Sep 12, 2024
@ABNER-1
Copy link
Member

ABNER-1 commented Sep 13, 2024

It's a bit strange.
@ypnuaa037 Are the pods in the second batch not ready at the same time?

@furykerry
Copy link
Member

one possible explanation is that after step 2, one new pod get ready faster than the other two, so kruise will start step 3 but only creating one replicas since maxSurge is 3

@ypnuaa037
Copy link
Author

ypnuaa037 commented Sep 13, 2024

to facilitate the process display, I reduced replicas to 5

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 0/1 ContainerCreating 0 1s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m40s
nginx-clone-lossy-test-123-cclzm 1/1 Running 0 4m36s
nginx-clone-lossy-test-123-ddnm6 1/1 Running 0 4m32s
nginx-clone-lossy-test-123-jxlfd 0/1 ContainerCreating 0 1s
nginx-clone-lossy-test-123-t4xnd 0/1 ContainerCreating 0 1s

nginx-clone-lossy-test-123-vzf4n 1/1 Running 0 4m36s
nginx-clone-lossy-test-123-w8m4s 1/1 Running 0 4m40s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 0/1 Running 0 4s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m43s
nginx-clone-lossy-test-123-cclzm 1/1 Running 0 4m39s
nginx-clone-lossy-test-123-ddnm6 1/1 Running 0 4m35s
nginx-clone-lossy-test-123-jxlfd 0/1 Running 0 4s
nginx-clone-lossy-test-123-t4xnd 0/1 Running 0 4s

nginx-clone-lossy-test-123-vzf4n 1/1 Running 0 4m39s
nginx-clone-lossy-test-123-w8m4s 1/1 Running 0 4m43s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 0/1 Running 0 6s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m45s
nginx-clone-lossy-test-123-cclzm 1/1 Terminating 0 4m41s
nginx-clone-lossy-test-123-ddnm6 1/1 Running 0 4m37s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 6s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 6s
nginx-clone-lossy-test-123-vzf4n 1/1 Running 0 4m41s
nginx-clone-lossy-test-123-w8m4s 1/1 Running 0 4m45s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 1/1 Running 0 7s
nginx-clone-lossy-test-123-5n2c9 0/1 ContainerCreating 0 1s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m46s
nginx-clone-lossy-test-123-ddnm6 1/1 Terminating 0 4m38s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 7s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 7s
nginx-clone-lossy-test-123-vzf4n 1/1 Running 0 4m42s
nginx-clone-lossy-test-123-w8m4s 1/1 Terminating 0 4m46s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 1/1 Running 0 8s
nginx-clone-lossy-test-123-5n2c9 0/1 Running 0 2s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m47s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 8s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 8s
nginx-clone-lossy-test-123-vzf4n 1/1 Running 0 4m43s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 1/1 Running 0 12s
nginx-clone-lossy-test-123-5n2c9 1/1 Running 0 6s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m51s
nginx-clone-lossy-test-123-jcn4z 0/1 ContainerCreating 0 0s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 12s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 12s
nginx-clone-lossy-test-123-vzf4n 1/1 Terminating 0 4m47s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 1/1 Running 0 14s
nginx-clone-lossy-test-123-5n2c9 1/1 Running 0 8s
nginx-clone-lossy-test-123-7wwrc 1/1 Running 0 4m53s
nginx-clone-lossy-test-123-jcn4z 0/1 Running 0 2s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 14s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 14s

NAME READY STATUS RESTARTS AGE
nginx-clone-lossy-test-123-45dnf 1/1 Running 0 18s
nginx-clone-lossy-test-123-5n2c9 1/1 Running 0 12s
nginx-clone-lossy-test-123-7wwrc 0/1 Terminating 0 4m57s
nginx-clone-lossy-test-123-jcn4z 1/1 Running 0 6s
nginx-clone-lossy-test-123-jxlfd 1/1 Running 0 18s
nginx-clone-lossy-test-123-t4xnd 1/1 Running 0 18s

@ypnuaa037
Copy link
Author

the last 2 pods were creating and ready one by one, and it's easy to reappear when initialDelaySeconds is a little bit longger ( for example, 10s)

@ypnuaa037
Copy link
Author

I0913 09:25:21.115721 1 cloneset_sync_utils.go:118] Calculate diffs for CloneSet project-123/nginx-clone-lossy-test-123, replicas=5, partition=0, maxSurge=3, maxUnavailable=0, allPods=6, newRevisionPods=3, newRevisionActivePods=3, oldRevisionPods=3, oldRevisionActivePods=3, unavailableNewRevisionCount=1, unavailableOldRevisionCount=0, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0. Result: {scaleUpNum:1 scaleUpNumOldRevision:0 scaleDownNum:0 scaleDownNumOldRevision:0 scaleUpLimit:1 deleteReadyLimit:0 useSurge:2 useSurgeOldRevision:0 updateNum:2 updateMaxUnavailable:1}
I0913 09:25:21.115756 1 cloneset_scale.go:84] CloneSet project-123/nginx-clone-lossy-test-123 begin to scale out 1 pods including 0 (current rev)
I0913 09:25:21.118524 1 event.go:282] Event(v1.ObjectReference{Kind:"CloneSet", Namespace:"project-123", Name:"nginx-clone-lossy-test-123", UID:"6c530a5f-3f29-4716-ab95-a7a9d5806e09", APIVersion:"apps.kruise.io/v1alpha1", ResourceVersion:"30661475", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' succeed to delete pod nginx-clone-lossy-test-123-ddnm6
I0913 09:25:21.134346 1 cloneset_status.go:50] To update CloneSet status for project-123/nginx-clone-lossy-test-123, replicas=6 ready=5 available=5 updated=3 updatedReady=2, revisions current=nginx-clone-lossy-test-123-784895c554 update=nginx-clone-lossy-test-123-86b9d66856

I0913 09:25:21.135176 1 event.go:282] Event(v1.ObjectReference{Kind:"CloneSet", Namespace:"project-123", Name:"nginx-clone-lossy-test-123", UID:"6c530a5f-3f29-4716-ab95-a7a9d5806e09", APIVersion:"apps.kruise.io/v1alpha1", ResourceVersion:"30661475", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' succeed to create pod nginx-clone-lossy-test-123-5n2c9

I0913 09:25:21.157225 1 cloneset_sync_utils.go:118] Calculate diffs for CloneSet project-123/nginx-clone-lossy-test-123, replicas=5, partition=0, maxSurge=3, maxUnavailable=0, allPods=7, newRevisionPods=4, newRevisionActivePods=4, oldRevisionPods=3, oldRevisionActivePods=3, unavailableNewRevisionCount=2, unavailableOldRevisionCount=0, preDeletingNewRevisionCount=0, preDeletingOldRevisionCount=0, toDeleteNewRevisionCount=0, toDeleteOldRevisionCount=0. Result: {scaleUpNum:0 scaleUpNumOldRevision:0 scaleDownNum:1 scaleDownNumOldRevision:3 scaleUpLimit:0 deleteReadyLimit:0 useSurge:1 useSurgeOldRevision:0 updateNum:1 updateMaxUnavailable:2}

@ypnuaa037
Copy link
Author

after nginx-clone-lossy-test-123-5n2c9 created, useSurge was set to 1, then pod was creating and ready one by one

@ypnuaa037
Copy link
Author

@ABNER-1 We use maxSurge as update speed control, and our users have questions about this. Can you try to reproduce and analyze this problem ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants