Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: keep rs informer updated #3091

Merged
merged 4 commits into from
Oct 12, 2023
Merged

Conversation

zachaller
Copy link
Collaborator

@zachaller zachaller commented Oct 10, 2023

This PR fixes an issue where we race on updating revisions on replicasets, this causes a bunch of extra reconciles and logs that are noisy.

https://github.com/argoproj/argo-rollouts/blob/master/rollout/context.go#L72

	isScalingEvent, err := c.isScalingEvent() // Calls c.getAllReplicaSetsAndSyncRevision
	if err != nil {
		return err
	}

	if isScalingEvent {
		return c.syncReplicasOnly() // calls c.getAllReplicaSetsAndSyncRevision
	}

	if c.rollout.Spec.Strategy.BlueGreen != nil {
		return c.rolloutBlueGreen()
	}

	// Due to the rollout validation before this, when we get here strategy is canary
	return c.rolloutCanary() // Calls getAllReplicaSetsAndSyncRevision

Signed-off-by: zachaller <[email protected]>
@zachaller zachaller changed the title keep rs informer updated fix: keep rs informer updated Oct 10, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 10, 2023

Go Published Test Results

2 045 tests   2 045 ✔️  2m 42s ⏱️
   118 suites         0 💤
       1 files           0

Results for commit 7c5c5e1.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (8985fd8) 81.71% compared to head (7c5c5e1) 81.70%.
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3091      +/-   ##
==========================================
- Coverage   81.71%   81.70%   -0.02%     
==========================================
  Files         134      134              
  Lines       20416    20426      +10     
==========================================
+ Hits        16683    16689       +6     
- Misses       2873     2876       +3     
- Partials      860      861       +1     
Files Coverage Δ
rollout/controller.go 81.68% <100.00%> (+0.02%) ⬆️
rollout/replicaset.go 69.72% <100.00%> (+0.28%) ⬆️
rollout/sync.go 79.58% <55.55%> (-0.34%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 10, 2023

E2E Tests Published Test Results

    4 files      4 suites   3h 41m 53s ⏱️
102 tests   85 ✔️   6 💤 11
444 runs  389 ✔️ 24 💤 31

For more details on these failures, see this check.

Results for commit 7c5c5e1.

♻️ This comment has been updated with latest results.

Copy link
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds informer writeback for ReplicaSets, but the screenshot in #3080 shows an Istio VirtualService in the logs.

I always thought the ReplicaSets error was an annoying error that we've lived with for years, and is only cosmetic in nature (and caused noise in the logs / unnecessary reconciliations), but I did not think it could fail a Rollout.

Although I think this change is welcomed, I question if it would fix any functional user-perceived problems.

@zachaller
Copy link
Collaborator Author

zachaller commented Oct 10, 2023

@jessesuen I see that, the images is all VirtualService the log in the text was

roCtx.reconcile err Operation cannot be fulfilled on replicasets.apps "pg-query-65bc4849f5": the object has been modified; please apply your changes to the latest version and try again

I agree this is probably unlikely to fix any issues with VirtualServices failing to reconcile due to other updates.

Signed-off-by: zachaller <[email protected]>
@zachaller
Copy link
Collaborator Author

Also pre this PR the behavior would have been to always update the rollouts informer even after any kind of error returned from the reconciler, after that PR we now do not write rollout objects back to the informer if there is errors such as the replicaset failing to update it's revision.

@jessesuen
Copy link
Member

after that PR we now do not write rollout objects back to the informer if there is errors such as the replicaset failing to update it's revision.

Ah. I see the other PR now and the reason behind it. It makes sense.

		// https://github.com/argoproj/argo-rollouts/issues/2522#issuecomment-1492181154 I also believe there are other cases
		// that newRollout can get updated while we get an error during reconciliation 

Regarding the above code comment, maybe we should remember a roCtx.updatedRollout which is updated to the rollout that we get back from the API server after the patch. This way we only write back to the informer when updates were successful.

@zachaller
Copy link
Collaborator Author

after that PR we now do not write rollout objects back to the informer if there is errors such as the replicaset failing to update it's revision.

Ah. I see the other PR now and the reason behind it. It makes sense.

		// https://github.com/argoproj/argo-rollouts/issues/2522#issuecomment-1492181154 I also believe there are other cases
		// that newRollout can get updated while we get an error during reconciliation 

Regarding the above code comment, maybe we should remember a roCtx.updatedRollout which is updated to the rollout that we get back from the API server after the patch. This way we only write back to the informer when updates were successful.

Yea, I like the updatedRollout idea, it allows separation of concerns on in memory state vs informer/k8s state.

rollout/sync.go Outdated Show resolved Hide resolved
rollout/sync.go Outdated Show resolved Hide resolved
Signed-off-by: zachaller <[email protected]>
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
18.6% 18.6% Duplication

Copy link
Contributor

@leoluz leoluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zachaller zachaller merged commit 28d9502 into argoproj:master Oct 12, 2023
21 of 22 checks passed
phclark pushed a commit to phclark/argo-rollouts that referenced this pull request Oct 15, 2023
* keep rs informer updated

Signed-off-by: zachaller <[email protected]>

* correct bad log

Signed-off-by: zachaller <[email protected]>

* add error context

Signed-off-by: zachaller <[email protected]>

---------

Signed-off-by: zachaller <[email protected]>
Signed-off-by: Philip Clark <[email protected]>
phclark pushed a commit to phclark/argo-rollouts that referenced this pull request Oct 15, 2023
* keep rs informer updated

Signed-off-by: zachaller <[email protected]>

* correct bad log

Signed-off-by: zachaller <[email protected]>

* add error context

Signed-off-by: zachaller <[email protected]>

---------

Signed-off-by: zachaller <[email protected]>
Signed-off-by: Philip Clark <[email protected]>
zachaller added a commit that referenced this pull request Oct 25, 2023
* keep rs informer updated

Signed-off-by: zachaller <[email protected]>

* correct bad log

Signed-off-by: zachaller <[email protected]>

* add error context

Signed-off-by: zachaller <[email protected]>

---------

Signed-off-by: zachaller <[email protected]>
@zachaller zachaller added the cherry-pick-completed Used once we have cherry picked the PR to all requested releases label Oct 25, 2023
@bpoland
Copy link
Contributor

bpoland commented Nov 8, 2023

#3080 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick/release-1.6 cherry-pick-completed Used once we have cherry picked the PR to all requested releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants