-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix race condition between work status update and deletion. #74
🐛 fix race condition between work status update and deletion. #74
Conversation
c08a864
to
dbacf24
Compare
/assign @qiujian16 @skeeey |
return nil, errors.NewInternalError(err) | ||
} | ||
// ensure the resource version of the work is not outdated | ||
if newResourceVersion < lastResourceVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be a problem we return error here? since we publish but do not update local store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only consider the deletion case, there might not be an issue because the resource will be deleted later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may not be a problem, if the resource version does not match, that means
- the resource spec has beed updated on the source
- the resource spec has beed updated on the agent local cache
because we publish the status update with resource version, so even if the update request is sent, the source should not accept this request.
for agent, this update is not for the current newest resource, so we should also reject it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need some comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments in 3ef6f5d for this.
33dfdba
to
6a28efa
Compare
Signed-off-by: morvencao <[email protected]>
6a28efa
to
05750c6
Compare
Signed-off-by: morvencao <[email protected]>
@qiujian16 Another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: morvencao, qiujian16 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7bd852f
into
open-cluster-management-io:main
Summary
This PR mainly addresses a race condition where a work status update can overwrite the DeletionTimestamp, preventing the resource from being deleted. The issue occurs when an update event is followed immediately by a delete event from the source, causing the agent to skip deletion.
To resolve this, we now verify the resource version before updating the agent store for work status update.
Additionally, any update event that follows a delete event for the same resource from the source will be ignored from agent side.
In this PR we've also removed a duplicate debug log that printed the received event. (that is printed in baseclient.go)
Related issue(s)
Fixes #