Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix race condition between work status update and deletion. #74

Merged

Conversation

morvencao
Copy link
Member

@morvencao morvencao commented Aug 28, 2024

Summary

  1. This PR mainly addresses a race condition where a work status update can overwrite the DeletionTimestamp, preventing the resource from being deleted. The issue occurs when an update event is followed immediately by a delete event from the source, causing the agent to skip deletion.

    To resolve this, we now verify the resource version before updating the agent store for work status update.

  2. Additionally, any update event that follows a delete event for the same resource from the source will be ignored from agent side.

  3. In this PR we've also removed a duplicate debug log that printed the received event. (that is printed in baseclient.go)

Related issue(s)

Fixes #

@morvencao
Copy link
Member Author

/assign @qiujian16 @skeeey

return nil, errors.NewInternalError(err)
}
// ensure the resource version of the work is not outdated
if newResourceVersion < lastResourceVersion {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be a problem we return error here? since we publish but do not update local store?

Copy link
Member Author

@morvencao morvencao Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only consider the deletion case, there might not be an issue because the resource will be deleted later.

Copy link
Member

@skeeey skeeey Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be a problem, if the resource version does not match, that means

  1. the resource spec has beed updated on the source
  2. the resource spec has beed updated on the agent local cache

because we publish the status update with resource version, so even if the update request is sent, the source should not accept this request.

for agent, this update is not for the current newest resource, so we should also reject it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments in 3ef6f5d for this.

@morvencao morvencao force-pushed the br_fix_deletion branch 2 times, most recently from 33dfdba to 6a28efa Compare August 28, 2024 09:40
Signed-off-by: morvencao <[email protected]>
@morvencao
Copy link
Member Author

@qiujian16 Another look?

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Aug 29, 2024
Copy link

openshift-ci bot commented Aug 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: morvencao, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7bd852f into open-cluster-management-io:main Aug 29, 2024
10 checks passed
@morvencao morvencao deleted the br_fix_deletion branch August 29, 2024 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants