Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VolumeSnapshot and VolumeSnapshotContent using JSON patch #876

Merged

Conversation

shubham-pampattiwar
Copy link
Contributor

@shubham-pampattiwar shubham-pampattiwar commented Jul 12, 2023

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR replaces the Update calls for VolumeSnapshot and VolumeSnapshotContent with JSON patch calls. Addresses the "object has been modified" issue we see a lot in the snapshot-controller/snapshotter.

Which issue(s) this PR fixes:

Partial fix (only for Update() calls) #748

Special notes for your reviewer:
N/A

Does this PR introduce a user-facing change?:

Update VolumeSnapshot and VolumeSnapshotContent using JSON patch

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 12, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 12, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: shubham-pampattiwar / name: Shubham Pampattiwar (ff71329)

@k8s-ci-robot
Copy link
Contributor

Welcome @shubham-pampattiwar!

It looks like this is your first PR to kubernetes-csi/external-snapshotter 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-snapshotter has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 12, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @shubham-pampattiwar. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 12, 2023
@ggriffiths
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 12, 2023
@ggriffiths
Copy link
Member

@shubham-pampattiwar thank you for picking this up! Can you also add fixes for VolumeSnapshotContents.UpdateStatus and VolumeSnapshots.UpdateStatus?

Alternatively, we can break #748 into two fixes - one for Update() and one for UpdateStatus().

Both calls can cause the Object has been modified errors.

@shubham-pampattiwar
Copy link
Contributor Author

shubham-pampattiwar commented Jul 12, 2023

@ggriffiths yeah I wanted target Update() calls in the current PR's scope. The alternate approach of splitting the fix in 2 PRs sounds good to me.

@ggriffiths
Copy link
Member

ggriffiths commented Jul 12, 2023

@ggriffiths yeah I wanted target Update() calls in the current PR's scope. The alternate approach of splitting the fix in 2 PRs sounds good to me.

Sounds good, can you remove Fixes: ... from the PR description then?

@shubham-pampattiwar
Copy link
Contributor Author

@ggriffiths yeah I wanted target Update() calls in the current PR's scope. The alternate approach of splitting the fix in 2 PRs sounds good to me.

Sounds good, can you remove Fixes: ... from the PR description then?

Updated the PR description.

@ggriffiths
Copy link
Member

Hi @shubham-pampattiwar, looks like the unit tests need to be fixed for this PR. Here is the original PR where patching support was added:
https://github.com/kubernetes-csi/external-snapshotter/pull/526/files

In there will be some examples of how to adjust the unit tests.

@sunnylovestiramisu
Copy link
Contributor

Is this a breaking change? If yes how do we track this in the side car release process?

@xing-yang
Copy link
Collaborator

Is this a breaking change? If yes how do we track this in the side car release process?

This is not a breaking change. This should reduce the number of conflicts we get while updating the API objects. @ggriffiths already made similar changes earlier.

@weshayutin
Copy link

@shubham-pampattiwar bump :)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 25, 2023
@shubham-pampattiwar
Copy link
Contributor Author

@ggriffiths updated the PR.

@xing-yang
Copy link
Collaborator

@shubham-pampattiwar Can you please take a look at the CI failure?

@ggriffiths
Copy link
Member

/assign @xing-yang

for approver review

@xing-yang
Copy link
Collaborator

Can you add "in the snapshot-controller" in the release notes?
It looks good other than that.

Thanks!

@xing-yang
Copy link
Collaborator

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shubham-pampattiwar, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 19, 2023
@k8s-ci-robot k8s-ci-robot merged commit 215f12f into kubernetes-csi:master Sep 19, 2023
@gdiwate
Copy link

gdiwate commented Sep 29, 2023

In my case, the volumesnapshot deletes instantly after creation with the above error.
I have checkout v6.1 and cherry-picked this PR into. Created csi-snapshotter and snapshot-controller images, but this doesn't resolve my issue.

@shubham-pampattiwar
Is this fixed or tested on kubernetes 1.25?

@ggriffiths
Copy link
Member

@xing-yang should this be added to a future release? I don't see it in the latest patches, unless I missed it.

@phoenix-bjoern
Copy link

This important fix is only in the main branch, not in the release-6.x branches. Hence, the fix is not included in an official release.
@xing-yang Please cherry-pick it to the release branches. This an incredible important fix when using Velero.

@phoenix-bjoern
Copy link

@xing-yang @sunnylovestiramisu Any change to get this important improvement in a 6.3.x release (see previous comment)?

@xing-yang
Copy link
Collaborator

@phoenix-bjoern This change will be in the next minor release which will come after the K8s 1.29 release.
Feel free to cherry-pick this to release-6.3 branch if you want it to be in a future 6.3.x release.

@phoenix-bjoern
Copy link

@xing-yang @ggriffiths PR for release-6.3 branch created: #974

mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 23, 2024
This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 24, 2024
…n detection

This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

The known flake pattern detection allows us to specify which flakes
are the ones on which we will retry.

Signed-off-by: Michal Pryc <[email protected]>
mpryc added a commit to mpryc/oadp-operator that referenced this pull request Jan 24, 2024
…n detection

This adds attempts to the flaky tests, which are caused by two known
issues. First one isn't yet available in the CI cluster, second
is known and needs to be fixed in the Velero code:

 - kubernetes-csi/external-snapshotter#876
 - vmware-tanzu/velero#5856

The known flake pattern detection allows us to specify which flakes
are the ones on which we will retry.

Signed-off-by: Michal Pryc <[email protected]>
@Samiksha29
Copy link

@xing-yang I am facing the below issue. internal snapshot not created.
{
"apiVersion": "snapshot.storage.k8s.io/v1",
"kind": "VolumeSnapshot",
"metadata": {
"annotations": {
"diamanti.com/created-by": "backup-controller"
},
"creationTimestamp": "2024-06-25T08:52:43Z",
"finalizers": [
"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection",
"snapshot.storage.kubernetes.io/volumesnapshot-bound-protection"
],
"generation": 1,
"name": "ns1-test-pvc2-snap-933382",
"namespace": "ns1",
"resourceVersion": "36048",
"uid": "ca94340e-c25a-4d30-b8b7-bdf5edfd7f88"
},
"spec": {
"source": {
"persistentVolumeClaimName": "test-pvc2"
},
"volumeSnapshotClassName": "default-snapclass"
},
"status": {
"boundVolumeSnapshotContentName": "snapcontent-ca94340e-c25a-4d30-b8b7-bdf5edfd7f88",
"creationTime": "2024-06-25T08:52:46Z",
"error": {
"message": "Failed to create snapshot: error updating status for volume snapshot content snapcontent-ca94340e-c25a-4d30-b8b7-bdf5edfd7f88: snapshot controller failed to update snapcontent-ca94340e-c25a-4d30-b8b7-bdf5edfd7f88 on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io "snapcontent-ca94340e-c25a-4d30-b8b7-bdf5edfd7f88": the object has been modified; please apply your changes to the latest version and try again",
"time": "2024-06-25T08:52:45Z"
},
"readyToUse": false,
"restoreSize": "1Gi"
}
}
],

I am using k8s 1.28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants