-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snapshot-controller logs report failure frequently #748
Comments
@ggriffiths Can you take a look? These are from the "update" that are not replaced with "patch" in snapshot-controller. |
Yes, there are many spots where we still use "update" instead of "patch":
This error will still be hit in these scenarios. We reduced the major scenarios in #526, but there is more work to be done. I'm happy to review a PR for this work. |
/help |
@ggriffiths: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I've sent a PR-757 |
/assign @camartinez04 |
Also hit this issue.
|
/unassign @camartinez04 |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@ggriffiths @xing-yang If anyone is not actively working on this issue, I would like to work on it. |
Hi @shubham-pampattiwar, that's great. Can you please assign this issue to yourself? Thanks. |
/remove-lifecycle stale |
/assign |
/unassign @shubham-pampattiwar |
Spottet this issue when using Velero 1.12.1 and CSI plugin 0.6.1 together with SC 6.3.1:
As the logs indicate the patch rewrites are not in the 6.3.1 release yet. So please release them ASAP to close this issue. |
I've cherry-picked commit ff71329 from main to the release-6.3 branch and created new images:
I can confirm the patch rewrites solve this issue. IMHO the patch could be safely merged to the release-6.3 branch. |
See #876 (comment) |
We have also seen similar errors about snapshot status update, most likely coming from |
Any update on this? This happens pretty consistently for me. Running |
@julienvincent can you provide logs to understand the context? |
@phoenix-bjoern Sure! What kind of logs would you like? Happy to provide. Similar to other users in this thread, I am using For example: Error:
Message: Failed to check and update snapshot content: failed to remove VolumeSnapshotBeingCreated annotation on the content snapcontent-48e5a79d-6d41-4b28-9d17-24cfaa920cad: "snapshot controller failed to update snapcontent-48e5a79d-6d41-4b28-9d17-24cfaa920cad on API server: Operation cannot be fulfilled on volumesnapshotcontents.snapshot.storage.k8s.io \"snapcontent-48e5a79d-6d41-4b28-9d17-24cfaa920cad\": the object has been modified; please apply your changes to the latest version and try again"
Time: 2023-12-28T00:01:34Z
Ready To Use: false Happy to provide any other information you need. |
@julienvincent the snapshot controller log should have a trace for the error. Can you share it to identify the exact code lines? |
@phoenix-bjoern sure, here is an example:
|
Thanks @julienvincent for the additional context. The trace shows a storage error: |
@phoenix-bjoern I might be misunderstanding something but the linked issue doesn't seem directly related to this issue. That issue is more about longhorn behaviour of not deleting it's internal snapshots when CSI volume snapshots are deleted. AFAIU Velero is not actually involved in the CSI snapshot process other than initially creating the In this case the snapshots themselves are not being reported as successful (but the underlying driver is successfully performing a snapshot). But if I understand what you are saying this error message is set on the VolumeSnapshot resource by the driver (longhorn) and external-snapshotter is just reporting/relaying it? Would you recommend opening an issue with longhorn? |
@julienvincent The snapshot controller is only triggering a process which the storage driver then executes. Since the error seems to occur in the storage driver there is nothing you can in the snapshot controller. |
Hi. We are running into the same issues. Have the UpdateStatus calls already been refactored to use apply instead of patch? If not I would like to help. |
Having refactored last month. |
@hoyho did you have time to finish ? do you know any ETA? thanks |
No problems. Probably will do it next week |
What happened:
In our csi-driver snapshot controller logs shows below error. There is no impact on functionality. But there are too many of these errors.
What you expected to happen:
How to reproduce it:
Anything else we need to know?:
Environment: I tested in IKs 1.22,1.23,1.24
kubectl version
):uname -a
):gcr.io/k8s-staging-sig-storage/csi-snapshotter:v6.0.1
gcr.io/k8s-staging-sig-storage/snapshot-controller:v6.0.1
The text was updated successfully, but these errors were encountered: