Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON patch of Restore resource modifiers failing when referencing fields from status #8094

Open
BaudouinH opened this issue Aug 7, 2024 · 7 comments · May be fixed by #8121
Open

JSON patch of Restore resource modifiers failing when referencing fields from status #8094

BaudouinH opened this issue Aug 7, 2024 · 7 comments · May be fixed by #8121
Assignees

Comments

@BaudouinH
Copy link

What steps did you take and what happened:

I am trying to restore volumesnapshots and volumesnapshotcontents from a backup from another GKE Kubernetes cluster using Velero.

Following GKE's documentation to import a pre-existing snapshot, some fields of the resources need to be modified to be successfully imported.

For example, for the initial VolumeSnapshotContent

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: restored-snapshot-content
spec:
  source:
    volumeHandle: <pvc reference>
status:
  snapshotHandle: <gcp reference for the snapshot>
(...)

The correct VolumeSnapshotContent definition, to import it on another cluster, would be:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: <name>
spec:
  source:
    snapshotHandle: <gcp reference for the snapshot>
status:
  snapshotHandle: <gcp reference for the snapshot>
(...)

So I need to add the field spec.source.snapshotHandle, set it to the value of status.snapshotHandle and remove the field spec.source.volumeHandle.

To do so, I tried using Restore Resource Modifiers, with the following JSON patches:

version: v1
resourceModifierRules:
  - conditions:
      groupResource: volumesnapshotcontents.snapshot.storage.k8s.io
      resourceNameRegex: "^snapcontent-.*$"
    patches:
      - operation: replace
        path: "/spec/deletionPolicy"
        value: "Retain"
      - operation: copy
        from: "/status/snapshotHandle"
        path: "/spec/source/snapshotHandle"
      - operation: remove
        path: "/spec/source/volumeHandle"

Which I put in a configmap, restore-modifier as shown in the doc.

I then triggered a restore by creating the following Restore resource through the Kubernetes API:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <name>
  namespace: velero
spec:
  backupName: <backup name>
  includedNamespaces:
  - <namespace>
  includedResources:
  - volumesnapshotcontents.snapshot.storage.k8s.io
  includeClusterResources: true
  restoreStatus:
    includedResources:
    - volumesnapshotcontents.snapshot.storage.k8s.io
  existingResourcePolicy: update
  resourceModifier:
    kind: configmap
    name: restore-modifier

You can note that I specifically set spec.restoreStatus to have the status on my resources, as well as spec.includeClusterResources to be able to successfully restore VolumeSnapshotContent.

When doing so, I get a partial failure of the restore:
The resource is created but not modified according to the patch.
The following error can be seen in Velero's logs:

time="2024-08-07T13:43:49Z" level=error msg="Cluster resource restore error: error in applying patch error in applying JSON Patch copy operation does not apply: doc is missing from path: \"/status/snapshotHandle\": missing value" logSource="pkg/controller/restore_controller.go:587" restore=velero/<name of the restore>

And with the debug command, for each resource that I try to patch:

Cluster:  error in applying patch error in applying JSON Patch copy operation does not apply: doc is missing from path: "/status/snapshotHandle": missing value

Even though the resource created by Velero has the value I'm trying to reference in the patch:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: <name of the restored volumeSnapshotContent>
spec:
  source:
    volumeHandle: <pvc reference> # Patch has not been applied
(...)
status:
  snapshotHandle: <gcp volume snapshot reference>

So the patch should work, if I'm not mistaken.

I took a quick look at the code, in https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go and, if I understand correctly, it seems the resourceModifiers are applied before the logic adding the status to the object being restored.

Since I'm lacking some context on the implementation of the restore feature, I don't know if this behavior is expected or not.

I'm running into the same issue, while restoring VolumeSnapshots, with the same type of JSON patchs.

What did you expect to happen:

Being able to use JSON patches on the status field in restore resource modifiers to modify a restored resource field by using values from the status field.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version): v1.14.0
  • Velero features (use velero client config get features): EnableCSI
  • Kubernetes version (use kubectl version): v1.30.2-gke.1587003
  • Kubernetes installer & version: GKE v1.30.2-gke.1587003
  • Cloud provider or hardware configuration: GCP
  • OS (e.g. from /etc/os-release): Container-Optimized OS

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@anshulahuja98 anshulahuja98 self-assigned this Aug 8, 2024
@reasonerjt
Copy link
Contributor

This happens b/c the json patches were applied after removing the status field.

if obj, err = resetMetadataAndStatus(obj); err != nil {

Maybe we can resetMetadata here and resetStatus after the jsonpatches are applied? I assume no RIA will try to modify the status field??

Above all, we need to avoid causing regression...

@jacksgt
Copy link

jacksgt commented Aug 9, 2024

Hello,
thank you for opening this issue - I was just debugging the same behavior for a few hours!
I would definitely be in favor of allowing the use of status fields in Restore Resource Modifiers.

@BaudouinH
Copy link
Author

BaudouinH commented Aug 14, 2024

Maybe we can resetMetadata here and resetStatus after the jsonpatches are applied? I assume no RIA will try to modify the status field?

I think this would be the most straightforward implementation, and as long as this limitation is highlighted in the documentation it is fine from my end user point of view.
I can open a PR if needed.

Making it possible to modify the status as well would be more complex but may be useful for some end users?

Above all, we need to avoid causing regression...

Which type of regression do you have in mind? Since we would still remove this status field by default, just later in the process

@reasonerjt
Copy link
Contributor

@BaudouinH

Above all, we need to avoid causing regression...

Which type of regression do you have in mind? Since we would still remove this status field by default, just later in the process

This is not likely to happen, but an extreme example is that there could be an RIA which panics if the status field is not nil.

Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

Copy link

github-actions bot commented Nov 3, 2024

This issue was closed because it has been stalled for 14 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2024
@anshulahuja98 anshulahuja98 reopened this Nov 6, 2024
@anshulahuja98
Copy link
Collaborator

Reopened this since we have a an open PR. @reasonerjt can you please have a look at the PR again and share any final thoughts before we are ready to merge?

@github-actions github-actions bot removed the staled label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants