Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support volume cloning #914

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

umagnus
Copy link
Contributor

@umagnus umagnus commented May 8, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Feat: support volume cloning in Azure Blob CSI driver (for backup)

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:
add mock for azcopy command ut. Create a interface EXEC and util_mock.go for it.

Test in cluster by hand:
protocol blobfuse2

xinyuyuan@devbox:~/go/src/blob-csi-driver$ k apply -f deploy/example/pvc-blob-csi.yaml 
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
persistentvolumeclaim/pvc-blob created
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k apply -f deploy/example/nginx-pod-blob.yaml 
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
pod/nginx-blob created
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k get pv
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
pvc-461bd64c-2c63-4643-9850-9e21dd697d92   10Gi       RWX            Delete           Bound    default/pvc-blob   blob-fuse               8s
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k apply -f deploy/example/pvc-blob-csi-clone.yaml 
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
persistentvolumeclaim/pvc-blob-clone created
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k get pv
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                    STORAGECLASS   REASON   AGE
pvc-23e76d2d-d0dd-4a5e-9349-45b38336d99c   10Gi       RWX            Delete           Released   default/pvc-blob-clone   blob-fuse               33m
pvc-71083d7d-81c1-43d0-bb8f-0434c0a4f70a   10Gi       RWX            Delete           Released   default/pvc-blob         blob-fuse               35m

it seems that generate sas token for account need some time, in our test it needs about 2 minutes

I0512 02:28:38.803503       1 controllerserver.go:691] begin to copy blob container pvc-71083d7d-81c1-43d0-bb8f-0434c0a4f70a to pvc-23e76d2d-d0dd-4a5e-9349-45b38336d99c
I0512 02:28:38.803535       1 controllerserver.go:729] generate sas token for account(fuse27fb43139faaa42bd86)
I0512 02:28:38.864629       1 util.go:124] Send.sendRequest got response with ContentLength -1, StatusCode 200 and responseBody length 767
I0512 02:28:38.864776       1 controllerserver.go:740] copy blob container pvc-71083d7d-81c1-43d0-bb8f-0434c0a4f70a to pvc-23e76d2d-d0dd-4a5e-9349-45b38336d99c
I0512 02:28:41.103730       1 controllerserver.go:745] copied blob container pvc-71083d7d-81c1-43d0-bb8f-0434c0a4f70a to pvc-23e76d2d-d0dd-4a5e-9349-45b38336d99c successfully

protocol nfs

xinyuyuan@devbox:~/go/src/blob-csi-driver$ k apply -f deploy/example/statefulset-nfs.yaml 
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
statefulset.apps/statefulset-blob-nfs created
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k apply -f deploy/example/pvc-blob-csi-clone.yaml 
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
persistentvolumeclaim/pvc-blob-clone-nfs created
xinyuyuan@devbox:~/go/src/blob-csi-driver$ k get pvc
WARNING: version difference between client (1.27) and server (1.25) exceeds the supported minor version skew of +/-1
NAME                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistent-storage-statefulset-blob-nfs-0   Bound    pvc-50bc3465-47d5-445e-bfb4-3310b6bb15ad   100Gi      RWX            blob-nfs       81s
pvc-blob-clone-nfs                          Bound    pvc-a290631b-3059-4817-974b-6e9773aba638   100Gi      RWX            blob-nfs       20s
I0512 03:29:47.963365       1 controllerserver.go:691] begin to copy blob container pvc-50bc3465-47d5-445e-bfb4-3310b6bb15ad to pvc-a290631b-3059-4817-974b-6e9773aba638
I0512 03:29:47.963392       1 controllerserver.go:729] generate sas token for account(nfsa29cf7d0b4f84fb9a33e)
I0512 03:29:48.085460       1 util.go:124] Send.sendRequest got response with ContentLength -1, StatusCode 200 and responseBody length 767
I0512 03:29:48.085666       1 controllerserver.go:740] copy blob container pvc-50bc3465-47d5-445e-bfb4-3310b6bb15ad to pvc-a290631b-3059-4817-974b-6e9773aba638
I0512 03:29:50.362518       1 controllerserver.go:745] copied blob container pvc-50bc3465-47d5-445e-bfb4-3310b6bb15ad to pvc-a290631b-3059-4817-974b-6e9773aba638 successfully

azcopy copy speed

file size copy time
1G 5s
10G 84s
100G 12min10s
1000G 74min23s
Release note:
none

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 8, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @umagnus. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 8, 2023
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2023
@andyzhangx
Copy link
Member

pls also increase the timeout, I think 120s may be not enough now:

pkg/blobplugin/Dockerfile Outdated Show resolved Hide resolved
pkg/blob/blob.go Outdated Show resolved Hide resolved
pkg/blob/blob.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add test in external e2e test, similar to smb driver volume cloning feature.

pkg/blob/blob.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add your test result, e.g. which protocol works, which not (nfs does not work)

pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@andyzhangx
Copy link
Member

to fix the sanity test failure, just download the azcopy binary, extract to current folder, it should work:

                Message: "failed to copy blob container exec: \"azcopy\": executable file not found in $PATH: ",
                Details: nil,
            },
        }
        rpc error: code = Internal desc = failed to copy blob container exec: "azcopy": executable file not found in $PATH: 

_output/amd64/blobplugin --endpoint "$controllerendpoint" -v=5 &
_output/amd64/blobplugin --endpoint "$nodeendpoint" --nodeid "$nodeid" --enable-blob-mock-mount -v=5 &

@umagnus
Copy link
Contributor Author

umagnus commented May 18, 2023

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2023
@umagnus umagnus force-pushed the add_volume_cloning branch 2 times, most recently from ea89cf8 to ca8d303 Compare May 19, 2023 06:46
@umagnus
Copy link
Contributor Author

umagnus commented May 19, 2023

/retest

if copyErr != nil {
if strings.Contains(string(out), azcopyAuthenticationFailedCode) {
klog.Warningf("CopyContainer(%s, %s, %s) failed with error(%v): %v, sleep 2 min and retry", resourceGroupName, accountName, dstContainerName, err, string(out))
time.Sleep(2 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try tune this value a little bit and also it's better to have a wait for succeed process in the CreateVolume func, and we should enlarge timeout as 10min:

        - name: csi-provisioner
          image: mcr.microsoft.com/oss/kubernetes-csi/csi-provisioner:v3.5.0
          args:
            - "-v=2"
            ...
            - "--timeout=120s"

@umagnus
Copy link
Contributor Author

umagnus commented May 22, 2023

/retest

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 11, 2023
@umagnus
Copy link
Contributor Author

umagnus commented Oct 12, 2023

/retest

@umagnus umagnus force-pushed the add_volume_cloning branch 4 times, most recently from cbe699f to 211df3a Compare October 13, 2023 07:15
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2023
@andyzhangx andyzhangx changed the title [WIP] Feat: support volume cloning in Azure Blob CSI driver (for backup) [WIP] feat: support volume cloning Oct 13, 2023
@umagnus
Copy link
Contributor Author

umagnus commented Oct 16, 2023

/retest

@andyzhangx andyzhangx changed the title [WIP] feat: support volume cloning feat: support volume cloning Oct 17, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2023
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the logs, the percentage is not displayed in the logs, is that expected?

[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.100220       1 controllerserver.go:728] generate sas token for account(fuse1974678f2544489c95c)
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.118210       1 controllerserver.go:740] azcopy job status: NotFound, copy percent: %, error: <nil>
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.118243       1 controllerserver.go:744] begin to copy blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:24.118132       1 controllerserver.go:749] azcopy job status: NotFound, copy percent: %, error: <nil>
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:24.118220       1 controllerserver.go:754] copy blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.313751       1 controllerserver.go:759] copied blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c successfully
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.328586       1 controllerserver.go:441] store account key to k8s secret(azure-storage-account-fuse1974678f2544489c95c-secret) in blob-5670 namespace
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.328658       1 controllerserver.go:452] create container pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c on storage account fuse1974678f2544489c95c successfully
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.32

@@ -841,4 +841,132 @@ var _ = ginkgo.Describe("[blob-csi-e2e] Dynamic Provisioning", func() {
}
test.Run(ctx, cs, ns)
})

ginkgo.It("should clone a volume from an existing NFSv3 volume [nfs]", func(ctx ginkgo.SpecContext) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does nfs volume also support volume cloning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, nfs is also supported

@umagnus
Copy link
Contributor Author

umagnus commented Oct 18, 2023

from the logs, the percentage is not displayed in the logs, is that expected?

[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.100220       1 controllerserver.go:728] generate sas token for account(fuse1974678f2544489c95c)
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.118210       1 controllerserver.go:740] azcopy job status: NotFound, copy percent: %, error: <nil>
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:19.118243       1 controllerserver.go:744] begin to copy blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:24.118132       1 controllerserver.go:749] azcopy job status: NotFound, copy percent: %, error: <nil>
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:24.118220       1 controllerserver.go:754] copy blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.313751       1 controllerserver.go:759] copied blob container pvc-42316440-4e30-4c13-b4ad-928e02412159 to pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c successfully
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.328586       1 controllerserver.go:441] store account key to k8s secret(azure-storage-account-fuse1974678f2544489c95c-secret) in blob-5670 namespace
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.328658       1 controllerserver.go:452] create container pvc-caac5a8c-50e0-4be1-ab40-31da7e4f724c on storage account fuse1974678f2544489c95c successfully
[pod/csi-blob-controller-567f79864d-zx5dl/blob] I1018 03:51:32.32

yes, this means azcopy is not in Running state

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, umagnus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2023
@k8s-ci-robot k8s-ci-robot merged commit fde2ca6 into kubernetes-sigs:master Oct 18, 2023
22 checks passed
@andyzhangx
Copy link
Member

/cherrypick release-1.29

@k8s-infra-cherrypick-robot

@andyzhangx: cannot checkout release-1.29: error checking out "release-1.29": exit status 1 error: pathspec 'release-1.29' did not match any file(s) known to git

In response to this:

/cherrypick release-1.29

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

/cherrypick release-1.23

@k8s-infra-cherrypick-robot

@andyzhangx: #914 failed to apply on top of branch "release-1.23":

Applying: use azcopy for volume cloning
Using index info to reconstruct a base tree...
M	deploy/csi-blob-controller.yaml
M	test/external-e2e/testdriver-blobfuse.yaml
Falling back to patching base and 3-way merge...
Auto-merging test/external-e2e/testdriver-blobfuse.yaml
CONFLICT (content): Merge conflict in test/external-e2e/testdriver-blobfuse.yaml
Auto-merging deploy/csi-blob-controller.yaml
CONFLICT (content): Merge conflict in deploy/csi-blob-controller.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 use azcopy for volume cloning
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants