Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement ListVolumes #618

Merged
merged 1 commit into from
Dec 23, 2020
Merged

feat: Implement ListVolumes #618

merged 1 commit into from
Dec 23, 2020

Conversation

nearora-msft
Copy link
Contributor

@nearora-msft nearora-msft commented Dec 16, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #378 #617

Requirements:

Special notes for your reviewer:

Release note:

feat: Implement ListVolumes

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 16, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @nearora-msft!

It looks like this is your first PR to kubernetes-sigs/azuredisk-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/azuredisk-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @nearora-msft. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 16, 2020
@nearora-msft nearora-msft changed the title Implement list volumes feat: Implement ListVolumes Dec 16, 2020
@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 16, 2020
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add list-volumes in integration test:
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/test/integration/run-test.sh

also is there any way to add e2e tests for this feature? thanks.

# csc controller list-volumes -h
NAME
    list-volumes -- invokes the rpc "ListVolumes"

SYNOPSIS
    csc controller list-volumes [flags]

ALIASES
    list-volumes, ls, list, volumes

OPTIONS
        --format
        The Go template format used to emit the results

    -h, --help
        help for list-volumes

        --max-entries
        The maximum number of entries to return

        --paging
        Enables auto-paging

        --starting-token
        The starting token used to retrieve paged data

pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
@nearora-msft
Copy link
Contributor Author

/test pull-azuredisk-csi-driver-e2e-windows

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add integration tests and fix the sanity test failure?

pkg/azuredisk/controllerserver_test.go Show resolved Hide resolved
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the error logs, pls provide as much info as possible, that would help in troubleshooting

pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
}

if start != 0 && start >= len(disks) {
return nil, status.Errorf(codes.Aborted, "ListVolumes starting token(%d) is greater than total number of volumes", start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListVolumes on rg(%s) starting token(%d) is greater than total number of volumes(%d)

pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
pkg/azuredisk/controllerserver.go Outdated Show resolved Hide resolved
}
}

disks, derr := d.cloud.DisksClient.ListByResourceGroup(ctx, d.cloud.ResourceGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there are other scenarios that disks are in created other resource group, so ListVolumes result is only accurate for default condition. On the other hand, agent nodes are all in d.cloud.ResourceGroup, so ListVolumes of all agent nodes would be accurate. I think this could be right solution: list disks of all agent nodes under d.cloud.ResourceGroup

Copy link
Member

@andyzhangx andyzhangx Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking use d.cloud.KubeClient.CoreV1().PersistentVolumes() to get all PV info and then get all diskURI from pv.csi.volumeid and then get disk.ManagedBy, that would provide complete disk list, could refer to:

Copy link
Member

@andyzhangx andyzhangx Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would require much more work, while it could list all disks this cluster is using, otherwise it could not cover all disks and may provide incomplete disk list, not sure what would happen if disk list is incomplete.

@andyzhangx
Copy link
Member

about sanity test failure, need to fix the 5 failures before upgrade to csi sanity test v4.0.2: #622

@nearora-msft
Copy link
Contributor Author

about sanity test failure, need to fix the 5 failures before upgrade to csi sanity test v4.0.2: #622

@andyzhangx I think then we can enable the LIST_VOLUMES_PUBLISHED_NODES only after issue #622 is resolved?

@andyzhangx
Copy link
Member

about sanity test failure, need to fix the 5 failures before upgrade to csi sanity test v4.0.2: #622

@andyzhangx I think then we can enable the LIST_VOLUMES_PUBLISHED_NODES only after issue #622 is resolved?

another way is disable that failed test in sanity tests temporarily

@nearora-msft
Copy link
Contributor Author

about sanity test failure, need to fix the 5 failures before upgrade to csi sanity test v4.0.2: #622

@andyzhangx I think then we can enable the LIST_VOLUMES_PUBLISHED_NODES only after issue #622 is resolved?

another way is disable that failed test in sanity tests temporarily

Disabled the failing sanity test temporarily

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add csc integration test and squash all commits by git rebase -i HEAD~7

pkg/azuredisk/controllerserver_test.go Outdated Show resolved Hide resolved
test/sanity/run-test.sh Outdated Show resolved Hide resolved
@nearora-msft
Copy link
Contributor Author

/test pull-azuredisk-csi-driver-sanity

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 18, 2020
@nearora-msft
Copy link
Contributor Author

csc

Added the integration test. Will squash commits once all the checks complete successfully.

@andyzhangx
Copy link
Member

• Failure [17.089 seconds]
Controller Service [Controller Server]
/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  ListVolumes
  /home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:166
    pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted [It]
skipped 1156 lines unfold_more
[Fail] Controller Service [Controller Server] ListVolumes [It] pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted 
/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:336
Ran 63 of 72 Specs in 503.670 seconds
FAIL! -- 62 Passed | 1 Failed | 0 Pending | 9 Skipped

@nearora-msft
Copy link
Contributor Author

• Failure [17.089 seconds]
Controller Service [Controller Server]
/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/tests.go:44
  ListVolumes
  /home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:166
    pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted [It]
skipped 1156 lines unfold_more
[Fail] Controller Service [Controller Server] ListVolumes [It] pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted 
/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:336
Ran 63 of 72 Specs in 503.670 seconds
FAIL! -- 62 Passed | 1 Failed | 0 Pending | 9 Skipped

Disabled this test because of existing issue kubernetes-csi/csi-test#223

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 21, 2020
@andyzhangx
Copy link
Member

/retest

@andyzhangx
Copy link
Member

would you squash all commits first? thanks.

fix: Add ListVolumes capabilities to the driver

fix: Handle case where maxEntries is greater than the number of disks

Handle case when both starting_token and number of disks are zero

fix: Rectify the attached node name and add more details in logs

test: Disable the 'should return appropriate capabilities' sanity test temporarily

test: Add unit tests and integration test for ListVolumes

test: Fix the ListVolumes unit test

test: Disable flaky pagination test

test: Add the missing apostrophe at the end of skipped tests string

test: Add UTs, integration tests for ListVolumes and disable some sanity tests

feat: Implement ListVolumes
@nearora-msft
Copy link
Contributor Author

/retest

1 similar comment
@nearora-msft
Copy link
Contributor Author

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, nearora-msft

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 22, 2020
@andyzhangx
Copy link
Member

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2020
@nearora-msft
Copy link
Contributor Author

/retest

@nearora-msft
Copy link
Contributor Author

looks like it's a flacky test, could you fix it? thanks:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_azuredisk-csi-driver/618/pull-azuredisk-csi-driver-sanity/1341304585670627328

/hold

The test that is failing performs these steps:

  1. Call List Volumes
  2. Create a volume
  3. List Volumes again

It expects that the List Volumes count in step 3 should be List Volumes count in step 1 + 1. But the actual case is that List Volumes count in step 3 = List Volumes count in step1 +2. This is why the test fails.

I have observed that when a vm is created a disk is attached to it by default. While running the tests, we create an rg and a sanity-test-node in that rg . The creation of this node should create a disk and attach is to the newly created node. So when the ListVolumes call is made the 1st time, it usually lists the disk that was created with the creation of sanity-test-node. So, we get 1 entry in the 1st call and 2 entries in the 2nd call.

But in this failed case, the 1st ListVolumes call doesn't give any results(Could be because the default disk attach has not happened yet ) And when the ListVolumes call is made the second time, it returns 2 disks(One that was created by the vm creation and the other that was created by the test)

Not sure how we can fix it at our end(if that's possible)

@andyzhangx
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2020
@nearora-msft
Copy link
Contributor Author

/retest

@andyzhangx
Copy link
Member

/test pull-azuredisk-csi-driver-e2e-windows

@k8s-ci-robot k8s-ci-robot merged commit 651a03a into kubernetes-sigs:master Dec 23, 2020
@nearora-msft nearora-msft deleted the implement-ListVolumes branch December 24, 2020 07:26
sozercan pushed a commit to sozercan/azuredisk-csi-driver that referenced this pull request Aug 16, 2021
…ttacher

feat: disable attacher by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ListVolumes
4 participants