implement distributed resizing #195

travisghansen · 2022-02-26T17:18:01Z

Signed-off-by: Travis Glenn Hansen [email protected]

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add resizing support to distributed setups to compliement provisioner and snapshotter features of the same nature.

Added a parameter "--node-deployment" as a command line option for resizer sidecar, which should be set to true when the sidecar is being deployed on a per node basis.
For these changes to work, NODE_NAME environment variable must also be set while deploying the sidecar controller.

Which issue(s) this PR fixes:

Fixes #142

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Adds support for distributed resizing.

Signed-off-by: Travis Glenn Hansen <[email protected]>

k8s-ci-robot · 2022-02-26T17:18:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: travisghansen
To complete the pull request process, please assign msau42 after the PR has been reviewed.
You can assign the PR to them by writing /assign @msau42 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2022-02-26T17:18:08Z

Welcome @travisghansen!

It looks like this is your first PR to kubernetes-csi/external-resizer 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-resizer has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2022-02-26T17:18:09Z

Hi @travisghansen. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Travis Glenn Hansen <[email protected]>

xing-yang · 2022-02-28T13:18:34Z

/ok-to-test

pohly · 2022-03-01T16:25:23Z

cmd/csi-resizer/main.go

@@ -156,7 +171,8 @@ func main() {
 		*timeout,
 		kubeClient,
 		informerFactory,
-		driverName)
+		driverName,
+		*enableNodeDeployment)


There is one potential scalability problem here: if there are now as many resizers running in the cluster as there are nodes, then the apiserver has to keep all of them informed about PV and PVC updates.

external-provisioner has the same problem, but it cannot really be avoided there because support for immediate provisioning depends on all provisioners seeing a new PVC.

I'm just wondering whether something better can be done for the other sidecars. The code below already checks for the util.VolumeSelectedNodeKey label. It would be possible to set up the PVC informer so that it does server-side filtering. It's doable, I just don't have a code example at hand.

That leaves the PV, though. external-provisioner would have to be modified to set a similar (the same?) label there. Such a label may even be useful for the provisioner. This code here does client-side filtering of PVs:
https://github.com/kubernetes-csi/external-provisioner/blob/3b752c36ca71fbf5d7310bb6a568b93844062189/pkg/controller/controller.go#L1132-L1145

That could be replaced with server-side filtering.

Here is code which sets up an informer with server-side filtering:
https://github.com/kubernetes-csi/external-provisioner/blob/3b752c36ca71fbf5d7310bb6a568b93844062189/cmd/csi-provisioner/csi-provisioner.go#L452-L464

I thought about that as well but didn't want to 'kick against the pricks' as they say with how the others were implemented. I also noted that we are not currently doing server-side filtering on the driver either (which I found odd).

Given my limited golang skill set I would definitely need hand-holding on proper code to manage the watches/filters should we go that route.

k8s-triage-robot · 2022-05-30T17:20:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2022-05-30T17:33:09Z

/remove-lifecycle stale

mrpre · 2022-06-14T03:04:21Z

Any update about this pr? I patched it and it seems the resizer works well.

travisghansen · 2022-06-14T03:07:49Z

I haven’t had a chance to address the scalability concerns yet unfortunately. It should work fine for small clusters for sure.

k8s-triage-robot · 2022-09-12T04:02:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2022-09-12T04:04:47Z

/remove-lifecycle stale

b8kings0ga · 2022-10-20T10:30:13Z

wait to use this feature

k8s-triage-robot · 2023-01-18T10:47:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2023-01-18T15:01:14Z

/remove-lifecycle stale

k8s-triage-robot · 2023-04-18T15:28:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2023-04-18T21:16:38Z

/remove-lifecycle stale

k8s-triage-robot · 2023-07-17T21:51:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2023-07-18T03:08:56Z

/remove-lifecycle stale

k8s-ci-robot · 2023-11-02T02:34:06Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2024-01-31T03:28:10Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2024-02-03T05:02:36Z

/remove-lifecycle stale

k8s-triage-robot · 2024-05-03T05:05:46Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2024-05-07T13:24:53Z

/remove-lifecycle stale

k8s-triage-robot · 2024-08-05T13:34:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

travisghansen · 2024-08-05T14:36:47Z

/remove-lifecycle stale

silenceper · 2024-08-27T03:25:06Z

@travisghansen Will this PR be merged? I also need this capability.

travisghansen · 2024-08-27T04:52:17Z

I would like to see it be merged yes. When the PR was originally created there were concerns on the csi team of scalability issues with no real options for addressing them. I have not discussed this in a meeting with the csi team in a very long time but would be happy to discuss again and possibly get it over the hill.

Fundamentally the issue with scalability is about being able to limit the watches on each node to only volumes associated with that node. It's been a while since I looked but at the time this was originally put together there were no mechanisms to do so.

k8s-ci-robot · 2024-09-05T20:43:05Z

@travisghansen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-csi-external-resizer-1-27-on-kubernetes-1-27	`b3e94c3`	link	true	`/test pull-kubernetes-csi-external-resizer-1-27-on-kubernetes-1-27`
pull-kubernetes-csi-external-resizer-1-30-on-kubernetes-1-30	`b3e94c3`	link	true	`/test pull-kubernetes-csi-external-resizer-1-30-on-kubernetes-1-30`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

gnufied · 2024-11-14T20:32:28Z

Why couldn't we accomplish similar thing using NodeExpand RPC calls on nodes? If there are any limitations - it would be nice to know. It is perfectly fine to have volume drivers that implement node-only NodeExpand RPC calls.

implement distributed resizing

91e6ac7

Signed-off-by: Travis Glenn Hansen <[email protected]>

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 26, 2022

k8s-ci-robot requested review from gnufied and humblec February 26, 2022 17:18

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 26, 2022

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 26, 2022

travisghansen mentioned this pull request Feb 26, 2022

zfs-local (non ephemereal) a possibility (possibly a feature request) democratic-csi/democratic-csi#148

Open

readme typos

b3e94c3

Signed-off-by: Travis Glenn Hansen <[email protected]>

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 28, 2022

pohly reviewed Mar 1, 2022

View reviewed changes

travisghansen mentioned this pull request Mar 28, 2022

Consolidate and improve Nomad-related documentation democratic-csi/democratic-csi#168

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 17, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 18, 2023

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 2, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 31, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 3, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement distributed resizing #195

implement distributed resizing #195

travisghansen commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

xing-yang commented Feb 28, 2022

pohly Mar 1, 2022 •

edited

Loading

pohly Mar 1, 2022

travisghansen Mar 1, 2022

k8s-triage-robot commented May 30, 2022

travisghansen commented May 30, 2022

mrpre commented Jun 14, 2022

travisghansen commented Jun 14, 2022

k8s-triage-robot commented Sep 12, 2022

travisghansen commented Sep 12, 2022

b8kings0ga commented Oct 20, 2022

k8s-triage-robot commented Jan 18, 2023

travisghansen commented Jan 18, 2023

k8s-triage-robot commented Apr 18, 2023

travisghansen commented Apr 18, 2023

k8s-triage-robot commented Jul 17, 2023

travisghansen commented Jul 18, 2023

k8s-ci-robot commented Nov 2, 2023

k8s-triage-robot commented Jan 31, 2024

travisghansen commented Feb 3, 2024

k8s-triage-robot commented May 3, 2024

travisghansen commented May 7, 2024

k8s-triage-robot commented Aug 5, 2024

travisghansen commented Aug 5, 2024

silenceper commented Aug 27, 2024

travisghansen commented Aug 27, 2024

k8s-ci-robot commented Sep 5, 2024

gnufied commented Nov 14, 2024

implement distributed resizing #195

Are you sure you want to change the base?

implement distributed resizing #195

Conversation

travisghansen commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

k8s-ci-robot commented Feb 26, 2022

xing-yang commented Feb 28, 2022

pohly Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

pohly Mar 1, 2022

Choose a reason for hiding this comment

travisghansen Mar 1, 2022

Choose a reason for hiding this comment

k8s-triage-robot commented May 30, 2022

travisghansen commented May 30, 2022

mrpre commented Jun 14, 2022

travisghansen commented Jun 14, 2022

k8s-triage-robot commented Sep 12, 2022

travisghansen commented Sep 12, 2022

b8kings0ga commented Oct 20, 2022

k8s-triage-robot commented Jan 18, 2023

travisghansen commented Jan 18, 2023

k8s-triage-robot commented Apr 18, 2023

travisghansen commented Apr 18, 2023

k8s-triage-robot commented Jul 17, 2023

travisghansen commented Jul 18, 2023

k8s-ci-robot commented Nov 2, 2023

k8s-triage-robot commented Jan 31, 2024

travisghansen commented Feb 3, 2024

k8s-triage-robot commented May 3, 2024

travisghansen commented May 7, 2024

k8s-triage-robot commented Aug 5, 2024

travisghansen commented Aug 5, 2024

silenceper commented Aug 27, 2024

travisghansen commented Aug 27, 2024

k8s-ci-robot commented Sep 5, 2024

gnufied commented Nov 14, 2024

pohly Mar 1, 2022 •

edited

Loading