Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix Port Leaks #1648

Merged
merged 1 commit into from
Aug 30, 2023
Merged

Conversation

spjmurray
Copy link
Contributor

What this PR does / why we need it:

When an instance fails to come up, there's a bug where the port doesn't get cleaned up, and eventually you will exhaust your quota. The code does attempt to garbage collect, but I suspect at some point in the past ports were suffixed with an index, but the garbage collection code wasn't made aware of this, so lists nothing. Replace the APi "filtering" that doesn't work with a manual filter that does prefix matching.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1404

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 21, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @spjmurray. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 21, 2023
@netlify
Copy link

netlify bot commented Aug 21, 2023

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit b77e660
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/64e86dd52fa12d0008ac1e58
😎 Deploy Preview https://deploy-preview-1648--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 21, 2023
@@ -380,14 +380,6 @@ func (s *Service) createInstanceImpl(eventObject runtime.Object, openStackCluste
return createdInstance, nil
}

// getPortName appends a suffix to an instance name in order to try and get a unique name per port.
Copy link
Contributor Author

@spjmurray spjmurray Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did I move this you will inevitably ask? Well, problem is I want to associate the test with the function that causes the issue in the first place. Problem is I cannot just export this and include it because of circular dependencies. That led me to going, well this should be a networking_test package anyway, which then led to to not being able to create the networking service due to the client field not being exported, which then became ugly because of a hacky NewWithClientForTestOnly() function in the production code. Moving it was by far the simplest solution.

instanceName := "foo"
portID1 := "dc6e0ae3-dad6-4240-a9cb-e541916f20d3"
portID2 := "a38ab1cb-c2cc-4c1b-9d1d-696ec73356d2"
portName1 := GetPortName(instanceName, nil, 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: above comment, I wanted to be absolutely sure that things generated with GetPortName will get offed. By adding a tight coupling between the test and the call, you'll be less likely to change one in a broken way without being drawn to the test cases, so a win in my opinion,

pkg/cloud/services/networking/port.go Outdated Show resolved Hide resolved
@jichenjc
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 22, 2023
@spjmurray
Copy link
Contributor Author

@dulek I've probably broken everything, but this may be a "better", definitely far more complex solution, but I can always revert back to the KISS version.

So I've copied instance creation by generating an InstanceSpec, propagated that down so we can generate []PortOpts then passed that down to the garbage collector so we only check for specific ports that should exist.

Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I like this version more, thank you! One comment inline.

Comment on lines 306 to 314
for _, p := range portList {
if err := s.DeletePort(eventObject, p.ID); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably error out if there are multiple ports of that name? Unless we can get to such a case in a single cluster scenario too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, update incoming. And now this has been philosophically reviewed, I guess I had better test it! Luckily I have a 0.8.0 branch... back in a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news! It does actually work.

When an instance fails to come up, there's a bug where the port doesn't
get cleaned up, and eventually you will exhaust your quota. The code
does attempt to garbage collect, but I suspect at some point in the past
ports were suffixed with an index, but the garbage collection code
wasn't made aware of this, so lists nothing. Replace the APi "filtering"
that doesn't work with a better algorithm that uses the instance
creation code to generate the expected ports, then generate the names
and delete if the port exists. Also protected with some nice unit tests.
@dulek
Copy link
Contributor

dulek commented Aug 25, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2023
Comment on lines +298 to +299
// TODO: whould be nice if gophercloud could be persuaded to accept multiple
// names as is allowed by the API in order to reduce API traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @pierreprinetti

If you'd like to resurrect that effort? The use cases are mounting.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth, spjmurray

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 30, 2023
@spjmurray
Copy link
Contributor Author

@mdbooth thanks! Should I unhold this or is that your job? Speaking of which... do we have a rough estimate for 0.8.0 GA?

@mdbooth
Copy link
Contributor

mdbooth commented Aug 30, 2023

@spjmurray You get the pleasure of removing your hold :) Unless I get bored waiting for you, that is, but that takes a couple of weeks.

@spjmurray
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2023
@spjmurray
Copy link
Contributor Author

Not mine it's part of the PR template.

@spjmurray
Copy link
Contributor Author

/pony

@k8s-ci-robot
Copy link
Contributor

@spjmurray: pony image

In response to this:

/pony

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit 1384d61 into kubernetes-sigs:main Aug 30, 2023
4 checks passed
@spjmurray spjmurray deleted the fix_port_leak branch August 30, 2023 17:25
@pierreprinetti
Copy link
Contributor

Here's a proposal for "or" junctions in Gophercloud List queries: gophercloud/utils#190
Would this match your use-case?

I would also really appreciate candid feedback on how ergonomic my proposal would be. There are many different ways to add this capability to Gophercloud and this is just one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CAPO leaks ports if machines can not be created (HTTP 500 NoValidHosts)
6 participants