Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable http client connection reuse to prevent memory leak #842

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

swoehrl-mw
Copy link
Collaborator

Description

The operator pod is suffering from memory leaks. After some analysis I think I have narrowed it down to connections for the http client being kept for reuse but never being used due to a new client being created in every reconcile run.
This PR disables the connection keepalive/reuse and (at least in my experiments) prevents the memory leak.

Issues Resolved

Fixes #700

Check List

  • Commits are signed per the DCO using --signoff
  • [-] Unittest added for the new/changed functionality and all unit tests are successful
  • [-] Customer-visible features documented
  • No linter warnings (make lint)

If CRDs are changed:

  • [-] CRD YAMLs updated (make manifests) and also copied into the helm chart
  • [-] Changes to CRDs documented

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@swoehrl-mw
Copy link
Collaborator Author

@prudhvigodithi @salyh Can I get a review+approval here please?

@prudhvigodithi
Copy link
Collaborator

Thanks @swoehrl-mw I will take a look at this today.

@@ -33,14 +35,17 @@ func NewScalerReconciler(
recorder record.EventRecorder,
reconcilerContext *ReconcilerContext,
instance *opsterv1.OpenSearchCluster,
opts ...reconciler.ResourceReconcilerOption,
opts ...ReconcilerOption,
) *ScalerReconciler {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know what is change doing by adding ReconcilerOption ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change is needed to, for one, make this reconciler behave more like the others, and also to get the osClientTransport field from the ReconcilerOptions which is a needed argument for the CreateClientForCluster function. In the end this also helps with testing as the RoundTripper can be mocked.

return true, err
}
clusterClient, err := services.NewOsClusterClient(builders.URLForCluster(r.instance), username, password)
clusterClient, err := util.CreateClientForCluster(r.client, r.ctx, r.instance, r.osClientTransport)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what is the advantage of switching this to CreateClientForCluster method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following code also changes to CreateClientForCluster, what is this change doing ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically unifies how an opensearch client can be created so there is no duplicated code and only one way to create a client.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this only one OpenSearch client is created (until the operator is restarted) and used for all the operator options invoked like create/update on a cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new client is created for each reconciler and reconcile loop, but there is only one codepath through which a client can be created. This gives us better control when we want/need to change handling of clients.
Strictly speaking this change is not needed for the bugfix, but as I was checking that code part anyway for the leak I used the chance to unify it.

@prudhvigodithi
Copy link
Collaborator

Thanks @swoehrl-mw I have added my comments can you please check, this is an important change that should be shipped.
Adding @salyh
@getsaurabh02

@swoehrl-mw swoehrl-mw merged commit 56c9c8f into opensearch-project:main Jun 13, 2024
9 checks passed
@swoehrl-mw swoehrl-mw deleted the fix/memory-leak branch June 13, 2024 06:54
swoehrl-mw added a commit that referenced this pull request Jun 18, 2024
The operator pod is suffering from memory leaks. After some analysis I
think I have narrowed it down to connections for the http client being
kept for reuse but never being used due to a new client being created in
every reconcile run.
This PR disables the connection keepalive/reuse and (at least in my
experiments) prevents the memory leak.

Fixes #700

- [x] Commits are signed per the DCO using --signoff
- [-] Unittest added for the new/changed functionality and all unit
tests are successful
- [-] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [-] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [-] Changes to CRDs documented

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Sebastian Woehrl <[email protected]>
(cherry picked from commit 56c9c8f)
swoehrl-mw added a commit to MaibornWolff/opensearch-operator that referenced this pull request Jul 2, 2024
…ch-project#842)

The operator pod is suffering from memory leaks. After some analysis I
think I have narrowed it down to connections for the http client being
kept for reuse but never being used due to a new client being created in
every reconcile run.
This PR disables the connection keepalive/reuse and (at least in my
experiments) prevents the memory leak.

Fixes opensearch-project#700

- [x] Commits are signed per the DCO using --signoff
- [-] Unittest added for the new/changed functionality and all unit
tests are successful
- [-] Customer-visible features documented
- [x] No linter warnings (`make lint`)

If CRDs are changed:
- [-] CRD YAMLs updated (`make manifests`) and also copied into the helm
chart
- [-] Changes to CRDs documented

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

Signed-off-by: Sebastian Woehrl <[email protected]>
(cherry picked from commit 56c9c8f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Memory Leak in Operator version 2.4.0
2 participants