-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable http client connection reuse to prevent memory leak #842
Disable http client connection reuse to prevent memory leak #842
Conversation
Signed-off-by: Sebastian Woehrl <[email protected]>
@prudhvigodithi @salyh Can I get a review+approval here please? |
Thanks @swoehrl-mw I will take a look at this today. |
@@ -33,14 +35,17 @@ func NewScalerReconciler( | |||
recorder record.EventRecorder, | |||
reconcilerContext *ReconcilerContext, | |||
instance *opsterv1.OpenSearchCluster, | |||
opts ...reconciler.ResourceReconcilerOption, | |||
opts ...ReconcilerOption, | |||
) *ScalerReconciler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know what is change doing by adding ReconcilerOption
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change is needed to, for one, make this reconciler behave more like the others, and also to get the osClientTransport field from the ReconcilerOptions which is a needed argument for the CreateClientForCluster
function. In the end this also helps with testing as the RoundTripper can be mocked.
return true, err | ||
} | ||
clusterClient, err := services.NewOsClusterClient(builders.URLForCluster(r.instance), username, password) | ||
clusterClient, err := util.CreateClientForCluster(r.client, r.ctx, r.instance, r.osClientTransport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what is the advantage of switching this to CreateClientForCluster
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the following code also changes to CreateClientForCluster
, what is this change doing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically unifies how an opensearch client can be created so there is no duplicated code and only one way to create a client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this only one OpenSearch client is created (until the operator is restarted) and used for all the operator options invoked like create/update on a cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new client is created for each reconciler and reconcile loop, but there is only one codepath through which a client can be created. This gives us better control when we want/need to change handling of clients.
Strictly speaking this change is not needed for the bugfix, but as I was checking that code part anyway for the leak I used the chance to unify it.
Thanks @swoehrl-mw I have added my comments can you please check, this is an important change that should be shipped. |
The operator pod is suffering from memory leaks. After some analysis I think I have narrowed it down to connections for the http client being kept for reuse but never being used due to a new client being created in every reconcile run. This PR disables the connection keepalive/reuse and (at least in my experiments) prevents the memory leak. Fixes #700 - [x] Commits are signed per the DCO using --signoff - [-] Unittest added for the new/changed functionality and all unit tests are successful - [-] Customer-visible features documented - [x] No linter warnings (`make lint`) If CRDs are changed: - [-] CRD YAMLs updated (`make manifests`) and also copied into the helm chart - [-] Changes to CRDs documented By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Sebastian Woehrl <[email protected]> (cherry picked from commit 56c9c8f)
…ch-project#842) The operator pod is suffering from memory leaks. After some analysis I think I have narrowed it down to connections for the http client being kept for reuse but never being used due to a new client being created in every reconcile run. This PR disables the connection keepalive/reuse and (at least in my experiments) prevents the memory leak. Fixes opensearch-project#700 - [x] Commits are signed per the DCO using --signoff - [-] Unittest added for the new/changed functionality and all unit tests are successful - [-] Customer-visible features documented - [x] No linter warnings (`make lint`) If CRDs are changed: - [-] CRD YAMLs updated (`make manifests`) and also copied into the helm chart - [-] Changes to CRDs documented By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Sebastian Woehrl <[email protected]> (cherry picked from commit 56c9c8f)
Description
The operator pod is suffering from memory leaks. After some analysis I think I have narrowed it down to connections for the http client being kept for reuse but never being used due to a new client being created in every reconcile run.
This PR disables the connection keepalive/reuse and (at least in my experiments) prevents the memory leak.
Issues Resolved
Fixes #700
Check List
make lint
)If CRDs are changed:
make manifests
) and also copied into the helm chartBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.