-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support for OkHttp thread metrics #1022
base: main
Are you sure you want to change the base?
Conversation
"smithy.client.http.threads.count", | ||
{ measurement -> recordThreadState(measurement, callback) }, | ||
"{thread}", | ||
"Current state of HTTP engine threads (active, running)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are "running" threads different from "active" ones? I see only "total" and "active" are recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, "active" and "running" are not the correct terms here. I can change those to "total" and "active".
|
To be clear, you mean verify that a metric value exists but not what it is? Because this is multi-threaded behavior and we don't control the implementation of the underlying thread pool, it'll be difficult to achieve a stable expectation for number of total/active threads in a test case. |
Yes, just testing that a metric value exists if we don't have predictable behavior to test on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we best test this in an automated fashion?
I'm not sure we can in every scenario. I had to do a lot of testing locally with an observability stack setup to actually view the metrics working. Let me know if you discover something better.
This change only adds support for OkHttp. CRT doesn't seem to have support for interrogating metrics about HTTP client thread counts. Is that good enough for now or should we push for enhanced support in CRT?
I'd say do what we can for now (although see my comment about getting this in SRA first before finalizing it).
The additions to HttpClientMetrics eschew the existing pattern of pushing metrics to intermediate refs and the polling them from the async counter in favor of pulling the source data directly via a passed callback. This seems cleaner to me since it removes unnecessary state but I'm curious whether others think it's a good idea (and whether we'd potentially want to extend it to other async counters as appropriate).
The reason it's structured this way is so that engines can update the values however they see fit (whether that be inc/dec or absolute value changes). See my other comment here but I don't think I see a reason to special case this new metric and would likely prefer to keep them consistent.
@@ -39,6 +44,7 @@ public object HttpClientMetricAttributes { | |||
public class HttpClientMetrics( | |||
scope: String, | |||
public val provider: TelemetryProvider, | |||
threadStateCallback: (() -> ThreadState?)? = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems different for no apparent reason. It's not wrong I guess but feels odd that we would special case it since each engine can adapt however they want to the current way metrics are exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I concede it's very different from what we have elsewhere and feels very specialized right now.
I think one problem is that we don't directly control the thread pool. We don't decide when new threads are created or when they swap from idle to active. All we can do is inspect the state at any given moment to know the current value. That sounds like what an AsyncUpDownCounter
is intended to do via a callback.
We could start a background thread in the OkHttp engine to periodically inspect the thread pool and stuff the idle/active counts into HttpClientMetrics
, which would then be read when the callback is run. But this means the value will be out of sync by whatever interval the background thread uses. If, for instance, the background thread runs every 10 seconds but the meter polls every 5 seconds, then up to two datapoints will be reported inaccurately.
Is your greater concern that this change exposes the concept of callbacks outside of HttpClientMetrics
and inverts the collection model from a push to a pull? Is it the inconsistency of some metrics being exposed outside of HttpClientMetrics
as callbacks while others are mutable properties inside? Or is it that this may not be flexible enough to adapt to other engines where thread updates are more deterministic?
@@ -25,8 +25,13 @@ public object HttpClientMetricAttributes { | |||
public val AcquiredConnection: Attributes = attributesOf { "state" to "acquired" } | |||
public val QueuedRequest: Attributes = attributesOf { "state" to "queued" } | |||
public val InFlightRequest: Attributes = attributesOf { "state" to "in-flight" } | |||
public val TotalThread: Attributes = attributesOf { "state" to "total" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: The state attribute for this should probably follow the usage convention where the sub states total to some limit.
e.g. thread state usage metric smithy.client.http.threads.usage
with states of idle | consumed
and a separate metric for the limit smithy.client.http.threads.limit
.
Also I suppose we should probably get this defined in the SRA before codifying it in an implementation.
We may not know the total thread limit in every case or thread pools may be dynamic and spin down threads which would break the "should" part of the convention but I think that's fine. In the ideal case the convention applies and even when it doesn't it's still how you'd want it graphed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll change this to idle/active threads.
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
Issue #
#894
Description of changes
This change adds metrics for the OkHttp engine's thread pool. It should be considered WIP until the following open questions are addressed:
HttpClientMetrics
eschew the existing pattern of pushing metrics to intermediate refs and the polling them from the async counter in favor of pulling the source data directly via a passed callback. This seems cleaner to me since it removes unnecessary state but I'm curious whether others think it's a good idea (and whether we'd potentially want to extend it to other async counters as appropriate).By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.