Optimization for slow AsyncGauge execution #31

gaojieliu · 2024-03-11T23:33:44Z

This PR introduces a dynamic way to track slow AsyncGauge metric execution and tries not to block the caller thread as much as possible. In the high-level, this PR introduces a AsyncGaugeExecutor, which implements the following strategy:

There are two executors and one for regular metrics and the other one is for slow metrics.
All the metric evaluations are triggered by the caller.
If the actual metric execution time exceeds the configured slow metric threshold, it will be moved to slow metric tracking map, which indicates the following behaviors: a. The next metric measurement call will return the cached value immediately. b. The submitted measurable will be executed asynchronously. c. If the actual measurement runtime latency becomes lower than the slow metric threshold, it will be moved out of slow metric tracking map.
If the actual metric execution time belows the configured slow metric threshold, the following behaviors will be observed: a. After submitting the measurable to the regular executor, it will wait up to configured {@link AsyncGaugeExecutor#initialMetricsMeasurementTimeoutInMs} to collect the latest result. b. If it can't collect the latest value in step #a, the next call will examine the previous execution to decide whether it should be put into the slow metric tracking map or not.
There is an async thread to clean up inactive metrics from slow metric tracking map to avoid the accumulation of garbage because of metric deletion.

There are several config params of AsyncGaugeExecutor and the user can tune it according to the actual load pattern, and the caller can construct a global AsyncGaugeExecutor and pass it to MetricsRepository via MetricConfig.

This PR introduces a dynamic way to track slow AsyncGauge metric execution and tries not to block the caller thread as much as possible. In the high-level, this PR introduces a `AsyncGaugeExecutor`, which implements the following strategy: 1. There are two executors and one for regular metrics and the other one is for slow metrics. 2. All the metric evaluations are triggered by the caller. 3. If the actual metric execution time exceeds the configured slow metric threshold, it will be moved to slow metric tracking map, which indicates the following behaviors: a. The next metric measurement call will return the cached value immediately. b. The submitted measurable will be executed asynchronously. c. If the actual measurement runtime latency becomes lower than the slow metric threshold, it will be moved out of slow metric tracking map. 4. If the actual metric execution time belows the configured slow metric threshold, the following behaviors will be observed: a. After submitting the measurable to the regular executor, it will wait up to configured {@link AsyncGaugeExecutor#initialMetricsMeasurementTimeoutInMs} to collect the latest result. b. If it can't collect the latest value in step #a, the next call will examine the previous execution to decide whether it should be put into the slow metric tracking map or not. 5. There is an async thread to clean up inactive metrics from slow metric tracking map to avoid the accumulation of garbage because of metric deletion. There are several config params of `AsyncGaugeExecutor` and the user can tune it according to the actual load pattern, and the caller can construct a global `AsyncGaugeExecutor` and pass it to `MetricsRepository` via `MetricConfig`.

huangminchn

Thanks Gaojie! Looks good overall; left some comments.

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java

huangminchn

Thanks a lot Gaojie!

gaojieliu requested review from huangminchn and FelixGV March 11, 2024 23:34

huangminchn reviewed Mar 13, 2024

View reviewed changes

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Show resolved Hide resolved

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Show resolved Hide resolved

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Outdated Show resolved Hide resolved

Addressed comments

e59e41f

sushantmane reviewed Mar 13, 2024

View reviewed changes

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Outdated Show resolved Hide resolved

sushantmane reviewed Mar 13, 2024

View reviewed changes

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Show resolved Hide resolved

Addressed new comments

f63fbac

huangminchn reviewed Mar 13, 2024

View reviewed changes

src/main/java/io/tehuti/metrics/stats/AsyncGauge.java Show resolved Hide resolved

huangminchn approved these changes Mar 13, 2024

View reviewed changes

gaojieliu merged commit a50499f into tehuti-io:master Mar 14, 2024
1 check passed

gaojieliu mentioned this pull request Apr 4, 2024

[controller] Fix multiple AdminExecutionTasks working on the same store at the same time. linkedin/venice#918

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for slow AsyncGauge execution #31

Optimization for slow AsyncGauge execution #31

gaojieliu commented Mar 11, 2024

huangminchn left a comment

huangminchn left a comment

Optimization for slow AsyncGauge execution #31

Optimization for slow AsyncGauge execution #31

Conversation

gaojieliu commented Mar 11, 2024

huangminchn left a comment

Choose a reason for hiding this comment

huangminchn left a comment

Choose a reason for hiding this comment