From f3ec63c901b69996b2a615b1f26ad148bd6a3f94 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Tue, 26 Nov 2024 16:01:56 +0200 Subject: [PATCH] Docs: new MySQL server throttler metrics (#1878) * Docs: new MySQL server throttler metrics Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> * also ducmneting history_list_length metric Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --------- Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- .../reference/features/tablet-throttler.md | 28 +++++++++++++------ 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/content/en/docs/22.0/reference/features/tablet-throttler.md b/content/en/docs/22.0/reference/features/tablet-throttler.md index f852ea0de..02de5256e 100644 --- a/content/en/docs/22.0/reference/features/tablet-throttler.md +++ b/content/en/docs/22.0/reference/features/tablet-throttler.md @@ -40,10 +40,13 @@ However, we limit the collaboration to specific tablet types, based on `--thrott The objective of the throttler is to push back work based on database load. Previously, this was done based on a single metric, which could be either the replication lag, or the result of a custom query. Now, the throttler collects multiple metrics. The current supported metrics are: -- Replication lag (`lag`), measured in seconds. -- Load average (`loadavg`), per core, on the tablet server/container. -- MySQL `Threads_running` value (`threads_running`). -- Custom query (`custom`) as defined by the user. +- `lag`: replication lag, measured in seconds. +- `loadavg`: load average, per core, on the tablet server/container. +- `threads_running`: MySQL's `Threads_running` value. +- `custom`: a custom query as defined by the user. +- `mysqld-loadavg`: load average, per core, on the MySQL server/container. +- `mysqld-datadir-used-ratio`: disk space usage on MySQL's `datadir` mount, range `0.0` (empty) to `1.0` (full) +- `history_list_length`: InnoDB's history list length value. This list is expected to expand in the future. @@ -62,6 +65,10 @@ Each metric has a "factory default" threshold, e.g.: - `5` (5 seconds) for `lag`. - `1.0` (per core) for `loadavg`. - `100` for `threads_running`. +- `1.0` (per core) for `mysqld-loadavg`. +- `0.98` (98%) for `mysqld-datadir-used-ratio`. +- `1000000000` for `history_list_length`. + Thresholds are positive values. A threshold of `0` is considered _undefined_. @@ -797,20 +804,25 @@ These are the metrics by which the throttler compares with the threshold and dec Gauge, the current metric value on the tablet. This is the result of a self-check, done continuously when the throttler is enabled. Available per metric: - `ThrottlerAggregatedSelfCustom` +- `ThrottlerAggregatedSelfHistoryListLength` - `ThrottlerAggregatedSelfLag` - `ThrottlerAggregatedSelfLoadavg` -- `ThrottlerAggregatedSelfThreads_running` - +- `ThrottlerAggregatedSelfThreadsRunning` +- `ThrottlerAggregatedSelfMysqldLoadavg` +- `ThrottlerAggregatedSelfMysqldDatadirUsedRatio` ##### `ThrottlerAggregatedShard` Gauge, on the `PRIMARY` tablet only, this is the aggregated collected metric value from all serving shard tables, including the `PRIMARY`. The value is the highest (aka _worst_) of all collected tablets. Available per metric: - `ThrottlerAggregatedShardCustom` +- `ThrottlerAggregatedShardHistoryListLength` - `ThrottlerAggregatedShardLag` - `ThrottlerAggregatedShardLoadavg` -- `ThrottlerAggregatedShardThreads_running` - +- `ThrottlerAggregatedShardThreadsRunning` +- `ThrottlerAggregatedShardMysqldLoadavg` +- `ThrottlerAggregatedShardMysqldDatadirUsedRatio` + #### Check metrics The throttler is checked by apps (`vreplication`, `online-ddl`, etc), and responds with status codes, "OK" for "good to proceed" or any other code for "hold off".