From ba80ee6f3cc84c6e5a55b67d206e677e778d2546 Mon Sep 17 00:00:00 2001 From: ealui-statsig Date: Mon, 25 Nov 2024 14:26:45 -0800 Subject: [PATCH] update monitoring doc --- docs/monitoring.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/monitoring.md b/docs/monitoring.md index 09afb92..ffd2643 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -137,7 +137,7 @@ We emit a number of events that allow you to monitor and ensure that the forward - **Description**: If using redis caching, indicates a successful write of the latest configuration to redis. - **Event Unit Type**: Count -- **Useful Dimensions**: path, sdk_key +- **Useful Dimensions**: path, sdk_key, lcut - **How to Interpret**: - Sum total data points to get volume - **Is it working?**: This should be one-to-one with HttpDataProviderGotData. @@ -146,7 +146,7 @@ We emit a number of events that allow you to monitor and ensure that the forward - **Description**: If using redis caching, indicates a failed write of the latest configuration to redis. - **Event Unit Type**: Count -- **Useful Dimensions**: path, sdk_key +- **Useful Dimensions**: path, sdk_key, lcut - **How to Interpret**: - Sum total data points to get volume - **Is it working?**: If this is happening consistantly, please validate that your redis instance/cluster is operating normally. @@ -155,7 +155,7 @@ We emit a number of events that allow you to monitor and ensure that the forward - **Description**: If using redis caching, if a request to our CDN/Origin fails, it will check redis to see if there is a newer payload. - **Event Unit Type**: Count -- **Useful Dimensions**: path, sdk_key +- **Useful Dimensions**: path, sdk_key, lcut - **How to Interpret**: - Sum total data points to get volume - **Is it working?**: If this is happening consistantly, please validate that your redis instance/cluster is operating normally. @@ -182,7 +182,7 @@ We emit a number of events that allow you to monitor and ensure that the forward - **Description**: If using redis caching, we ensure only one proxy instance is writing at any given time. This allows to you check which proxy instances are not doing RedisCacheWriteSucceed. - **Event Unit Type**: Count -- **Useful Dimensions**: path, sdk_key +- **Useful Dimensions**: path, sdk_key, lcut - **How to Interpret**: - Sum total data points to get volume - **Is it working?**: This should be the inverse of RedisCacheWriteSucceed, such that the sum of the two metrics is the total number of pods when grouping by path + sdk_key. @@ -244,6 +244,15 @@ We emit a number of events that allow you to monitor and ensure that the forward - Sum total data points to get volume - **Is it working?**: This should roughly match with pod shutdown, however, transient issues can occur on the client application level/networking stack that can cause slightly higher volume. +### GrpcEstimatedActiveStreams + +- **Description**: Estimated number of gRPC streams that are active. +- **Event Unit Type**: Gauge +- **Useful Dimensions**: sdk_key +- **How to Interpret**: + - Sum of latest data point across pods is the number of active connections +- **Is it working?**: If this is hitting the number of maximum grpc connections, or not incrementing, you will either be dropping connections at SFP or not able to connect at all. + ### StreamingChannelGotNewData - **Description**: Indicates that new data was received on a streaming channel which is about to be distributed to all GRPC connections.