DOC-8878 Enhance Essential Metrics with Alert guidance #18537

florence-crl · 2024-05-10T19:32:13Z

(1) Added essential-alerts.md include file, a compilation of alerts from (a) a-entin's repo, (b) alertmanager alerts in public docs, and (c) Skills Taxonomy private doc.
(2) In essential-metrics.md, (a) added anchors to metrics used in essential-alerts.md, (b) added link to essential alerts.
(3) Added essential-alerts-self-hosted.md to display all the essential alerts.
(4) Added essential-alerts-advanced.md to display a subset of the essential alerts applicable to advanced clusters.
(5) In self-hosted-deployments.json and cloud-deployments.json, added links to the corresponding essential alerts pages.

Rendered preview:

github-actions · 2024-05-10T19:32:36Z

Files changed:

src/current/_includes/v24.1/essential-alerts.md:

src/current/_includes/v24.1/essential-metrics.md:

src/current/_includes/v24.1/sidebar-data/cloud-deployments.json
src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json
src/current/_includes/v24.2/essential-alerts.md:

src/current/_includes/v24.2/essential-metrics.md:

src/current/_includes/v24.2/sidebar-data/cloud-deployments.json
src/current/_includes/v24.2/sidebar-data/self-hosted-deployments.json
src/current/v24.1/essential-alerts-advanced.md
src/current/v24.1/essential-alerts-self-hosted.md
src/current/v24.2/essential-alerts-advanced.md
src/current/v24.2/essential-alerts-self-hosted.md

netlify · 2024-05-10T19:32:45Z

✅ Deploy Preview for cockroachdb-api-docs canceled.

Name	Link
🔨 Latest commit	`a759d83`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-api-docs/deploys/667ef97bf7c5030008bd544e

netlify · 2024-05-10T19:32:45Z

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name	Link
🔨 Latest commit	`a759d83`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/667ef97bb37a340008dda37e

netlify · 2024-05-10T19:36:16Z

✅ Netlify Preview

Name	Link
🔨 Latest commit	`d92a8e4`
🔍 Latest deploy log	https://app.netlify.com/sites/cockroachdb-docs/deploys/66e880d2c72ba100082da6bb
😎 Deploy Preview	https://deploy-preview-18537--cockroachdb-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

…lerts from private repo and alertmanager alerts from public docs. (2) In essential-metrics.md, added anchors to metrics used in essential-alerts-self-hosted.md.

… sections as essential-metrics.md.

…erts from public docs.

florence-crl · 2024-06-26T19:48:54Z

@andyyang890 OR @wenyihu6: please review the alert for Changefeed experiencing high latency.

@dikshant: please review the alerts for:

…alerts on storage.write-stalls and schedules.BACKUP.failed. (b) fixed case of headings to sentence case.

src/current/_includes/v24.1/essential-alerts.md

florence-crl

TFTR!

src/current/_includes/v24.1/essential-alerts.md

andyyang890

Changefeed experiencing high latency section LGTM! Will defer to others for final approval.

kevin-v-ngo

Looks great Florence! Excited to get this published.

dikshant · 2024-08-29T15:13:07Z

src/current/_includes/v24.1/essential-alerts.md

+## SQL 
+
+### Node not executing SQL
+
+Send an alert when a node is not executing SQL despite having connections. `sql.conns` shows the number of connections as well as the distribution, or balancing, of connections across cluster nodes. An imbalance can lead to nodes becoming overloaded.
+
+**Metric**
+<br>[`sql.conns`]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#sql-conns)
+<br>`sql.query.count`
+
+**Rule**
+<br>Set alerts for each node:
+<br>WARNING:  `sql.conns` greater than `0` while `sql.query.count` equals `0`
+
+**Action**
+
+- Refer to [Connection Pooling]({% link {{ page.version.version }}/connection-pooling.md %}).
+
+### SQL query failure
+
+Send an alert when the query failure count exceeds a user-determined threshold based on their application's SLA.
+
+**Metric**
+<br>[`sql.failure.count`]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#sql-failure-count)
+
+**Rule**
+<br>WARNING:  `sql.failure.count` is greater than a threshold (based on the user’s application SLA)
+
+**Action**
+
+-  Use the [**Insights** page]({% link {{ page.version.version }}/ui-insights-page.md %}) to find failed executions with their error code to troubleshoot or use application-level logs, if instrumented, to determine the cause of error.
+
+### SQL queries experiencing high latency
+
+Send an alert when the query latency exceeds a user-determined threshold based on their application’s SLA.
+
+**Metric**
+<br>[`sql.service.latency`]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#sql-service-latency)
+<br>[`sql.conn.latency`]({% link {{ page.version.version }}/essential-metrics-self-hosted.md %}#sql-conn-latency)
+
+**Rule**
+<br>WARNING:  (p99 or p90 of `sql.service.latency` plus average of `sql.conn.latency`) is greater than a threshold (based on the user’s application SLA)
+
+**Action**
+
+- Apply the time range of the alert to the [**SQL Activity** pages]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#sql-activity-pages) to investigate. Use the [**Statements** page]({% link {{ page.version.version }}/ui-statements-page.md %}) P90 Latency and P99 latency columns to correlate [statement fingerprints]({% link {{ page.version.version }}/ui-statements-page.md %}#sql-statement-fingerprints) with this alert.
+
+{% if include.deployment == 'self-hosted' %}


LGTM! @mgartner do you have any thoughts/concerns on highlighting these as essential metrics to monitor for SQL?

Hi @mgartner , when you have a chance, would you be able to review this section: SQL queries experiencing high latency?

In essential-metrics.md, took cloud-2.0 version and manually added links to metrics used by essential-alerts.md. Renamed essential-alerts-dedicated.md to essential-alerts-advanced.md. In essential-alerts.md, (a) replaced dedicated with advanced, (b) replaced links to essential-metrics-self-hosted.md with essential-metrics-{{ include.deployment }}.md. In cloud-deployments.json, added link to essential-alerts-advanced.md.

…th link to Essential Alerts. copied v24.1 changed files to v24.2

florence-crl · 2024-09-13T19:05:25Z

Hi @kathancox, I have made the cloud-2.0 and v24.2 changes I had planned. Please review this PR at your earliest convenience. The cloud-2.0 docs release is Sept. 25, so please keep this date in mind as this PR needs to be merged before then.

kathancox

Really great Florence! Pending your review of the comments, it looks good to me.

src/current/_includes/v24.1/essential-alerts.md

src/current/_includes/v24.2/essential-alerts.md

src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json

florence-crl

TFTRs!

src/current/_includes/v24.1/sidebar-data/self-hosted-deployments.json

src/current/_includes/v24.1/essential-alerts.md

src/current/_includes/v24.2/essential-alerts.md

draft 1

9e7ba58

florence-crl added 18 commits May 14, 2024 11:51

draft 2

9034968

Merge remote-tracking branch 'origin/main' into DOC-8878

1b60a79

draft 3

ba4878d

Merge remote-tracking branch 'origin/main' into DOC-8878

0c8906e

draft 4

faed197

Merge remote-tracking branch 'origin/main' into DOC-8878

50540bc

Merge remote-tracking branch 'origin/main' into DOC-8878

d98968d

draft 5, fixed links

b90965f

Merge remote-tracking branch 'origin/main' into DOC-8878

e58780f

draft 6, fixed summary

4f9464f

Merge remote-tracking branch 'origin/main' into DOC-8878

ae5befa

draft 7

5948b77

Merge remote-tracking branch 'origin/main' into DOC-8878

fcb6248

draft 8

ae131b7

Merge remote-tracking branch 'origin/main' into DOC-8878

7dfd58c

(1) Added essential-alerts-self-hosted.md, a compilation of a-entin a…

b6bcafc

…lerts from private repo and alertmanager alerts from public docs. (2) In essential-metrics.md, added anchors to metrics used in essential-alerts-self-hosted.md.

In essential-alerts-self-hosted.md, re-ordered alerts to have similar…

81da0c6

… sections as essential-metrics.md.

Merge remote-tracking branch 'origin/main' into DOC-8878

dfd1b0e

exalate-issue-sync bot mentioned this pull request Jun 7, 2024

Feedback: Troubleshoot Self-Hosted Setup - Replication issues #18632

Open

florence-crl added 2 commits June 7, 2024 18:06

In essential-alerts-self-hosted.md, added actions for alertmanager al…

e4a5844

…erts from public docs.

Merge remote-tracking branch 'origin/main' into DOC-8878

e676acc

florence-crl changed the title ~~DOC-8878~~ DOC-8878 Enhance Essential Metrics with Alert guidance Jun 10, 2024

florence-crl added 2 commits June 10, 2024 15:02

In essential-alerts-self-hosted.md, various edits.

8406015

Merge remote-tracking branch 'origin/main' into DOC-8878

8a64531

florence-crl requested a review from kevin-v-ngo June 10, 2024 19:12

florence-crl requested review from dikshant, andyyang890 and wenyihu6 June 26, 2024 19:49

florence-crl added 2 commits June 27, 2024 14:55

In essential-alerts.md include file, (a) from Skills Taxonomy, added …

027a134

…alerts on storage.write-stalls and schedules.BACKUP.failed. (b) fixed case of headings to sentence case.

Merge remote-tracking branch 'origin/main' into DOC-8878

e9d07f8

andyyang890 reviewed Jun 27, 2024

View reviewed changes

src/current/_includes/v24.1/essential-alerts.md Outdated Show resolved Hide resolved

florence-crl added 2 commits June 28, 2024 13:55

Incorporated Andy’s feedback.

a73fb03

Merge remote-tracking branch 'origin/main' into DOC-8878

a759d83

florence-crl commented Jun 28, 2024

View reviewed changes

src/current/_includes/v24.1/essential-alerts.md Outdated Show resolved Hide resolved

florence-crl requested review from andyyang890 and removed request for wenyihu6 June 28, 2024 18:07

andyyang890 reviewed Jun 28, 2024

View reviewed changes

kevin-v-ngo approved these changes Aug 28, 2024

View reviewed changes

dikshant reviewed Aug 29, 2024

View reviewed changes

florence-crl requested a review from kathancox September 9, 2024 19:12

florence-crl changed the base branch from main to cloud-2.0 September 10, 2024 14:19

florence-crl added 5 commits September 10, 2024 16:55

In essential-metrics.md, add anchor for sys.rss.

6c3d818

Merge remote-tracking branch 'origin/cloud-2.0' into DOC-8878

db09916

Resolve merge conflict in v24.1/essential-metrics.md.

b24f3d0

In v24.1/essential-metrics.md, replaced link to Events to alert on wi…

cd433df

…th link to Essential Alerts. copied v24.1 changed files to v24.2

dikshant approved these changes Sep 13, 2024

View reviewed changes

kathancox approved these changes Sep 16, 2024

View reviewed changes

florence-crl added 2 commits September 16, 2024 14:53

Incorporated Kathryn’s feedback

d8f4c87

Resolve merge conflicts

d92a8e4

florence-crl commented Sep 16, 2024

View reviewed changes

florence-crl merged commit 1c06a8b into cloud-2.0 Sep 16, 2024
4 checks passed

florence-crl deleted the DOC-8878 branch September 16, 2024 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC-8878 Enhance Essential Metrics with Alert guidance #18537

DOC-8878 Enhance Essential Metrics with Alert guidance #18537

florence-crl commented May 10, 2024 •

edited

Loading

github-actions bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading

florence-crl commented Jun 26, 2024

florence-crl left a comment

andyyang890 left a comment

kevin-v-ngo left a comment

dikshant Aug 29, 2024

florence-crl Sep 13, 2024

florence-crl commented Sep 13, 2024

kathancox left a comment

florence-crl left a comment

DOC-8878 Enhance Essential Metrics with Alert guidance #18537

DOC-8878 Enhance Essential Metrics with Alert guidance #18537

Conversation

florence-crl commented May 10, 2024 • edited Loading

github-actions bot commented May 10, 2024 • edited Loading

Files changed:

netlify bot commented May 10, 2024 • edited Loading

✅ Deploy Preview for cockroachdb-api-docs canceled.

netlify bot commented May 10, 2024 • edited Loading

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

netlify bot commented May 10, 2024 • edited Loading

✅ Netlify Preview

florence-crl commented Jun 26, 2024

florence-crl left a comment

Choose a reason for hiding this comment

andyyang890 left a comment

Choose a reason for hiding this comment

kevin-v-ngo left a comment

Choose a reason for hiding this comment

dikshant Aug 29, 2024

Choose a reason for hiding this comment

florence-crl Sep 13, 2024

Choose a reason for hiding this comment

florence-crl commented Sep 13, 2024

kathancox left a comment

Choose a reason for hiding this comment

florence-crl left a comment

Choose a reason for hiding this comment

florence-crl commented May 10, 2024 •

edited

Loading

github-actions bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading

netlify bot commented May 10, 2024 •

edited

Loading