Skip to content

Commit

Permalink
Update alarms runbooks (#3672)
Browse files Browse the repository at this point in the history
* Update API alarms runbooks

* Update Nuxt alarms runbooks

* Update runbook for unhealthy ECS host alarms
  • Loading branch information
krysal authored Jan 19, 2024
1 parent 487142a commit a2d8dac
Show file tree
Hide file tree
Showing 13 changed files with 38 additions and 50 deletions.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Average Response Time above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @krysaldb
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+above+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Average Response Time anomaly

```{admonition} Metadata
Status: **Unstable**
Maintainer: @krysaldb
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production P99 Response Time above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @krysaldb
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+above+threshold?)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production P99 Response Time anomaly

```{admonition} Metadata
Status: **Unstable**
Maintainer: @krysaldb
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Request Count anomalously high

```{admonition} Metadata
Status: **Unstable**
Maintainer: @krysaldb
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Request+Count+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Request+Count+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt Production Average Response Time above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @obulat
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+above+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt Production Average Response Time anomalously high

```{admonition} Metadata
Status: **Unstable**
Maintainer: @obulat
Status: **Stable**
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt 2XX responses count under threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @dhruvkb
Status: **Stable**
Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+2XX+responses+count+under+threshold)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+2XX+responses+count+under+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt 5XX responses count above threshold

```{admonition} Metadata
Status: **Unstable**
Maintainer: @dhruvkb
Status: **Stable**
Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+5XX+responses+count+over+threshold)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+5XX+responses+count+over+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity guide
Expand All @@ -25,9 +24,7 @@ errors (this can be checked by observing paths in the Cloudflare logs).
- If the API requests are returning 5XX responses, the severity is high. Further
investigation into the API side is warranted to determine the cause for the
5XX responses. Also refer to the
[API 5XX runbook](/meta/monitoring/runbooks/index.md).

<!-- TODO: Update link to /meta/monitoring/runbooks/api_5xx_above_threshold.md -->
[API 5XX runbook](/meta/monitoring/runbooks/api_http_5xx_above_threshold.md).

## Historical false positives

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
```{admonition} Metadata
Status: **Disabled** until Nuxt request logging is added.
Maintainer: @obulat
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+above+threshold)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
```{admonition} Metadata
Status: **Disabled** until Nuxt request logging is added.
Maintainer: @obulat
Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+anomalously+high>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+anomalously+high>)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Run Book: Nuxt Request Count anomalously high

```{admonition} Metadata
Status: **Unstable**
Maintainer: @dhruvkb
Status: **Stable**
Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Request+Count+anomalously+high)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Request+Count+anomalously+high)
```

## Severity guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Run Book: Unhealthy hosts for ECS service

```{admonition} Metadata
Status: **Unstable**
Status: **Stable**
Alarm links:
Expand Down

0 comments on commit a2d8dac

Please sign in to comment.