Skip to content

Commit

Permalink
Release operator version: 2.14.1
Browse files Browse the repository at this point in the history
* Reuse the last copied image since release failed due to clean-cluster issue

Co-authored-by: Mark Michael <[email protected]>
Co-authored-by: Jerry Belmonte <[email protected]>
Co-authored-by: Anil Kodali <[email protected]>
Co-authored-by: Yuqi Jin <[email protected]>
Co-authored-by: Priya Selvaganesan <[email protected]>

* Update operator version from the file locally for the test

Co-authored-by: Jerry Belmonte <[email protected]>
Co-authored-by: Anil Kodali <[email protected]>
Co-authored-by: Yuqi Jin <[email protected]>
Co-authored-by: Priya Selvaganesan <[email protected]>
Co-authored-by: John Cornish <[email protected]>

* Fix the operator version file path

Co-authored-by: Anil Kodali <[email protected]>
Co-authored-by: Yuqi Jin <[email protected]>
Co-authored-by: Priya Selvaganesan <[email protected]>
Co-authored-by: John Cornish <[email protected]>
Co-authored-by: Mark Michael <[email protected]>

* Release operator version: 2.14.1

---------

Co-authored-by: John Cornish <[email protected]>
Co-authored-by: Mark Michael <[email protected]>
Co-authored-by: Jerry Belmonte <[email protected]>
Co-authored-by: Anil Kodali <[email protected]>
Co-authored-by: Yuqi Jin <[email protected]>
Co-authored-by: Priya Selvaganesan <[email protected]>
  • Loading branch information
7 people authored Oct 4, 2023
1 parent c6d6cfd commit e28bf9b
Show file tree
Hide file tree
Showing 40 changed files with 1,634 additions and 232 deletions.
33 changes: 31 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,11 +147,17 @@ We have templates for common scenarios. See the comments in each file for usage
You can see all configuration options in the [wavefront-full-config.yaml](deploy/scenarios/wavefront-full-config.yaml).
# Creating Alerts
## Creating Alerts
We have alerts on common Kubernetes issues. For details on creating alerts, see [alerts.md](docs/alerts/alerts.md).
### Pod Failure
### Observability Failures
| Alert name | Description |
|---|---|
| [Observability Status is Unhealthy](docs/alerts/templates/observability-status-unhealthy.json.tmpl) | The status of the Observability for Kubernetes is unhealthy. |
### Pod Failures
| Alert name | Description |
|---|---|
Expand All @@ -162,6 +168,29 @@ We have alerts on common Kubernetes issues. For details on creating alerts, see
| [Pod Out-of-memory Kills](docs/alerts/templates/pod-out-of-memory-kills.json.tmpl) | Workload has pod with container status `OOMKilled`. |
| [Container CPU Throttling](docs/alerts/templates/container-cpu-throttling.json.tmpl) | Workload has a container with high CPU throttling. |
| [Container CPU Overutilization](docs/alerts/templates/container-cpu-overutilization.json.tmpl) | Workload has a container with high CPU utilization. |
| [Container Memory Overutilization](docs/alerts/templates/container-memory-overutilization.json.tmpl) | Workload has a container with high memory utilization. |
| [Missing etcd leader](templates/etcd-no-leader.json.tmpl) | etcd cannot elect a leader. |
### Persistent Volume Failures
| Alert name | Description |
|---|---|
| [Persistent Volumes No Claim](docs/alerts/templates/persistent-volumes-no-claim.json.tmpl) | Persistent Volume has no claim. |
| [Persistent Volumes Error](docs/alerts/templates/persistent-volumes-error.json.tmpl) | Persistent Volume has issues with provisioning. |
| [Persistent Volume Claim Overutilization](docs/alerts/templates/persistent-volume-claim-overutilization.json.tmpl) | Workload has low available disk space for a claimed Persistent Volume. |
### Node Failures
| Alert name | Description |
|----------------------------------------------------------------------------------------------------|-------------|
| [Node Memory Overutilization](docs/alerts/templates/node-memory-overutilization.json.tmpl) | Node has high memory utilization. |
| [Node CPU Overutilization](docs/alerts/templates/node-cpu-overutilization.json.tmpl) | Node has high CPU utilization. |
| [Node Filesystem Overutilization](docs/alerts/templates/node-filesystem-overutilization.json.tmpl) | Node storage is almost full. |
| [Node CPU-request Saturation](docs/alerts/templates/node-cpu-request-saturation.json.tmpl) | Node has overcommitted cpu resource requests. |
| [Node Memory-request Saturation](docs/alerts/templates/node-memory-request-saturation.json.tmpl) | Node has overcommitted memory resource requests. |
| [Node Disk Pressure](docs/alerts/templates/node-disk-pressure.json.tmpl) | Node has problematic `DiskPressure` condition. |
| [Node Memory Pressure](docs/alerts/templates/node-memory-pressure.json.tmpl) | Node has problematic `MemoryPressure` condition. |
| [Node Condition Not Ready](docs/alerts/templates/node-condition-not-ready.json.tmpl) | Node Condition not in Ready state. |
## Bring Your Own Logs Shipper
Expand Down
2 changes: 1 addition & 1 deletion collector/release/NEXT_RELEASE_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.26.1
1.27.0
2 changes: 1 addition & 1 deletion collector/release/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.25.0
1.26.1
18 changes: 10 additions & 8 deletions deploy/crd/wavefront.com_wavefronts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ spec:
default:
resources:
limits:
cpu: 400m
cpu: 2000m
ephemeral-storage: 1Gi
memory: 512Mi
requests:
Expand Down Expand Up @@ -297,7 +297,7 @@ spec:
default:
resources:
limits:
cpu: 200m
cpu: 1000m
ephemeral-storage: 512Mi
memory: 256Mi
requests:
Expand Down Expand Up @@ -712,19 +712,21 @@ spec:
type: object
type: object
type: object
kubernetesEvents:
description: KubernetesEvents is deprecated, please use aria-insights-secret
instead
insights:
description: Insights
properties:
enable:
default: false
description: Enable is whether to enable events. Defaults
description: Enable is whether to enable Insights. Defaults
to false.
type: boolean
externalEndpointURL:
ingestionUrl:
description: Ingestion Url is the endpoint to send kubernetes
events.
pattern: ^http(s)?:\/\/.+
type: string
required:
- externalEndpointURL
- ingestionUrl
type: object
type: object
imagePullSecret:
Expand Down
11 changes: 6 additions & 5 deletions deploy/scenarios/wavefront-full-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Need to change YOUR_CLUSTER_NAME and YOUR_WAVEFRONT_URL accordingly
# This is not a valid configuration since some options are not compatible. See notes for more information.
# Unless otherwise specified, the values here are set to their default values.
apiVersion: wavefront.com/v1alpha1
kind: Wavefront
metadata:
Expand Down Expand Up @@ -56,16 +57,16 @@ spec:
- kubernetes.collector.runtime.*
tagGuaranteeList:
- label.env
defaultCollectionInterval: 90s #defaults to 60s
defaultCollectionInterval: 60s
# Rules based and Prometheus endpoints auto-discovery.
enableDiscovery: true #defaults to true
enableDiscovery: true
# controlPlane can enable/disable control plane metrics
controlPlane:
enable: true #defaults to true
enable: true
clusterCollector:
resources:
limits:
cpu: 400m
cpu: 2000m
ephemeral-storage: 1Gi
memory: 512Mi
requests:
Expand All @@ -75,7 +76,7 @@ spec:
nodeCollector:
resources:
limits:
cpu: 200m
cpu: 1000m
ephemeral-storage: 512Mi
memory: 256Mi
requests:
Expand Down
4 changes: 2 additions & 2 deletions deploy/scenarios/wavefront-pod-resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ spec:
cpu: 200m
memory: 10Mi
limits:
cpu: 400m
cpu: 2000m
memory: 512Mi
nodeCollector:
resources:
requests:
cpu: 200m
memory: 10Mi
limits:
cpu: 200m
cpu: 1000m
memory: 256Mi
dataExport:
wavefrontProxy:
Expand Down
24 changes: 13 additions & 11 deletions deploy/wavefront-operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ spec:
default:
resources:
limits:
cpu: 400m
cpu: 2000m
ephemeral-storage: 1Gi
memory: 512Mi
requests:
Expand Down Expand Up @@ -304,7 +304,7 @@ spec:
default:
resources:
limits:
cpu: 200m
cpu: 1000m
ephemeral-storage: 512Mi
memory: 256Mi
requests:
Expand Down Expand Up @@ -719,19 +719,21 @@ spec:
type: object
type: object
type: object
kubernetesEvents:
description: KubernetesEvents is deprecated, please use aria-insights-secret
instead
insights:
description: Insights
properties:
enable:
default: false
description: Enable is whether to enable events. Defaults
description: Enable is whether to enable Insights. Defaults
to false.
type: boolean
externalEndpointURL:
ingestionUrl:
description: Ingestion Url is the endpoint to send kubernetes
events.
pattern: ^http(s)?:\/\/.+
type: string
required:
- externalEndpointURL
- ingestionUrl
type: object
type: object
imagePullSecret:
Expand Down Expand Up @@ -1441,9 +1443,9 @@ subjects:
---
apiVersion: v1
data:
collector: 1.25.0
collector: 1.26.1
logging: 2.1.9
proxy: "13.1"
proxy: "13.2"
kind: ConfigMap
metadata:
labels:
Expand Down Expand Up @@ -1513,7 +1515,7 @@ spec:
configMapKeyRef:
key: logging
name: wavefront-component-versions
image: projects.registry.vmware.com/tanzu_observability/kubernetes-operator:2.13.0
image: projects.registry.vmware.com/tanzu_observability/kubernetes-operator:2.14.1
imagePullPolicy: Always
livenessProbe:
httpGet:
Expand Down
93 changes: 55 additions & 38 deletions docs/alerts/alerts.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,81 @@
# Alerts
This page contains the steps to create an alert template.

We have alert templates on common Kubernetes issues.
This page contains the steps to create alerts for the Observability for Kubernetes Operator.

* [Detect pod stuck in pending](templates/pod-stuck-in-pending.json.tmpl)
* [Detect pod stuck in terminating](templates/pod-stuck-in-terminating.json.tmpl)
* [Detect pod backoff event](templates/pod-backoff-event.json.tmpl)
* [Detect workload with non-ready pods](templates/workload-not-ready.json.tmpl)
* [Detect pod out-of-memory kills](templates/pod-out-of-memory-kills.json.tmpl)
* [Detect container cpu throttling](templates/container-cpu-throttling.json.tmpl)
* [Detect container cpu overutilization](templates/container-cpu-overutilization.json.tmpl)
## Table of Content

## Flags
- [Alert Templates](#alert-templates)
- [Creating Alerts](#creating-alerts)
- [Example: Creating All the Alerts](#example-creating-all-the-alerts)
- [Example: Creating a Single Alert](#example-creating-a-single-alert)
- [Customizing Alerts](#customizing-alerts)

```
Usage of ./create-alert.sh:
-t (Required) Wavefront API token
-c (Required) Wavefront instance name
-f (Required) path to alert file template
-n (Required) kubernetes cluster name
-h print usage info and exit
```
## Alert Templates

We have alert templates on common Kubernetes issues.

## Create an alert
| Alert | Template |
|---|---|
| [Detect if observability status is unhealthy](templates/observability-status-unhealthy.json.tmpl) | `observability-status-unhealthy.json.tmpl` |
| [Detect pod stuck in pending](templates/pod-stuck-in-pending.json.tmpl) | `pod-stuck-in-pending.json.tmpl` |
| [Detect pod stuck in terminating](templates/pod-stuck-in-terminating.json.tmpl) | `pod-stuck-in-terminating.json.tmpl` |
| [Detect pod backoff event](templates/pod-backoff-event.json.tmpl) | `pod-backoff-event.json.tmpl` |
| [Detect workload with non-ready pods](templates/workload-not-ready.json.tmpl) | `workload-not-ready.json.tmpl` |
| [Detect pod out-of-memory kills](templates/pod-out-of-memory-kills.json.tmpl) | `pod-out-of-memory-kills.json.tmpl` |
| [Detect container cpu throttling](templates/container-cpu-throttling.json.tmpl) | `container-cpu-throttling.json.tmpl` |
| [Detect container cpu overutilization](templates/container-cpu-overutilization.json.tmpl) | `container-cpu-overutilization.json.tmpl` |
| [Detect persistent volumes with no claims](templates/persistent-volumes-no-claim.json.tmpl) | `persistent-volumes-no-claim.json.tmpl` |
| [Detect persistent volumes with error](templates/persistent-volumes-error.json.tmpl) | `persistent-volumes-error.json.tmpl` |
| [Detect persistent volumes filling up](templates/persistent-volume-claim-overutilization.json.tmpl) | `persistent-volume-claim-overutilization.json.tmpl` |
| [Detect node memory overutilization](templates/node-memory-overutilization.json.tmpl) | `node-memory-overutilization.json.tmpl` |
| [Detect node cpu overutilization](templates/node-cpu-overutilization.json.tmpl) | `node-cpu-overutilization.json.tmpl` |
| [Detect node filesystem overutilization](templates/node-filesystem-overutilization.json.tmpl) | `node-filesystem-overutilization.json.tmpl` |
| [Detect node cpu-request saturation](templates/node-cpu-request-saturation.json.tmpl) | `node-cpu-request-saturation.json.tmpl` |
| [Detect node memory-request saturation](templates/node-memory-request-saturation.json.tmpl) | `node-memory-request-saturation.json.tmpl` |
| [Detect node disk pressure condition](templates/node-disk-pressure.json.tmpl) | `node-disk-pressure.json.tmpl` |
| [Detect node memory pressure condition](templates/node-memory-pressure.json.tmpl) | `node-memory-pressure.json.tmpl` |
| [Detect node condition not ready](templates/node-condition-not-ready.json.tmpl) | `node-not-ready.json.tmpl` |
| [Detect etcd has no leader](templates/etcd-no-leader.json.tmpl) | `etcd-no-leader.json.tmpl` |

### Step 1: Download the alert template file.
## Creating Alerts

1. Replace `<alert_file_output_path>`, (ex: `/tmp/pod-stuck-in-pending.json`).
2. Replace `<alert_template_file.json.tmpl>`, (ex: `pod-stuck-in-pending.json.tmpl`).
1. Ensure that you have the information for the required fields:
- **Wavefront API token**. See [Managing API Tokens](https://docs.wavefront.com/wavefront_api.html#managing-api-tokens) page.
- **Wavefront instance**. For example, the value of `<YOUR_WAVEFRONT_INSTANCE>` from your wavefront url (`https://<YOUR_WAVEFRONT_INSTANCE>.wavefront.com`).
- **Cluster name**. For example, the value of `clusterName` from your Wavefront Custom Resource configuration (ex: `mycluster-us-west-1`).
- **(Optional) Alert template**. For example, the value of `<alert_template_file.json.tmpl>` from the list of alert templates (ex: `pod-backoff-event.json.tmpl`).
- **(Optional) Alert target**. For example, an email address, PagerDuty key, or [alert target](https://docs.wavefront.com/webhooks_alert_notification.html). Alert targets can be a comma separated list.

### Example: Creating All the Alerts

```bash
export ALERT_FILE_OUTPUT_PATH=<alert_file_output_path>
export ALERT_TEMPLATE_FILE=<alert_template_file.json.tmpl>
curl -sSL -o "$ALERT_FILE_OUTPUT_PATH" "https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/templates/$ALERT_TEMPLATE_FILE"
curl -sSL https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/create-all-alerts.sh | bash -s -- \
-t <YOUR_API_TOKEN> \
-c <YOUR_WAVEFRONT_INSTANCE> \
-e <YOUR_ALERT_TARGET> \
-n <YOUR_CLUSTER_NAME>
```

### Step 2: Create the alert template.
>**Note:** You will need to change <YOUR_API_TOKEN>, <YOUR_WAVEFRONT_INSTANCE>, <YOUR_ALERT_TARGET>, and <YOUR_CLUSTER_NAME> in the above example.
1. Ensure that you have the information for the required fields:
- **Wavefront API token**. See [Managing API Tokens](https://docs.wavefront.com/wavefront_api.html#managing-api-tokens) page.
- **Wavefront instance**. For example, the value of `<your_instance>` from your wavefront url (`https://<your_instance>.wavefront.com`).
- **Cluster name**. For example, the value of `clusterName` from your Wavefront Custom Resource configuration (ex: `mycluster-us-west-1`).
- **Alert template file**. For example, the download output path of the alert template file from **Step 1**.
### Example: Creating a Single Alert

```bash
curl -sSL https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/create-alert.sh | bash -s -- \
-t <YOUR_API_TOKEN> \
-c <YOUR_WAVEFRONT_INSTANCE> \
-n <YOUR_CLUSTER_NAME> \
-f <PATH_TO_ALERT_FILE>
-e <YOUR_ALERT_TARGET> \
-f <ALERT_TEMPLATE>
```

**Note:** You will need to change YOUR_API_TOKEN, YOUR_WAVEFRONT_INSTANCE, YOUR_CLUSTER_NAME, and PATH_TO_ALERT_FILE in the above example.
>**Note:** You will need to change <YOUR_API_TOKEN>, <YOUR_WAVEFRONT_INSTANCE>, <YOUR_CLUSTER_NAME>, <YOUR_ALERT_TARGET>, and <ALERT_TEMPLATE> in the above example.
### Step 3: Customize the alert.
## Customizing Alerts

1. Log in to your service instance `https://<your_instance>.wavefront.com` as a user with the Alerts permission. Click **Alerting** > **All Alerts** from the toolbar to display the Alerts Browser.
1. Log in to your service instance `https://<YOUR_WAVEFRONT_INSTANCE>.wavefront.com` as a user with the Alerts permission. Click **Alerting** > **All Alerts** from the toolbar to display the Alerts Browser.
2. Click the alert name, or click the ellipsis icon next to the alert and select **Edit**. You can search for the alert by typing the alert name in the search field.
3. Change the alert properties when you edit the alert.
4. Specify alert recipients to receive notifications when the alert changes state.
5. Click **Save** in the top right to save your changes.
4. Click **Save** in the top right to save your changes.

See [Create and Manage Alerts](https://docs.wavefront.com/alerts_manage.html) for an overview on how to create and manage alerts.
>**Note:** See [Create and Manage Alerts](https://docs.wavefront.com/alerts_manage.html) for an overview on how to create and manage alerts.
Loading

0 comments on commit e28bf9b

Please sign in to comment.