Release operator version: 2.14.1

* Reuse the last copied image since release failed due to clean-cluster issue Co-authored-by: Mark Michael <[email protected]> Co-authored-by: Jerry Belmonte <[email protected]> Co-authored-by: Anil Kodali <[email protected]> Co-authored-by: Yuqi Jin <[email protected]> Co-authored-by: Priya Selvaganesan <[email protected]> * Update operator version from the file locally for the test Co-authored-by: Jerry Belmonte <[email protected]> Co-authored-by: Anil Kodali <[email protected]> Co-authored-by: Yuqi Jin <[email protected]> Co-authored-by: Priya Selvaganesan <[email protected]> Co-authored-by: John Cornish <[email protected]> * Fix the operator version file path Co-authored-by: Anil Kodali <[email protected]> Co-authored-by: Yuqi Jin <[email protected]> Co-authored-by: Priya Selvaganesan <[email protected]> Co-authored-by: John Cornish <[email protected]> Co-authored-by: Mark Michael <[email protected]> * Release operator version: 2.14.1 --------- Co-authored-by: John Cornish <[email protected]> Co-authored-by: Mark Michael <[email protected]> Co-authored-by: Jerry Belmonte <[email protected]> Co-authored-by: Anil Kodali <[email protected]> Co-authored-by: Yuqi Jin <[email protected]> Co-authored-by: Priya Selvaganesan <[email protected]>
wavefrontHQ · Oct 4, 2023 · e28bf9b · e28bf9b
1 parent c6d6cfd
commit e28bf9b
Show file tree

Hide file tree

Showing 40 changed files with 1,634 additions and 232 deletions.
diff --git a/README.md b/README.md
@@ -147,11 +147,17 @@ We have templates for common scenarios. See the comments in each file for usage
 
 You can see all configuration options in the [wavefront-full-config.yaml](deploy/scenarios/wavefront-full-config.yaml).
 
-# Creating Alerts
+## Creating Alerts
 
 We have alerts on common Kubernetes issues. For details on creating alerts, see [alerts.md](docs/alerts/alerts.md).
 
-### Pod Failure
+### Observability Failures
+
+| Alert name | Description |
+|---|---|
+| [Observability Status is Unhealthy](docs/alerts/templates/observability-status-unhealthy.json.tmpl) | The status of the Observability for Kubernetes is unhealthy. |
+
+### Pod Failures
 
 | Alert name | Description |
 |---|---|
@@ -162,6 +168,29 @@ We have alerts on common Kubernetes issues. For details on creating alerts, see
 | [Pod Out-of-memory Kills](docs/alerts/templates/pod-out-of-memory-kills.json.tmpl) | Workload has pod with container status `OOMKilled`. |
 | [Container CPU Throttling](docs/alerts/templates/container-cpu-throttling.json.tmpl) | Workload has a container with high CPU throttling. |
 | [Container CPU Overutilization](docs/alerts/templates/container-cpu-overutilization.json.tmpl) | Workload has a container with high CPU utilization. |
+| [Container Memory Overutilization](docs/alerts/templates/container-memory-overutilization.json.tmpl) | Workload has a container with high memory utilization. |
+| [Missing etcd leader](templates/etcd-no-leader.json.tmpl) | etcd cannot elect a leader. |
+
+### Persistent Volume Failures
+
+| Alert name | Description |
+|---|---|
+| [Persistent Volumes No Claim](docs/alerts/templates/persistent-volumes-no-claim.json.tmpl) | Persistent Volume has no claim. |
+| [Persistent Volumes Error](docs/alerts/templates/persistent-volumes-error.json.tmpl) | Persistent Volume has issues with provisioning. |
+| [Persistent Volume Claim Overutilization](docs/alerts/templates/persistent-volume-claim-overutilization.json.tmpl) | Workload has low available disk space for a claimed Persistent Volume. |
+
+### Node Failures
+
+| Alert name                                                                                         | Description |
+|----------------------------------------------------------------------------------------------------|-------------|
+| [Node Memory Overutilization](docs/alerts/templates/node-memory-overutilization.json.tmpl)         | Node has high memory utilization. |
+| [Node CPU Overutilization](docs/alerts/templates/node-cpu-overutilization.json.tmpl)               | Node has high CPU utilization. |
+| [Node Filesystem Overutilization](docs/alerts/templates/node-filesystem-overutilization.json.tmpl) | Node storage is almost full. |
+| [Node CPU-request Saturation](docs/alerts/templates/node-cpu-request-saturation.json.tmpl) | Node has overcommitted cpu resource requests. |
+| [Node Memory-request Saturation](docs/alerts/templates/node-memory-request-saturation.json.tmpl) | Node has overcommitted memory resource requests. |
+| [Node Disk Pressure](docs/alerts/templates/node-disk-pressure.json.tmpl) | Node has problematic `DiskPressure` condition. |
+| [Node Memory Pressure](docs/alerts/templates/node-memory-pressure.json.tmpl) | Node has problematic `MemoryPressure` condition. |
+| [Node Condition Not Ready](docs/alerts/templates/node-condition-not-ready.json.tmpl)               | Node Condition not in Ready state. |
 
 ## Bring Your Own Logs Shipper
 

diff --git a/collector/release/NEXT_RELEASE_VERSION b/collector/release/NEXT_RELEASE_VERSION
@@ -1 +1 @@
-1.26.1
+1.27.0
diff --git a/collector/release/VERSION b/collector/release/VERSION
@@ -1 +1 @@
-1.25.0
+1.26.1
diff --git a/deploy/crd/wavefront.com_wavefronts.yaml b/deploy/crd/wavefront.com_wavefronts.yaml
@@ -158,7 +158,7 @@ spec:
                         default:
                           resources:
                             limits:
-                              cpu: 400m
+                              cpu: 2000m
                               ephemeral-storage: 1Gi
                               memory: 512Mi
                             requests:
@@ -297,7 +297,7 @@ spec:
                         default:
                           resources:
                             limits:
-                              cpu: 200m
+                              cpu: 1000m
                               ephemeral-storage: 512Mi
                               memory: 256Mi
                             requests:
@@ -712,19 +712,21 @@ spec:
                             type: object
                         type: object
                     type: object
-                  kubernetesEvents:
-                    description: KubernetesEvents is deprecated, please use aria-insights-secret
-                      instead
+                  insights:
+                    description: Insights
                     properties:
                       enable:
                         default: false
-                        description: Enable is whether to enable events. Defaults
+                        description: Enable is whether to enable Insights. Defaults
                           to false.
                         type: boolean
-                      externalEndpointURL:
+                      ingestionUrl:
+                        description: Ingestion Url is the endpoint to send kubernetes
+                          events.
+                        pattern: ^http(s)?:\/\/.+
                         type: string
                     required:
-                    - externalEndpointURL
+                    - ingestionUrl
                     type: object
                 type: object
               imagePullSecret:

diff --git a/deploy/scenarios/wavefront-full-config.yaml b/deploy/scenarios/wavefront-full-config.yaml
@@ -1,5 +1,6 @@
 # Need to change YOUR_CLUSTER_NAME and YOUR_WAVEFRONT_URL accordingly
 # This is not a valid configuration since some options are not compatible. See notes for more information.
+# Unless otherwise specified, the values here are set to their default values.
 apiVersion: wavefront.com/v1alpha1
 kind: Wavefront
 metadata:
@@ -56,16 +57,16 @@ spec:
           - kubernetes.collector.runtime.*
         tagGuaranteeList:
           - label.env
-      defaultCollectionInterval: 90s #defaults to 60s
+      defaultCollectionInterval: 60s
       # Rules based and Prometheus endpoints auto-discovery.
-      enableDiscovery: true #defaults to true
+      enableDiscovery: true
       # controlPlane can enable/disable control plane metrics
       controlPlane:
-        enable: true #defaults to true
+        enable: true
       clusterCollector:
         resources:
           limits:
-            cpu: 400m
+            cpu: 2000m
             ephemeral-storage: 1Gi
             memory: 512Mi
           requests:
@@ -75,7 +76,7 @@ spec:
       nodeCollector:
         resources:
           limits:
-            cpu: 200m
+            cpu: 1000m
             ephemeral-storage: 512Mi
             memory: 256Mi
           requests:

diff --git a/deploy/scenarios/wavefront-pod-resources.yaml b/deploy/scenarios/wavefront-pod-resources.yaml
@@ -16,15 +16,15 @@ spec:
             cpu: 200m
             memory: 10Mi
           limits:
-            cpu: 400m
+            cpu: 2000m
             memory: 512Mi
       nodeCollector:
         resources:
           requests:
             cpu: 200m
             memory: 10Mi
           limits:
-            cpu: 200m
+            cpu: 1000m
             memory: 256Mi
   dataExport:
     wavefrontProxy:

diff --git a/deploy/wavefront-operator.yaml b/deploy/wavefront-operator.yaml
@@ -165,7 +165,7 @@ spec:
                         default:
                           resources:
                             limits:
-                              cpu: 400m
+                              cpu: 2000m
                               ephemeral-storage: 1Gi
                               memory: 512Mi
                             requests:
@@ -304,7 +304,7 @@ spec:
                         default:
                           resources:
                             limits:
-                              cpu: 200m
+                              cpu: 1000m
                               ephemeral-storage: 512Mi
                               memory: 256Mi
                             requests:
@@ -719,19 +719,21 @@ spec:
                             type: object
                         type: object
                     type: object
-                  kubernetesEvents:
-                    description: KubernetesEvents is deprecated, please use aria-insights-secret
-                      instead
+                  insights:
+                    description: Insights
                     properties:
                       enable:
                         default: false
-                        description: Enable is whether to enable events. Defaults
+                        description: Enable is whether to enable Insights. Defaults
                           to false.
                         type: boolean
-                      externalEndpointURL:
+                      ingestionUrl:
+                        description: Ingestion Url is the endpoint to send kubernetes
+                          events.
+                        pattern: ^http(s)?:\/\/.+
                         type: string
                     required:
-                    - externalEndpointURL
+                    - ingestionUrl
                     type: object
                 type: object
               imagePullSecret:
@@ -1441,9 +1443,9 @@ subjects:
 ---
 apiVersion: v1
 data:
-  collector: 1.25.0
+  collector: 1.26.1
   logging: 2.1.9
-  proxy: "13.1"
+  proxy: "13.2"
 kind: ConfigMap
 metadata:
   labels:
@@ -1513,7 +1515,7 @@ spec:
             configMapKeyRef:
               key: logging
               name: wavefront-component-versions
-        image: projects.registry.vmware.com/tanzu_observability/kubernetes-operator:2.13.0
+        image: projects.registry.vmware.com/tanzu_observability/kubernetes-operator:2.14.1
         imagePullPolicy: Always
         livenessProbe:
           httpGet:

diff --git a/docs/alerts/alerts.md b/docs/alerts/alerts.md
@@ -1,64 +1,81 @@
 # Alerts
-This page contains the steps to create an alert template.
 
-We have alert templates on common Kubernetes issues.
+This page contains the steps to create alerts for the Observability for Kubernetes Operator.
 
-* [Detect pod stuck in pending](templates/pod-stuck-in-pending.json.tmpl)
-* [Detect pod stuck in terminating](templates/pod-stuck-in-terminating.json.tmpl)
-* [Detect pod backoff event](templates/pod-backoff-event.json.tmpl)
-* [Detect workload with non-ready pods](templates/workload-not-ready.json.tmpl)
-* [Detect pod out-of-memory kills](templates/pod-out-of-memory-kills.json.tmpl)
-* [Detect container cpu throttling](templates/container-cpu-throttling.json.tmpl)
-* [Detect container cpu overutilization](templates/container-cpu-overutilization.json.tmpl)
+## Table of Content
 
-## Flags
+- [Alert Templates](#alert-templates)
+- [Creating Alerts](#creating-alerts)
+- [Example: Creating All the Alerts](#example-creating-all-the-alerts)
+- [Example: Creating a Single Alert](#example-creating-a-single-alert)
+- [Customizing Alerts](#customizing-alerts)
 
-```
-Usage of ./create-alert.sh:
-    -t  (Required) Wavefront API token
-    -c  (Required) Wavefront instance name
-    -f  (Required) path to alert file template
-    -n  (Required) kubernetes cluster name
-    -h  print usage info and exit
-```
+## Alert Templates
+
+We have alert templates on common Kubernetes issues.
 
-## Create an alert
+| Alert | Template |
+|---|---|
+| [Detect if observability status is unhealthy](templates/observability-status-unhealthy.json.tmpl) | `observability-status-unhealthy.json.tmpl` |
+| [Detect pod stuck in pending](templates/pod-stuck-in-pending.json.tmpl) | `pod-stuck-in-pending.json.tmpl` |
+| [Detect pod stuck in terminating](templates/pod-stuck-in-terminating.json.tmpl) | `pod-stuck-in-terminating.json.tmpl` |
+| [Detect pod backoff event](templates/pod-backoff-event.json.tmpl) | `pod-backoff-event.json.tmpl` |
+| [Detect workload with non-ready pods](templates/workload-not-ready.json.tmpl) | `workload-not-ready.json.tmpl` |
+| [Detect pod out-of-memory kills](templates/pod-out-of-memory-kills.json.tmpl) | `pod-out-of-memory-kills.json.tmpl` |
+| [Detect container cpu throttling](templates/container-cpu-throttling.json.tmpl) | `container-cpu-throttling.json.tmpl` |
+| [Detect container cpu overutilization](templates/container-cpu-overutilization.json.tmpl) | `container-cpu-overutilization.json.tmpl` |
+| [Detect persistent volumes with no claims](templates/persistent-volumes-no-claim.json.tmpl) | `persistent-volumes-no-claim.json.tmpl` |
+| [Detect persistent volumes with error](templates/persistent-volumes-error.json.tmpl) | `persistent-volumes-error.json.tmpl` |
+| [Detect persistent volumes filling up](templates/persistent-volume-claim-overutilization.json.tmpl) | `persistent-volume-claim-overutilization.json.tmpl` |
+| [Detect node memory overutilization](templates/node-memory-overutilization.json.tmpl) | `node-memory-overutilization.json.tmpl` |
+| [Detect node cpu overutilization](templates/node-cpu-overutilization.json.tmpl) | `node-cpu-overutilization.json.tmpl` |
+| [Detect node filesystem overutilization](templates/node-filesystem-overutilization.json.tmpl) | `node-filesystem-overutilization.json.tmpl` |
+| [Detect node cpu-request saturation](templates/node-cpu-request-saturation.json.tmpl) | `node-cpu-request-saturation.json.tmpl` |
+| [Detect node memory-request saturation](templates/node-memory-request-saturation.json.tmpl) | `node-memory-request-saturation.json.tmpl` |
+| [Detect node disk pressure condition](templates/node-disk-pressure.json.tmpl) | `node-disk-pressure.json.tmpl` |
+| [Detect node memory pressure condition](templates/node-memory-pressure.json.tmpl) | `node-memory-pressure.json.tmpl` |
+| [Detect node condition not ready](templates/node-condition-not-ready.json.tmpl)                     | `node-not-ready.json.tmpl`                          |
+| [Detect etcd has no leader](templates/etcd-no-leader.json.tmpl)                                     | `etcd-no-leader.json.tmpl`                          |
 
-### Step 1: Download the alert template file.
+## Creating Alerts
 
-1. Replace `<alert_file_output_path>`, (ex: `/tmp/pod-stuck-in-pending.json`).
-2. Replace `<alert_template_file.json.tmpl>`, (ex: `pod-stuck-in-pending.json.tmpl`).
+1. Ensure that you have the information for the required fields:
+    - **Wavefront API token**. See [Managing API Tokens](https://docs.wavefront.com/wavefront_api.html#managing-api-tokens) page.
+    - **Wavefront instance**. For example, the value of `<YOUR_WAVEFRONT_INSTANCE>` from your wavefront url (`https://<YOUR_WAVEFRONT_INSTANCE>.wavefront.com`).
+    - **Cluster name**. For example, the value of `clusterName` from your Wavefront Custom Resource configuration (ex: `mycluster-us-west-1`).
+    - **(Optional) Alert template**. For example, the value of `<alert_template_file.json.tmpl>` from the list of alert templates (ex: `pod-backoff-event.json.tmpl`).
+    - **(Optional) Alert target**. For example, an email address, PagerDuty key, or [alert target](https://docs.wavefront.com/webhooks_alert_notification.html). Alert targets can be a comma separated list.
+
+### Example: Creating All the Alerts
 
 ```bash
-export ALERT_FILE_OUTPUT_PATH=<alert_file_output_path>
-export ALERT_TEMPLATE_FILE=<alert_template_file.json.tmpl>
-curl -sSL -o "$ALERT_FILE_OUTPUT_PATH" "https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/templates/$ALERT_TEMPLATE_FILE"
+curl -sSL https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/create-all-alerts.sh | bash -s -- \
+  -t <YOUR_API_TOKEN> \
+  -c <YOUR_WAVEFRONT_INSTANCE> \
+  -e <YOUR_ALERT_TARGET> \
+  -n <YOUR_CLUSTER_NAME>
 ```
 
-### Step 2: Create the alert template.
+>**Note:** You will need to change <YOUR_API_TOKEN>, <YOUR_WAVEFRONT_INSTANCE>, <YOUR_ALERT_TARGET>, and <YOUR_CLUSTER_NAME> in the above example.
 
-1. Ensure that you have the information for the required fields:
-   - **Wavefront API token**. See [Managing API Tokens](https://docs.wavefront.com/wavefront_api.html#managing-api-tokens) page.
-   - **Wavefront instance**. For example, the value of `<your_instance>` from your wavefront url (`https://<your_instance>.wavefront.com`).
-   - **Cluster name**. For example, the value of `clusterName` from your Wavefront Custom Resource configuration (ex: `mycluster-us-west-1`).
-   - **Alert template file**. For example, the download output path of the alert template file from **Step 1**.
+### Example: Creating a Single Alert
 
 ```bash
 curl -sSL https://raw.githubusercontent.com/wavefrontHQ/observability-for-kubernetes/main/docs/alerts/create-alert.sh | bash -s -- \
   -t <YOUR_API_TOKEN> \
   -c <YOUR_WAVEFRONT_INSTANCE> \
   -n <YOUR_CLUSTER_NAME> \
-  -f <PATH_TO_ALERT_FILE>
+  -e <YOUR_ALERT_TARGET> \
+  -f <ALERT_TEMPLATE>
 ```
 
-**Note:** You will need to change YOUR_API_TOKEN, YOUR_WAVEFRONT_INSTANCE, YOUR_CLUSTER_NAME, and PATH_TO_ALERT_FILE in the above example.
+>**Note:** You will need to change <YOUR_API_TOKEN>, <YOUR_WAVEFRONT_INSTANCE>, <YOUR_CLUSTER_NAME>, <YOUR_ALERT_TARGET>, and <ALERT_TEMPLATE> in the above example.
 
-### Step 3: Customize the alert.
+## Customizing Alerts
 
-1. Log in to your service instance `https://<your_instance>.wavefront.com` as a user with the Alerts permission. Click **Alerting** > **All Alerts** from the toolbar to display the Alerts Browser.
+1. Log in to your service instance `https://<YOUR_WAVEFRONT_INSTANCE>.wavefront.com` as a user with the Alerts permission. Click **Alerting** > **All Alerts** from the toolbar to display the Alerts Browser.
 2. Click the alert name, or click the ellipsis icon next to the alert and select **Edit**.  You can search for the alert by typing the alert name in the search field.
 3. Change the alert properties when you edit the alert.
-4. Specify alert recipients to receive notifications when the alert changes state.
-5. Click **Save** in the top right to save your changes.
+4. Click **Save** in the top right to save your changes.
 
-See [Create and Manage Alerts](https://docs.wavefront.com/alerts_manage.html) for an overview on how to create and manage alerts.
+>**Note:** See [Create and Manage Alerts](https://docs.wavefront.com/alerts_manage.html) for an overview on how to create and manage alerts.