Skip to content

Commit

Permalink
monitoring: Configure KSM & cluster dashboard
Browse files Browse the repository at this point in the history
Update kube-prometheus-stack helm release values to configure
kube-state-metrics and use kube-state-metrics to collect gotk resource
state metrics.

- Configure kube-state-metrics to run in custom resource state only
  mode. In this mode, it'll only watch custom resources. Also, pass
  empty collectors as extra args to prevent passing all the core
  resources to watch as an argument.
- Running kube-state-metrics in custom resource state only mode makes
  the default grafana dashboards of no use. Disable the default
  dashboards.
- Add kube-state-metrics configuration to provide RBAC permissions to it
  to allow listing and watching flux CRDs.
- Also, configure custom resource state for each of the flux custom
  resources using Info type metrics called `gotk_resource_info`. KSM
  issues a warning if an Info type object doesn't have `_info` suffix.
  These metrics have the value 1 always. This works well for the CRD
  state metrics as a zero value would mean that the resource doesn't
  exist, in which case, the resource is deleted.
- Update the cluster dashboard panels to use `gotk_resource_info` in the
  queries.
  - The panels have been updated such that it's work with static
  resources which don't have any status as well. By default, it assumes
  such static resources to be in a Ready state. Resources are seen as
  failed only when the ready value is false.
  - The queries have been updated to Instant type in order to show the
  current data, instead of the result of past 15 minutes. This shows
  more accurate resource data as the resource metrics change.
  - The Stat visualizers have been updated to have zero as the default
  value when there's no data. This is to prevent showing no data when
  there's no object. This was motivated by the behavior of the previous
  configuration which depended on stale metrics from controllers and
  deleted conditions to show zero value when objects get deleted. With
  the fixes in the controller metrics that removes stale metrics, this
  will no longer work. In order to show a zero value for these stats, a
  default is set.

Signed-off-by: Sunny <[email protected]>
  • Loading branch information
darkowlzz committed Jul 31, 2023
1 parent 44d69d6 commit d5c0eb6
Show file tree
Hide file tree
Showing 2 changed files with 474 additions and 133 deletions.
243 changes: 243 additions & 0 deletions manifests/monitoring/kube-prometheus-stack/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,249 @@ spec:
podMonitorSelector:
matchLabels:
app.kubernetes.io/component: monitoring
grafana:
defaultDashboardsEnabled: false
kube-state-metrics:
collectors: []
extraArgs:
- --custom-resource-state-only=true
rbac:
extraRules:
- apiGroups:
- source.toolkit.fluxcd.io
- kustomize.toolkit.fluxcd.io
- helm.toolkit.fluxcd.io
- image.toolkit.fluxcd.io
- notification.toolkit.fluxcd.io
resources:
- gitrepositories
- buckets
- helmrepositories
- helmcharts
- ocirepositories
- kustomizations
- helmreleases
- imagerepositories
- imagepolicies
- imageupdateautomations
- alerts
- providers
- receivers
verbs: ["list", "watch"]
customResourceState:
enabled: true
config:
spec:
resources:
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1"
kind: GitRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: Bucket
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: HelmRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
type: [spec, type]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: HelmChart
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: source.toolkit.fluxcd.io
version: "v1beta2"
kind: OCIRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: kustomize.toolkit.fluxcd.io
version: "v1"
kind: Kustomization
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: helm.toolkit.fluxcd.io
version: "v2beta1"
kind: HelmRelease
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta2"
kind: ImageRepository
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta2"
kind: ImagePolicy
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: image.toolkit.fluxcd.io
version: "v1beta1"
kind: ImageUpdateAutomation
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1beta2"
kind: Alert
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1beta2"
kind: Provider
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
- groupVersionKind:
group: notification.toolkit.fluxcd.io
version: "v1"
kind: Receiver
metricNamePrefix: gotk
metrics:
- name: "resource_info"
help: "The current state of a GitOps Toolkit resource."
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
labelsFromPath:
exported_namespace: [metadata, namespace]
ready: [status, conditions, "[type=Ready]", status]
postRenderers:
- kustomize:
patches:
Expand Down
Loading

0 comments on commit d5c0eb6

Please sign in to comment.