You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since #307 we now have generic go metrics, like mem, gc, threads etc.
Let's add application level metrics for the operator iself, that could be useful for Grafana Board and alerts. Suggestions:
Gauge of nuber of currently managed CRD instances for SolrClouds, SolrBackups, SolrPrometheusExporter
Gauge for CRDs currently in a failure state
Reconcile stats
Successful vs failed reconcile events, broken down to what kind of event
Size of pending operations in reconcile queue (if such a thing)
Operation stats
For each operation type (install, upgrade, delete, backup etc) counts and status
Goal would be to make a simple Grafana board where you can filter on namespace etc to see raw operator health, and at a glance whether some operations are in failure state etc. Futher filter by labels like SolrCloud name, so you can see number of failed operations towards each cluster, and when they happened.
The text was updated successfully, but these errors were encountered:
Since #307 we now have generic
go
metrics, like mem, gc, threads etc.Let's add application level metrics for the operator iself, that could be useful for Grafana Board and alerts. Suggestions:
Goal would be to make a simple Grafana board where you can filter on namespace etc to see raw operator health, and at a glance whether some operations are in failure state etc. Futher filter by labels like SolrCloud name, so you can see number of failed operations towards each cluster, and when they happened.
The text was updated successfully, but these errors were encountered: