You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since jobs can enter and exit these states, they fit the definition of a prometheus gauge:
A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
Counters
However gauges do not make sense for the states that jobs can never exit, which might be called terminal states:
failed
error
canceled
successful
These metrics would be better captured in a counter:
A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.
Benefits
Counter metrics are more useful in Prometheus for building better visualizations and alerts using query functions like rate() and increase(), which in turn could be used to build SLOs based on AWX performance.
Select the relevant components
UI
API
Docs
Collection
CLI
Other
Steps to reproduce
Scrape /api/v2/metrics
Current results
awx_status_total{} gauge shows current values for jobs in all states.
# HELP awx_status_total Status of Job launched
# TYPE awx_status_total gauge
awx_status_total{status="running"} 0.0
awx_status_total{status="failed"} 21.0
awx_status_total{status="canceled"} 0.0
awx_status_total{status="successful"} 19.0
awx_status_total{status="waiting"} 0.0
awx_status_total{status="error"} 33.0
awx_status_total{status="pending"} 3.0
Sugested feature result
A gauge is returned for transient job states and a counter is kept for terminal job states.
# HELP awx_status_launched Status of Jobs launched but not completed
# TYPE awx_status_total gauge
awx_status_launched{status="running"} 0.0
awx_status_launched{status="waiting"} 0.0
awx_status_launched{status="pending"} 3.0
# HELP awx_status_completed Status of Jobs completed
# TYPE awx_status_total counter
awx_status_completed{status="failed"} 21
awx_status_completed{status="successful} 19
awx_status_completed{status="error"} 33
Additional information
No response
The text was updated successfully, but these errors were encountered:
Please confirm the following
Feature type
Enhancement to Existing Feature
Feature Summary
Currently the job metrics exposed by AWX for prometheus use
gauge
type metrics:Gauges
Gauge metrics make sense for transient states like
Since jobs can enter and exit these states, they fit the definition of a prometheus gauge:
Counters
However gauges do not make sense for the states that jobs can never exit, which might be called terminal states:
These metrics would be better captured in a counter:
Benefits
Counter metrics are more useful in Prometheus for building better visualizations and alerts using query functions like rate() and increase(), which in turn could be used to build SLOs based on AWX performance.
Select the relevant components
Steps to reproduce
Scrape
/api/v2/metrics
Current results
awx_status_total{}
gauge shows current values for jobs in all states.Sugested feature result
A gauge is returned for transient job states and a counter is kept for terminal job states.
Additional information
No response
The text was updated successfully, but these errors were encountered: