-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics endpoint #625
Comments
No objections, it would be a very much welcome contribution. Quick note about the orphan situation, we recently noticed that in our kubernetes manifests we made a mistake on the redis volume path which in result would not persist our data and create orphans every time redis would reboot. (if that's the method of deployment you are using, worth double checking). |
I tried an implementation in #628 but I don't think that's the right approach. To get prometheus metrics, I'm now looking at something like dumping the Redis data as JSON using Then writing a custom prometheus exporter to consume that JSON and parse it into prometheus metrics. |
Looked at both change and I'm happy to get both in. I think they're both useful and look pretty good to me 👌 . Nice work. |
Adds an endpoint at `/groups/{name}/list` to list the runners in each runner group . Relates to #625
Adds a metrics endpoint that computes updated metric values on each request. On each request we: - fetch runner groups from Redis - get all runners for each group - update a Gauge for the runners_count on each group - render out the prometheus metrics in standard format Resolves #625
I'd like to scrape metrics about the internal runner-manager state. Specifically have it expose how many runners it's tracking in each of its runner groups.
This is useful because we've sometimes observed that AWS EC2 instances may get 'orphaned' meaning the runner-manager service has lost track of it and forgets to delete it. Monitoring these metrics would allow us to detect these 'orphaned' instances by alerting on when the number of EC2 machines != the tracked number of runners.
I propose using Prometheus instrumentation via the prometheus-client library. This will introduce a
/metrics
endpoint that is used for machine readable metrics.Are there any objections to this approach?
The text was updated successfully, but these errors were encountered: