Add metrics endpoint #625

lewismiddleton · 2024-06-26T13:33:02Z

I'd like to scrape metrics about the internal runner-manager state. Specifically have it expose how many runners it's tracking in each of its runner groups.

This is useful because we've sometimes observed that AWS EC2 instances may get 'orphaned' meaning the runner-manager service has lost track of it and forgets to delete it. Monitoring these metrics would allow us to detect these 'orphaned' instances by alerting on when the number of EC2 machines != the tracked number of runners.

I propose using Prometheus instrumentation via the prometheus-client library. This will introduce a /metrics endpoint that is used for machine readable metrics.

Are there any objections to this approach?

The text was updated successfully, but these errors were encountered:

tcarmet · 2024-06-27T15:52:45Z

No objections, it would be a very much welcome contribution.

Quick note about the orphan situation, we recently noticed that in our kubernetes manifests we made a mistake on the redis volume path which in result would not persist our data and create orphans every time redis would reboot. (if that's the method of deployment you are using, worth double checking).

lewismiddleton · 2024-07-08T10:56:36Z

I tried an implementation in #628 but I don't think that's the right approach. To get prometheus metrics, I'm now looking at something like dumping the Redis data as JSON using get_runners() upon request to /groups/{name}/list.

Then writing a custom prometheus exporter to consume that JSON and parse it into prometheus metrics.

lewismiddleton · 2024-07-09T16:54:30Z

I got something I'm happy with in #630.

#629 is a stand alone change that you may want to keep but I've got what I needed from #630.

tcarmet · 2024-07-10T21:42:05Z

Looked at both change and I'm happy to get both in. I think they're both useful and look pretty good to me 👌 . Nice work.

Adds an endpoint at `/groups/{name}/list` to list the runners in each runner group . Relates to #625

Adds a metrics endpoint that computes updated metric values on each request. On each request we: - fetch runner groups from Redis - get all runners for each group - update a Gauge for the runners_count on each group - render out the prometheus metrics in standard format Resolves #625

lewismiddleton mentioned this issue Jul 8, 2024

superseded: add prometheus metrics #628

Closed

This was referenced Jul 8, 2024

api: add endpoint to list runners #629

Merged

api: add metrics endpoint #630

Merged

tcarmet pushed a commit that referenced this issue Jul 10, 2024

api: add endpoint to list runners (#629)

94e45c1

Adds an endpoint at `/groups/{name}/list` to list the runners in each runner group . Relates to #625

tcarmet closed this as completed in #630 Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics endpoint #625

Add metrics endpoint #625

lewismiddleton commented Jun 26, 2024

tcarmet commented Jun 27, 2024

lewismiddleton commented Jul 8, 2024

lewismiddleton commented Jul 9, 2024

tcarmet commented Jul 10, 2024

Add metrics endpoint #625

Add metrics endpoint #625

Comments

lewismiddleton commented Jun 26, 2024

tcarmet commented Jun 27, 2024

lewismiddleton commented Jul 8, 2024

lewismiddleton commented Jul 9, 2024

tcarmet commented Jul 10, 2024