Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics endpoint #625

Closed
lewismiddleton opened this issue Jun 26, 2024 · 4 comments · Fixed by #630
Closed

Add metrics endpoint #625

lewismiddleton opened this issue Jun 26, 2024 · 4 comments · Fixed by #630

Comments

@lewismiddleton
Copy link
Contributor

I'd like to scrape metrics about the internal runner-manager state. Specifically have it expose how many runners it's tracking in each of its runner groups.

This is useful because we've sometimes observed that AWS EC2 instances may get 'orphaned' meaning the runner-manager service has lost track of it and forgets to delete it. Monitoring these metrics would allow us to detect these 'orphaned' instances by alerting on when the number of EC2 machines != the tracked number of runners.

I propose using Prometheus instrumentation via the prometheus-client library. This will introduce a /metrics endpoint that is used for machine readable metrics.

Are there any objections to this approach?

@tcarmet
Copy link
Contributor

tcarmet commented Jun 27, 2024

No objections, it would be a very much welcome contribution.

Quick note about the orphan situation, we recently noticed that in our kubernetes manifests we made a mistake on the redis volume path which in result would not persist our data and create orphans every time redis would reboot. (if that's the method of deployment you are using, worth double checking).

@lewismiddleton
Copy link
Contributor Author

I tried an implementation in #628 but I don't think that's the right approach. To get prometheus metrics, I'm now looking at something like dumping the Redis data as JSON using get_runners() upon request to /groups/{name}/list.

Then writing a custom prometheus exporter to consume that JSON and parse it into prometheus metrics.

@lewismiddleton
Copy link
Contributor Author

I got something I'm happy with in #630.

#629 is a stand alone change that you may want to keep but I've got what I needed from #630.

@tcarmet
Copy link
Contributor

tcarmet commented Jul 10, 2024

Looked at both change and I'm happy to get both in. I think they're both useful and look pretty good to me 👌 . Nice work.

tcarmet pushed a commit that referenced this issue Jul 10, 2024
Adds an endpoint at `/groups/{name}/list` to list the runners in each
runner group .

Relates to #625
tcarmet pushed a commit that referenced this issue Jul 11, 2024
Adds a metrics endpoint that computes updated metric values on each
request.

On each request we:
- fetch runner groups from Redis
- get all runners for each group
- update a Gauge for the runners_count on each group
- render out the prometheus metrics in standard format

Resolves #625
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants