Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windsock profilers/metrics refactor #8

Open
rukai opened this issue Nov 30, 2023 · 0 comments
Open

windsock profilers/metrics refactor #8

rukai opened this issue Nov 30, 2023 · 0 comments

Comments

@rukai
Copy link
Member

rukai commented Nov 30, 2023

Profiling is a bit of a mess right now.
Every profiler collects and stores its results in a slightly different way.

A possible refactor I'm thinking of is this:
Build a profiler binary that is downloaded to each instance.
This binary is then run over ssh, it will collect metrics and/or profile a running service until sigterm is issued at which point it reports all collected metrics+profile in bincode. bincode is used since we want transfer large amounts of binary data efficiently.
This profiler binary would be generic and suitable for use with any project.
We then invoke the binary as needed for the bench and process the results in a shotover specific way on the windsock side.

Depending on which flags the binary is invoked with, it will:

  • measure all system metrics that we currently get from sar
  • collect metrics from a prometheus endpoint (as a bonus this will ensure accurate timing of collection interval as network latency is avoided)
  • perform profiling of a PID via samply (keep in mind this would very rarely be useful for cloud benches as we would need bare metal to run a sampling profiler)

The binary should go in its own repo, be deployed via cargo-dist, and then in shotover repo we just pin a specific url to download from which we bump when we need to use a newer version.

This approach should allow for more accurate readings of prometheus metrics as we can poll exactly as the second ticks over without the variable latency of a network hop.

I dont intend to do this anytime soon as we have other priorities, but if we find ourselves needing to implement yet another way to collect metrics we should probably perform this refactor first.

@conorbros conorbros transferred this issue from shotover/shotover-proxy Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant