Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Khepri metrics are showing up without any tags (v4 beta 5) #12142

Open
luos opened this issue Aug 28, 2024 · 2 comments · May be fixed by #12190
Open

Khepri metrics are showing up without any tags (v4 beta 5) #12142

luos opened this issue Aug 28, 2024 · 2 comments · May be fixed by #12190
Assignees

Comments

@luos
Copy link
Contributor

luos commented Aug 28, 2024

Hi,

We're testing out Khepri and reviewing how we could monitor its behaviour and performance.

Today, we monitor mnesia transaction counters to see if there is a high amount of churn in the system - mostly because mnesia can cause some issues if the transaction count / restarts are very high.

We've noticed, that the metrics in the metric family ra_metrics show up without any tags, which I think potentially should be either in a different family, ie. khepri or metadata , or they should have proper tags, ie. for the rabbit_metadata|quorum_queues, etc.

Test setup:

  1. Deploy v4 beta 5
  2. Enable the khepri_db feature flag
  3. Create a quorum queue
  4. Call the metrics api curl localhost:15692/metrics/detailed?family=ra_metrics
  5. Call rabbitmqctl eval 'rabbit_khepri:status().'

Excerpt from the output:

$ curl localhost:15692/metrics/detailed?family=ra_metrics
rabbitmq_detailed_raft_log_last_written_index{vhost="/",queue="qq1"} 2
rabbitmq_detailed_raft_log_last_written_index 49
$ rabbitmqctl eval 'rabbit_khepri:status().'
...  {<<"Last Written">>,49}, ...

Describe the solution you'd like

  • Metric is tagged with something like ra_process=rabbit_metadata
  • Metric is served under a different family and different name.

Describe alternatives you've considered

I tried to look at the metric collection code, but from my cursory review I could not figure out how to add the tag, and not even sure that would be the preferred way to go about it. 😄

Additional context

Due to Khepri's consistency behaviour with the projections, It would be good to know if a node is falling behind.

@the-mikedavis the-mikedavis self-assigned this Aug 28, 2024
@michaelklishin
Copy link
Member

We can both use a tag (if we do so for other Ra machine process types) and provide a new endpoint or a metric family.

@luos
Copy link
Contributor Author

luos commented Aug 28, 2024

Thanks, yeah, I was thinking similar to what the-mikedavis proposed in the PR.

I think it is worth considering a different family, ie. we may have a many thousands of queues but only one/few metadata processes, and I expect we will care more about khepri than QQ indexes, though I can't say for sure at this point in time. :-)

At the same time, I do not expect the khepri process to take a lot of traffic - but I am sure it will happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants