Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(prometheus): expose controlplane connectivity state as a gauge #14020

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aryan9600
Copy link
Member

@aryan9600 aryan9600 commented Dec 13, 2024

Summary

Add a new Prometheus gauge metric controlplane_reachable. Similar to datastore_reachable gauge, 0 means the connection is not healthy; 1 means that the connection is healthy. We mark the connection as unhealthy under the following circumstances:

  • Failure while establihing a websocket connection
  • Failure while sending basic information to controlplane
  • Failure while sending ping to controlplane
  • Failure while receiving a packet from the websocket connection

This is helpful for users running a signficant number of gateways to be alerted about potential issues any gateway(s) may be facing while talking to the controlplane.

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Issue reference

Fix #[issue number]

@github-actions github-actions bot added core/clustering plugins/prometheus cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee labels Dec 13, 2024
@aryan9600 aryan9600 force-pushed the cp-conn-prom-metric branch 2 times, most recently from 4fbf17f to c2d6278 Compare December 23, 2024 08:26
Add a new Prometheus gauge metric `control_plane_reachable`. Similar to
`datastore_reachable` gauge, 0 means the connection is not healthy; 1
means that the connection is healthy. We mark the connection as
unhealthy under the following circumstances:
* Failure while establihing a websocket connection
* Failure while sending basic information to controlplane
* Failure while sending ping to controlplane
* Failure while receiving a packet from the websocket connection

This is helpful for users running a signficant number of gateways to be
alerted about potential issues any gateway(s) may be facing while
talking to the controlplane.

Signed-off-by: Sanskar Jaiswal <[email protected]>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 23, 2024
@aryan9600 aryan9600 marked this pull request as ready for review December 23, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee core/clustering plugins/prometheus size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant