-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus failing to reload Probe TLS cert and key from disk #598
Comments
What seems to be happening is that Prometheus loads updated certs into new connections but not existing connections: Connections are set to remain open unless they are idle for 5 minutes. As long as the scrape interval is significantly shorter than 5 minutes, they remain open indefinitely: One possible enhancement could be for Prometheus to flush any connection that hits a 403 error |
We are seeing the same too, namely k8s +1 to flush connections on 403 and/or on cert reload |
Did any of you find a solution/workaround to this, besides increasing scrape durations for tbot targets? I'll also +1 on implementing the 403 flush/reload... would help tremendously. Edit: Instead of connecting to the tunnel endpoint directly, I just put The big negative.... my scrape durations almost tripled from |
I'm running Prometheus Operator 0.71.2 with Prometheus 2.49.1 on EKS
I have metric endpoints protected by TLS cert and key. Teleport Tbot rotates the cert and key every n hours and writes them to a secret. There's a Probe resource that refers to that secret. Prometheus Operator loads the Probe into a Prometheus instance and rewrites the secret for that instance. Prometheus uses the rewritten secret to access the endpoint
What I'm seeing is that:
The secrets look up to date on the Prometheus pod filesystem during the issue
Probe definition:
Generated config:
This sounds similar to #345 but still happening today
The text was updated successfully, but these errors were encountered: