Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods with attached PV staying in Terminating state when a worker node fails #419

Open
sati-max opened this issue Mar 18, 2024 · 0 comments

Comments

@sati-max
Copy link

Hi

We have few SigNoz setups running on a self-hosted kubernetes cluster. Overall it works nice and we don't have any problems with SigNoz. We decided to verify if our setup is tolerant of worker node failure. And sadly SigNoz pods that have attached PV (clickhouse, zookeeper, alert manager, query service) are stuck in Terminating state and won't be recreated/started on a working node in the cluster and basically wait until the "failed" node comes back. Which might be 5 minutes (when it's planned downtime) but also 5 hours (if there is actual problem and it takes time to solve it) during which SigNoz might not collect any data send to it from any endpoints...

View:
Zrzut ekranu 2024-03-18 152258

When checking (all mentioned above pods have the same info) from Rancher, this is what we see in Events and Conditions:
image
image

We don't have any other logs. kubectl describe pod <pod name> -n <namespace> doesn't say anything else.

How can we solve this issue? It seems that the pods that have attached PV for some reason can't be recreated. Other pods are in a dual state (Running on a different worker node and Terminating on "failed" node) and are working ok, and are getting sorted out when the "failed" node comes back.

We didn't really change anything regarding YAML configuration (only changed ports on otel-collector, disabled k8s-infra pod and increased the side of Clickhouse PV), so I won't paste it.

Thank you for any help.

Cheeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant