How to reach 100% uptime with Hetzner load balancers? #383
Replies: 3 comments 3 replies
-
Oh, I wasn't aware of upcoming maintenance for the load balancers, thanks for the heads up. So far I have been using Hetzner for personal, non critical stuff to be honest. At work we use GKE since forever, and although I have proposed Hetzner to cut costs (using my tool), others prefer Google as it's supposed to be more reliable. So so far I haven't had any cases where a brief downtime would cause me serious problems. |
Beta Was this translation helpful? Give feedback.
-
Yes, that's were Google is doing an insane Job, using project maglev I just hope they (Hetzner) won't update every region concurrently, then our approach might work. I'm a bit surprised you use it only for personal stuff to be honest :-) Our plan is to run production workloads (and we already do) Do you offer paid (!!) consulting? The money we save using hetzner instead of GAE / GKE.... There are no open questions atm, but these will come up for sure. |
Beta Was this translation helpful? Give feedback.
-
As a workaround during downtime you can work without Hetzner's load balancers and directly point your DNS to one of your nodes directly and configure the ingress to run on public port 443. Of course, you then require that node to front all the HTTP (and ssl) traffic, so you need to make sure it can handle the load. Real custom high availability solutions are hard to build because it often needs layer 2/3 access (routing tables) to fail over the load balancers. In general, 100% uptime is hard to achieve on Hetzner, I ran into multiple issues with their cloud infrastructure often leading to half days+ of downtime (but I dont have any critical systems, maybe if the pressure is higher and you are a bigger customer, it's easier to escalate). |
Beta Was this translation helpful? Give feedback.
-
Using a "premium" cloud provider auch as GKE, AWS, azure this is not an issue, but unfortunately on Hetzner it is:
The next maintenance is scheduled for load balancers: https://status.hetzner.com/incident/cd0ebfd2-8985-4aae-8be5-6548558c0f8c
The last maintenance was I think in December and it was not only a few connections dropped but a downtime of 20-30 minutes which contradicts the purpose of a load balancer. But we cannot change that.
What we did is to deploy another load balancer (together with an nginx controller in a different region, e.g. Falkenstein and Helsinki)
That leads to another A record, but we made our clients aware of another http entrypoint as we develop a service which needs to reach 100% uptime. We had that for more than a decade on Google App engine but we're leaving it now for various reasons.
Do you have a better idea to reach the best possible uptime?
Best regards
Beta Was this translation helpful? Give feedback.
All reactions