Skip to content

Latest commit

 

History

History
33 lines (20 loc) · 3.54 KB

more-hazl.md

File metadata and controls

33 lines (20 loc) · 3.54 KB

How High Availability Zonal Load Balancing (HAZL) Works

For every endpoint, HAZL maintains a set of data that includes:

  • The zone of the endpoint
  • The cost associated with that zone
  • The recent latency of responses to that endpoint
  • The recent failure rate of responses to that endpoint

For every service, HAZL continually computes a load metric measuring the utilization of the service. When load to a service falls outside the acceptable range, whether through failures, latency, spikes in traffic, or any other reason, HAZL dynamically adds additional endpoints from other zones. When load returns to normal, HAZL automatically shrinks the load balancing pool to just in-zone endpoints.

In short: under normal conditions, HAZL keeps all traffic within the zone, but when the system is under stress, HAZL will temporarily allow cross-zone traffic until the system returns to normal. We'll see this in the HAZL demonstration.

HAZL will also apply these same principles to cross-cluster / multi-cluster calls: it will preserve zone locality by default, but allow cross-zone traffic if necessary to preserve reliability.

High Availability Zonal Load Balancing (HAZL) vs Topology Hints

HAZL was designed in response to limitations seen by customers using Kubernetes's native Topology Hints (aka Topology-aware Routing) mechanism. These limitations are shared by native Kubernetes balancing (kubeproxy) as well as systems such as open source Linkerd and Istio that make use of Topology Hints to make routing decisions.

Within these systems, the endpoints for each service are allocated ahead of time to specific zones by the Topology Hints mechanism. This distribution is done at the Kubernetes API level, and attempts to allocate endpoints within the same zone (but note this behavior isn't guaranteed, and the Topology Hints mechanism may allocate endpoints from other zones). Once this allocation is done, it is static until endpoints are added or removed. It does not take into account traffic volumes, latency, or service health (except indirectly, if failing endpoints get removed via health checks).

Systems that make use of Topology Hints, including Linkerd and Istio, use this allocation to decide where to send traffic. This accomplishes the goal of keeping traffic within a zone but at the expense of reliability: Topology Hints itself provides no mechanism for sending traffic across zones if reliability demands it. The closest approximation in (some of) these systems are manual failover controls that allow the operator to failover traffic to a new zone.

Finally, Topology Hints has a set of well-known constraints, including:

  • It does not work well for services where a large proportion of traffic originates from a subset of zones.
  • It does not take into account tolerations, unready nodes, or nodes that are marked as control plane or master nodes.
  • It does not work well with autoscaling. The autoscaler may not respond to increases in traffic, or respond by adding endpoints in other zones.
  • No affordance is made for cross-cluster traffic.

These constraints have real-world implications. As one customer put it when trying Istio + Topology Hints: "What we are seeing in some applications is that they won’t scale fast enough or at all (because maybe two or three pods out of 10 are getting the majority of the traffic and is not triggering the HPA) and can cause a cyclic loop of pods crashing and the service going down."

Go back to the main doc.