-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRPC traffic is not load balanced correctly #142
Comments
Attaching controller logs |
@deenav can you share the client configuration? I would like to know how the Pravega endpoint is used in the client application. Thanks. |
@adrianmo I was running pravega-longevity
|
@deenav please try to reproduce the error using the Controller service name instead of the IP address in the |
I was able to reproduce the similar error yesterday but it suddenly worked today and I cannot reproduce it again in the past hours. It will be harder to debug if this error happens intermittently. |
@adrianmo It seems
|
one Controller pod in inactive state after deleting one of the controller(out of 2).
If you delete the controller pod (which is properly running), then other controller (which was stuck) starts to take the new i/o. |
After investigating, I'm quite sure that the following is happening: 1- When the longevity process starts, it establishes multiple GRPC (HTTP2) connections to the Controller service, which are forwarded to the backend pods in a round robin fashion. Therefore, some connections are established with Controller 1, and some other with Controller 2. This happens because HTTP/2 multiplexes requests into a single TCP connection, as opposed to HTTP/1, which establishes a connection for each request. Related documents: |
Based on current operator/controller code, behavior exhibited by controller is the expected behavior. When out of 2 controllers say 'a' and 'b', if 'a' goes down, all new client connections are redirected by load-balancer to controller 'b' and later when 'a' comes up, since the connections to 'b' are persistent, there is no way to automatically 'shift' these from 'b' to 'a'. Currently no attempt to do this is made by operator/ controller, by design. If such load balancing is a requirement, please file a new feature request for the same. |
@pbelgundi the behavior you are describing represents "failover" for High Availability. The problem here is w.r.t. load balancing gRPC traffic across the Controller nodes. The traffic to Controller pods in Pravega Operator managed Kubernetes deployments is not load-balanced due to the choice of load balancing mechanism used in the Operator: service type of load balancer, which represents a layer 4 load balancer L4 LB). An L4 LB cannot load balance gRPC traffic, as pointed out by references listed by @adrianmo above. Load balancing of gRPC traffic requires request-level load balancing, which is supported by some of the Ingress controllers like Ingress-nginx (which in turn spin up application-level/layer 7 load balancers). Whether to use sticky sessions or load balance across all nodes is usually a policy decision represented as rules/configuration in a load balancer. In this case, the choice of the load balancer itself has restricted the options. It is not by design that gRPC load balancing was left out. It was just that the problem was realized later. @adrian and I had discussed this in length some time back, and he did some investigation and posted his findings above. I just want to share the context. In my view too, this is an enhancement. |
No. What I have mentioned is not about "failover". That is working as expected. There sure are better ways to do grpc load balancing so it works post "failover" as well. Its just that current operator does not attempt to do it and needs to be taken up as a feature/enhancement. |
@pbelgundi I'm not sure if to just close this issue is the right thing to do. You are right, this is an enhancement to be done and probably it is not for |
I'm fine with re-labelling too. This would definitely require fix on operator. |
Right now, by virtue of using grpc we load balance requests on a per client basis. This is by design choice of using grpc which uses HTTP/2 as transport and multiplexes multiple requests over a persistent connection. As an improvement we should definitely load balance the "requests" and not just connections across available controller instances. But that will require a broader discussion. We can keep this issue and keep that discussion in this context but we should move it to 0.6 and as a new feature. |
While running longevity test in PKS cluster with moderate IO (medium-scale) workload, Observing time lapses in the controller logs whenever there are more than one controller in the PKS cluster.
Environment details: PKS / K8 with medium cluster:
Steps to reproduce
kubectl logs -f po/<controller-pod-name>
command to monitor the controller logs.kubectl exec -it <controller1-pod-name> reboot
command.ControllerServiceStarter STARTING
then it is not logging anything and not serving read/write requests from the client.reboot
on the active controller ) which makes the idle controller to resume it's services.Snip of time lapses in controller log:-
Problem location
My observation here is, it seems load balancing not happening properly when restarted/failed controller resumes it's operation if more than one controller in the cluster.
The text was updated successfully, but these errors were encountered: