GRPC traffic is not load balanced correctly #142

deenav · 2019-03-25T10:58:38Z

While running longevity test in PKS cluster with moderate IO (medium-scale) workload, Observing time lapses in the controller logs whenever there are more than one controller in the PKS cluster.

Environment details: PKS / K8 with medium cluster:

3 master: xlarge: 4 CPU, 16 GB Ram, 32 GB Disk
5 worker: 2xlarge: 8 CPU, 32 GB Ram, 64 GB Disk
Tier-1 storage is from VSAN datastore
Tier-2 storage curved on NFS Client provisioner using Isilon as backend

Pravega version: zk-closed-client-issue-0.5.0-2161.60655bf
Zookeeper Operator : pravega/zookeeper-operator:0.2.1
Pravega Operator: adrianmo/pravega-operator:issue-134-5

Steps to reproduce

Deploy Pravega-operator and Pravega cluster with minimum 2 controller instances on PKS Cluster.
Start longevity test with moderate IO (medium-scale) workload and observe that both controllers are serving read/write requests from the longevity run. Use kubectl logs -f po/<controller-pod-name> command to monitor the controller logs.
Induce one controller restart using kubectl exec -it <controller1-pod-name> reboot command.
After the reboot, when controller completes it's initial configuration stages like ControllerServiceStarter STARTING then it is not logging anything and not serving read/write requests from the client.
Restarted pod remains idle until the active controller pod gets some kind of error/event (I tested by triggering reboot on the active controller ) which makes the idle controller to resume it's services.

Snip of time lapses in controller log:-

2019-03-21 10:20:56,286 3023 [ControllerServiceStarter STARTING] INFO  i.p.c.s.ControllerServiceStarter - Awaiting start of controller event processors
2019-03-21 10:20:56,286 3023 [ControllerServiceStarter STARTING] INFO  i.p.c.s.ControllerServiceStarter - Awaiting start of controller cluster listener
2019-03-21 10:20:57,456 4193 [controllerpool-16] INFO  i.p.c.e.i.EventProcessorGroupImpl - Notifying failure of process 172.24.4.7-e1620132-57fb-4e3f-a74b-4a359aca27c6 participating in reader group EventProcessorGroup[abortStreamReaders]
2019-03-21 10:20:57,456 4193 [controllerpool-19] INFO  i.p.c.e.i.EventProcessorGroupImpl - Notifying failure of process 172.24.4.7-e1620132-57fb-4e3f-a74b-4a359aca27c6 participating in reader group EventProcessorGroup[scaleGroup]
2019-03-21 10:20:57,456 4193 [controllerpool-30] INFO  i.p.c.e.i.EventProcessorGroupImpl - Notifying failure of process 172.24.4.7-e1620132-57fb-4e3f-a74b-4a359aca27c6 participating in reader group EventProcessorGroup[commitStreamReaders]
2019-03-21 11:53:37,489 5564226 [controllerpool-18] INFO  i.p.c.cluster.zkImpl.ClusterZKImpl - Node 172.24.4.6:9090:eff7c8dd-8794-4dc4-8dc0-c361cb2b2eb4 removed from cluster
2019-03-21 11:53:37,492 5564229 [controllerpool-18] INFO  i.p.c.f.ControllerClusterListener - Received controller cluster event: HOST_REMOVED for host: 172.24.4.6:9090:eff7c8dd-8794-4dc4-8dc0-c361cb2b2eb4
2019-03-21 11:53:37,494 5564231 [Curator-LeaderSelector-0] INFO  i.p.c.fault.SegmentMonitorLeader - Obtained leadership to monitor the Host to Segment Container Mapping

Problem location
My observation here is, it seems load balancing not happening properly when restarted/failed controller resumes it's operation if more than one controller in the cluster.

The text was updated successfully, but these errors were encountered:

deenav · 2019-03-25T11:27:25Z

Attaching controller logs
controller-logs.zip

adrianmo · 2019-03-25T14:40:07Z

@deenav can you share the client configuration? I would like to know how the Pravega endpoint is used in the client application. Thanks.

deenav · 2019-03-26T12:17:11Z

@adrianmo I was running pravega-longevity medium-scale workload on my cluster using

./tools/longevity_tests.py --image-repository=devops-repo.isus.emc.com:8116/
pravega-longevity --image-tag=18.ed0bc5e-0.5.0-2151.2e8f0b7 --controller-uri tcp://10.100.200.126:9090 --test=medium-scale deploy

adrianmo · 2019-03-26T13:59:57Z

@deenav please try to reproduce the error using the Controller service name instead of the IP address in the --controller-uri parameter.

Tristan1900 · 2019-03-27T07:09:42Z

I was able to reproduce the similar error yesterday but it suddenly worked today and I cannot reproduce it again in the past hours. It will be harder to debug if this error happens intermittently.
It seems that this is a load balance problem as deenav suggests because I have been looking into the controller logs but nothing suspicious occurs.
I was misled by those messages Notifying failure of process 172.24.4.7-e1620132-57fb-4e3f-a74b-4a359aca27c6 participating in reader group EventProcessorGroup[abortStreamReaders] because this failed process has been addressed by another controller already. But it isn't the cause of our problem.

deenav · 2019-03-29T11:55:50Z

@adrianmo It seems pravega-longevity does not work with Controller service name as --controller-uri. It's throwing Name resolver io.pravega.client.stream.impl.ControllerResolverFactory$ControllerNameResolver@3f5a124d returned an empty list exception and not starting IO's.

Exception in thread "Thread-32" io.pravega.common.util.RetriesExhaustedException: java.util.concurrent.CompletionException: io.grpc.StatusRuntimeException: UNAVAILABLE: Name resolver io.pravega.client.stream.impl.ControllerResolverFactory$ControllerNameResolver@3f5a124d returned an empty list
        at io.pravega.common.util.Retry$RetryAndThrowBase.lambda$null$6(Retry.java:245)
        at java.util.concurrent.CompletableFuture.uniExceptionally(Unknown Source)
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(Unknown Source)
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source)
        at io.pravega.client.stream.impl.ControllerImpl$RPCAsyncCallback.onError(ControllerImpl.java:1016)
        at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:694)
        at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
        at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
        at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
        at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Raghuveera-Rao · 2019-05-07T11:48:44Z

one Controller pod in inactive state after deleting one of the controller(out of 2).

Pravega version: 0.5.0-2236.5228e2d
Zookeeper Operator : pravega/zookeeper-operator:0.2.1
Pravega Operator: pravega/pravega-operator:0.3.2

If you delete the controller pod (which is properly running), then other controller (which was stuck) starts to take the new i/o.

Raghuveera-Rao · 2019-05-07T11:52:50Z

contr2_1.log

adrianmo · 2019-05-30T11:32:26Z

After investigating, I'm quite sure that the following is happening:

1- When the longevity process starts, it establishes multiple GRPC (HTTP2) connections to the Controller service, which are forwarded to the backend pods in a round robin fashion. Therefore, some connections are established with Controller 1, and some other with Controller 2.
2- When you restart Controller 1, all the connections to that pod are dropped and reestablished to the service which forwards it to the only pod available, Controller 2. At this point, Controller 2 is handling all connections from the longevity process.
3- When Controller 1 comes up and running again, it does not receive any traffic because all connections are already established with Controller 2. This is why Controller 1 doesn't manifest any activity and seems idle.
4- Eventually, when Controller 2 faces an issue or is deleted, connections are reestablished with the service and forwarded to Controller 1. Then you start seeing activity again.

This happens because HTTP/2 multiplexes requests into a single TCP connection, as opposed to HTTP/1, which establishes a connection for each request.

Related documents:

pbelgundi · 2019-07-03T04:14:42Z

Based on current operator/controller code, behavior exhibited by controller is the expected behavior. When out of 2 controllers say 'a' and 'b', if 'a' goes down, all new client connections are redirected by load-balancer to controller 'b' and later when 'a' comes up, since the connections to 'b' are persistent, there is no way to automatically 'shift' these from 'b' to 'a'. Currently no attempt to do this is made by operator/ controller, by design.

If such load balancing is a requirement, please file a new feature request for the same.

ravisharda · 2019-07-03T04:31:55Z

@pbelgundi the behavior you are describing represents "failover" for High Availability. The problem here is w.r.t. load balancing gRPC traffic across the Controller nodes.

The traffic to Controller pods in Pravega Operator managed Kubernetes deployments is not load-balanced due to the choice of load balancing mechanism used in the Operator: service type of load balancer, which represents a layer 4 load balancer L4 LB).

An L4 LB cannot load balance gRPC traffic, as pointed out by references listed by @adrianmo above. Load balancing of gRPC traffic requires request-level load balancing, which is supported by some of the Ingress controllers like Ingress-nginx (which in turn spin up application-level/layer 7 load balancers).

Whether to use sticky sessions or load balance across all nodes is usually a policy decision represented as rules/configuration in a load balancer. In this case, the choice of the load balancer itself has restricted the options.

It is not by design that gRPC load balancing was left out. It was just that the problem was realized later. @adrian and I had discussed this in length some time back, and he did some investigation and posted his findings above.

I just want to share the context. In my view too, this is an enhancement.

pbelgundi · 2019-07-03T05:33:53Z

@pbelgundi the behavior you are describing represents "failover" for High Availability. The problem here is w.r.t. load balancing gRPC traffic across the Controller nodes.

No. What I have mentioned is not about "failover". That is working as expected.
This issue is about uneven load-balancing post "failover".
Current operator takes care of correctly load balancing in a normal (non-failover) situation.

There sure are better ways to do grpc load balancing so it works post "failover" as well. Its just that current operator does not attempt to do it and needs to be taken up as a feature/enhancement.

RaulGracia · 2019-07-03T07:27:35Z

@pbelgundi I'm not sure if to just close this issue is the right thing to do. You are right, this is an enhancement to be done and probably it is not for 0.3.2. What's more, it is still unclear whether this needs to be fixed in the operator or in Pravega. But it would be great to keep this open at the moment
(relabeling the issue) or replace this issue with another one to keep the knowledge on the problem that has been accumulated so far, instead of just closing it.

pbelgundi · 2019-07-03T10:08:07Z

@pbelgundi I'm not sure if to just close this issue is the right thing to do. You are right, this is an enhancement to be done and probably it is not for 0.3.2. What's more, it is still unclear whether this needs to be fixed in the operator or in Pravega. But it would be great to keep this open at the moment
(relabeling the issue) or replace this issue with another one to keep the knowledge on the problem that has been accumulated so far, instead of just closing it.

I'm fine with re-labelling too. This would definitely require fix on operator.

shiveshr · 2019-07-03T10:32:04Z

Right now, by virtue of using grpc we load balance requests on a per client basis. This is by design choice of using grpc which uses HTTP/2 as transport and multiplexes multiple requests over a persistent connection.
So this design choice has the limitation that once a client connects to a controller, its a sticky connection until that connection is disrupted.

As an improvement we should definitely load balance the "requests" and not just connections across available controller instances. But that will require a broader discussion. We can keep this issue and keep that discussion in this context but we should move it to 0.6 and as a new feature.

pbelgundi · 2020-06-04T10:47:37Z

Some products that offer load balancing capabilities for GRPC traffic:

https://github.com/jtattermusch/grpc-loadbalancing-kubernetes-examples
https://github.com/jtattermusch/grpc-loadbalancing-kubernetes-examples/blob/master/grpc_loadbalancing_kubernetes_slides.pdf

sumit-bm added the kind/bug Something isn't working label Mar 25, 2019

sumit-bm assigned shiveshr Apr 24, 2019

adrianmo changed the title ~~time lapses between log lines seen in the controller logs when there is more than one controller in the PKS cluster~~ GRPC traffic is not load balanced correctly May 30, 2019

adrianmo added priority/P1 Recoverable error, functionality/performance impaired but not lost, no permanent damage status/needs-investigation Further investigation is required labels May 30, 2019

adrianmo unassigned shiveshr May 30, 2019

adrianmo added priority/P2 Slight inconvenience or annoyance to applications, system continues to function and removed priority/P1 Recoverable error, functionality/performance impaired but not lost, no permanent damage labels May 30, 2019

pbelgundi added the version 0.3.2 This is an issue with operator version 0.3.2 label Jul 2, 2019

pbelgundi self-assigned this Jul 2, 2019

RaulGracia mentioned this issue Jul 2, 2019

Unexpected Controller restart in failure attempts to terminate Event Processor pravega/pravega#3770

Closed

pbelgundi closed this as completed Jul 3, 2019

pbelgundi reopened this Jul 3, 2019

shiveshr added kind/enhancement Enhancement of an existing feature and removed kind/bug Something isn't working labels Jul 3, 2019

shiveshr added version 0.4.0 Issue with Operator 0.4.0 and removed version 0.3.2 This is an issue with operator version 0.3.2 labels Jul 3, 2019

pbelgundi removed the version 0.4.0 Issue with Operator 0.4.0 label May 6, 2020

pbelgundi removed their assignment May 6, 2020

Ranganaths8 assigned anishakj Oct 7, 2020

pbelgundi mentioned this issue May 27, 2021

Controller does not retry API operations failing due to OutofMemory Errors pravega/pravega#6004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPC traffic is not load balanced correctly #142

GRPC traffic is not load balanced correctly #142

deenav commented Mar 25, 2019

deenav commented Mar 25, 2019

adrianmo commented Mar 25, 2019 •

edited

Loading

deenav commented Mar 26, 2019 •

edited

Loading

adrianmo commented Mar 26, 2019

Tristan1900 commented Mar 27, 2019

deenav commented Mar 29, 2019

Raghuveera-Rao commented May 7, 2019 •

edited

Loading

Raghuveera-Rao commented May 7, 2019

adrianmo commented May 30, 2019 •

edited

Loading

pbelgundi commented Jul 3, 2019 •

edited

Loading

ravisharda commented Jul 3, 2019 •

edited

Loading

pbelgundi commented Jul 3, 2019

RaulGracia commented Jul 3, 2019

pbelgundi commented Jul 3, 2019

shiveshr commented Jul 3, 2019

pbelgundi commented Jun 4, 2020 •

edited

Loading

GRPC traffic is not load balanced correctly #142

GRPC traffic is not load balanced correctly #142

Comments

deenav commented Mar 25, 2019

deenav commented Mar 25, 2019

adrianmo commented Mar 25, 2019 • edited Loading

deenav commented Mar 26, 2019 • edited Loading

adrianmo commented Mar 26, 2019

Tristan1900 commented Mar 27, 2019

deenav commented Mar 29, 2019

Raghuveera-Rao commented May 7, 2019 • edited Loading

Raghuveera-Rao commented May 7, 2019

adrianmo commented May 30, 2019 • edited Loading

pbelgundi commented Jul 3, 2019 • edited Loading

ravisharda commented Jul 3, 2019 • edited Loading

pbelgundi commented Jul 3, 2019

RaulGracia commented Jul 3, 2019

pbelgundi commented Jul 3, 2019

shiveshr commented Jul 3, 2019

pbelgundi commented Jun 4, 2020 • edited Loading

adrianmo commented Mar 25, 2019 •

edited

Loading

deenav commented Mar 26, 2019 •

edited

Loading

Raghuveera-Rao commented May 7, 2019 •

edited

Loading

adrianmo commented May 30, 2019 •

edited

Loading

pbelgundi commented Jul 3, 2019 •

edited

Loading

ravisharda commented Jul 3, 2019 •

edited

Loading

pbelgundi commented Jun 4, 2020 •

edited

Loading