Gateway service throttling issue #1659
Replies: 1 comment 5 replies
-
Hi @vvasilevbosch
You wrote:
To me this sounds like an expected behavior to find some kind of "max throughput" for a single actor being able to handle. There is an Akka setting of So to be honest I don't actually see a real "problem" ;) |
Beta Was this translation helpful? Give feedback.
-
I have an issue with the gateway service, while doing some testing. My setup is 1 instance of each service, all having 5Gb of memory.
I run a warmup phase, reading a total of 10,000 things via the gateway, which get cached.
I send 1310 search commands(?filter=eq(thingId,'${id}')) per second via the gateway for 5 minutes and the whole time taken for a single request is ~10s according to 'HTTP Command Processing Time (2xx)' grafana panel. What I notice in the logs is that things-search service receives the command from the gateway service after ~10s. Also in grafana 'signal processing times' panel is seen that the signals are processed in less than 20ms.
When sending 1300 requests/s the total time is <20ms, so it seems I hit some limit, because 10 requests/s makes a huge difference in total processing time. GC dashboard shows no spikes in memory or GC time for gateway. I tried playing with ActiveProcessorCount, a value of 8 is what I found best performance with, the node that the service runs on has 8 cpus. After the problem occurs, as can be seen on '# of messages in mailbox' panel, 'org.eclipse.ditto.gateway.service.proxy.actors.GatewayProxyActor' actor mailbox increases to ~4k and keeps that number during the remaining time of the test.
Another thing is that a lot of the requests I send get timed out, with error of:
"error":"read: connection reset by peer","error_code":1220, from k6 docs: https://k6.io/docs/testing-guides/running-large-tests/#read-connection-reset-by-peer
Also I get this error, which is related to the previous one:
"error":"dial: i/o timeout","error_code":1211, from k6 docs: https://k6.io/docs/testing-guides/running-large-tests/#dial-tcp-52-18-24-222-80-i-o-timeout
In total, http_req_failed: 13.80%, 53016 out of 330985, because of these two errors.
What I tried is different values for these http akka server properties, from https://doc.akka.io/docs/akka-http/current/configuration.html:
akka.http.server.pipelining-limit - default 1, tried 128, 512, 1024, made no difference in the results
akka.http.server.max-connections - default 1024, tried with 4096, no difference
akka.http.server.backlog - default 100, tried 200, 1000, no difference
checking the file descriptor limits on the gateway pod with 'ulimit -a' returns the following:
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 10240
coredump(blocks) unlimited
memory(kbytes) unlimited
locked memory(kbytes) unlimited
process unlimited
nofiles 1048576
vmemory(kbytes) unlimited
locks unlimited
rtprio 0
Doing the same with 2 gateway instances shows the same results, but with 2 times more requests/s. I feel like I am maybe missing some akka configuration.
Attached screenshots from test runs and log for a single request, that took >10s.
screenshots.zip
logs.csv
Beta Was this translation helpful? Give feedback.
All reactions