-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Schedulers.single()
to avoid accidental thread co-location
#190
Comments
I wasn't able to reproduce any computation issue regarding the pool size. The pool itself properly creates the configured number of connections: ConnectionPoolConfiguration configuration = ConnectionPoolConfiguration.builder(connectionFactoryMock)
.initialSize(5)
.maxSize(5)
.build();
ConnectionPool pool = new ConnectionPool(configuration);
List<Connection> connections = new ArrayList<>();
for (int i = 0; i <5; i++) {
pool.create().as(StepVerifier::create).consumeNextWith(connections::add).verifyComplete();
}
assertThat(connections).hasSize(5); The event loop co-location is indeed an issue. It is the default setting in Reactor Netty, likely, this came from allocating channels for HTTP requests. Let me know whether I missed something regarding the sizing. |
It might have the correct size, but it only provides one connection. You can run the example attached to the Micronaut Data issue and see that the queries are run sequentially.
|
And this makes totally sense because if connections are co-located on the same event-loop thread, then a single thread can only run things sequentially. If you warmup the pool (
Without warmup:
|
I didn't know that the pool needs to be initialized somehow, there is no mention of |
Let me just add a thought here: |
Warmup can be a slight workaround, but as mentioned by @PiotrDuz, it's by no means a fix. In an ideal world, the pool would be event-loop aware and hand out a connection that runs on the same thread. In reality, we do not have the event loop information being exposed from the connections. Additionally, pools of that size would negatively affect database servers. Disabling colocation is the only viable solution for long-lived connections. I already reached out to the Reactor Netty team for guidance and will share it once I have more details. |
Since Reactor Netty 1.0.28, Setting |
During investigation with a customized |
How does your custom loop resource look like?
where "threads" = pool max size |
@mp911de I found a problem when I calling |
@PiotrDuz there's a I additionally filed a ticket to reuse the default event loop groups with a |
What is the best way to inject that |
For the Postgres driver, either via
|
@mp911de Is there an agreed fix R2DBC Postgres users can apply to avoid this? I've had a read through this thread but have found it difficult to piece together what exactly is needed. |
@dstepanov Big thanks for spotting the area of the cause! I've been struggling a lot (looking down to PG TCP packets) to find it until I noticed R2DBC is really using only a single thread like you said. @mp911de I confirm to experience the same. This makes R2DBC Postgres driver 3 times slower compared to JDBC blocking driver. Setting different maxSize and initialSize settings makes it faster (very weird effect), however still ~1.6 slower than JDBC. @mp911de This is a serious major issue that makes R2DC drivers basically unusable for any production usage. This needs serious attention. I'm honestly surprised how r2dbc-pool library made it up to 1.0.0.RELEASE over 3 years without this being noticed by maintainers nor users. The benchmarks were even already there on TechEmpower, like @dstepanov mentioned, but it's also very easy to reproduce anyway. I assume it affects not only Postgres driver (which I was testing), since the issue is in the r2dbc-pool/netty. My ReproductionI have 4 (8) cores CPU i7-7700K. I do a single request to a WebFlux endpoint. Inside, it executes very simple SELECT (returning 1 record) 40 000 times concurrently. I.e. I create a list of 40 000 Monos of below:
and execute them concurrently, calculating time spent once all Monos finish:
100 maxSize for R2DBC pool. For JDBC, I do the above in a fixed thread pool, with the same size of 100 (matches the connection pool size in Hikari), with blocking Callable calls, waiting for all Futures to finish at the end. Results:
The above correlates with TPS I see in Postgres dashboard. JVM warmup was done prior. Everything on the latest version (Ubuntu, Spring, drivers, Postgres). Postgres is on a separate machine. The absolute latency values don't matter, since it will be different on each machine, only the relative difference is important. 40 000 calls are done inside the endpoint (as opposed to calling 40 000 times the endpoint), to eliminate any framework/http latencies. However, I have another test with Spring MVC (JDBC) and WebFlux (R2DBC) doing the same SELECT single time during endpoint call, and endpoints are bombarded with a benchmarking tool from another machine on the local network with 1024 concurrency. In this setup, there are 8 Spring threads and 8 reactor-tcp-epoll threads. I observe the same speeds like above (even a bit worse). As for the threads, here is the WebFlux R2DBC graph - In Spring MVC JDBC there is Hibernate on top, but it makes no difference - I've done tests without it as well, same results. All records returned from DB are unique, so Hibernate has no advantage (instead, even makes things a bit slower, probably). Setting I am able to provide the full reproducibles with .jfr's, but it's very easy to reproduce. |
@artemik Thanks for your test. In my opinion, one single thread can not support high TPS if one single job(one DB query/update) is heavy. For example, when we using R2dbcRepository, framework help us do some serializations/deserializations or reflections to inject values, and they are heavy works. Setting colocate to false is a necessarily way to improve performance in some circumstances. |
Hello, So what is the fix for this problem? Having separate threads per connection socket. I have proposed custom, simple solution. During our tests it seemed to fix the issue. A picture is worth more than words, so here is the code:
This class should be used in @artemik can you run your tests once again, using above class? Of course this theoretically is slower than single thread handling many sockets, as we are inducing cpu context switching for each thread. But load balancing the eventLoopGroup seems to be too complicated to implement, and the cost of performance loss is negligible when compared to random slowness introduced by overloaded thread. Regards |
@PiotrDuz Thanks, I've included it in my tests. I didn't check your class in details, but afaik it forces R2DBC to use the specified number of threads? @mp911de I've cleaned up and prepared the reproducable: https://github.com/artemik/r2dbc-jdbc-vertx-issue-190 It tests more than just initialSize vs maxSize - it tests JDBC vs R2DBC vs Vertx (as another reactive driver example based on Netty) in standalone and WebFlux environments. Different environments produce different interesting results. Please check readme, it has all the details. I'll duplicate the key parts here. BenchmarkThere are 6 applications doing the same - run 500 000 SELECTs and measure the total time spent. Each SELECT returns the same (for max stability) single record by ID. All SELECTs are executed with concurrency equal to max DB connection pool size:
ResultsNote: In all R2DBC tests below, all 8 setup permutations were tested: with/without custom LoopResources; with/without Standalone
Web App
Conclusions
Apparently, because Vertx also uses Netty but doesn't suffer from issues like R2DBC under WebFlux (and doesn't require any tricky settings), there should be a way to fix R2DBC. |
@artemik in this post: #190 (comment) Answering your question: Regards |
@PiotrDuz I used number_of_cores threads for your LoopResources, because otherwise making it equal to max connection pool size breaks the reactive sense, and in theory shouldn't give much more performance. I tried it - same performance as with number_of_cores. Anyway, to your question - I confirm your LoopResources class forces the number of threads specified, and the utilization is quite even. However:
It means that how many threads R2DBC is using doesn't seem to matter - R2DBC is just slow with WebFlux for some unclear reason. And maybe the collocation, which was originally assumed to be the cause in this ticket, is not the issue. As for initialSize vs maxSize equality, which was also the original issue raised in this ticket - it seems to be true - your default setup is: without custom LoopResources; without ConnectionPool.warmup(); equal initialSize and maxSize - this is exactly the case where R2DBC shows the worst performance, as per benchmarks. The workaround is - use any other setup, for example start calling warmup, or use LoopResources, etc, but only don't use this combination - "without custom LoopResources; with (!) ConnectionPool.warmup()" - because surprisingly it works the worst as well. But let me highlight here - equality of initialSize and maxSize itself doesn't make things slow - for example, the case "without LoopResources, with warmup(), with equal sizes" works fine... And lastly, these all workarounds give you a better performance, but still much slower than JDBC or Vertx. So two things to be fixed here:
|
I think your benchmarks do not address the original problem of an issue. Your tests however are also helpful as they show other problems with r2dbc. But at this moment maybe it is worth to split topics in 2, here we could focus on colocation forcing queries on the same thread when other cores could be idle. Am I getting it right? |
@PiotrDuz you're right that many small queries don't clearly show how concurrently connections are used. I've added additional tests. Results (SELECT 100 records)StandaloneNot tested. WebFlux results show it's not needed. Web AppBoth MVC JDBC and WebFlux Vertx: 51 sec (baseline). R2DBC results are more diverse, so I provide a table specifically for it with all setups. I only tested DatabaseClient.
How to Interpret R2DBC Results
Results (Connections concurrency / Threading)In all the tests above, it was visible on DB side monitoring that R2DBC established all You saw 4 cases with multi-record SELECTs where R2DBC performed as fast as JDBC and Vertx, but from single-record SELECT tests we know R2DBC is slower, so longer DB processing time of multi-record SELECTs just helps to hide R2DBC slowness itself. Moreover, those multi-record SELECTs were still too fast to simulate some heavy long query. It all means that we need to make a test specifically for concurrent connections usage. To do that, I just modified the single-record SELECT to select I also included observations on threading usage. It corresponds to what I saw in single/multi record SELECT tests as well. Standalone and Web AppVertx - all 100 connections were active in parallel. As for threading, it used only 1 thread, like in all previous benchmarks above as well, but it doesn't seem to cause performance issues. R2DBC
How to Interpret R2DBC Results
So to summarize:
|
@artemik Would you also mind including Micronaut V4 (latest milestone or RC) into the benchmark and see how it does behave vs. WebFlux? We may need some optimization too. |
Avoid co-location of event loops within the pool by default for drivers that use Reactor Netty with activated co-location. [resolves #190] Signed-off-by: Mark Paluch <[email protected]>
Schedulers.single()
to avoid accidental thread co-location
After a few iterations, the best we can do here is using a dedicated thread pool for subscribing ( There is no good place to solve this issue as Reactor Netty wants to keep their colocated default. R2DBC Pool isn't opinionated about the underlying driver nor its threading model, yet it doesn't make sense to have connections with colocated threads in a pool, so I guess we're the one now compensating for a default that makes only sense in certain client-request scenarios. |
@mp911de I think performance results (before/after) need to be provided from you here, before officially closing this issue. P.S. @dstepanov I'll try checking Micronaut, if I have time. |
@mp911de Once we pick up the pool parallelism changes from reactor-pool, does it make sense to add this parallelism as a config property to ConnectionPoolConfiguration? |
It makes sense to do something. Can you file a new ticket? |
Probably, It is time to replace my custom solution with an out-of-the-box one! My custom fix was:
|
Hi @PiotrDuz , @mp911de , while I understand the low performance cause (or one of them) is threads colocation, any idea why I didn't see any performance degradation in the standalone driver tests (exactly the same tests, but just without WebFlux)? Why does it become slow only when the series of queries is launched from a WebFlux thread? What kind of relation between r2dbc-pool and WebFlux might be causing that? |
Colocation takes effect if a connection is obtained from a process that runs on a thread which is part of the colocated event loop. Since WebFlux and most drivers use the same default event loop, any WebFlux thread requesting a database connection will create a connection that uses the request thread. Does that make sense? |
@mp911de, "any WebFlux thread requesting a database connection will create a connection that uses the request thread" - From what I saw, the threads were created new ( Thanks for clarification. |
Though @mp911de if colocation is default behaviour, I'm still confused why it doesn't happen in standalone environment, wouldn't we similarly have 8 threads handling all 100 connections (them being colocated as well)? |
Bug Report
The issue was originally reported here micronaut-projects/micronaut-data#2136
I did debug a bit. It looks like some calculations aren't correct when min and max are the same, the pool only allows one connection. The example project has a reproducible example, setting
maxSize
bigger thaninitialSize
makes the connections in parallel. This also explains some bad performances in TechEmpower/FrameworkBenchmarks because I configured the values with the same value.The text was updated successfully, but these errors were encountered: