Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High latency, increasing with higher load #4652

Open
ThomasVerhoeven1998 opened this issue Sep 19, 2024 · 8 comments
Open

High latency, increasing with higher load #4652

ThomasVerhoeven1998 opened this issue Sep 19, 2024 · 8 comments

Comments

@ThomasVerhoeven1998
Copy link

ThomasVerhoeven1998 commented Sep 19, 2024

We are experiencing high latency with ProxySQL compared to without it, and the issue becomes more pronounced under increased load. While ProxySQL is primarily used for failover in our setup (no load balancing or query rewriting), we see significant latency spikes, especially when handling more connections.

ProxySQL Version: 2.6.5 (we used older version too, no big difference)

Environment:

We have 3 MySQL servers configured (using fake mysql server names in this post):

  • sql1: Weight 150 (primary server in closest datacenter)
  • sql2: Weight 100
  • sql3: Weight 1

We use this setup to ensure ProxySQL prioritizes sql1, and fails over to sql2 or sql3 when needed.

The latency remains visible even when there is low traffic, but it spikes considerably as the load increases. We're seeing suboptimal performance despite our expectation of minimal impact from ProxySQL, as it's only being used for failover purposes.

Proxysql configuration

query_cache_size_MB: 256
max_connections: 2048
default_query_delay: 0
default_query_timeout: 36000000
have_compress: true
poll_timeout: 2000
interfaces: "0.0.0.0:6033"
default_schema: "information_schema"
stacksize: 1048576
server_version: "8.0.20"
connect_timeout_server: 3000
monitor_history: 7200000
monitor_connect_interval: 5000
monitor_ping_interval: 2000
monitor_read_only_interval: 1500
monitor_read_only_timeout: 500
monitor_enabled: true
ping_interval_server_msec: 15000
ping_timeout_server: 500
commands_stats: true
sessions_sort: true
connect_retries_on_failure: 10
use_tcp_keepalive: true
tcp_keepalive_time: 120
set_parser_algorithm: 2
enable_server_deprecate_eof: 0
enable_client_deprecate_eof: 0
multiplexing: false
threads: 16
default_tx_isolation: "REPEATABLE-READ"
autocommit_false_is_transaction: true
eventslog_default_log: 0
have_ssl=true

Hostgroup configuration

Hostgroup 0

| hid | hostname | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment |
| 0   | sql1     | 3306 | 0    | 150    | 0      | 0   | 5000      | 0       | 1   | 0       |         |
| 0   | sql2     | 3306 | 0    | 100    | 1      | 0   | 5000      | 0       | 1   | 0       |         |
| 0   | sql3     | 3306 | 0    | 1      | 1      | 0   | 5000      | 0       | 1   | 0       |         |

Hostgroup 1 is empty

Hostgroup 2:

| hid | hostname | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment |
| 2   | sql2     | 3306 | 0    | 100    | 0      | 0   | 5000      | 0       | 1   | 0       |         |
| 2   | sql3     | 3306 | 0    | 1      | 0      | 0   | 5000      | 0       | 1   | 0       |         |

Hostgroup 3:

| hid | hostname | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment |
| 3   | sql1     | 3306 | 0    | 150    | 3      | 0   | 5000      | 0       | 1   | 0       |         |

We use fast forward for our debezium user, for the regular stuff fast forward is disabled

Expected Behavior:

ProxySQL should maintain low latency, without noticeable performance degradation when the load increases.

Actual Behavior:

ProxySQL shows high latency, and as the number of connections grows and the load increases, the latency increases significantly.
Performance degradation is consistent even with basic failover functionality and no query rewriting or load balancing configured.

Steps to Reproduce:

Configure ProxySQL 2.6.5 with the provided settings.
Set up 3 MySQL servers with the described hostgroup configuration.
Simulate increasing load on the ProxySQL instance.

Observations

Queries are twice as slow with proxysql when load is increasing. We have strict SLAs and high amount of queries happening so the impact of performance drop impacts are SLA measurements

Query durations without proxysql vs load (higher producers means higher load):
image

Query durations with proxysql vs load (higher producers means higher load):
image

Memory (% of 2 GB memory used) and cpu (number of cores used) usage during the tests:
image

Is there anything you would suggest to investigate why the query durations almost double in the higher load scenarios? And what should the expected latency be?

Thanks for your consideration

@ThomasVerhoeven1998 ThomasVerhoeven1998 changed the title High Latency in ProxySQL 2.6.5, Increasing with higher load High latency, increasing with higher load Sep 19, 2024
@renecannao
Copy link
Contributor

Hi @ThomasVerhoeven1998 ,

Can you please describe the network topology?
Both heatmaps shows a spike at the beginning that is completely off-chart compared to all the rest of the traffic: I guess there is some initialization going on at that time, and probably you should exclude it from latency measurements.

@ThomasVerhoeven1998
Copy link
Author

ThomasVerhoeven1998 commented Sep 20, 2024

Hi,

I think the heatmaps are inaccurate in the small overview, setup is excluded in the measurements, during the tests we do the same thing over and over again, in the different scenarios just at a higher producer rate

Heatmap without proxysql

image

Heatmap with proxysql:

image

I will also get back to you with some more details on the network topology

@ThomasVerhoeven1998
Copy link
Author

@renecannao What information about the network topology are you looking for?

@renecannao
Copy link
Contributor

What information about the network topology are you looking for?

I assume that App , ProxySQL and Backend are on 3 different network devices (hardware).
Thus the latency between the 3 .
It is fair to assume that network latency between App and Backend is half the network latency between App to ProxySQL to Backend and back.

That said, on the heatmap the difference doesn't look like double

@ThomasVerhoeven1998
Copy link
Author

ThomasVerhoeven1998 commented Sep 20, 2024

To answer the question about network topology:

They are all in the same VLAN, so the latency between those machine should be in the range of milliseconds...

Applications and ProxySQL:

All of our applications and the ProxySQL deployment are located in the same datacenter. While the applications and ProxySQL may be in different zones within the datacenter, they share the same physical location and network, ensuring low-latency communication.

MySQL Instances:

  • sql1 and sql2: Both these MySQL instances are in the same datacenter as the applications and ProxySQL. There are no firewalls between ProxySQL and the MySQL instances, allowing for direct communication.
  • sql3: This instance resides in a separate datacenter and is only used by ProxySQL in case of a disaster recovery (DR) scenario, i.e., when both sql1 and sql2 are unavailable. sql3 is not directly used under normal conditions.

@ThomasVerhoeven1998
Copy link
Author

ThomasVerhoeven1998 commented Sep 20, 2024

I also added the AVG query durations and there it is much clear, I focused on 1 scenario. It is not the heatmap that doubled but the P90 values.

Without proxysql:

image

With proxysql:

image

@renecannao
Copy link
Contributor

You're experiencing increased latency when connecting to the database through ProxySQL, and that's expected behavior for any proxy server. Here's why:

ProxySQL introduces an extra 'hop' in the network communication path. Instead of your application directly connecting to the database, it now connects to ProxySQL first, which then forwards the request to the database. This means the data has to travel an extra distance, both to get to the database and to return to your application. This inherently adds latency, at least doubling the network travel time in the best-case scenario.

To make matters a bit more complex, your infrastructure uses randomly assigned zones within the datacenter for application servers, ProxySQL instances, and database servers.

  • Without ProxySQL: It's possible your application and database server were in the same zone, leading to minimal network latency.
  • With ProxySQL: Now, your application could be in one zone, ProxySQL in another, and the database in yet another. This increases the likelihood of traversing longer network paths within the datacenter, further impacting latency.

Therefore, it's completely normal to see an increase in latency when using ProxySQL. You need to pay special attention to the network topology and how your servers are distributed across zones to potentially mitigate this impact. You might explore options like co-locating application servers, ProxySQL, and databases within the same zone, or using a more optimized network path to reduce the distance data needs to travel.

However, it's crucial not to solely focus on latency. ProxySQL offers numerous features that can significantly improve overall database performance and reliability. I won't list them all because it is not relevant to this issue, but it is worth to mention that ProxySQL's Connection Pooling & Management can optimize connection management, potentially reducing latency for your application by reusing existing connections.

The impact of ProxySQL on latency and throughput can vary depending on the specific application and its interaction with the database, as well as the configuration of ProxySQL. In many scenarios, the improved connection management, load balancing, and query optimization can lead to increased throughput and even a reduction in latency.

Therefore, while the extra network hop can introduce some latency, the benefits that ProxySQL offers can outweigh this cost in many situations. You need to carefully consider the application's needs and configure ProxySQL appropriately to optimize performance for your specific use case.

@ThomasVerhoeven1998
Copy link
Author

ThomasVerhoeven1998 commented Sep 20, 2024

Thanks for your detailed clarification. We just hoped that we had some misconfiguration or something like that cause in the past we used F5 as proxy and the latency was a lot lower. We have tried a numerous of options and config variables but nothing comes close to the F5 performance. Currently we only want the fail-over. We could also look into the multiplexing feature, but we didn't because multiplexing and hibernate don’t play that well together and we also saw issues in the past when you use autocommit=0

We will investigate if there are other things we might be able to change. Latency is an important factor in our architecture so that's why the focus is high on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants