Skip to content

ProxySQL Cluster

René Cannaò edited this page Sep 5, 2017 · 12 revisions

Those features are EXPERIMENTAL and subject to changes, especially because not all the features in the roadmap are implemented yet.

Preface

ProxySQL is a decentralized proxy, and it is normally advised to deploy it close to the application. This approach seems to scale pretty well to hundred(s) of nodes, as it was designed to be easily reconfigurable at runtime.
This allows to coordinate and reconfigure a farm of ProxySQL instances using a configuration management software like Ansible/Chef/Puppet/Salt (in alphabetical order), or a service discovery software like Etcd/Consul/ZooKeeper.
This allows ProxySQL to be highly customizable, and be adopted in any setup using any of these technologies, or even re-configurable by home-made tools.
Although it has some drawback too:

  • it requires and relies on external software (configuration management software itself)
  • the previous point means also that such approach it is not natively supported
  • converge time it is not predictable
  • there is no protection against network split

For this reason, ProxySQL 1.4.x attempts to support clustering natively. As pointed already, these features are EXPERIMENTAL and subject to changes, especially because not all the features in the roadmap are implemented yet.

Currently there are two main components in the ProxySQL clustering solution:

  • monitoring
  • re-configuration

Both components (monitoring and remote reconfiguration) are available for 4 tables:

  • mysql_query_rules
  • mysql_servers
  • mysql_users
  • proxysql_servers

More will be added in future, see roadmap.

Monitoring

To support Cluster Monitoring, several new tables, commands and variables were introduced.

Admin variables

Several new variables were added related to the Cluster solution. They are all Admin's variables, that means that to load them the command LOAD ADMIN VARIABLES TO RUNTIME is needed.

Variables that define what to sync:

  • admin-checksum_mysql_query_rules: boolean variable. When true (default) ProxySQL generates a new configuration checksum every time LOAD MYSQL QUERY RULES TO RUNTIME is executed. The new configuration isn't automatically propagated if variable is set to false;
  • admin-checksum_mysql_servers: boolean variable. When true (default) ProxySQL generates a new configuration checksum every time LOAD MYSQL SERVERS TO RUNTIME is executed. The new configuration isn't automatically propagated if variable is set to false;
  • admin-checksum_mysql_users: boolean variable. When true (default) ProxySQL generates a new configuration checksum every time LOAD MYSQL USERS TO RUNTIME is executed. The new configuration isn't automatically propagated if variable is set to false. If you have millions of users, disable this feature and do not rely on it, as it may be very slow;

Variables that define credentials:

  • admin-cluster_username and admin-cluster_password: to monitor other proxysql instances this credential is used. Note that the pair username/password should also be present in admin-admin_credentials, or connection will fail. If admin-cluster_username is not defined, Clustering doesn't perform any check;

Variables that define checks interval/frequency:

  • admin-cluster_check_interval_ms : this variable defines the interval between checksums checks.
    default: 1000. Min: 10 , Max: 300000
  • admin-cluster_check_status_frequency : if greater than 0, this variable defines after how many checksums checks a status check is performed.
    default: 10. Min: 0 , Max: 10000

After a remote sync, it is normally a good idea to immediately save to disk the new change. In this way, after a restart the configuration will be already in sync. Variables related to sync to disk:

  • admin-cluster_mysql_query_rules_save_to_disk: boolean variable. When true (default) after a remote sync and load to runtime, the new mysql query rules are also saved to disk;
  • admin-cluster_mysql_servers_save_to_disk: boolean variable. When true (default) after a remote sync and load to runtime, the new mysql servers are also saved to disk;
  • admin-cluster_mysql_users_save_to_disk: boolean variable. When true (default) after a remote sync and load to runtime, the new mysql users are also saved to disk;
  • admin-cluster_proxysql_servers_save_to_disk: boolean variable. When true (default) after a remote sync and load to runtime, the new proxysql servers are also saved to disk;

It is possible that multiple ProxSQL instances are being reconfigured at the same time, for different reasons.
For example it is possible that each ProxySQL instance is monitoring a MySQL replication topology and automatically detecting a failover, and within a short period of time (probably less than a second) they will all converge to the same configuration without the need of synchronize with each other.
Similarly, it is possible that a temporary network issue or a slow MySQL instance is detected by all proxies that will automatically shun the node. All proxies will take the same action without the need to synchronize with each other.
Or, as a final example, if a slave is lagging and automatically shunned because of replication lag, all proxies will take the same action independently from each other.
For this reason, ProxySQL Cluster can be configured to not synchronize immediately with a remote node, but to wait a certain number of checks before triggering a remote synchronization. If after such threshold the local and the remote configuration is still different, a synchronization is triggered:

  • admin-cluster_mysql_query_rules_diffs_before_sync: defines how many mismatching checks triggers the synchronization of mysql_query_rules
    default: 3. Min: 0 (never sync). Max: 1000
  • admin-cluster_mysql_servers_diffs_before_sync: defines how many mismatching checks triggers the synchronization of mysql_servers
    default: 3. Min: 0 (never sync). Max: 1000
  • admin-cluster_mysql_users_diffs_before_sync: defines how many mismatching checks triggers the synchronization of mysql_users
    default: 3. Min: 0 (never sync). Max: 1000
  • admin-cluster_proxysql_servers_diffs_before_sync: defines how many mismatching checks triggers the synchronization of proxysql_servers
    default: 3. Min: 0 (never sync). Max: 1000

Configuration tables

Table proxysql_servers

Table definition:

CREATE TABLE proxysql_servers (
    hostname VARCHAR NOT NULL,
    port INT NOT NULL DEFAULT 6032,
    weight INT CHECK (weight >= 0) NOT NULL DEFAULT 0,
    comment VARCHAR NOT NULL DEFAULT '',
    PRIMARY KEY (hostname, port) )

This table is a configuration table, and defines a list of ProxySQL peers.

  • hostname : peer's hostname/IP
  • port : peer's port
  • weight : currently unused, but in the roadmap for future enhancements
  • comment : free form comment field
Support for config file

Entries for proxysql_servers can be loaded form configuration file.

Below an example of how to cofigure proxysql_servers from config file:

proxysql_servers =
(
    {
        hostname="172.16.0.101"
        port=6032
        weight=0
        comment="proxysql1"
    },
    {
        hostname="172.16.0.102"
        port=6032
        weight=0
        comment="proxysql2"
    }
)

Note: ProxySQL reads from config file only if the database file doesn't exist, or if executed with --initial

  • configuration file doesn't support this table yet!
  • because this feature is still experimental, the table is not automatically loaded from disk

Table runtime_proxysql_servers

Table definition:

CREATE TABLE runtime_proxysql_servers (
    hostname VARCHAR NOT NULL,
    port INT NOT NULL DEFAULT 6032,
    weight INT CHECK (weight >= 0) NOT NULL DEFAULT 0,
    comment VARCHAR NOT NULL DEFAULT '',
    PRIMARY KEY (hostname, port) )

Like other runtime_ tables, this is a runtime representation of the base table: proxysql_servers .

Table runtime_checksums_values

Table definition:

CREATE TABLE runtime_checksums_values (
    name VARCHAR NOT NULL,
    version INT NOT NULL,
    epoch INT NOT NULL,
    checksum VARCHAR NOT NULL,
    PRIMARY KEY (name))

Table runtime_checksums_values is the first runtime_ table that is not the runtime representation of a base table. Table runtime_checksums_values shows information of when a LOAD TO RUNTIME command was executed:

  • name : name of the module
  • version : how many times LOAD TO RUNTIME was executed, either explicitly or not (executed internal due to some other event)
  • epoch : timestamp of when LOAD TO RUNTIME was executed
  • checksum : the checksum of the internal memory structure resulting from LOAD TO RUNTIME

Example:

Admin> SELECT * FROM runtime_checksums_values;
+-------------------+---------+------------+--------------------+
| name              | version | epoch      | checksum           |
+-------------------+---------+------------+--------------------+
| admin_variables   | 0       | 0          |                    |
| mysql_query_rules | 5       | 1503442167 | 0xD3BD702F8E759B1E |
| mysql_servers     | 1       | 1503440533 | 0x6F8CEF0F4BD6456E |
| mysql_users       | 1       | 1503440533 | 0xF8BDF26C65A70AC5 |
| mysql_variables   | 0       | 0          |                    |
| proxysql_servers  | 2       | 1503442214 | 0x89768E27E4931C87 |
+-------------------+---------+------------+--------------------+
6 rows in set (0,00 sec)

Note:
Only 4 of the 6 modules generate a checksum for now.

  • LOAD MYSQL QUERY RULES TO RUNTIME: generates a new checksum if admin-checksum_mysql_query_rules is true
  • LOAD MYSQL SERVERS TO RUNTIME: generates a new checksum if admin-checksum_mysql_servers is true
  • LOAD MYSQL USERS TO RUNTIME: generates a new checksum if admin-checksum_mysql_users is true
  • LOAD PROXYSQL SERVERS TO RUNTIME: generates a new checksum, always
  • LOAD ADMIN VARIABLES TO RUNTIME: does NOT generate a checksum yet
  • LOAD MYSQL VARIABLES TO RUNTIME: does NOT generate a checksum yet

New commands:

  • LOAD PROXYSQL SERVERS FROM MEMORY / LOAD PROXYSQL SERVERS TO RUNTIME
    loads ProxySQL servers from the in-memory database to the runtime data structures
  • SAVE PROXYSQL SERVERS TO MEMORY / SAVE PROXYSQL SERVERS FROM RUNTIME
    persists the ProxySQL Servers from the runtime data structures to the in-memory database
  • LOAD PROXYSQL SERVERS TO MEMORY / LOAD PROXYSQL SERVERS FROM DISK
    loads ProxySQL Servers from the on-disk database to the in-memory database
  • LOAD PROXYSQL SERVERS FROM CONFIG
    loads ProxySQL Servers from configuration file to the in-memory database
  • SAVE PROXYSQL SERVERS FROM MEMORY / SAVE PROXYSQL SERVERS TO DISK
    persists the ProxySQL Servers from the in-memory database to the on-disk database

stats tables

3 new tables were added in stats schema

Table stats_proxysql_servers_checksums

Table definition:

Admin> SHOW CREATE TABLE stats.stats_proxysql_servers_checksums\G
*************************** 1. row ***************************
       table: stats_proxysql_servers_checksums
Create Table: CREATE TABLE stats_proxysql_servers_checksums (
    hostname VARCHAR NOT NULL,
    port INT NOT NULL DEFAULT 6032,
    name VARCHAR NOT NULL,
    version INT NOT NULL,
    epoch INT NOT NULL,
    checksum VARCHAR NOT NULL,
    changed_at INT NOT NULL,
    updated_at INT NOT NULL,
    diff_check INT NOT NULL,
    PRIMARY KEY (hostname, port, name) )
1 row in set (0,00 sec)

This table shows the checksum of other proxies, and their status:

  • hostname : hostname of the peer
  • port : port of the peer
  • name : name of the module as reported in peer's runtime_checksums_values
  • version : version of checksum's module as reported in peer's runtime_checksums_values.
    Note that a ProxySQL instance just started will have version=1: for this reason, a ProxySQL instance will never sync from another instance having version=1, because it is unlikely that a ProxyQL instance just started is the source of truth. This prevent a new joining node to corrupt current Cluster configuration.
  • epoch : epoch of the checksum's module as reported in peer's runtime_checksums_values
  • checksum : the checksum's module as reported in peer's runtime_checksums_values
  • changed_at : the timestamp of when a checksum change was detected
  • updated_at : the timestamp of when this entry was last refreshed
  • diff_check : a counter that defines for how many checks the checksum of the remote peer's was different than the local checksum. The reconfiguration algorithm will wait a threshold to be reached before triggering a reconfiguration.
    This is useful in case the same configuration is applied to multiple proxies at the same time, or when proxies are reconfiguring themselves in case of a failover and they will likely converge without the need of resync.
    See also variables cluster_*_diffs_before_sync
    If diff_check increases a lot without triggering a synchronization it means that the remote peer is not a reliable source of truth, for example if version=1.
    On the other hand, it the remote peer doesn't sync with the rest of the cluster it means that the cluster doesn't have a reliable source of truth. This happen when all the proxies in a cluster starts with a different configuration, and they can't automatically decide which is the correct configuration. Running LOAD module TO RUNTIME on one of the node will automatically "elect" it to become the source of truth for that specific module.

Table stats_proxysql_servers_metrics

Table definition:

Admin> SHOW CREATE TABLE stats.stats_proxysql_servers_metrics\G
*************************** 1. row ***************************
       table: stats_proxysql_servers_metrics
Create Table: CREATE TABLE stats_proxysql_servers_metrics (
    hostname VARCHAR NOT NULL,
    port INT NOT NULL DEFAULT 6032,
    weight INT CHECK (weight >= 0) NOT NULL DEFAULT 0,
    comment VARCHAR NOT NULL DEFAULT '',
    response_time_ms INT NOT NULL,
    Uptime_s INT NOT NULL,
    last_check_ms INT NOT NULL,
    Queries INT NOT NULL,
    Client_Connections_connected INT NOT NULL,
    Client_Connections_created INT NOT NULL,
    PRIMARY KEY (hostname, port) )
1 row in set (0,00 sec)

This table shows some of the metrics that are retrieved when the clustering module executes SHOW MYSQL STATUS in its peers. Columns:

  • hostname : hostname of the peer
  • port : port of the peer
  • weight : same as reported in proxysql_servers.weight
  • comment : same as reported in proxysql_servers.comment
  • response_time_ms : response time while running SHOW MYSQL STATUS, in millisecond
  • Uptime_s : peer's uptime in second
  • last_check_ms : age of the last time a check was executed, in millisecond
  • Queries : number of queries executed by the peer
  • Client_Connections_connected : number of client's connections connected
  • Client_Connections_created : number of client's connections created

Note:
All the status variables are retrieved by the peers, but only few are monitored to be able to check if the peer is up and running and processing traffic.
Currently this feature is useful only for debugging purpose, but future versions will use these metrics to understand the health of remote peers.

Table stats_proxysql_servers_status

Currently unused

Bandwidth consideration

In the architecture described above, all nodes monitor all the other nodes. A fully mesh peer-to-peer network.
To reduce network usage, nodes do not always exchange the whole list of checksum: instead the exchange a single checksum resulting from combining all the versions and all the checksums. It this global checksum changed, a detailed list of checksums is retrieved.
Using this technique, a 200 nodes cluster monitoring each other every 1000ms, requires a bandwidth of 50KBpb in/out to/from each node.

Re-configuration

Becauses proxies monitor each others, they can immediately know when a checksum of a confguration changed, that means that the configuration itself changes. If a configuration changed, it is checked against its own configuration because it is possible that the remote peer's configuration and its own configuration have changed at the same time, or within a short period of time. If they differ:

  • if the own version is 1 , find the peer with version > 1 and with the highest epoch, and sync immediately
  • if the own version is greater than 1, starts counting for how many checks they differ
    • when the number of checks in which they differ is greater than cluster__name___diffs_before_sync and cluster__name__diffs_before_sync itself is greater than 0, find the peer with version > 1 and with the highest epoch, and sync immediately (Note: it is possible that a different is detected against a node, but the sync is performed against a different node. Because the sync is done with the node with the highest epoch, it is expected that all the nodes will converge)

The syncing process is performed as follow:

  • the same connection used to perform the health check is used to execute a series of SELECT statements in the form of SELECT _list_of_columns_ FROM runtime_module . For example:
SELECT hostgroup_id, hostname, port, status, weight, compression, max_connections, max_replication_lag, use_ssl, max_latency_ms, comment FROM runtime_mysql_servers;
SELECT writer_hostgroup, reader_hostgroup, comment FROM runtime_mysql_replication_hostgroups;
  • delete the local configuration tables. For example:
DELETE FROM mysql_servers;
DELETE FROM mysql_replication_hostgroup;
  • insert into the local configuration tables what retrieved from remote peer
  • issue an internal LOAD module_name TO RUNTIME : this will increase version number and create a new checksum
  • if cluster__name__save_to_disk is true, issue an internal SAVE module_name TO DISK

TODO

  • add support for MySQL Group Replication
  • add support for Scheduler

Roadmap

This is an overview of the features related to clustering, and not a complete list. None the following list is impletemented yet.
Implementation may be different than what listed right now:

  • support for master election: the word master was intentionally chosen instead of leader
  • only master proxy is writable/configurable
  • implementation of MySQL-like replication from master to slaves, allowing to push configuration in real-time instead pulling it
  • implementation of MySQL-like replication from master to candidate-masters
  • implementation of MySQL-like replication from candidate-masters to slaves
  • creation of a quorum with only candidate-masters: normal slaves are not part of the quorum

Q&A

What if a different configuration is loaded at the same time on each of the proxysql servers, which configuration is the one that needs to be "propagated" to all other nodes ? The last one?

The concept of master and master election is not implemented yet. That means that a LOAD command can be potentially be executed on multiple nodes at the same time (multi-master, to make some analogy), and each will trigger an automatic reconfiguration with timestamp based conflict-resolution.
If the same configuration is loaded at the same time on multiple proxysql instances, they should automatically converge.
If different configurations are loaded on multiple proxysql instances at different times, the last one will win.
If different configurations are loaded on multiple proxysql instances at the same time, the two configurations will start propagating till the point in which they won't converge as conflict resolution is not possible.
The good thing is that each ProxySQL knows the checksum of configuration of every other node, so mismatches are easy to detect and monitor.

who is writing this configuration to all those nodes?

Currently a pull mechanism is used, therefore the node that detects it needs to reconfigure itself will pull the configuration from the node with the most up-to-date configuration and apply it locally.

How are you going to implement election ? Raft consensus protocol ?

Implementation of election is in the roadmap, but probably not Raft consensus protocol.
ProxySQL uses tables to store configuration, it uses the MySQL protocol to perform requests to its peers querying their health and their configuration, it uses the MySQL protocol to implement heatbeat and much more: for these reasons, in the case of ProxySQL, the MySQL protocol itself might be a more versatile solution compared to Raft protocol.

What will happen if for some reason one of the nodes will be unable to grab the new configuration in an event of re-configuration?

Changes are propagated asynchronously. Therefore it is possible that one of nodes is not able to grab the new configuration, for example in case of network issue of ProxySQL being restarted. Yet, when a ProxySQL instance detects that one of its peers has a newer configuration, it will automatically grab it.

What about crossdc ? what will be the best practice , having a cluster in each DC?

Cluster do not have boundaries, therefore it is possible to have a single cluster across multiple DCs, or to have multiple clusters in the same DC, or multiple clusters across multiple DCs. This really depends from the specific use case.
The only limitation is that each proxysql instance needs to belong to a single cluster.
Clusters do not have names, and to make sure that a node doesn't erroneously join the wrong cluster it is important to ensure that each cluster uses different credentials. See admin-admin_credentials, admin-cluster_username and admin-cluster_password.

Could be a nice feature to somehow replicate the configuration crossdc but prefer traffic to the backend server that is closest to the local proxysql server. I am doing it now using weight.

For this specific case I think it makes more sense to create a different cluster for each DC, as configuration will be different.

How is a new proxysql going to join the cluster ?

Bootstrap is very easy: starts with at least 1 peer in proxysql_servers .

How will all other proxysql server know there is a new node ?

They do not know it automatically, and this is intentional to prevent that a new node may corrupt the cluster.
In other words, a new node can pull the configuration as soon as it joins, but cannot advertise itself as the source of truth.
To let the other proxysql instances know that there is a new node, it is enough to add the new node in proxysql_servers of any node of the current cluster and issue LOAD PROXYSQL SERVERS TO RUNTIME .

Clone this wiki locally