rabbit_peer_discovery: Fixes and improvements for Consul and etcd (backport #11045) #11647

mergify · 2024-07-09T16:11:34Z

Why

Node registration

After the rework of the peer discovery subsystem in RabbitMQ 3.13.0 to fix cases where clusters would be incorrectly initialized, there were regressions in the Consul and etcd backends causing nodes to hang during peer discovery and eventually boot as standalone nodes.

For Consul and etcd, the discovery process relies on the node registration: they return whatever was previously registered. With the new checks and failsafes added in RabbitMQ 3.13.0, the fact that registration happens after running discovery breaks these backends.

It used to work before because the first node would eventually time out waiting for a non-empty list of nodes from the backend and proceed as a standalone node, registering itself on the way. Following nodes would then discover that first node.

Among the new checks, the node running discovery expects to find itself in the list of discovered nodes. Because it didn't register yet, it will never find itself and reject the peer discovery results.

Node selection

In addition to that, RabbitMQ 3.13 introduce a new nodes sorting to make sure the node selection was deterministic. However in Consul, the order in which nodes are registered should be considered in case registration happens in a different order that nodes booted in. Therefore, the patch now allows to let the backend select the node to join and return a single node instead of a list.

This is committed at the same time as the first part because the new testsuites may fail transiently without this.

How

The solution is to register first, then run discovery. The node should at least get itself in the discovered nodes.

While here, add a testsuite in both the Consul and etcd plugins to actually test clustering against a real Consul/etcd server.

Finally, allow backends to select the node to join and return a single node name instead of a list. We still query that node’s properties to see if it is ready to be joined.

This is an automatic backport of pull request #11045 done by Mergify.

[Why] A variable can be set to `false` to explicitly unset it in the child process. This was ignored and converted to the string "false". [How] We special-case `false` and leave it as is. (cherry picked from commit 6266e64)

[Why] The existing testsuite tried if the communication with an etcd node would work, but didn't test an actual cluster formation. [How] The new testcases try to create a cluster using the local etcd node started by the testsuite. The first one starts one RabbitMQ node at a time. the second one starts all of them concurrently. While here, use the etcd source code added as a Git submodule in a previous commit to compile etcd locally just for the testsuite. (cherry picked from commit 50b4901)

[Why] The `consul_svc` parameter is used as the service name and to construct the service ID. The problem with the way the service ID is constructed is that it doesn't allow to register several distinct RabbitMQ nodes in the same Consul agent. This is a problem for testsuites where we want to run several RabbitMQ nodes on the same host with a single local Consul agent. [How] The service ID has now its own parameters, `consul_svc_id`. If this one is unset, it falls back to the previous construction from the service name. This allows to remain 100% compatible with previous versions. (cherry picked from commit 684ec76)

…stration [Why] This allows other nodes to discover the actual node names, instead of deriving one from the Consul agent node name and their own node name. This permits to register several RabbitMQ nodes in the same Consul agent. This is at least handy for testsuites. [How] The Erlang node name is added to the `Meta` properties list as long as the RabbitMQ cluster name. Note that this also fixes when the cluster name is added to `Meta`: before this commit, a non-default cluster name was not added if the user-configured properties list was empty at the beginning. (cherry picked from commit 750497c)

[Why] Add a `system_SUITE` testsuite, copied from rabbitmq_peer_discovery_etcd, that attempts to start a RabbitMQ cluster where nodes use a Consul server to discover themselves. [How] The new testcases try to create a cluster using the local Consul node started by the testsuite. The first one starts one RabbitMQ node at a time. the second one starts all of them concurrently. While here, use the Consul source code added as a Git submodule in a previous commit to compile Consul locally just for the testsuite. (cherry picked from commit 27ed4d2)

[Why] The new implementation of `rabbit_peer_discovery` acquires the lock only when a node needs to join another one. This is meant to disappear in the medium/long term anyway. Here, we need to lock the query to Consul to make sure that queries happen sequentially, not concurrently. This is a work in progress and we may not keep it either. (cherry picked from commit a56d82c)

[Why] The two backends that use registration are Consul and etcd. The discovery process relies on the registered nodes: they return whatever was previously registered. With the new checks and failsafes added in peer discovery in RabbitMQ 3.13.0, the fact that registration happens after running discovery breaks Consul and etcd backend. It used to work before because the first node would eventually time out waiting for a non-empty list of nodes from the backend and proceed as a standalone node, registering itself on the way. Following nodes would then discover that first node. Among the new checks, the node running discovery expects to find itself in the list of discovered nodes. Because it didn't register yet, it will never find itself. [How] The solution is to register first, then run discovery. The node should at least get itself in the discovered nodes. (cherry picked from commit cb9f0d8)

…selves [Why] Before, the backend would always return a list of nodes and the subsystem would select one based on their uptimes, the nodes they are already clustered with, and the readiness of their database. This works well in general but has some limitations. For instance with the Consul backend, the discoverability of nodes depends on when each one registered and in which order. Therefore, the node with the highest uptime might not be the first that registers. In this case, the one that registers first will only discover itself and boot as a standalone node. However, the one with the highest uptime that registered after will discover both nodes. It will then select itself as the node to join because it has the highest uptime. In the end both nodes form distinct clusters. Another example is the Kubernetes backend. The current solution works fine but it could be optimized: the backend knows we always want to join the first node ("$node-0") regardless of the order in which they are started because picking the first node alphabetically is fine. Therefore we want to let the backend selects the node to join if it wants. [How] The `list_nodes()` callback can now return the following term: {ok, {SelectedNode :: node(), NodeType}} If the subsystem sees this return value, it will consider that the returned node is the one to join. It will still query properties because we want to make sure the node's database is ready before joining it. (cherry picked from commit 3147ab7)

[Why] The default node selection of the peer discovery subsystem doesn't work well with Consul. The reason is that that selection is based on the nodes' uptime. However, the node with the highest uptime may not be the first to register in Consul. When this happens, the node that registered first will only discover itself and boot as a standalone node. Then, the node with the highest uptime will discover both of them, but will select itself as the node to join because of its uptime. In the end, we end up with two clusters instead of one. [How] We use the `CreateIndex` property in the Consul response to sort services. We then derive the name of the node to join after the service that has the lower `CreateIndex`, meaning it was the first to register. (cherry picked from commit 0f054e1)

…s unsupported [How] We must check the return value of `rabbit_ct_broker_helpers:run_steps/2` because it could ask that the testsuite/testgroup/testcase should be skipped. (cherry picked from commit e890b9d)

mergify bot assigned dumbbell Jul 9, 2024

mergify bot added the bazel label Jul 9, 2024

michaelklishin added this to the 3.13.5 milestone Jul 9, 2024

dumbbell force-pushed the mergify/bp/v3.13.x/pr-11045 branch from 6d444fe to 66b50c3 Compare July 10, 2024 08:37

dumbbell added 10 commits July 10, 2024 17:01

rabbitmq_peer_discovery_{etcd,consul}: Fix error handling if Khepri i…

4108700

…s unsupported [How] We must check the return value of `rabbit_ct_broker_helpers:run_steps/2` because it could ask that the testsuite/testgroup/testcase should be skipped. (cherry picked from commit e890b9d)

dumbbell force-pushed the mergify/bp/v3.13.x/pr-11045 branch from 0cf4655 to 4108700 Compare July 10, 2024 15:02

dumbbell mentioned this pull request Jul 10, 2024

rabbitmq_peer_discovery_{etcd,consul}: Fix error handling if Khepri is unsupported #11661

Merged

dumbbell merged commit cb288ab into v3.13.x Jul 10, 2024
246 checks passed

dumbbell deleted the mergify/bp/v3.13.x/pr-11045 branch July 10, 2024 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rabbit_peer_discovery: Fixes and improvements for Consul and etcd (backport #11045) #11647

rabbit_peer_discovery: Fixes and improvements for Consul and etcd (backport #11045) #11647

mergify bot commented Jul 9, 2024

rabbit_peer_discovery: Fixes and improvements for Consul and etcd (backport #11045) #11647

rabbit_peer_discovery: Fixes and improvements for Consul and etcd (backport #11045) #11647

Conversation

mergify bot commented Jul 9, 2024

Why

Node registration

Node selection

How