Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbit_peer_discovery: Rewrite the core logic #9797

Merged
merged 8 commits into from
Dec 7, 2023

Conversation

dumbbell
Copy link
Member

@dumbbell dumbbell commented Oct 26, 2023

Why

This work started as an effort to add peer discovery support to our Khepri integration. Indeed, as part of the task to integrate Khepri, we missed the fact that rabbit_peer_discovery:maybe_create_cluster/1 was called from the Mnesia-specific code only. Even though we knew about it because we hit many issues caused by the fact the join_cluster and peer discovery use different code path to create a cluster.

To add support for Khepri, the first version of this patch was to move the call to rabbit_peer_discovery:maybe_create_cluster/1 from rabbit_db_cluster instead of rabbit_mnesia. To achieve that, it made sense to unify the code and simply call rabbit_db_cluster:join/2 instead of duplicating the work.

Unfortunately, doing so highlighted another issue: the way the node to cluster with was selected. Indeed, it could cause situations where multiple clusters are created instead of one, without resorting to out-of-band counter-measures, like a 30-second delay added in the Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was even more frequent when we tried to unify the code path and call join_cluster.

After several iterations on the patch and even more discussions with the team, we decided to rewrite the algorithm to make node selection more robust and still use rabbit_db_cluster:join/2 to create the cluster.

How

We wanted the new algorithm to fulfill the following properties:

  1. rabbit_peer_discovery should provide the ability to re-trigger it easily to re-evaluate the cluster. The new public API is rabbit_peer_discovery:sync_desired_cluster/0.

  2. The selection of the node to join should be designed in a way that all nodes select the same, regardless of the order in which they become available. The adopted solution is to sort the list of discovered nodes with the following criterias (in that order):

    1. the size of the cluster a discovered node is part of; sorted from bigger to smaller clusters
    2. the start time of a discovered node; sorted from older to younger nodes
    3. the name of a discovered node; sorted alphabetically

    The first node in that list will not join anyone and simply proceed with its boot process. Other nodes will try to join the first node.

  3. To reduce the chance of incorrectly having multiple standalone nodes because the discovery backend returned only a single node, we want to apply the following constraints to the list of nodes after it is filtered and sorted (see property 2 above):

    • The list must contain node() (i.e. the node running peer discovery itself).
    • If the RabbitMQ's cluster size hint is greater than 1, the list must have at least two nodes. The cluster size hint is the maximum between the configured target cluster size hint and the number of elements in the nodes list returned by the backend.

    If one of the constraint is not met, the entire peer discovery process is restarted after a delay.

  4. The lock is acquired only to protect the actual join, not the discovery step where the backend is queried to get the list of peers. With the node selection described above, this will let the first node to start without acquiring the lock.

  5. The cluster membership views queried as part of the algorithm to sort the list of nodes will be used to detect additional clusters or standalone nodes that did not cluster correctly. These nodes will be asked to re-evaluate peer discovery to increase the chance of forming a single cluster.

  6. After some delay, peer discovery will be re-evaluated to further eliminate the chances of having multiple clusters instead of one.

This commit covers properties from point 1 to point 4. Remaining properties will be the scope of additional pull requests after this one works.

If there is a failure at any point during discovery, filtering/sorting, locking or joining, the entire process is restarted after a delay. This is configured using the following parameters:

  • cluster_formation.discovery_retry_limit
  • cluster_formation.discovery_retry_interval

The default parameters were bumped to 30 retries with a delay of 1 second between each.

The locking retries/interval parameters are not used by the new algorithm anymore.

There are extra minor changes that come with the rewrite:

  • The configured backend is cached in a persistent term. The goal is to make sure we use the same backend throughout the entire process and when we call maybe_unregister/0 even if the configuration changed for whatever reason in between.
  • maybe_register/0 is called from rabbit_db_cluster instead of at the end of a successful peer discovery process. rabbit_db_cluster had to call maybe_register/0 if the node was not virgin anyway. So make it simpler and always call it in rabbit_db_cluster regardless of the state of the node.
  • log_configured_backend/0 is gone. maybe_init/0 can log the backend directly. There is no need to explicitly call another function for that.
  • Messages are logged using ?LOG_*() macros instead of the old rabbit_log module.

@dumbbell dumbbell added this to the 3.13.0 milestone Oct 26, 2023
@dumbbell dumbbell self-assigned this Oct 26, 2023
@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch 3 times, most recently from d37aab6 to ed97c34 Compare October 27, 2023 14:07
@dumbbell dumbbell modified the milestones: 3.13.0, 4.0.0 Oct 27, 2023
@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch 7 times, most recently from 2246618 to 3eb3b34 Compare November 2, 2023 16:39
@dumbbell
Copy link
Member Author

dumbbell commented Nov 8, 2023

After several iterations on the patch and more discussions with the team, I will restart from scratch. The idea is still to use the common join_cluster code path instead of having something specific to peer discovery. However, this highlights some issues with the current peer discovery code. In particular, the way the node to join is selected means that e.g. node 4 joins node 1, but node 1 then joins node 2, kicking node 4 out of the cluster.

The current consensus is to have the following properties:

  1. rabbit_peer_discovery should provide the ability to re-trigger it easily to re-evaluate the cluster.

  2. The selection of the node to join should be designed in a way that all nodes select the same, regardless of the order in which they become available. The current solution is to sort the list of discovered nodes with the following criterias (in that order):

    1. the size of the cluster a discovered node is part of; sorted from bigger to smaller clusters
    2. the start time of a discovered node; sorted from older to younger nodes
    3. the name of a discovered node; sorted alphabetically

    The first node in that list does not join anyone and simply proceed with its boot process. Other nodes try to join the first node.

  3. Only the actual join will be protected by the peer discovery lock, not the discovery step where the backend is queried to get the list of peers. With the node selection described above, this will let the first node to start without acquiring the lock.

  4. To reduce the chance of incorrectly having multiple standalone nodes because the discovery backend returned only a single node, we want the following failsafes:

    • If the backend returns a list that does not contain node(), the node will wait and retry the query.
    • If the RabbitMQ's cluster size hint is greater than 1 and the backend returns a single node, the node will wait and retry the query until at least two nodes are returned. The cluster size hint is the maximum between the configured target cluster size hint and the number of elements in the nodes list returned by the backend.
    • After a timeout, the node will abort peer discovery and boot as a standalone node.
  5. The cluster membership views queried as part of the algorithm to sort the list of nodes will be used to detect standalone nodes that did not cluster correctly or additional clusters. These nodes will be asked to re-evaluate peer discovery to increase the chance of forming a single cluster.

  6. After some delay, peer discovery will be re-evaluated to further eliminate the chances of having multiple clusters instead of one.

This pull request will cover properties from point 1 to point 3 point 4. Remaining properties will be the scope of additional pull requests after this one works.

@mkuratczyk
Copy link
Contributor

Sounds great, thanks for all the work you are putting into this! Perhaps this is implicitly covered by 4B, but one things I'd add is that if the cluster size hint is N, a node that doesn't know see N nodes in the cluster should not consider itself successfully booted.

@dumbbell
Copy link
Member Author

dumbbell commented Nov 8, 2023

This is not covered by 4B. I clarified my comment. The idea is that if the cluster size hint is 2 or more, peer discovery should expect that the backend returns at least two nodes. This is to avoid that an early query of the backend only returns [node()] to every nodes.

a node that doesn't now see N nodes in the cluster should not consider itself successfully booted.

Do you suggest that peer discovery should wait for a list of discovered nodes of size N? Or should peer discovery finish, but later in the boot process, we pause until all N nodes joined the cluster?

What about deployments where nodes are started sequentially? Or deployments where the peer discovery backend and the configured cluster hint are out-of-sync? Should the node fail to boot after some time, or should it boot with a warning?

@michaelklishin
Copy link
Member

michaelklishin commented Nov 8, 2023

If we wait for N nodes to appear, and N = the total number of nodes, a single node that fails to boot will cause issues.

Also, Kubernetes generally assumes that nodes do not have any inter-dependencies.

Hence the idea to re-evaluate cluster membership and retry after node boot, with a delay.

@mkuratczyk
Copy link
Contributor

I guess I haven't thought that through before. Indeed, with sequentially booted nodes, the nodes would need to be fully booted, for the next to even start booting (eg. that's the default behaviour for StatefulSets on Kubermetes, our Operator sets the startup policy to Parallel, but there are many deployments that don't use this option). I guess what I'd mostly want and I think Michael's comment covers is that ideally, the cluster should be somehow unusable until all expected nodes are there. Eg we discussed in the past the possibility of rejecting any queue declaration until all expected nodes are present - that could be a way to make it clear that something went wrong (and ideally automatically fixing/retrying cluster formation).

@dumbbell
Copy link
Member Author

dumbbell commented Nov 9, 2023

If the RabbitMQ's cluster size hint is greater than 1 and the backend returns a single node, the node will wait and retry the query until at least two nodes are returned.

Small update to this: instead of using the configured target cluster size hint alone, I take the max between this value and the number of nodes returned by the backend. This is handy with the classic config backend for instance: the static list of nodes is the cluster size hint.

@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch 2 times, most recently from d35c752 to e39d53c Compare November 9, 2023 13:10
@dumbbell dumbbell modified the milestones: 4.0.0, 3.13.0 Nov 9, 2023
@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch from e39d53c to 89bbfc3 Compare November 9, 2023 15:40
@mergify mergify bot added the bazel label Nov 10, 2023
@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch from 686084b to 18be108 Compare November 10, 2023 15:42
@dumbbell dumbbell marked this pull request as ready for review December 6, 2023 15:26
@dumbbell
Copy link
Member Author

dumbbell commented Dec 6, 2023

I finished another round of fixes. In particular:

  • Some testcases like classic_config_discovery_node_list in clustering_management_SUITE didn't reproduce a real world use case and peer discovery wouldn't work in these peculiar conditions. The testcases were fixed to better reflect reality.
  • If the Erlang cooking was set through $RABBITMQ_ERLANG_COOKIE instead of the default file, the temporary hidden node would not use the expected cookie, leading to failure to communicate.

Copy link
Contributor

@mkuratczyk mkuratczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've run many tests on Kubernetes (kind and GKE) and we found no issues. It's certainly much better than current main, which has multiple issues.

Once merged, we can ask the community for additional testing in other environments.

…_subset_of_nodes_coming_online`

[Why]
The testcase was broken as part of the work on Khepri (#7206): all nodes
were started, making it an equivalent of the `successful_discovery`
testcase.

[How]
We drop the first entry in the list of nodes given to
`rabbit_ct_broker_helpers`. This way, it won't be started at all while
still being listed in the classic config parameter.
[Why]
This work started as an effort to add peer discovery support to our
Khepri integration. Indeed, as part of the task to integrate Khepri, we
missed the fact that `rabbit_peer_discovery:maybe_create_cluster/1` was
called from the Mnesia-specific code only. Even though we knew about it
because we hit many issues caused by the fact the `join_cluster` and
peer discovery use different code path to create a cluster.

To add support for Khepri, the first version of this patch was to move
the call to `rabbit_peer_discovery:maybe_create_cluster/1` from
`rabbit_db_cluster` instead of `rabbit_mnesia`. To achieve that, it made
sense to unify the code and simply call `rabbit_db_cluster:join/2`
instead of duplicating the work.

Unfortunately, doing so highlighted another issue: the way the node to
cluster with was selected. Indeed, it could cause situations where
multiple clusters are created instead of one, without resorting to
out-of-band counter-measures, like a 30-second delay added in the
Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was
even more frequent when we tried to unify the code path and call
`join_cluster`.

After several iterations on the patch and even more discussions with the
team, we decided to rewrite the algorithm to make node selection more
robust and still use `rabbit_db_cluster:join/2` to create the cluster.

[How]
This commit is only about the rewrite of the algorithm. Calling peer
discovery from `rabbit_db_cluster` instead of `rabbit_mnesia` (and thus
making peer discovery work with Khepri) will be done in a follow-up
commit.

We wanted the new algorithm to fulfill the following properties:

1. `rabbit_peer_discovery` should provide the ability to re-trigger it
   easily to re-evaluate the cluster. The new public API is
   `rabbit_peer_discovery:sync_desired_cluster/0`.

2. The selection of the node to join should be designed in a way that
   all nodes select the same, regardless of the order in which they
   become available. The adopted solution is to sort the list of
   discovered nodes with the following criterias (in that order):

    1. the size of the cluster a discovered node is part of; sorted from
       bigger to smaller clusters
    2. the start time of a discovered node; sorted from older to younger
       nodes
    3. the name of a discovered node; sorted alphabetically

   The first node in that list will not join anyone and simply proceed
   with its boot process. Other nodes will try to join the first node.

3. To reduce the chance of incorrectly having multiple standalone nodes
   because the discovery backend returned only a single node, we want to
   apply the following constraints to the list of nodes after it is
   filtered and sorted (see property 2 above):

    * The list must contain `node()` (i.e. the node running peer
      discovery itself).
    * If the RabbitMQ's cluster size hint is greater than 1, the list
      must have at least two nodes. The cluster size hint is the maximum
      between the configured target cluster size hint and the number of
      elements in the nodes list returned by the backend.

   If one of the constraint is not met, the entire peer discovery
   process is restarted after a delay.

4. The lock is acquired only to protect the actual join, not the
   discovery step where the backend is queried to get the list of peers.
   With the node selection described above, this will let the first node
   to start without acquiring the lock.

5. The cluster membership views queried as part of the algorithm to sort
   the list of nodes will be used to detect additional clusters or
   standalone nodes that did not cluster correctly. These nodes will be
   asked to re-evaluate peer discovery to increase the chance of forming
   a single cluster.

6. After some delay, peer discovery will be re-evaluated to further
   eliminate the chances of having multiple clusters instead of one.

This commit covers properties from point 1 to point 4. Remaining
properties will be the scope of additional pull requests after this one
works.

If there is a failure at any point during discovery, filtering/sorting,
locking or joining, the entire process is restarted after a delay. This
is configured using the following parameters:
* cluster_formation.discovery_retry_limit
* cluster_formation.discovery_retry_interval

The default parameters were bumped to 30 retries with a delay of 1
second between each.

The locking retries/interval parameters are not used by the new
algorithm anymore.

There are extra minor changes that come with the rewrite:
* The configured backend is cached in a persistent term. The goal is to
  make sure we use the same backend throughout the entire process and
  when we call `maybe_unregister/0` even if the configuration changed
  for whatever reason in between.
* `maybe_register/0` is called from `rabbit_db_cluster` instead of at
  the end of a successful peer discovery process. `rabbit_db_cluster`
  had to call `maybe_register/0` if the node was not virgin anyway. So
  make it simpler and always call it in `rabbit_db_cluster` regardless
  of the state of the node.
* `log_configured_backend/0` is gone. `maybe_init/0` can log the backend
  directly. There is no need to explicitly call another function for
  that.
* Messages are logged using `?LOG_*()` macros instead of the old
  `rabbit_log` module.
[Why]
We go through a temporary hidden node to query all other discovered
peers properties, instead of querying them directly.

The reason is that we don't want that Erlang automatically connect all
nodes together as a side effect (to form the full mesh network by
default). If we let Erlang do that, we may interfere with the Feature
flags controller which is globally registered when it performs an
operation. If all nodes become connected, it's possible two or more
globally registered controllers end up connected before they are ready
to be clustered, and thus in the same "namespace". `global' will kill
all but one of them.

[How]
By using a temporary intermediate hidden node, we ask Erlang not to
connect everyone automatically.

V2: Set `-setcookie <cookie>` in the temporary hidden node's VM
    arguments if one was set in the RabbitMQ context. This is required
    if the Erlang cookie is not written to disk; it might be the case
    with some container deployments.
…nodes only

[Why]
A lock is acquired to protect against concurrent cluster joins.

Some backends used to use the entire list of discovered nodes and used
`global` as the lock implementation. This was a problem because a side
effect was that all discovered Erlang nodes were connected to each
other. This led to conflicts in the global process name registry and
thus processes were killed randomly.

This was the case with the feature flags controller for instance. Nodes
are running some feature flags operation early in boot before they are
ready to cluster or run the peer discovery code. But if another node was
executing peer discovery, it could make all nodes connected. Feature
flags controller unrelated instances were thus killed because of another
node running peer discovery.

[How]
Acquiring a lock on the joining and the joined nodes only is enough to
achieve the goal of protecting against concurrent joins. This is
possible because of the new core logic which ensures the same node is
used as the "seed node". I.e. all nodes will join the same node.

Therefore the API of `rabbit_peer_discovery_backend:lock/1` is changed
to take a list of nodes (the two nodes mentionned above) instead of one
node (which was the current node, so not that helpful in the first
place).

These backends also used to check if the current node was part of the
discovered nodes. But that's already handled in the generic peer
discovery code already.

CAUTION: This brings a breaking change in the peer discovery backend
API. The `Backend:lock/1` callback now takes a list of node names
instead of a single node name. This list will contain the current node
name.
[Why]
The group leader for all processes on the temporary hidden node is the
calling process' group leader on the upstream node.

When we use `erpc:call/4` (or the multicall equivalent) to execute code
on one of the given nodes, the remotely executed code will also use the
calling process' group leader by default.

We use this temporary hidden node to ensure the downstream node will not
connected to the upstream node. Therefore, we must change the group
leader as well, otherwise any I/O from the downstream node will send a
message to the upstream node's group leader and thus open a connection.
This would defeat the entire purpose of this temporary hidden node.

[How]
To avoid this, we start a proxy process which we will use as a group
leader. This process will send all messages it receives to the group
leader on the upstream node.

There is one caveat: the logger (local to the temporary hidden node)
forwards log messages to the upstream logger (on the upstream node) only
if the group leader of that message is a remote PID. Because we set a
local PID, it stops forwarding log messages originating from that
temporary hidden node. That's why we use `with_group_leader_proxy/2` to
set the group leader to our proxy only around the use of `erpc`.

That's a lot just to keep logging working while not reveal the upstream
node to the downstream node...
…art-cluster`

[Why]
So far, we use the CLI to create the cluster after starting the
individual nodes.

It's faster to use peer discovery and gives more exposure to the
feature. Thus it will be easier to quickly test changes to the peer
discovery subsystem with a simple `make start-cluster`.

[How]
We pass the classic configuration `cluster_nodes` application
environment variable to all nodes' command line.

This is only when the target is `start-cluster`, not `start-brokers`.
[Why]
If a node joins the selected node but the selected node's DB layer is
not ready, it will fail and the whole peer discovery process will
restart (until the selected node is ready).

That's fine, but scary messages are logged for a situation that is not
really an actual error at this point.

[How]
While querying properties of all discovered nodes, we also check is the
DB layer is ready using `rabbit_db:is_init_finished/0`. We then use this
property to determine if we can try to join or if we should wait and
retry.

This avoids a join which we know will fail eventually, and thus error
messages.
... instead of `rabbit_mnesia`.

[Why]
We need it for both Mnesia and Khepri. So instead of calling it in
`rabbit_khepri` too, let's manage this from `rabbit_db` directly.
@dumbbell dumbbell force-pushed the use-join_cluster-in-peer-discovery branch from fef910b to e1261b9 Compare December 7, 2023 14:52
@dumbbell dumbbell merged commit 5157f25 into main Dec 7, 2023
19 checks passed
@dumbbell dumbbell deleted the use-join_cluster-in-peer-discovery branch December 7, 2023 15:05
%% Peer discovery may have been a no-op if it decided that all other nodes
%% should join this one. Therefore, we need to look at if this node is
%% still virgin and finish our use of Mnesia accordingly. In particular,
%% this second part crates all our Mnesia tables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%% this second part crates all our Mnesia tables.
%% this second part creates all our Mnesia tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!


maybe_init() ->
Backend = backend(),
?LOG_DEBUG(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
?LOG_DEBUG(
?LOG_INFO(

Is there another place where the configured / in-use backend is logged at an INFO level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!

Comment on lines +390 to +392
Ret = erpc:call(Peer, ?MODULE, do_query_node_props, [Nodes]),
peer:stop(Pid),
Ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ret = erpc:call(Peer, ?MODULE, do_query_node_props, [Nodes]),
peer:stop(Pid),
Ret;
Ret = try
erpc:call(Peer, ?MODULE, do_query_node_props, [Nodes])
after
peer:stop(Pid)
end,
Ret;

Would this help ensure no dangling peers in the case of an erpc:call exception? Apologies if my suggested code doesn't compile OOTB 😬

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!

Comment on lines +450 to +454
UpstreamGroupLeader = erlang:group_leader(),
true = erlang:group_leader(ProxyGroupLeader, self()),
Ret = Fun(),
true = erlang:group_leader(UpstreamGroupLeader, self()),
Ret.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UpstreamGroupLeader = erlang:group_leader(),
true = erlang:group_leader(ProxyGroupLeader, self()),
Ret = Fun(),
true = erlang:group_leader(UpstreamGroupLeader, self()),
Ret.
UpstreamGroupLeader = erlang:group_leader(),
Ret = try
true = erlang:group_leader(ProxyGroupLeader, self()),
Fun()
after
true = erlang:group_leader(UpstreamGroupLeader, self())
end,
Ret.

Overkill? Not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!

NodesAndProps2 = sort_nodes_and_props(NodesAndProps1),
%% Wait for the proxy group leader to flush its inbox.
ProxyGroupLeader ! stop_proxy,
receive proxy_stopped -> ok end,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a timeout here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!

Copy link
Collaborator

@lukebakken lukebakken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one more suggestion.


ThisNodeIsIncluded andalso HasEnoughNodes;
can_use_discovered_nodes(_DiscoveredNodes, []) ->
?LOG_DEBUG(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be LOG_INFO or WARNING

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #10073. Thanks!

dumbbell added a commit that referenced this pull request Dec 8, 2023
dumbbell added a commit that referenced this pull request Dec 8, 2023
[Why]
They both can be useful to diagnose before debug messages are enabled.

Also, the second message tells to enable debug messages to get more
details, but it was logged at the debug level :-)

Follow up to #9797.

Submitted by: @lukebakken
dumbbell added a commit that referenced this pull request Dec 8, 2023
[Why]
We start a temporary hidden node, then ask it to execute some code, then
stop it. But if there is an execution in between, we leave the
temporary hidden node running.

Likewise when we mess with the group leader: if the function executed
after overriding the group leader temporarily raises an exception, the
group leader becomes permanent and we may miss log messages.

[How]
We simply use a try/after block to ensure the temporary things are
reverted at the end, regardless of the success or failure.

Follow up to #9797.

Submitted by: @lukebakken
dumbbell added a commit that referenced this pull request Dec 8, 2023
… leader

... to exit.

[How]
It should never be stuck obviously™. But in case planets are not
aligned, we wait for its exit with a timeout. If it stays around, this
is not the end of the world.

Follow up to #9797.

Submitted by: @lukebakken
michaelklishin pushed a commit that referenced this pull request Feb 29, 2024
michaelklishin pushed a commit that referenced this pull request Feb 29, 2024
[Why]
They both can be useful to diagnose before debug messages are enabled.

Also, the second message tells to enable debug messages to get more
details, but it was logged at the debug level :-)

Follow up to #9797.

Submitted by: @lukebakken
michaelklishin pushed a commit that referenced this pull request Feb 29, 2024
[Why]
We start a temporary hidden node, then ask it to execute some code, then
stop it. But if there is an execution in between, we leave the
temporary hidden node running.

Likewise when we mess with the group leader: if the function executed
after overriding the group leader temporarily raises an exception, the
group leader becomes permanent and we may miss log messages.

[How]
We simply use a try/after block to ensure the temporary things are
reverted at the end, regardless of the success or failure.

Follow up to #9797.

Submitted by: @lukebakken
michaelklishin pushed a commit that referenced this pull request Feb 29, 2024
… leader

... to exit.

[How]
It should never be stuck obviously™. But in case planets are not
aligned, we wait for its exit with a timeout. If it stays around, this
is not the end of the world.

Follow up to #9797.

Submitted by: @lukebakken
dumbbell added a commit that referenced this pull request Mar 18, 2024
[Why]
The Consul peer discovery backend needs to create a session before it
can acquire a lock. This session is also required for nodes to discover
each other.

This session was created as part of the lock callback. However, after
pull request #9797, the lock was only acquired if and when a node had ot
join another one. Thus, after the actual discovery phase. This broke
Consul peer discovery because the discovery was performed before that
Consul session was created.

[How]
We introduce two new callbacks, `pre_discovery/0` and `post_discovery/1`
to allow a backend to perform actions before and after the whole
discover/lock/join process.

To remain compatible with other peer discovery backend, the new
callbacks are optional.
dumbbell added a commit that referenced this pull request Mar 18, 2024
[Why]
The Consul peer discovery backend needs to create a session before it
can acquire a lock. This session is also required for nodes to discover
each other.

This session was created as part of the lock callback. However, after
pull request #9797, the lock was only acquired if and when a node had to
join another one. Thus, after the actual discovery phase. This broke
Consul peer discovery because the discovery was performed before that
Consul session was created.

[How]
We introduce two new callbacks, `pre_discovery/0` and `post_discovery/1`
to allow a backend to perform actions before and after the whole
discover/lock/join process.

To remain compatible with other peer discovery backend, the new
callbacks are optional.
dumbbell added a commit that referenced this pull request Mar 18, 2024
[Why]
The Consul peer discovery backend needs to create a session before it
can acquire a lock. This session is also required for nodes to discover
each other.

This session was created as part of the lock callback. However, after
pull request #9797, the lock was only acquired if and when a node had to
join another one. Thus, after the actual discovery phase. This broke
Consul peer discovery because the discovery was performed before that
Consul session was created.

[How]
We introduce two new callbacks, `pre_discovery/0` and `post_discovery/1`
to allow a backend to perform actions before and after the whole
discover/lock/join process.

To remain compatible with other peer discovery backend, the new
callbacks are optional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants