Handle database timeouts in MQTT queue deletion (backport #12317) #12320
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes some crash reports when using MQTT with Khepri, spotted by @mkuratczyk. With an OMQ stresstest:
while a cluster restarts (
make restart-cluster
), we would see badmatch errors from matching on{ok, _}
forrabbit_queue_type:delete/4
and exits for{normal, {gen_server2, call, [Pid, consumers, infinity]}}
. That stress test causes queue churn since QoS1 MQTT creates transient exclusive classic queues. Restarting a node leads to very many queues being deleted which can overload Khepri and lead to timeouts.The first commit makes a refactor to have
rabbit_queue_type:delete/4
return{error, timeout}
for timeout errors.{error, timeout}
could already be returned and is handled inrabbit_amqqueue:delete_with/4
. This change is just for consistency: in some places we returned aprotocol_error
record instead. The second commit handles the{error, timeout}
result inrabbit_mqtt_processor
.Also included is a fix for
rabbit_amqqueue:consumers/1
to catch exits: an exit can happen if another process asks for a classic queue's consumers while it is terminating. (With Khepri the terminate callback can take some time as it callsrabbit_amqqueue:internal_delete/2
.)This is an automatic backport of pull request #12317 done by Mergify.