You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are also seeing higher queue numbers as the system gets busier, and unacknowledged messages, due to the high number of channel exceptions, we also encountered this 'stuck' queue
RabbitMQ Uacked Messages are unacknowledged messages. In RabbitMQ, when many messages are delivered to the consumer or the target. But according to protocols, it is not guaranteed that message delivery will always be successful. So to solve this, Publishers and Consumers require a mechanism for delivery and processing confirmation. This is where Acknowledgement and Unacknowledgment. A message is ready when it is waiting to be processed. Whenever a consumer connects to the queue it receives a batch of messages to process. Meanwhile, the consumer is working on the messages the amount is given in prefetch size and they get the message unacked. RabbitMQ Unacked Messages are the messages that are not Acknowledged.
If a consumer fails to acknowledge messages, the RabbitMQ will keep sending new messages until the prefetch value set for the associated channel is equal to the number of RabbitMQ Unacked Messages count. If RabbitMQ Unacked Messages are available then it will make the “struck” messages available again, and the client process will automatically reconnect. RabbitMQ Unacked Messages are read by the consumer but the consumer has never sent an ACK or confirmation to the RabbitMQ broker to say that it has finished processing it.
Goals
Review our use of queues to closer match the patterns expected by RabbitMQ developers.
as discussed in teams, we could ack the message as soon as we received it, but we will need to requeue a message should the processing failed in an unexpected way, currently there's 2 ways this may happen:
Licensing failure - this is easily resolved by code changed as we are in control at the point of failure
If the bouncer_worker was terminated unexpectedly, this mostly happens when AWS recalls the machine and the nodeJS application gets killed. Question is, is there anyway the nodeJS can get a signal before this happens so we can requeue a message? Do we get a signal that we can catch (like SIGTERM?)
If the node is going away due to AWS initiated activity, we have the aws-node-termination-handler installed and currently configured in metadata monitoring mode so we should get notified that the hardware will be going away.
In that event, the nodes get 'tainted' or set that they're going to be offlined, and kubernetes should issue a signal to the pods that they're getting terminated and pods gracefully shut down under our current configuration TERM is sent to node
The kubelet triggers the container runtime to send a TERM signal to process 1 inside each container.
so we should be able to catch it.
However, occasionally there may be a hardware error underneath that will not follow this process.
Description
There was an issue on production captured in https://github.com/3drepo/DevOps/issues/457 that was resolved with a configuration change on rabbitmq as they changed the default behaviour of unacknowledged queues in RabbitMQ 3.10. This change was also backported to 3.8.15
They've also deprecated 'classic queues' and recommend new quorum queues https://www.rabbitmq.com/ha.html
We are also seeing higher queue numbers as the system gets busier, and unacknowledged messages, due to the high number of channel exceptions, we also encountered this 'stuck' queue
Goals
Tasks
Related Resources
RabbitMQ ChangeLog
Current queue configuration
The text was updated successfully, but these errors were encountered: