-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection recovery hangs infinitely #52
Comments
Can you post a JVM thread dump when that happens? |
Of course. Here it is:
|
What version of Lyra and amqp-client are you using? Also, what are the exact steps to reproduce? Can it be done with a single, non-clustered instance? |
I'm using Lyra v0.5.2, apmqp-client v3.5.3 and RabbitMQ v3.5.3. The behavior is NOT reproducible with a single instance. If it helps I could provide a network packet dump. |
I have basically the exact same problem (my thread dumps look nearly the same in the lyra code -- I can post them if needed). In my case, there is no load balancer, but the connection bounced up and down a few times. RecoveryPolicy policy = new RecoveryPolicy();
RetryPolicy retryPolicy = RetryPolicies.retryAlways();
if (_maxAttempts >0) {
policy.withMaxAttempts(_maxAttempts)
retryPolicy.withMaxAttempts(_maxAttempts);
}
if (_interval > 0) {
policy.withInterval(Duration.seconds(_interval));
retryPolicy.withInterval(Duration.seconds(_interval));
}
if (_maxDuration > 0) {
policy.withMaxDuration(Duration.seconds(_maxDuration));
retryPolicy.withMaxDuration(Duration.seconds(_maxDuration));
}
if (_backoffTime > 0) {
policy.withBackoff(Duration.seconds(_backoffTime),Duration.seconds(_backoffMaxInterval));
retryPolicy.withBackoff(Duration.seconds(_backoffTime),Duration.seconds(_backoffMaxInterval));
}
return new Config().withRecoveryPolicy(policy).withRetryPolicy(retryPolicy); Currently the variables are set like this: From my understanding, that should create Recovery and Retry policies that continuously retry (forever), waiting longer and longer between retries until there is 1 minute between each. As a note, I have two different channels from one connection. Not sure if that is relevant I created connection and channel listeners with logging and it looks like the first time the connection went down, recovery happened normally. My log has about 10 retries and then the connection, channels, and consumers all come back up, and then the connection drops back out and lyra locks up. Edit. forgot to post versions... Lyra 0.4.3, amqp-client 3.4.1, rabbitmq (server) 3.4.2 |
@Minalan are you also using a cluster? Do you know if one or all nodes go down around the same time? |
Not using a cluster. Only one server. If it just goes down and comes back On Tue, Aug 4, 2015 at 11:57 AM, Jonathan Halterman <
|
@Minalan What are you doing with the two channels when the failure happens? |
One channel is reading from a queue (pretty continuously -- there are On Tue, Aug 4, 2015 at 12:46 PM, Jonathan Halterman <
|
Do you know what sort of failures are happening on your server or how I might reproduce it? Trying to reproduce this and killing networking or restarting the machine repeatedly doesn't do it. |
I don't know what the failure is specifically, but I do know that the java On Tue, Aug 4, 2015 at 2:19 PM, Jonathan Halterman <[email protected]
|
Here are my steps to reproduce:
new Config()
.withRecoveryPolicy(RecoveryPolicies.recoverAlways())
.withRetryPolicy(RetryPolicies.retryAlways())
In debug mode I see that lyra is trying to reconnect once, then gets IOException (with ShutdownSignalException cause) and never reconnects again. It happens in RetryableResource:58
if (sse != null && (recovery || !recoverable))
throw e; Exception is just rethrown up and lost. No other reconnection attempts are made - lyra doesn't get this SSE and doesn't trigger reconnection. So last line I see in logs is [ConnectionHandler.java:240] Recovering connection cxn-7 to [localhost:5672] and it hangs indefinitely even after I start rabbitmq-server back up.
Lyra is 0.5.2, rabbit - 3.5.3 So steps to reproduce are easy. Am I doing anything wrong? |
@atfire @Minalan @marcuslinke What OS are you running on? As #53 brought up, this could be the result of a different exception being thrown on your platform which Lyra isn't recovering from. |
I was testing on Mac OS 10.9.5 |
Me too |
My setup is similar to marcuslinke, I am Using lyra 0.5.2 and rabbit 3.5.3 (server & client) and able to reproduce this issue on Mac OS 10.10.5 as well on RHEL Enterprise 6.5. I see an IOException thrown when this happens. Any solution for this issue? |
I'm having this problem too. Is there a cookbook entry that shows the proper way to recover when a rabbitmq node goes down? The process I'm following is:
This is using the following configuration:
|
I suspect this is a variation of #53 (comment). Without knowing what the exception is and server logs it is difficult to reconstruct what's going on. |
While my experiments with two rabbit nodes behind a load balancer i ran into a situation where both nodes where down for a short period. In this case the connection recovery hangs infinitely although both nodes came up again after a few seconds:
Here is my configuration:
Shouldn't the recovery attempt at least timeout after 20 seconds so a retry will occur or have I misunderstood something?
The text was updated successfully, but these errors were encountered: