Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Treat timeout error before pool shutting down error #359

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wandenberg
Copy link
Contributor

When the connection_pool gem is not able to return a connection from the pool during the time configured at pool_timeout, it raise a Timeout::Error.
Which is not properly handled and result on an attempt do set the node as down.
Resulting in a invalid state transformed in to ConnectionPool::PoolShuttingDownError exception.

This pull request was done using the script posted by @InvisibleMan at #353.

I also applied this commit to operation_timeout branch

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 93.77% when pulling 3e43409 on wandenberg:treat_timeout_error_before_pool_shutting_down_error into 7ef2b2c on mongoid:master.

@dblock
Copy link
Contributor

dblock commented Mar 25, 2015

@wandenberg Do you believe this permanently fixes the ConnectionPool::PoolShuttingDownError that people are seeing? In which case #353 didn't and CHANGELOG needs to be updated to to say that.

@dblock
Copy link
Contributor

dblock commented Mar 25, 2015

@arthurnn Could you please try to review this one in before 2.0.5, reduce the surface of issues that are discussed in #346?

@wandenberg
Copy link
Contributor Author

@dblock I believe yes. It was a not well handled exception that set the node as down and try to close the connections when it was not able to get a connection from the pool, it even used the connection to know if is broken, resulting in some wrong interpretations like the ConnectionPool::PoolShuttingDownError and the could not connect to a primary node in some cases.

@dblock
Copy link
Contributor

dblock commented Apr 27, 2015

@arthurnn Bump?

@jonhyman
Copy link
Contributor

jonhyman commented May 4, 2015

@arthurnn can you take a look? We just upgraded to Moped 2 in production last night and have been wrecked by this bug so far.

@jonhyman
Copy link
Contributor

jonhyman commented May 4, 2015

@wandenberg I just applied this patch to prod, will let you know if we see the connection pool shutdown. We're still getting the could not connect to a primary node every few minutes. I'm still debugging that one, this doesn't seem to have fixed that (at least for us).

@jonhyman
Copy link
Contributor

jonhyman commented May 7, 2015

@wandenberg unfortunately even with this patch, this just happened and didn't go away until we restarted services. I guess it's possible that there are other scenarios in which this would happen and your patch fixes a subset of them.

@sahin
Copy link

sahin commented May 22, 2015

+1, @arthurnn any update on this one?

@jonhyman
Copy link
Contributor

@sahin

Give this branch a try. We upgraded from 1.5 to 2.0 about 3 weeks ago and have seen absolutely horrible failover handling with Moped 2.0. We finally now can do stepdowns in production without a single error and haven't seen this error anymore. I cherry-picked in various commits from other pulls (such as @wandenberg's) that address this and also added many commits of my own to handle different failure scenarios.

https://github.com/jonhyman/moped/tree/feature/15988-and-logging

It has some extra logging in there that I've been using as we've been doing failover testing, so feel free to fork and remove if you inspect your Moped logs. We've also tested kill -9'ing the primary mongod on this branch and killing a mongos successfully, whereas on 2.0.4 it couldn't handle any of those scenarios.

@sahin
Copy link

sahin commented May 22, 2015

@jonhyman right now, we have a some issues in productions in dozens of server and websites + api that is used by many vendors, movie studios and our apps.

Right now, if any thing happens to a node in the replication , we are getting
No route to host - connect(2) for "20.0.0.16" port 27017
ConnectionPool::PoolShuttingDownError

@jonhyman
Copy link
Contributor

Give my branch a try, see if it helps.

@deepredsky
Copy link

@arthurnn Bump!

@deepredsky
Copy link

@jonhyman your branch seems to get rid of the pool shutdown error. are you using this in production?

@jonhyman
Copy link
Contributor

Yeah we are. And we've done numerous stepdowns in prod without issues with
my branch.

Sent from my mobile device
On May 24, 2015 3:11 AM, "Rajesh Sharma" [email protected] wrote:

@jonhyman https://github.com/jonhyman your branch seems to get rid of
the pool shutdown error. are you using this in production?

Reply to this email directly or view it on GitHub
#359 (comment).

@wandenberg wandenberg force-pushed the treat_timeout_error_before_pool_shutting_down_error branch from 3e43409 to 49f65ef Compare June 3, 2015 13:55
@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 93.92% when pulling 372f22a on wandenberg:treat_timeout_error_before_pool_shutting_down_error into 68923e0 on mongoid:master.

@agis
Copy link

agis commented Sep 18, 2015

@jonhyman Hey, are the issues you mention in your comment fixed in 2.0.7 which contains #380? Or do you still use a fork?

@jonhyman
Copy link
Contributor

Yeah it should all be fixed in 2.0.7. We're still on my fork because we've stopped putting any resources behind Moped (even if it is just gem update which conceptually should be fine, I'm not going to spend the time testing in staging). We're instead entirely focused on getting to Mongoid 5 and the official driver.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants