-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make geoip client work even if redis is broken #389
Conversation
We should try to reconnect with some longer delay here as well to handle occasional 3rd party API failures, geo data updates, etc. If the probes don't attempt to reconnect at all, I'd suggest adding a retry after 1 hour or so for all such cases. |
Seems like the "stop reconnecting in case of unresolvable geoip" logic is not intentional. Here is the explicit code to stop reconnects in some cases, but if (isFatalError) {
// At that stage socket.connected=false already,
// but we want to stop reconnections for fatal errors
socket.disconnect();
} else {
socket.connect();
} If we want to persist the current not reconnect logic, we can add |
The error is changed to
and that one is on the list, so I think that works as originally intended? My point still is that it might be better to not drop probes forever, only apply some long-ish timeout. |
Size of delay surely can be set to any value, we just have 7 days by default. |
I see. Let's fix the reconnecting and reduce the timeout, then. |
Do you think it should be reduced for all fatalConnectErrors or just for unresolvable geo? |
I'd say all. 1 hour should be enough to prevent the load caused by constant reconnecting and, at the same time, reasonably short to fix any temporary failures. |
Adding that logic revealed an #390 issue. So it will be implemented along with the fix. But current PR is safe to merge and deploy right now, so lets do that. |
Fixes #388
At the time of the issue redis was in a strange state, it was rejecting all the queries (GET, SET, etc.). Our geo ip logic uses redis to get cached value before making real requests to the 3rd party geo APIs. Since get command was throwing errors the whole geo ip function was throwing and API were unable to get the geo ips of probes and sent
unresolvable geoip
to them.Probes were constantly reconnecting when API was down. But they stopped reconnecting after getting
unresolvable geoip
error from the API, which is valid behaviour from the perspective of the probe.In that fix we are catching any geo ip redis errors and making request to the 3rd party geo APIs not only if there is no value in the cache but also if the cache itself is buggy/unavailable.