fix: make geoip client work even if redis is broken #389

alexey-yarmosh · 2023-07-10T12:14:00Z

Fixes #388

At the time of the issue redis was in a strange state, it was rejecting all the queries (GET, SET, etc.). Our geo ip logic uses redis to get cached value before making real requests to the 3rd party geo APIs. Since get command was throwing errors the whole geo ip function was throwing and API were unable to get the geo ips of probes and sent unresolvable geoip to them.

Probes were constantly reconnecting when API was down. But they stopped reconnecting after getting unresolvable geoip error from the API, which is valid behaviour from the perspective of the probe.

In that fix we are catching any geo ip redis errors and making request to the 3rd party geo APIs not only if there is no value in the cache but also if the cache itself is buggy/unavailable.

MartinKolarik · 2023-07-10T12:32:56Z

But they stopped reconnecting after getting unresolvable geoip error from the API, which is valid behaviour from the perspective of the probe.

We should try to reconnect with some longer delay here as well to handle occasional 3rd party API failures, geo data updates, etc. If the probes don't attempt to reconnect at all, I'd suggest adding a retry after 1 hour or so for all such cases.

alexey-yarmosh · 2023-07-10T15:56:26Z

Seems like the "stop reconnecting in case of unresolvable geoip" logic is not intentional. Here is the explicit code to stop reconnects in some cases, but unresolvable geoip is not in the fatalConnectErrors list. Real reason why it is not reconnecting is because socketio not reconnecting for errors from middlewares. It may also not reconnect for other similar reason. So we need to add an explicit socket.connect() to fix that:

if (isFatalError) {
  // At that stage socket.connected=false already,
  // but we want to stop reconnections for fatal errors
  socket.disconnect();
} else {
  socket.connect();
}

If we want to persist the current not reconnect logic, we can add unresolvable geoip to fatalConnectErrors list. But such probes are reconnected only once a week, after a restart.

MartinKolarik · 2023-07-10T16:14:39Z

The error is changed to failed to collect probe metadata at

globalping/src/lib/ws/middleware/probe-metadata.ts

Line 20 in a09d044

let message = 'failed to collect probe metadata';

and that one is on the list, so I think that works as originally intended?

My point still is that it might be better to not drop probes forever, only apply some long-ish timeout.

alexey-yarmosh · 2023-07-10T16:25:13Z

failed to collect probe metadata is replaced with unresolvable geoip: x.x.x.x here.

Size of delay surely can be set to any value, we just have 7 days by default.

MartinKolarik · 2023-07-10T16:33:40Z

I see. Let's fix the reconnecting and reduce the timeout, then.

alexey-yarmosh · 2023-07-10T16:37:59Z

Do you think it should be reduced for all fatalConnectErrors or just for unresolvable geo?

MartinKolarik · 2023-07-10T16:42:24Z

I'd say all. 1 hour should be enough to prevent the load caused by constant reconnecting and, at the same time, reasonably short to fix any temporary failures.

alexey-yarmosh · 2023-07-11T14:26:18Z

I'd say all. 1 hour should be enough to prevent the load caused by constant reconnecting and, at the same time, reasonably short to fix any temporary failures.

Adding that logic revealed an #390 issue. So it will be implemented along with the fix. But current PR is safe to merge and deploy right now, so lets do that.

fix: make geoip client work even if redis is broken

45c141e

alexey-yarmosh requested review from jimaek and MartinKolarik July 10, 2023 12:14

MartinKolarik approved these changes Jul 10, 2023

View reviewed changes

MartinKolarik merged commit 99882e9 into master Jul 11, 2023
2 checks passed

MartinKolarik deleted the geoip-cache branch July 11, 2023 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make geoip client work even if redis is broken #389

fix: make geoip client work even if redis is broken #389

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023 •

edited

Loading

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 11, 2023

fix: make geoip client work even if redis is broken #389

fix: make geoip client work even if redis is broken #389

Conversation

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023 • edited Loading

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 10, 2023

MartinKolarik commented Jul 10, 2023

alexey-yarmosh commented Jul 11, 2023

alexey-yarmosh commented Jul 10, 2023 •

edited

Loading