Unbounded MULTI transactions cause big latency spikes in Redis #149

elboulangero · 2023-10-11T15:07:06Z

I'm evaluating Mirrorbits for Kali Linux. Kali is distro based on Debian, and we have around 500,000 files in the repository (this number can go up to a million).

So we have deployed Mirrorbits (not in production yet), and I noticed a warning messages in the logs: Renewing lock for %s failed: lock disappeared. Looking at the code, we can see that this lock is in fact a key in redis, with a 10 seconds TTL, renewed every 5 seconds. As it turns out, the lock disappeared because, every now and then, the Redis server was unavailable for more than 5 seconds.

So I did some research, and I recorded and plotted the Redis latency. Here's the result and it's not pretty:

As we can see, we have very big latency spikes (up to 9 seconds).

After more investigation, it turns out that this is caused by unbounded MULTI transactions made by Mirrorbits. There are a couple of places in the code where it's done, and often the number of commands in the transaction is a factor of the number of files in the repo. As said above, we have ~ 500k files in the Kali repo. So it's no surprise if these transactions cause Redis to be unresponsive for too long.

I could prepare a patch where these transactions are broken down in batches of 5k files (following recommendation from https://redis.io/docs/manual/pipelining/, cf. IMPORTANT NOTE). The result looks much better:

We still have latency spikes, however 1) they are under a second and 2) they are caused, this time, by expensive commands SDIFF and SINTERSTORE. Optimizing that might be doable, but it can be the topic for another issue and PR.

In the meantime, I propose #148 to at least fix the latency caused by the unbounded MULTI transactions.

For anyone interested, here's how I measured the latency.

In a shell, where the Redis instance is running, run:

for i in {1..3600}; do { echo -n "$(date +%s) "; redis-cli -i 1 --latency; } >> latency.txt; done

And here's the gnuplot script to plot it:

$ cat script.pl 
# gnuplot -c script.plg FILENAME
set title "Redis latency, measured every second during an hour"
set xdata time
set timefmt "%s"
plot ARG1 using 1:($3/1000) with steps title "Latency in seconds"
pause mouse close

Run the script with gnuplot -c script.plg latency.txt. Replace 1:($3/1000) with 1:3 if you want to see milliseconds instead.

I'd be curious to see how it looks like for other instances, depending on the number of files in the repo.

The text was updated successfully, but these errors were encountered:

jbkempf added bug help wanted labels Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbounded MULTI transactions cause big latency spikes in Redis #149

Unbounded MULTI transactions cause big latency spikes in Redis #149

elboulangero commented Oct 11, 2023 •

edited

Loading

Unbounded MULTI transactions cause big latency spikes in Redis #149

Unbounded MULTI transactions cause big latency spikes in Redis #149

Comments

elboulangero commented Oct 11, 2023 • edited Loading

elboulangero commented Oct 11, 2023 •

edited

Loading