You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm evaluating Mirrorbits for Kali Linux. Kali is distro based on Debian, and we have around 500,000 files in the repository (this number can go up to a million).
So we have deployed Mirrorbits (not in production yet), and I noticed a warning messages in the logs: Renewing lock for %s failed: lock disappeared. Looking at the code, we can see that this lock is in fact a key in redis, with a 10 seconds TTL, renewed every 5 seconds. As it turns out, the lock disappeared because, every now and then, the Redis server was unavailable for more than 5 seconds.
So I did some research, and I recorded and plotted the Redis latency. Here's the result and it's not pretty:
As we can see, we have very big latency spikes (up to 9 seconds).
After more investigation, it turns out that this is caused by unbounded MULTI transactions made by Mirrorbits. There are a couple of places in the code where it's done, and often the number of commands in the transaction is a factor of the number of files in the repo. As said above, we have ~ 500k files in the Kali repo. So it's no surprise if these transactions cause Redis to be unresponsive for too long.
I could prepare a patch where these transactions are broken down in batches of 5k files (following recommendation from https://redis.io/docs/manual/pipelining/, cf. IMPORTANT NOTE). The result looks much better:
We still have latency spikes, however 1) they are under a second and 2) they are caused, this time, by expensive commands SDIFF and SINTERSTORE. Optimizing that might be doable, but it can be the topic for another issue and PR.
In the meantime, I propose #148 to at least fix the latency caused by the unbounded MULTI transactions.
For anyone interested, here's how I measured the latency.
In a shell, where the Redis instance is running, run:
for i in {1..3600}; do { echo -n "$(date +%s) "; redis-cli -i 1 --latency; } >> latency.txt; done
And here's the gnuplot script to plot it:
$ cat script.pl
# gnuplot -c script.plg FILENAME
set title "Redis latency, measured every second during an hour"
set xdata time
set timefmt "%s"
plot ARG1 using 1:($3/1000) with steps title "Latency in seconds"
pause mouse close
Run the script with gnuplot -c script.plg latency.txt. Replace 1:($3/1000) with 1:3 if you want to see milliseconds instead.
I'd be curious to see how it looks like for other instances, depending on the number of files in the repo.
The text was updated successfully, but these errors were encountered:
I'm evaluating Mirrorbits for Kali Linux. Kali is distro based on Debian, and we have around 500,000 files in the repository (this number can go up to a million).
So we have deployed Mirrorbits (not in production yet), and I noticed a warning messages in the logs:
Renewing lock for %s failed: lock disappeared
. Looking at the code, we can see that this lock is in fact a key in redis, with a 10 seconds TTL, renewed every 5 seconds. As it turns out, the lock disappeared because, every now and then, the Redis server was unavailable for more than 5 seconds.So I did some research, and I recorded and plotted the Redis latency. Here's the result and it's not pretty:
As we can see, we have very big latency spikes (up to 9 seconds).
After more investigation, it turns out that this is caused by unbounded MULTI transactions made by Mirrorbits. There are a couple of places in the code where it's done, and often the number of commands in the transaction is a factor of the number of files in the repo. As said above, we have ~ 500k files in the Kali repo. So it's no surprise if these transactions cause Redis to be unresponsive for too long.
I could prepare a patch where these transactions are broken down in batches of 5k files (following recommendation from https://redis.io/docs/manual/pipelining/, cf. IMPORTANT NOTE). The result looks much better:
We still have latency spikes, however 1) they are under a second and 2) they are caused, this time, by expensive commands
SDIFF
andSINTERSTORE
. Optimizing that might be doable, but it can be the topic for another issue and PR.In the meantime, I propose #148 to at least fix the latency caused by the unbounded
MULTI
transactions.For anyone interested, here's how I measured the latency.
In a shell, where the Redis instance is running, run:
And here's the gnuplot script to plot it:
Run the script with
gnuplot -c script.plg latency.txt
. Replace1:($3/1000)
with1:3
if you want to see milliseconds instead.I'd be curious to see how it looks like for other instances, depending on the number of files in the repo.
The text was updated successfully, but these errors were encountered: