You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did some measurements with erasure coding, and I found that EC with a wider stripe usually has a lower encoding throughput. For example, 10+4 has throughput of 6989 MB/s, but 20+4 has throughput of 5467 MB/s.
Is it expected and is there any explanation on how this happens?
I thought EC configurations with same parity numbers would have similar encoding throughputs.
Thanks!
The text was updated successfully, but these errors were encountered:
Hello, I observed more severe performance drops when stripes get wider than 64!
I ran the benchmark erasure_code_perf on Intel(R) Xeon(R) Silver 4310 CPU with AVX512 enabled but GFNI disabled. The TEST_LEN (i.e. chunksize) is set to be 32MB. When stripe width (#data + #parity) increases from 64 to 65, the encoding/decoding performance suddenly drop by half (e.g. from 6000+ MB/s to 2000+ MB/s).
However, when GFNI is enabled, I didn't obeserve such performance drop even when the stripe width increases to more than 260.
I don't think such performance drop can be simply explained by cache contention, because the lookup table used by EC is much smaller than L1 cache. And during coding, data read ops always reach the DRAM because the stripe is much larger than LLC (Last-Level-Cache).
More specifically, I observe a dramatic increase in L2_cache_miss (e.g. from 31.70% to 64.09%) and LLC_miss (e.g. from 59.33% to 98.44%) as the performance drops.
Is there anything to do with the micro-architecture?
Hello!
I did some measurements with erasure coding, and I found that EC with a wider stripe usually has a lower encoding throughput. For example, 10+4 has throughput of 6989 MB/s, but 20+4 has throughput of 5467 MB/s.
Is it expected and is there any explanation on how this happens?
I thought EC configurations with same parity numbers would have similar encoding throughputs.
Thanks!
The text was updated successfully, but these errors were encountered: