Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why wider stripe leads to lower encoding throughput even with same parity? #237

Open
mengwanguc opened this issue Mar 31, 2023 · 2 comments
Labels

Comments

@mengwanguc
Copy link

Hello!

I did some measurements with erasure coding, and I found that EC with a wider stripe usually has a lower encoding throughput. For example, 10+4 has throughput of 6989 MB/s, but 20+4 has throughput of 5467 MB/s.

Is it expected and is there any explanation on how this happens?

I thought EC configurations with same parity numbers would have similar encoding throughputs.

Thanks!

@pablodelara
Copy link
Contributor

With more data disks/buffers, the pressure on the cache is higher, so it makes sense that the throughput decays, as there'll be more cache misses.

@Clide-thu
Copy link

Hello, I observed more severe performance drops when stripes get wider than 64!

I ran the benchmark erasure_code_perf on Intel(R) Xeon(R) Silver 4310 CPU with AVX512 enabled but GFNI disabled. The TEST_LEN (i.e. chunksize) is set to be 32MB. When stripe width (#data + #parity) increases from 64 to 65, the encoding/decoding performance suddenly drop by half (e.g. from 6000+ MB/s to 2000+ MB/s).
However, when GFNI is enabled, I didn't obeserve such performance drop even when the stripe width increases to more than 260.

I don't think such performance drop can be simply explained by cache contention, because the lookup table used by EC is much smaller than L1 cache. And during coding, data read ops always reach the DRAM because the stripe is much larger than LLC (Last-Level-Cache).

More specifically, I observe a dramatic increase in L2_cache_miss (e.g. from 31.70% to 64.09%) and LLC_miss (e.g. from 59.33% to 98.44%) as the performance drops.

Is there anything to do with the micro-architecture?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants