Finding the barrier toward better parallelism #11
Labels
D-medium
Difficulty: medium
T-design
Type: discuss API design and/or research
T-performance
Type: performance improvements
More updates will be added to this note.
I measured the cost for Groth16 to prove a constraint system with two million constraints, with a different number of cores, using
cargo bench
in this repo with appropriate command line parameters.Below I focus on the main cost in my constraint system, which turns out to be in the witness map and computation of C.
Going from 1 core to 4 cores, the improvement is significant. But later when more cores are added, the time for the witness map seems to not change a lot.
Since the witness map involves a lot of FFT, it may suggest that the current implementation of FFT has a barrier toward many-many-core parallelism.
Such a barrier, maybe avoidable, maybe unavoidable. I will take a look at the detailed breakdown of the cost of the witness map.
The text was updated successfully, but these errors were encountered: