This library offers higher throughput and lower latency to generate shamir secret-sharing, compared to Hashicorp's implementation in Vault. Our benchmark shows up to 80x better throughput!
Modifications from the original Hashicorp's implementation:
- Support arbitrary length secret data.
- Straightforward API interfaces for splitting and combining secret shares, similar as the original implementation by Hashicorp.
- Optimized for doing threshold secret sharing for large data using vectorized operation which is more cache-friendly.
- For the Galois arithmatics used in the seret-shares generation, we use SIMD support for ARM64 and AMD64 architecture.
- For higher throughput, we use AES counter mode with AES-NI as the randomization source (computationally secure psuedo-random number generator/CSPRNG) instead of Go standard
math/rand
orcrypto/rand
. - Implementation of SSMS (Secret-Sharing Made Short) which combines Shamir's secret sharing and reed-solomon erasure coding for smaller shares' size.
This implementation is possible because of these amazing works:
- Hashicorp secret sharing from Vault
- GF(2^8) with SIMD support from Klaus Post's ReedSolomon Encoding implementation
Benchmark result using one core in Macbook Pro with M1 chip can be seen below. Here we generate 5 secret-shares from 1K, 10K, and 1M bytes of secret data which can be recovered by combining t=2 shares.
$ go test -cpu 1 -bench BenchmarkSplitHashicorp
goos: darwin
goarch: arm64
pkg: github.com/fadhilkurnia/shamir
BenchmarkSplitHashicorp1K 3614 328623 ns/op 3.04 MB/s
BenchmarkSplitHashicorp10K 369 3212093 ns/op 3.11 MB/s
BenchmarkSplitHashicorp1M 3 335963042 ns/op 3.12 MB/s
$ go test -cpu 1 -bench BenchmarkSplitGoShamir
goos: darwin
goarch: arm64
pkg: github.com/fadhilkurnia/shamir
BenchmarkSplitGoShamir1K 135366 8785 ns/op 113.83 MB/s
BenchmarkSplitGoShamir10K 25412 46940 ns/op 21.30 MB/s
BenchmarkSplitGoShamir1M 276 4366021 ns/op 240.17 MB/s
Here we can see that Go-Shamir provides up to 80x the throughput of Hashicorp's implementation!
Note: we remove the use of ConstantTimeSelect()
and we have not tested the implementation for any timing attacks. So use the library with caution :)
If you have taken any computer system or architecture course, you might aware that we can have faster code if we minimize CPU cache-miss by processing "consecutive" data, not processing data that are "far" frome ach other. The classic example is shown in this article; changing the iteration order when doing matrix multiplication can lower the execution time.
In the original implementation, it creates polynomial for each byte of secret data one-by-one. In this implementation, we immediately create polynomials for all the bytes.
Modified code:
func makePolynomials(intercepts []uint8, degree int) ([][]uint8, error) {
N := len(intercepts)
polynomials := newMatrix(N, degree+1)
coefficients := make([]byte, degree*N)
// Assign random co-efficients to all the N polynomials
if _, err := rand.Read(coefficients); err != nil {
return nil, err
}
startIdx := 0
for p := 0; p < N; p++ {
polynomials[p][0] = intercepts[p] // polynomials[p][0] is the intercept
copy(polynomials[p][1:], coefficients[startIdx:startIdx+degree]) // polynomials[p][1:] is the other coefficients
startIdx += degree
}
return polynomials, nil
}
Original implementation:
func makePolynomial(intercept, degree uint8) (polynomial, error) {
// Create a wrapper
p := polynomial{
coefficients: make([]byte, degree+1),
}
// Ensure the intercept is set
p.coefficients[0] = intercept
// Assign random co-efficients to the polynomial
if _, err := rand.Read(p.coefficients[1:]); err != nil {
return p, err
}
return p, nil
}
We try to extensively use SIMD instruction in this library. For example, we use SIMD for the galois add operation, as can be seen here; there we can simultaneously (in one CPU cycle) execute add operation for up to 64 bytes of data. The original implementation do the add operation one byte at a time, as can be seen below:
Original implementation:
// add combines two numbers in GF(2^8)
// This can also be used for subtraction since it is symmetric.
func add(a, b uint8) uint8 {
return a ^ b
}
Randomization is an important building block for shamir implementation, it is used to generate random polynomial and random points on the polynomial. As shown in this paper titled "How to Best Share a Big Secret" (Table 3), the randomization easily becomes the bottleneck. Using computationally secure pseudo random generator (CSPRNG), AES in counter mode, is the most performant randomization. That is also the case since most of the modern CPU provide native instruction for AES operation, such as AES-NI in Intel chip or similar instructions in other chip. Therefore, in this implementation we use AES in counter mode as the source of randomization, and it uses native AES instructions from the chip.