-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When are asm and intrinsics worth it? #575
Comments
There are some tough tradeoffs indeed. We get pretty frequent complaints about performance when it isn't on par with ASM implementations. See e.g. #327. Intrinsics add per-platform testing/maintenance burden via redundant implementations of the same algorithm, which also introduces the possibility of per-platform defects. But at least they're Rust code, which makes them accessible to other Rust programmers. ASM has all of the same problems, but has the additional complications of being a separate language from Rust (and obviously lacking its many guarantees around type/memory safety), and having to determine the correct arguments to Regarding the path forward on ASM, which is still an open question since we've removed the old non-inline ASM implementations, personally I've been interesting in finding the safest possible way to consume ASM, particularly looking at projects which provide formally verified ASM implementations for a wide variety of algorithms and platforms where we could extract those implementations in an automated manner and transform them into Rust This does have the disadvantage that these formally verified implementations tend to lag behind the fastest hand-optimized ASM implementations, and that's also a debatable tradeoff. It might also impact FIPS certification, for those who care about that. |
I think we can close this issue as non-actionable. In general, if ASM/intrinsics backend provides statistically significant performance improvements, we are likely to use it. We also may use verified assembly even if it does not improve performance, but we will consider doing it on case-by-case basis. |
I'm looking at doing a third implementation of sha256 for x86 targeting the x86-64-v3 ISA level (AVX, AVX2, but no AVX512 and no SHA-NI, i.e. Haswell), because the pure-rust soft implementation isn't doing so well without SHA-NI. This raised the question when this effort is sensible.
For example, there is a Loongarch64 asm implementation for SHA-256, but it's actually scalar (I believe the LA64 vector instructions aren't even publicly documented) and as a result only about 10% faster than the pure-rust implementation. On the other end of the scale are implementations using dedicated instructions, like SHA-NI or AES-NI, which can be 1000% or more faster. Where's the line? Is there one?
The text was updated successfully, but these errors were encountered: