When are asm and intrinsics worth it? #575

enkore · 2024-03-19T18:21:20Z

I'm looking at doing a third implementation of sha256 for x86 targeting the x86-64-v3 ISA level (AVX, AVX2, but no AVX512 and no SHA-NI, i.e. Haswell), because the pure-rust soft implementation isn't doing so well without SHA-NI. This raised the question when this effort is sensible.

For example, there is a Loongarch64 asm implementation for SHA-256, but it's actually scalar (I believe the LA64 vector instructions aren't even publicly documented) and as a result only about 10% faster than the pure-rust implementation. On the other end of the scale are implementations using dedicated instructions, like SHA-NI or AES-NI, which can be 1000% or more faster. Where's the line? Is there one?

tarcieri · 2024-03-19T18:56:11Z

There are some tough tradeoffs indeed.

We get pretty frequent complaints about performance when it isn't on par with ASM implementations. See e.g. #327.

Intrinsics add per-platform testing/maintenance burden via redundant implementations of the same algorithm, which also introduces the possibility of per-platform defects. But at least they're Rust code, which makes them accessible to other Rust programmers. ASM has all of the same problems, but has the additional complications of being a separate language from Rust (and obviously lacking its many guarantees around type/memory safety), and having to determine the correct arguments to asm! when using inline assembly.

Regarding the path forward on ASM, which is still an open question since we've removed the old non-inline ASM implementations, personally I've been interesting in finding the safest possible way to consume ASM, particularly looking at projects which provide formally verified ASM implementations for a wide variety of algorithms and platforms where we could extract those implementations in an automated manner and transform them into Rust asm! syntax, or perhaps even have the upstream tooling generate Rust code directly. Some projects of this nature for the specific case of hashes are AWS-LC and HACL*.

This does have the disadvantage that these formally verified implementations tend to lag behind the fastest hand-optimized ASM implementations, and that's also a debatable tradeoff. It might also impact FIPS certification, for those who care about that.

newpavlov · 2024-11-01T12:20:28Z

I think we can close this issue as non-actionable.

In general, if ASM/intrinsics backend provides statistically significant performance improvements, we are likely to use it. We also may use verified assembly even if it does not improve performance, but we will consider doing it on case-by-case basis.

newpavlov closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When are asm and intrinsics worth it? #575

When are asm and intrinsics worth it? #575

enkore commented Mar 19, 2024

tarcieri commented Mar 19, 2024

newpavlov commented Nov 1, 2024

When are asm and intrinsics worth it? #575

When are asm and intrinsics worth it? #575

Comments

enkore commented Mar 19, 2024

tarcieri commented Mar 19, 2024

newpavlov commented Nov 1, 2024