Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kyber Aarch64: assembly implementations of functions #7998

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

SparkiDev
Copy link
Contributor

Description

Aarch64 assembly implementation of Kyber functions.
SHA-3 assembly implementations when not hardware crypto.

Testing

Tested on M1 Mac.
Inline assembly tested as well.

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Sep 20, 2024
@SparkiDev SparkiDev force-pushed the kyber_aarch64_asm branch 2 times, most recently from 436df89 to b829134 Compare September 25, 2024 00:21
configure.ac Outdated
@@ -2977,7 +2977,7 @@ then
AM_CPPFLAGS="$AM_CPPFLAGS+sm4"
fi
else
AM_CPPFLAGS="$AM_CPPFLAGS -mcpu=generic+crypto"
AM_CPPFLAGS="$AM_CPPFLAGS -march=armv8.1-a+crypto"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers target is not v8.1. Needs to be v8.0 -march=armv8-a+crypto, however that does not allow sqrdmlsh. Is there a way to not use sqrdmlsh?

 CC       wolfcrypt/src/src_libwolfssl_la-dilithium.lo
wolfcrypt/src/port/arm/armv8-kyber-asm.S: Assembler messages:
wolfcrypt/src/port/arm/armv8-kyber-asm.S:266: Error: selected processor does not support `sqrdmlsh v21.8h,v29.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:267: Error: selected processor does not support `sqrdmlsh v22.8h,v30.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:274: Error: selected processor does not support `sqrdmlsh v23.8h,v29.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:275: Error: selected processor does not support `sqrdmlsh v24.8h,v30.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:282: Error: selected processor does not support `sqrdmlsh v25.8h,v29.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:283: Error: selected processor does not support `sqrdmlsh v26.8h,v30.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:290: Error: selected processor does not support `sqrdmlsh v27.8h,v29.8h,v4.h[0]'
wolfcrypt/src/port/arm/armv8-kyber-asm.S:291: Error: selected processor does not support `sqrdmlsh v28.8h,v30.8h,v4.h[0]'

FYI: When I run on ZCU102 A53 I get:

./configure --host=aarch64 CC="aarch64-linux-gnu-gcc" AR="aarch64-linux-gnu-ar" RANLIB="aarch64-linux-gnu-ranlib" --enable-sp=yes,asm --enable-keygen --enable-armasm --enable-experimental --enable-kyber --enable-dilithium --enable-keygen --enable-lms --enable-xmss --enable-curve25519 --enable-ed25519 --enable-curve448 --enable-ed448 --disable-shared --enable-static --disable-dh --disable-filesystem && make
./benchmark -kyber
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
        Single Precision: ecc 256 384 521 rsa 2048 3072 4096 asm sp_arm64.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
Illegal instruction
CURVE448 test passed!
ED448    test passed!
Illegal instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to not use instruction when WOLFSSL_AARCH64_NO_SQRMLSH is defined.
Using the instruction is quicker so I want it used by default.

Aarch64 assembly implementation of Kyber functions.
SHA-3 assembly implementations when not hardware crypto.
@dgarske
Copy link
Contributor

dgarske commented Sep 26, 2024

Tested on Xilinx UltraScale+ ZCU102:

Master:

# ./benchmark -kyber -sha3
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
        Single Precision: ecc 256 384 521 rsa 2048 3072 4096 asm sp_arm64.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
SHA3-224                    65 MiB took 1.026 seconds,   63.345 MiB/s
SHA3-256                    65 MiB took 1.080 seconds,   60.209 MiB/s
SHA3-384                    50 MiB took 1.057 seconds,   47.304 MiB/s
SHA3-512                    35 MiB took 1.039 seconds,   33.692 MiB/s
KYBER512    128  key gen      8900 ops took 1.005 sec, avg 0.113 ms, 8859.861 ops/sec
KYBER512    128    encap      7600 ops took 1.012 sec, avg 0.133 ms, 7508.676 ops/sec
KYBER512    128    decap      5400 ops took 1.011 sec, avg 0.187 ms, 5338.993 ops/sec
KYBER768    192  key gen      5200 ops took 1.005 sec, avg 0.193 ms, 5173.026 ops/sec
KYBER768    192    encap      4600 ops took 1.013 sec, avg 0.220 ms, 4539.985 ops/sec
KYBER768    192    decap      3400 ops took 1.004 sec, avg 0.295 ms, 3385.359 ops/sec
KYBER1024   256  key gen      3300 ops took 1.027 sec, avg 0.311 ms, 3213.490 ops/sec
KYBER1024   256    encap      3000 ops took 1.030 sec, avg 0.343 ms, 2911.320 ops/sec
KYBER1024   256    decap      2300 ops took 1.007 sec, avg 0.438 ms, 2284.343 ops/sec
Benchmark complete

PR 7998:

./benchmark -kyber -sha3
------------------------------------------------------------------------------
 wolfSSL version 5.7.2
------------------------------------------------------------------------------
Math:   Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
        Single Precision: ecc 256 384 521 rsa 2048 3072 4096 asm sp_arm64.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
SHA3-224                    80 MiB took 1.056 seconds,   75.769 MiB/s
SHA3-256                    75 MiB took 1.039 seconds,   72.165 MiB/s
SHA3-384                    60 MiB took 1.054 seconds,   56.931 MiB/s
SHA3-512                    45 MiB took 1.102 seconds,   40.837 MiB/s
KYBER512    128  key gen     19900 ops took 1.002 sec, avg 0.050 ms, 19863.217 ops/sec
KYBER512    128    encap     17800 ops took 1.001 sec, avg 0.056 ms, 17790.202 ops/sec
KYBER512    128    decap     12500 ops took 1.005 sec, avg 0.080 ms, 12440.752 ops/sec
KYBER768    192  key gen     12700 ops took 1.003 sec, avg 0.079 ms, 12662.974 ops/sec
KYBER768    192    encap     11400 ops took 1.005 sec, avg 0.088 ms, 11343.019 ops/sec
KYBER768    192    decap      8300 ops took 1.009 sec, avg 0.122 ms, 8228.787 ops/sec
KYBER1024   256  key gen      7600 ops took 1.013 sec, avg 0.133 ms, 7503.618 ops/sec
KYBER1024   256    encap      7000 ops took 1.012 sec, avg 0.145 ms, 6918.733 ops/sec
KYBER1024   256    decap      5400 ops took 1.007 sec, avg 0.186 ms, 5364.265 ops/sec
Benchmark complete

Cross compiled using: ./configure --host=aarch64 CC="aarch64-linux-gnu-gcc" AR="aarch64-linux-gnu-ar" RANLIB="aarch64-linux-gnu-ranlib" --enable-sp=yes,asm --enable-keygen --enable-armasm --enable-experimental --enable-kyber --enable-dilithium --enable-keygen --enable-lms --enable-xmss --enable-curve25519 --enable-ed25519 --enable-curve448 --enable-ed448 --disable-shared --enable-static --disable-dh --disable-filesystem && make

@dgarske dgarske merged commit 2285c02 into wolfSSL:master Sep 26, 2024
137 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants