Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thumb-2 Poly1305: implementation in assembly #7939

Merged
merged 2 commits into from
Sep 12, 2024

Conversation

SparkiDev
Copy link
Contributor

Description

Implementation of Poly1305 algorithm for ARM Thumb-2.

Testing

Tested with QEMU.
With and without: -DWOLFSSL_SP_NO_UMAAL

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

@SparkiDev SparkiDev self-assigned this Sep 4, 2024
@SparkiDev
Copy link
Contributor Author

This needs to be merged first: #7935

@SparkiDev SparkiDev force-pushed the thumb2_poly1305 branch 2 times, most recently from 3b77224 to eb76034 Compare September 5, 2024 11:00
@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev Sep 5, 2024
Implementation of ChaCha algorithm for ARM Thumb-2.
"LDR r3, [%[key], #4]\n\t"
"LDR r4, [%[key], #8]\n\t"
"LDR r5, [%[key], #12]\n\t"
"LDM r10, {r6, r7, r8, r9}\n\t"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a fault here...
Screenshot 2024-09-10 at 4 27 18 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Cortex M7 (STM32H753ZI). This was with debug build.

arm-none-eabi-gcc "../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/thumb2-poly1305-asm_c.c" -mcpu=cortex-m7 -std=gnu11 -g3 -DUSE_HAL_DRIVER -DHAVE_PKCS11_STATIC -DSTM32H753xx -DDEBUG -c -I../Middlewares/Third_Party/FreeRTOS/Source/include -I"/Users/davidgarske/Projects/WolfSSL/STM/STM32H7/STM32H753/Middlewares/Third_Party/wolfPKCS11" -I../Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F -I../Drivers/CMSIS/Include -I../Core/Inc -I../Drivers/STM32H7xx_HAL_Driver/Inc/Legacy -I../Drivers/CMSIS/Device/ST/STM32H7xx/Include -I../Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS_V2 -I../Drivers/STM32H7xx_HAL_Driver/Inc -I../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl -I../wolfSSL/. -I../wolfSSL -I../Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/ -I../Middlewares/Third_Party/wolfSSL_wolfSSH_wolfSSH/wolfssh/ -I../Middlewares/Third_Party/wolfSSL_wolfMQTT_wolfMQTT/wolfmqtt/ -I../wolfTPM -I../Middlewares/Third_Party/wolfSSL_wolfTPM_wolfTPM/wolftpm/ -O0 -ffunction-sections -fdata-sections -Wall -fomit-frame-pointer -fstack-usage -fcyclomatic-complexity -MMD -MP -MF"Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/thumb2-poly1305-asm_c.d" -MT"Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/thumb2-poly1305-asm_c.o" --specs=nano.specs -mfpu=fpv5-d16 -mfloat-abi=hard -mthumb -o "Middlewares/Third_Party/wolfSSL_wolfSSL_wolfSSL/wolfssl/wolfcrypt/src/port/arm/thumb2-poly1305-asm_c.o"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@dgarske dgarske assigned SparkiDev and unassigned dgarske Sep 10, 2024
Implementation of ChaCha algorithm for ARM Thumb-2.
Implementation of Poly1305 algorithm for ARM Thumb-2.
@SparkiDev
Copy link
Contributor Author

retest this please

@SparkiDev SparkiDev assigned dgarske and unassigned SparkiDev Sep 12, 2024
@dgarske dgarske self-requested a review September 12, 2024 17:43
Copy link
Contributor

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful improvement!

On an STM32H7 Cortex-M7 at 480MHz:

CHACHA: 50% faster
CHA-POLY: 85% faster
POLY1305: 276% faster

Before:

CHACHA                       6 MiB took 1.000 seconds,    6.177 MiB/s
CHA-POLY                     3 MiB took 1.004 seconds,    3.404 MiB/s
POLY1305                    12 MiB took 1.000 seconds,   12.207 MiB/s

After:

CHACHA                       9 MiB took 1.000 seconds,    9.204 MiB/s
CHA-POLY                     6 MiB took 1.000 seconds,    6.299 MiB/s
POLY1305                    34 MiB took 1.000 seconds,   33.789 MiB/s

@dgarske dgarske merged commit 20e2e33 into wolfSSL:master Sep 12, 2024
133 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants