armeabi-v7a with NEON #9318

dermotfix · 2024-09-05T01:00:29Z

dermotfix
Sep 5, 2024

Hi
First time poster.

I'm running into NEON-related redefinition errors building for armeabi-v7a with NEON on Android-31 (Cortex-A53)

Hardware amlogic
supported ABIs armeabi-v7a, armeabi
supported 32-bit ABIs armeabi-v7a, armeabi

Android Version 12
Kernel Architecture armv8I

cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a\ with\ NEON -DANDROID_PLATFORM=android-31 -DCMAKE_C_FLAGS=-march=armv8l+dotprod ..

Then running "make":

In file included from /Users/USER/git/llama.cpp/ggml/src/ggml.c:4:
/Users/USER/git/llama.cpp/ggml/src/ggml-impl.h:222:25: error: redefinition of 'vcvtnq_s32_f32'
222 | inline static int32x4_t vcvtnq_s32_f32(float32x4_t v) {
| ^
/Users/USER/Library/Android/sdk/ndk/27.0.12077973/toolchains/llvm/prebuilt/darwin-x86_64/lib/clang/18/include/arm_neon.h:40371:16: note: previous definition is here
40371 | __ai int32x4_t vcvtnq_s32_f32(float32x4_t __p0) {
| ^
/Users/USER/git/llama.cpp/ggml/src/ggml.c:2166:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
2166 | GGML_F16_VEC_REDUCE(sumf, sum);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1193:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
1193 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1183:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
1183 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1113:11: note: expanded from macro 'GGML_F32x4_REDUCE'
1113 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1098:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
1098 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
| ^~~~~~~~~~~~~
/Users/USER/git/llama.cpp/ggml/src/ggml.c:2214:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
2214 | GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1193:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
1193 | #define GGML_F16_VEC_REDUCE GGML_F32Cx4_REDUCE
| ^
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1183:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
1183 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE
| ^
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1113:11: note: expanded from macro 'GGML_F32x4_REDUCE'
1113 | res = GGML_F32x4_REDUCE_ONE(x[0]);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/USER/git/llama.cpp/ggml/src/ggml.c:1098:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
1098 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)

Appreciate any advice!

ggerganov · 2024-09-05T08:01:42Z

ggerganov
Sep 5, 2024
Maintainer

Likely this #if has to be changed to support cross-compilation, but I don't have enough expertise to suggest a fix:

llama.cpp/ggml/src/ggml-impl.h

Line 174 in bdf314f

#if !defined(__aarch64__)

1 reply

dermotfix Sep 5, 2024
Author

Same here, I'll try to do something

dermotfix · 2024-09-05T15:23:33Z

dermotfix
Sep 5, 2024
Author

AIDA64 CPU says on my device:

Instruction set = 64-bit ARMv8-A (32-bit Mode)
Looks like Android is running in 32-bit mode (2GB RAM)

I guess that is related to my issue

0 replies

dermotfix · 2024-09-07T06:18:56Z

dermotfix
Sep 7, 2024
Author

well I got it to build

I noticed that these flags were set in the compiler: -mfp16-format=ieee -mno-unaligned-access

That happened here:

ifneq ($(filter armv8%,$(UNAME_M)),)
# Raspberry Pi 3, 4, Zero 2 (32-bit)
MK_CFLAGS += -mfp16-format=ieee -mno-unaligned-access
MK_CXXFLAGS += -mfp16-format=ieee -mno-unaligned-access
endif

Using sed commands to change the Makefile I was able to build it locally on the device

UNAME_M is "armv8l" and the flags that need to be set are -mfpu=vfpv3 -mfloat-abi=softfp
(not the Raspberry Pi ones that were getting set)

(forgot to mention that it runs fine)

0 replies

dermotfix · 2024-09-14T22:23:20Z

dermotfix
Sep 14, 2024
Author

I was able to vastly improve performance by compiling with these flags instead:

-mfpu=neon-vfpv4 -mfloat-abi=softfp

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

armeabi-v7a with NEON #9318

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

armeabi-v7a with NEON #9318

dermotfix Sep 5, 2024

Replies: 4 comments · 1 reply

ggerganov Sep 5, 2024 Maintainer

dermotfix Sep 5, 2024 Author

dermotfix Sep 5, 2024 Author

dermotfix Sep 7, 2024 Author

dermotfix Sep 14, 2024 Author

dermotfix
Sep 5, 2024

Replies: 4 comments 1 reply

ggerganov
Sep 5, 2024
Maintainer

dermotfix Sep 5, 2024
Author

dermotfix
Sep 5, 2024
Author

dermotfix
Sep 7, 2024
Author

dermotfix
Sep 14, 2024
Author