From 87b3d9054f8a615457fa5795bbab983c24ebe029 Mon Sep 17 00:00:00 2001 From: Amrita H S Date: Tue, 7 May 2024 11:31:36 -0500 Subject: [PATCH] Fix regression SAXPY when compiler with OpenXL compiler. SAXPY built with OpenXL regresses when compared to SAXPY built with gcc. OpenXL compiler doesn't know that the SAXPY inner kernel assembly is a 64 element loop and to it the remainder loop is the main loop. It vectorizes and interleaves the remainder to be a 48 elements per iteration loop. With a max of 63 iterations, a 48 element loop is mostly not going to get executed, so the 1 element scalar loop that is the remainder after that is probably mostly what gets executed. This can be fixed by adding a pragma, loop interleave_count(2) which will result in 8 element loop. Signed-off-by: Amrita H S --- kernel/power/saxpy_power10.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/power/saxpy_power10.c b/kernel/power/saxpy_power10.c index 302b2418e3..a01e1b53da 100644 --- a/kernel/power/saxpy_power10.c +++ b/kernel/power/saxpy_power10.c @@ -76,6 +76,9 @@ int CNAME(BLASLONG n, BLASLONG dummy0, BLASLONG dummy1, FLOAT da, FLOAT *x, BLAS saxpy_kernel_64(n1, &x[i], &y[i], da); i += n1; +#if defined(__clang__) +#pragma clang loop interleave_count(2) +#endif while(i < n) {