Skip to content

Commit

Permalink
Fix regression SAXPY when compiler with OpenXL compiler.
Browse files Browse the repository at this point in the history
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.

This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.

Signed-off-by: Amrita H S <[email protected]>
  • Loading branch information
amritahs-ibm committed May 13, 2024
1 parent f0560f9 commit 87b3d90
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions kernel/power/saxpy_power10.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ int CNAME(BLASLONG n, BLASLONG dummy0, BLASLONG dummy1, FLOAT da, FLOAT *x, BLAS
saxpy_kernel_64(n1, &x[i], &y[i], da);

i += n1;
#if defined(__clang__)
#pragma clang loop interleave_count(2)
#endif
while(i < n)
{

Expand Down

0 comments on commit 87b3d90

Please sign in to comment.