New kernel for reciprocal #643

Ka-zam · 2023-10-13T10:04:53Z

New kernel that calculates the reciprocal:

volk_profile -R rec -i 10000

RUN_VOLK_TESTS: volk_32f_reciprocal_32f(131071,10000)
generic completed in 119.092 ms
a_avx2_fma completed in 94.7562 ms
a_avx completed in 97.1054 ms
a_avx_div completed in 118.053 ms
u_avx2_fma completed in 97.3939 ms
u_avx completed in 98.4622 ms
u_avx_div completed in 129.018 ms
Best aligned arch: a_avx2_fma
Best unaligned arch: u_avx2_fma

Signed-off-by: Magnus Lundmark <[email protected]>

jdemel

This looks like an interesting kernel. I have a few comments.
My biggest worry would be to see this kernel fail randomly with some 1.0f / 1e-xx value.

include/volk/volk_avx2_fma_intrinsics.h

kernels/volk/volk_32f_reciprocal_32f.h

Signed-off-by: Magnus Lundmark <[email protected]>

Ka-zam · 2023-10-14T12:40:25Z

This looks like an interesting kernel. I have a few comments. My biggest worry would be to see this kernel fail randomly with some 1.0f / 1e-xx value.

I ran the code for some special values:

	x	x		generic		div		rcp		avx2_fma		avx
-0.000000e+00	 -0x0p+0	    -inf	    -inf	    -inf	    -nan	    -nan
+0.000000e+00	 +0x0p+0	    +inf	    +inf	    +inf	    -nan	    -nan
+3.673420e-40	+0x1p-131	    +inf	    +inf	    +inf	    -inf	    -inf
+5.877472e-39	+0x1p-127	+1.701412e+38	+1.701412e+38	    +inf	    -inf	    -inf
+1.175494e-38	+0x1.fffffp-127	+8.507063e+37	+8.507063e+37	    +inf	    -inf	    -inf
+1.175494e-38	+0x1p-126	+8.507059e+37	+8.507059e+37	+8.504982e+37	+8.507059e+37	+8.507059e+37
+4.253530e+37	+0x1p+125	+2.350989e-38	+2.350989e-38	+2.350415e-38	+2.350989e-38	+2.350989e-38
    +inf	    +inf	+0.000000e+00	+0.000000e+00	+0.000000e+00	    -nan	    -nan

div is the _mm256_div_ps instruction
rcp is the _mm256_rcp_ps instruction

So, the proposed implementation fails for special values more or less spectactularly. However, it's 20% faster for valid inputs and more accurate than _mm256_rcp_ps.

_mm256_rcp_ps is 10% faster still but not as accurate in the valid domain.

Ka-zam · 2023-10-17T12:00:23Z

I did some more experiments and tried to add checks for nan, 0.0 and inf. Turns out it's not worth it and then it's better to just use div_ps()

So, 30% faster but

returns nan instead of inf for 0.0 input
returns nan instead of 0.0 for inf input

jdemel · 2023-11-04T08:34:13Z

How do we prevent that this kernel is constantly causing random CI failures? Values close to 0.0f are likely in our tests. We'd need to make sure that we don't see flaky kernel issues raised.

Also, the inf vs nan issue needs to be discussed. If I'd use this kernel I'd expect that 1.0f / inf = 0.0f. This is not consistent across all implementations. This would probably cause issues in the future.
We had a similar issue in #622 and eventually had to remove some kernel implementations #658 . I'm worried this kernel will see the same fate. We should find a solution for this issue before we merge it.

Ka-zam · 2023-11-21T12:51:41Z

Values close to 0.f are no problem, it's only a very slight problem for subnormals where there's decreased accuracy.

inf vs nan is more of a problem. Perhaps that can be looked at but the problem is we can't do too much work before the div_ps( 1.f / x ) instruction is faster.

I will investigate.

Ka-zam added 3 commits October 13, 2023 11:56

new kernel for reciprocal

4be7e43

Signed-off-by: Magnus Lundmark <[email protected]>

typo

295ee58

Signed-off-by: Magnus Lundmark <[email protected]>

renamed for consistency

fcc0576

Signed-off-by: Magnus Lundmark <[email protected]>

jdemel requested changes Oct 14, 2023

View reviewed changes

minor renames and fixes

6b6adca

Signed-off-by: Magnus Lundmark <[email protected]>

Ka-zam closed this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New kernel for reciprocal #643

New kernel for reciprocal #643

Ka-zam commented Oct 13, 2023

jdemel left a comment

Ka-zam commented Oct 14, 2023

Ka-zam commented Oct 17, 2023

jdemel commented Nov 4, 2023

Ka-zam commented Nov 21, 2023

New kernel for reciprocal #643

New kernel for reciprocal #643

Conversation

Ka-zam commented Oct 13, 2023

jdemel left a comment

Choose a reason for hiding this comment

Ka-zam commented Oct 14, 2023

Ka-zam commented Oct 17, 2023

jdemel commented Nov 4, 2023

Ka-zam commented Nov 21, 2023