Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove buggy and slow neonv8 kernel #680

Merged
merged 2 commits into from
Nov 4, 2023

Conversation

marcusmueller
Copy link
Member

as noticed by argilo when fixing the integer generation in #677 , that kernel was buggy. It seems compilers are better at building byte-swapping code than people writing SIMD intrinsics, so falling back on generic doesn't hurt.

@marcusmueller marcusmueller force-pushed the 64u_byteswap_remove_neonv8 branch from e952f32 to 34f4575 Compare October 23, 2023 17:18
@argilo
Copy link
Member

argilo commented Oct 23, 2023

Looks like this kernel has been slower than generic since it was added in #196. Good riddance.

@argilo
Copy link
Member

argilo commented Oct 23, 2023

For the record, the bug in the implementation was that the load (vld2q_u8) & store (vst2q_u8) were interleaved:

VLD2 loads 2 vectors from memory. It performs a 2-way de-interleave from memory to the vectors.

@marcusmueller marcusmueller force-pushed the 64u_byteswap_remove_neonv8 branch from 49d8822 to d49a8cd Compare October 23, 2023 18:01
@marcusmueller
Copy link
Member Author

this is a bit strange, it seems I need to figure out a method to avoid the #ifdef LV_HAVE_NEON when LV_HAVE_NEONV8 is defined, but I must not have an #if… LV_HAVE_NEONV8 line, otherwise the build system assumes there's an neonv8 kernel

@argilo
Copy link
Member

argilo commented Oct 23, 2023

Why do you need to avoid building the neon kernel if neonv8 is defined?

I didn't think that made sense in #668 so I removed the nested #ifdefs, which weren't working to begin with.

@marcusmueller
Copy link
Member Author

Because on arm64 machines, the neon kernel malfunctions:
https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972909894?pr=680#step:3:2492

@argilo
Copy link
Member

argilo commented Oct 23, 2023

Ah. That seems like a bug that should be fixed.

@marcusmueller
Copy link
Member Author

https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972917000?pr=680#step:3:2453 Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

@argilo
Copy link
Member

argilo commented Oct 23, 2023

In #668 I found that nested ifdefs prevent the neon kernel from running even on 32-bit ARM, so it's possible it's broken on 32-bit ARM too.

@argilo
Copy link
Member

argilo commented Oct 23, 2023

Yep, it's broken. Change inputPtr += 4 to inputPtr += 8 and the kernel works fine.

@argilo
Copy link
Member

argilo commented Oct 23, 2023

Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

I too noticed that. It seems on the armv7 build, NEON is not detected at all. Perhaps a bug in platform detection?

@argilo
Copy link
Member

argilo commented Oct 23, 2023

By the way, neon is slower than generic on my Raspberry Pi, 2345.84 ms vs. 1977.6 ms.

@marcusmueller
Copy link
Member Author

Since that Pi and maybe an E310 would be the main target for that kernel: should we maybe just eradicate both?

@argilo
Copy link
Member

argilo commented Oct 23, 2023

Yeah, I'd say get rid of it.

As far as I can tell, it's been broken since it was created in 2014 (158a6b2). It only correctly swaps the first two integers in the input vector, so I can't imagine it's ever done anyone any good.

This was hidden for 9 years du to being shadowed when NEONV8 was available

Signed-off-by: Marcus Müller <[email protected]>
@jdemel
Copy link
Contributor

jdemel commented Nov 4, 2023

https://github.com/gnuradio/volk/actions/runs/6617156936/job/17972917000?pr=680#step:3:2453 Seeing that on our armv7 machine it doesn't run at all, do we test that Neon non-v8 kernels on anything, @jdemel ?

Unfortunately, no. If you happen to find some CI infrastructure for this, please add it. I didn't. We used to test these when TravisCI was still working.

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The discussion concluded that these kernels were always broken.

@jdemel
Copy link
Contributor

jdemel commented Nov 4, 2023

I'll close #606 after this PR is merged because this PR supersedes #606 . Thanks for working through this issue.

@jdemel jdemel merged commit fd20770 into gnuradio:main Nov 4, 2023
32 checks passed
Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024
…e_neonv8

remove buggy and slow neonv8 kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants