fewer intrinsics in _mm_movemask_ps #653

shinjiogaki · 2024-11-08T10:36:34Z

No description provided.

aqrit · 2024-11-10T14:05:19Z

an alternative is to extract bits with using umulh:

uint64_t x = vget_lane_s64(vreinterpret_s64_s16(vqmovn_s32(vreinterpretq_s32_m128i(a))),0);
const uint64_t mask = 0x8000800080008000ULL;
const uint64_t magic = 0x0002000400080010ULL;
return (uint8_t)(((__uint128_t)(x & mask) * (__uint128_t)magic) >> 64);

I don't know which is faster.

howjmay · 2024-11-12T22:07:54Z

@shinjiogaki are you interested in giving a try to see whether this bring performance enhancement?

fewer intrinsics in _mm_movemask_ps

04c9789

shinjiogaki requested review from jserv and howjmay as code owners November 8, 2024 10:36

shinjiogaki closed this Nov 8, 2024

shinjiogaki deleted the patch-1 branch November 8, 2024 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fewer intrinsics in _mm_movemask_ps #653

fewer intrinsics in _mm_movemask_ps #653

shinjiogaki commented Nov 8, 2024

aqrit commented Nov 10, 2024

howjmay commented Nov 12, 2024

fewer intrinsics in _mm_movemask_ps #653

fewer intrinsics in _mm_movemask_ps #653

Conversation

shinjiogaki commented Nov 8, 2024

aqrit commented Nov 10, 2024

howjmay commented Nov 12, 2024