Skip to content

v0.8.0

Compare
Choose a tag to compare
@mr-c mr-c released this 02 May 10:06
· 80 commits to master since this release

SIMDe 0.8.0

Summary

  • Complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in the previous release! (@yyctw @wewe5215)
  • SIMDe PRs are tested using Fedora Rawhide (@junaruga)

For the entire project: 656 files changed, 202635 insertions(+), 1724 deletions(-)

For just the simde folder: 295 files changed, 47053 insertions(+), 896 deletions(-)

X86

There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).

Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (ER, PF, 4MAPS, and 4VNNIW) from their intrinsic list. SIMDe will retain those few implementations we already had, but this changes how our completeness statistics are calculated.

Newly added function families

  • AES: 5 of 6 (83.33%)

Newly AVX512 added function families

Additions to existing families

  • AVX512BW: 7 additional, 337 of 790 (42.66%)
  • AVX512DQ: 5 additional, 112 total of 376 (29.79%)
  • AVX512F: 48 additional, 1087 total of 2812 (38.66%)
  • AVX512_FP16: 15 additional, 17 total of 1105 (1.54%)

Neon

SIMDe currently implements 6670 out of 6670 (100.00%) NEON functions; up from 56.46% in the previous release!

Newly added families

  • abal
  • abal_high
  • abd
  • abdh
  • abdl_high
  • addhn_high
  • aes
  • bfdot
  • bfdot_lane
  • cadd_rot
  • cale
  • calt
  • cmla_lane
  • cmla_rot_lane
  • copy_lane
  • cvt_high
  • cvt_n
  • cvta
  • cvtn
  • cvtp
  • cvtx
  • cvtx_high
  • div
  • dupb_lane
  • duph_lane
  • eor3
  • fmlal
  • fms
  • fms_lane
  • fms_n
  • ld2_dup
  • ld2_lane
  • ld3_dup
  • ld3_lane
  • ld4_dup
  • maxnmv
  • minnmv
  • mla_lane
  • mla_high_lane
  • mls_lane
  • mlsl_high_lane
  • mmla
  • mull_high_lane
  • mull_high_n
  • mulx
  • mulx_lane
  • pmaxnm
  • pminnm
  • qdmlal
  • qdmlal_high
  • qdmlal_high_lane
  • qdmlal_high_n
  • qdmlal_lane
  • qdmlal_n
  • qdmlsl
  • qdmlsl_high
  • qdmlsl_high_lane
  • qdmlsl_high_n
  • qdmlsl_lane
  • qdmlsl_n
  • qdmlslh
  • qdmlslh_lane
  • qdmulhh
  • qdmulhh_lane
  • qdmull_high
  • qdmull_high_lane
  • qdmull_high_n
  • qdmull_lane
  • qdmull_n
  • qdmullh_lane
  • qmovun_high
  • qrdmlah
  • qrdmlah_lane
  • qrdmlahh
  • qrdmlahh_lane
  • qrdmlsh
  • qrdmlsh_lane
  • qrdmlshh
  • qrdmlshh_lane
  • qrdmulhh_lane
  • qrshl
  • qrshlh
  • qrshrn_high_n
  • qrshrnh_n
  • qrshrun_high_n
  • qrshrunh_n
  • qshl_n
  • qshlh_n
  • qshluh_n
  • qshrn_high_n
  • qshrnh_n
  • qshrun_high_n
  • qshrunh_n
  • raddhn
  • raddhn_high
  • rax
  • recp
  • rnd32x
  • rnd32x
  • rnd32x
  • rnd64z
  • rnda
  • rndx
  • rshrn_high_n
  • rsubhn
  • rsubhn
  • set_lane
  • sha1
  • sha1h
  • sha256
  • sha512
  • shll_high_n
  • shrn_high_n
  • sli_n
  • sm3
  • sm4
  • sqrt
  • st1_x2
  • st1_x3
  • st1_x4
  • st1q_x2
  • st1q_x3
  • st1q_x4
  • subhn_high
  • sudot_lane
  • usdot
  • usdot_lane

Finally complete families

  • cvtn
  • mla_lane

Details

  • simde-f16: improve _Float16 usage; better INFHF/NANHF defs 8910057 @mr-c
  • simde_float16: prefer __fp16 if available aba26f6 @mr-c

Implementation of Arm intrinsics

NEON

  • cvtn: vcvtnq_{s32_f32,s64_f64}: add SSE & AVX512 optimized implementations e134cc7 @mr-c
  • cvtn: vcvtnq_u32_f32 is a V8 function 8432c70 @mr-c
  • min: Remove non-working MMX specialization from simde_vmin_s16 6858b92 @M-HT
  • shll: Extend constant range in simde_vshll_n_XXX intrinsics (#1064) beb1c61 @M-HT
  • various: Implement some f16XN types and f16 related intrinsics. (#1071) aae2245 @yyctw
  • qtbl/qtbx polyfills for A32V7 a2fef9e @easyaspi314
  • arm: use SIMDE_ARCH_ARM_FMA 7198d6d @mr-c
  • arm neon: Complex operations from Armv8.3-a (#1077) d08d67c @wewe5215
  • more fp16 using intrinsics supported by architecture v7 (skip version) (#1081) 5e7c4d4 @yyctw
  • st1{,q}_*_x{2,3,4}: initial implementation (#1082) 879d1a0 @yyctw
  • part 1 of implement all intrinsics supported by architecture A64 (#1090) 2eedece @yyctw
  • Add AES instructions. 23adcd2 805ccd2 @yyctw
  • Modified simde_float16 to simde_float16_t (#1100) 8a05dc6 @yyctw
  • implement all intrinsics supported by architecture A64-remaining part (#1093) 018ba24 @yyctw
  • add enable vmlaq_laneq_f32 and vcvtq_n_f64_u64 c7d314b @yyctw
  • implement all bf16-related intrinsics (#1110) c59db7c @yyctw
  • arm/neon abs: negating INT_MIN is undefined behavior in C/C++ c200c16 @mr-c

SVE Intrinsics

  • Improve performance of simde_mm512_add_epi32 (#1126) 6cde31c @AymenQ

WASM intrinsics

  • simd128: fix altivec_p7 version of wasm_f64x2_pmin 96d6e53 @mr-c
  • simd128: add missing unsigned functions ea5e283 @mr-c
  • simd128 f{32x4,64x2}_min: add workaround for a gcc<6 issue d5d6d10 @mr-c
  • detect support for Relaxed SIMD mode 2e66dd4 @mr-c
  • simd128/relaxed: begin MIPS implementations db8ad84 @mr-c
  • relaxed: add f{32x4,64x2}_relaxed_{min,max} 9d1a34e @mr-c
  • relaxed: updated names; reordered FMA operations 8cc8874 @mr-c

x86 intrinsics

  • sse{,2,4.1}, avx{,2} *_stream_{,load}: use __builtin_nontemporal_{load,store} 6ce6030 @mr-c

SSE*

  • sse: Fix issues related to MXCSR register (#1060) 653aba8 @M-HT
  • sse: implement _mm_movelh_ps for Arm64 514564e @mr-c
  • sse _mm_movemask_ps: remove unused code fba97e4 @mr-c
  • sse2 mm_pause: more archs, add a basic test 692a2e8 @mr-
  • sse4.1: use logical OR instead of bitwise OR in neon impl of _mm_testnzc_si128 edd4678 @mr-c
  • sse4.1 _mm_testz_si128: fix backwards short circuit logic f132275 @mr-c

AVX

  • run test from #926 ce9708c @mr-c
  • simde_mm256_shuffle_pd fix for natural vector size < 128 1594d7c @mr-c

AVX2

  • correction of simde_mm256_sign_epi{8,16,32} (#1123) c376610 @Proudsalsa

AVX512

  • fpclass: naive implementation 353bf5f @mr-c
  • loadu: fix native detection 305f434 @mr-c
  • set: add simde_x_mm512_set_m256{,d} 67e0c50 @mr-c
  • gather: add MSVC native fallbacks 7b7e3f6 @mr-c
  • AVX512FP16 / m512h initial support e97691c @mr-c
  • fix many native aliases 75014b9 @mr-c

CLMUL

  • fix natives, some require VPCLMULQDQ f819c52 @mr-c

SVML

  • enable SIMDE_X86_SVML_NATIVE for MSVC 2019+ 593af95 @mr-c

AES

  • aes: initial implementation of most aes instructions (#1072) 8632391 @Vineg

MIPS MSA intrinics

  • msa neon impl: float64x2_t is not avail in A32V7 ae4c4ab @mr-c

Arch support

x86(-64)

  • fix SIMDE_ARCH_X86_SSE4_2 define 5e4b308 @cbielow

arm64

  • x86 aes: add neon implementation using the crypto extension fb3554f @mr-

Altivec

  • neon/st1: disable last remaining AltiVec implementation 0521245 @mr-c

Power

  • sse2,wasm simd128: skip SIMDE_CONVERT_VECTOR_ impementations on PowerPC 4de999a @mr-c
  • wasm simd128: more powerpc fixes 7cb5691 @mr-c

Compiler Specific

GCC

  • GCC AVX512F: SIMDE_BUG_GCC_95399 was fixed in GCC 9.5, 10.4, 11.4, 12+ 3fa89c5 @mr-c
  • GCC x86/x64: SIMDE_BUG_GCC_98521 was fixed in 10.3 edde42e @mr-c
  • GCC x86: SIMDE_BUG_GCC_94482 was fixed in 8.5, 9.4, 10+ 43d86a3 @mr-c
  • Add workaround for GCC bug 111609 fdafd8e @M-HT
  • arm neon ld2: silence warnings at -O3 on gcc risc-v 8f56628 @mr-c
  • avx512 abs: refine GCC compiler checks for _mm512{,_mask}_abs_pd (#1118) 5405bbd @thomas-schlichter

Clang

  • clang powerpc: vec_bperm bug was fixed in clang-14 6feb28a @mr-c
  • clmul: aarch64 clang has difficulties with poly64x1_t 1e1bd76 @mr-c
  • aarch64: optimization bug 45541 was fixed in clang-15 7ca5712 @mr-c
  • A32V7: Don't trust clang for load multiple on A32V7 927f141 @easyaspi314
  • wasm: SIMDE_BUG_CLANG_60655 is fixed in the upcoming 17.0 release 25cebbe @mr-c
  • simde-detect-clang.h: add clang 17 detection 923f8ac 684baa1 50d98c1 @Coeur

ClangCL

  • fp16: don't use _Float16 on ClangCL if not supported 8a6b8c5 @mr-c
  • svml: don't enable SIMDE_X86_SVML_NATIVE for ClangCl c877fe5 @mr-

Emscripten

  • emcc tot: set -Wno-switch-default fdbd6b2 @mr-c

MSVC

  • avx512 types: avoid using native AVX512 types on MSVC unless required 029d749 @mr-c
  • arm neon: {u,s}addh apply arm64 windows workaround only on msvc<1938 (#1121) 14311d6 @Changqing-JING

Testing with Docker/Podman & CI

  • Update recipe for qemu git mode 54b8c8f @mr-c
  • riscv64 gcc: typo fix for endian little 7423339 @mr-c
  • add new cross sets; Ubuntu Focal and Bionic support b0b9710 @mr-c
  • native tests: also AVX512, MSA; fix WASM SIMD128 path bdd075b @mr-c
  • test-flags: support the x86 microarchitecture levels 518b777 @mr-c
  • ignore common build paths b3689ea @mr-c

Appveyor

  • preserve test log 9815161 @mr-c
  • save meson log on error 5207d83 @mr-

Circle CI

  • circleci: clang, set -Wno-unsafe-buffer-usage 24c93c2 @mr-c

GitHub Actions

  • upgrade qemu ; fixes remaining ppc64el fails! e91944b @mr-c
  • tidy matrix ordering for easier to read job names b52ac36 @mr-c
  • add clang-qemu: aarch64, riscv64, ppc64el, s390x 8a6dbab @mr-c
  • test armv7 with gcc-12 via qemu 8cd8de1 @mr-c
  • add armel to gcc and clang qemu matrices 4ca849b @mr-c
  • add armv7 to clang-qemu matrix a144aca @mr-c
  • use GCC 12 for adv x64 native testing + AVX512FP f156b41 @mr-c
  • expand mac-os/xcode testing matrix 8055410 @mr-c
  • fix macos-13+brew failure c6149de @mr-c
  • test with clang-16 e25ced8 @mr-c
  • add gcc-13 43ac8fc @mr-c
  • simplify x86 ISA matrix 6b7c1b3 @mr-c
  • run on commits to the primary branch to prime the cache 6055bfb @mr-c
  • build(deps): bump actions/checkout from 3 to 4 149d0af @dependabot[bot]
  • build(deps): bump github/codeql-action from 2 to 3 (#1138) 5026e66 @dependabot[bot]
  • build(deps): bump actions/setup-python from 4 to 5 (#1137) 2768da8 @dependabot[bot]
  • build(deps): bump actions/setup-dotnet from 3 to 4 (#1135) ed382cb @dependabot[bot]
  • build(deps): bump ad-m/github-push-action from 0.6.0 to 0.8.0 (#1134) 193be1b @dependabot[bot]
  • add new repo for clang-16 7ebd267 @mr-c
  • add clang-17 (#1127) d31de99 @mr-c
  • test mips64el using qemu on gcc12/clang16 934d86d @mr-c
  • disable {clang,gcc}-qemu mips64el; needs newer Ubuntu version 471a342 @mr-c
  • test WASM Relaxed SIMD da0604f @mr-c

Packit CI

  • Start testing SIMDe PRs using Fedora Rawhide d64b103 6ae0763 b309d89 4d55fc2 643c419 @junaruga

Travis

  • restart testing with Travis CI 93905f5 @mr-c

Misc

  • README: mark F16C as complete 2d87cf5 @mr-c
  • README: Give credit to creator/maintainer of the vcpkg for SIMDe ceb1e73 @mr-c
  • README: related projects: add AvxToNeon 13bf92a @mr-c
  • README: add more background links for supported ISAs c76450d @mr-c
  • README: turn Packit CI link into a deep link e9e1901 @mr-c
  • README: NEON is complete 7412139 @mr-c
  • docs: explain how to target a single test 2158ac7 @mr-c

New Contributors

Full Changelog: v0.7.6...v0.8.0