Skip to content

Latest commit

 

History

History
30 lines (26 loc) · 1.79 KB

README.md

File metadata and controls

30 lines (26 loc) · 1.79 KB

simd demo

This C++ code shows how to accelerate a simple O(N^2) direct kernel summation of the type common in integral equation solvers, FMMs, etc. It compares 3 methods: explicit SIMD vectorization via the VCL library (which for convenience is included in this repo), and letting the compiler vectorize for different inner-outer loop orders.

For more on what "SIMD vectorization" means, see here:
https://en.wikipedia.org/wiki/SIMD
https://software.intel.com/en-us/articles/improve-performance-using-vectorization-and-intel-xeon-scalable-processors
https://software.intel.com/en-us/blogs/2012/01/31/vectorization-find-out-what-it-is-find-out-more
https://www.codingame.com/playgrounds/283/sse-avx-vectorization/what-is-sse-and-avx

For some speedups in the Helmholtz kernel context, see demo_slides.pdf.
Wen Yan's simd demo https://github.com/wenyan4work/DemoSIMD, which also has simd speed up tests and memory cache tutorials.

test autovec for icc and gcc

  • icpc -fPIC -g -O3 -march=native -funroll-loops -fopenmp -std=c++17 -DVCL -I./version1 main.cc; ./a.out
  • g++ -fPIC -g -O3 -march=native -funroll-loops -fopenmp -std=c++17 -DVCL -I./version1 main.cc; ./a.out

explicit vectorization(simd) libraries, there are more on github