There are 5 simple examples showing how to use AVX intrinsics to accelerate your program. However, please remember that I am a beginner in the use of CPU vector instructions. I'm not claiming this code is exemplary.
To build this respository, make sure your computer support AVX2, to find out, run the command
cat /proc/cpuinfo | grep avx2
- https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions/
- https://software.intel.com/en-us/articles/benefits-of-intel-avx-for-small-matrices/
- https://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX
- https://software.intel.com/en-us/node/523876
- http://sci.tuomastonteri.fi/programming/sse
- http://stackoverflow.com/questions/13577226/intel-sse-and-avx-examples-and-tutorials
- http://supercomputingblog.com/optimization/getting-started-with-sse-programming/
- https://felix.abecassis.me/2011/09/cpp-getting-started-with-sse/
- http://www.walkingrandomly.com/?p=3378
- https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Not only will we assume the input is correctly aligned, but also that their lengths are multiples of 256 bits.
Are there necessary restrictions on alignment with respect to each other, or can we take any two arrays of float anywhere in memory?
Let's calcuate dot product of two vectors.
I have seen it asserted online that brute force linear search can beat binary search for arrays of size up to 10K. The calculations people give to support this claim involve vector instructions. Let's try writing a vectorized linear search.
When the object is created dynamically, its address is determined at runtime. However, C++ Runtime Library does not concern the alignment statement, so we need to overload the new function.
In addition, if we want to create a class with aligned class dynamically, C++ Runtime Library will not call the overload new function, which will cause memory disalignment. The solution is relatively tricky, which requires users to use a Macro in their code. See the code for detail.
Reference http://eigen.tuxfamily.org/dox-devel/group__DenseMatrixManipulation__Alignement.html