Skip to content

Fast vectorized gemm to take full usage of YMM Registers by SIMD instructions.

Notifications You must be signed in to change notification settings

XiaomingXu1995/gemm

Repository files navigation

Overview

The vectorization idea of gemm by SIMD instructions comes from the zhihu (https://zhuanlan.zhihu.com/p/383115932) and the (https://github.com/pigirons/sgemm_hsw).

zhihu gives a detailed description of the methods with perspicuous pictures.

Build

make -j8

Init the input

./init.sh

This is used for initialization of the input elements (Integer and Float values). Input matrices of A[m][n] and B[n][k] are read from the *.random files. Make sure the m*n and n*k are less than the element number of .random files.

Run the gemm

./exe_gemm_float m n k res It means that C[m][k]=A[m][n]xB[n][k]. As is shown in zhihu, the n should be a multiple of 24 to fully use the 16 ymm logical registers.

For example:
./exe_gemm_float 2400 2400 2400 res

./exe_gemm_float_multiple 24 24 64 res

./run.sh

About

Fast vectorized gemm to take full usage of YMM Registers by SIMD instructions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published