Optimized Matrix Multiplication

A program that optimizes matrix multiplication by exploiting spatial and temporal locality within memory.

Methods

The different methods of matrix multiplication is compared and its execution time and speedup is measured below. Note that these tests are ran with a 2.20 GHz CPU on Linux. First, an unoptimized kernel is measured. Second, transposition of the matrices on the unoptimized kernel is analyzed. Finally, tiling optimization is implemented, in which the matrices are partitioned into smaller, sub-matrcices that are multiplied one by one.

Execution time and Speedup of various methods

Speedup is measured against the unoptimized kernel.

Method	Execution Time	Speedup
Unoptimized Kernel	3.88 seconds
Transpose Optimization	1.93 secconds	2.01x speedup
Tiling with tile size 1	2.568 seconds	1.51x speedup
Tiling with tile size 2	1.728 seconds	2.24x speedup
Tiling with tile size 4	1.456 seconds	2.66x speedup
Tiling with tile size 8	1.434 seconds	2.70x speedup
Tiling with tile size 16	1.576 seconds	2.46x speedup
Tiling with tile size 32	1.846 seconds	2.10x speedup
Tiling with tile size 64	2.064 seconds	1.87x speedup
Tiling with tile size 128	2.730 seconds	1.42x speedup
Tiling with tile size 256	4.702 seconds	0.82x speedup
Tiling with tile size 512	4.466 seconds	0.86x speedup
Tiling with tile size 1024	4.240 seconds	0.91x speedup

It is apparent that the tiling method with a matrix size of 8 yields the highest speedup (2.7x) compared with the unoptimized matrix multiplication, saving nearly 2.5 seconds of execution time. It is interesting to note, however, with tile sizes above 256, there is a trend in execution time that is taking longer than the unoptimized matrix multiplication.

Compilation and Usage

To compile the program:

$ gcc -g -o matmul -Wall -O3 matmul.c -lm

Usage:

$ ./matmul [runTimes] [power]

where runTimes = number of times to run the tests and power = the number of tiles in powers of 2

for example, to run it 5 times with 1-1024 tiles:

$ ./matmul 5 10

or to run it 1 time with 1-64 tiles:

$ ./matmul 1 6

At the end of running, a summary with average times from each test is printed to stdout.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
matmul.c		matmul.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimized Matrix Multiplication

Methods

Execution time and Speedup of various methods

Compilation and Usage

About

Releases

Packages

Languages

wchunl/OptimalMatrixMult

Folders and files

Latest commit

History

Repository files navigation

Optimized Matrix Multiplication

Methods

Execution time and Speedup of various methods

Compilation and Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages