The following repository contains all the material related to both the homeworks on Matrix Transposition assigned during the GPU computing course : University of Trento (Italy) a.y. 2023/2024.
To see the report and better understand what this work is about, click Here
Download the directory
git clone https://github.com/LuCazzola/cudaMatrixTranspose.git
Here follows the Hierarchy of relevant project's files :
.
├── bin # final executables
│ └── ...
├── obj # intermediate object files
│ └── ...
└── src # source code
│ ├── headers # header files
│ │ └── ...
│ ├── benchmark.c # produce an output file according to options in "run_benchmark.sh"
│ ├── benchmark_gpu.cu # produce an output file according to options in "launch_benchmark.sh"
│ .
│ ├── main.c # test the functions according to options in "run_main.sh"
│ ├── main_gpu.cu # test the functions according to options in "launch_main.sh"
│ .
│ ├── transpose.c # functions to compute the transpose of a given matrix
│ ├── transpose_gpu.c
│ .
│ ├── matrix.c # definition of methods to handle matrices
│ ├── opt_parser.c # command line parameter parsing
│ .
│ └── common_cuda.cu # defines some common functions for cuda methods
│
├── run_benchmark.sh # set parameters related to "benchmark.c" and run the script
├── run_main.sh # set parameters related to "main.c" and run the script
├── run_cache_benchmark.sh # run cachegrind to benchmark cache miss % on specified function
│ .
├── launch_benchmark.sh # set parameters related to "benchmark_gpu.cu" and run the script on SLURM system
├── launch_main.sh # set parameters related to "main_gpu.cu" and run the script on SLURM system
│ .
├── data # data gathered via "run_benchmark.sh" & "launch_benchmark.sh"
│ └── ...
├── plot_data.py # generates graphs using the data stored in "data" folder
│
├── Makefile
└── ...
Makefile defines 4 rules :
- make : builds object files and homework-1 + homework-2 executables
- make debug : builds object files and ALL executables adding debugging flags
- make benchmark : builds object files and benchmark + benchmark_gpu executable
- make clean : cleans all object files
There are many pre-set scripts to choose from :
>> CPU scripts section ( Homework-1 )
>> GPU scripts section ( Homework-2 )
Go first inside the repository before running the scripts
cd cudaMatrixTranspose
"run_main.sh" script sets parameters related to homework-1 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : run_main.sh
make
./run_main.sh
"run_benchmark.sh" script sets parameters related to benchmark executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : run_benchmark.sh
make benchmark
./run_benchmark.sh
"run_cache_benchmark.sh" script sets parameters related to homework-1 and runs Cachegrind on it, extracting localized informations about cache misses inside transpose_naive() or transpose_blocks() functions (according to the chosen parameter "method")
To change run parameters and have a better understanding of its functionalities see : run_cache_benchmark.sh
make clean
make debug
./run_cache_benchmark.sh
Please consider that the following commands are supposed to be ran on the Marzola DISI cluster, modify the launch_main.sh & launch_benchmark.sh scripts if needed to change partition or SLURM system.
Outside the cloned project folder upload the project's directory to the login node
scp -r cudaMatrixTranspose <YOUR USERNAME>@marzola.disi.unitn.it:/home/<YOUR USERNAME>
Then login and go inside the project's folder
cd cudaMatrixTranspose
module load cuda
"launch_main.sh" script sets parameters related to homework-2 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : launch_main.sh
make
sbatch launch_main.sh
To visualize the results, once the node returns do:
cat output.out
"launch_benchmark.sh" script sets parameters related to benchmark_gpu executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : launch_benchmark.sh
make benchmark
sbatch launch_benchmark.sh
Inside the project's directory there's also a python script which take's the content of data folder and generates 2 types of graphs
- x : Matrix size - y : Mean execution time
- x : Matrix size - y : Mean effective bandwidth
Test it by running (on you own device) :
python3 plot_data.py
You can customize what information to plot inside the script
It's also possible to change some other parameters at compilation level (optimization level and matrix element data type) by changing some variables in the makefile) :