Skip to content

LuCazzola/cudaMatrixTranspose

Repository files navigation

Matrix transposition : from sequential to parallel with CUDA

The following repository contains all the material related to both the homeworks on Matrix Transposition assigned during the GPU computing course : University of Trento (Italy) a.y. 2023/2024.

To see the report and better understand what this work is about, click Here

Matrix Transposition


How to use

Download the directory

git clone https://github.com/LuCazzola/cudaMatrixTranspose.git

Here follows the Hierarchy of relevant project's files :

.
├── bin                         # final executables
│    └── ...
├── obj                         # intermediate object files
│    └── ...
└── src                         # source code
│    ├── headers                # header files
│    │    └── ...                         
│    ├── benchmark.c            # produce an output file according to options in "run_benchmark.sh"
│    ├── benchmark_gpu.cu       # produce an output file according to options in "launch_benchmark.sh".
│    ├── main.c                 # test the functions according to options in "run_main.sh"
│    ├── main_gpu.cu            # test the functions according to options in "launch_main.sh".
│    ├── transpose.c            # functions to compute the transpose of a given matrix
│    ├── transpose_gpu.c 
│    .
│    ├── matrix.c               # definition of methods to handle matrices
│    ├── opt_parser.c           # command line parameter parsing.
│    └── common_cuda.cu         # defines some common functions for cuda methods
│
├── run_benchmark.sh            # set parameters related to "benchmark.c" and run the script
├── run_main.sh                 # set parameters related to "main.c" and run the script
├── run_cache_benchmark.sh      # run cachegrind to benchmark cache miss % on specified function.
├── launch_benchmark.sh         # set parameters related to "benchmark_gpu.cu" and run the script on SLURM system
├── launch_main.sh              # set parameters related to "main_gpu.cu" and run the script on SLURM system.
├── data                        # data gathered via "run_benchmark.sh" & "launch_benchmark.sh"
│    └── ...
├── plot_data.py                # generates graphs using the data stored in "data" folder
│
├── Makefile
└── ...

Main commands

Makefile defines 4 rules :

  • make : builds object files and homework-1 + homework-2 executables
  • make debug : builds object files and ALL executables adding debugging flags
  • make benchmark : builds object files and benchmark + benchmark_gpu executable
  • make clean : cleans all object files

There are many pre-set scripts to choose from :
>> CPU scripts section ( Homework-1 )
>> GPU scripts section ( Homework-2 )

CPU test commands ( Homework-1 )

NOTE

Go first inside the repository before running the scripts

cd cudaMatrixTranspose


COMMANDS

"run_main.sh" script sets parameters related to homework-1 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : run_main.sh

make
./run_main.sh

"run_benchmark.sh" script sets parameters related to benchmark executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : run_benchmark.sh

make benchmark
./run_benchmark.sh

"run_cache_benchmark.sh" script sets parameters related to homework-1 and runs Cachegrind on it, extracting localized informations about cache misses inside transpose_naive() or transpose_blocks() functions (according to the chosen parameter "method")
To change run parameters and have a better understanding of its functionalities see : run_cache_benchmark.sh

make clean
make debug
./run_cache_benchmark.sh


GPU test commands ( Homework-2 )

NOTE

Please consider that the following commands are supposed to be ran on the Marzola DISI cluster, modify the launch_main.sh & launch_benchmark.sh scripts if needed to change partition or SLURM system.

Outside the cloned project folder upload the project's directory to the login node

scp -r cudaMatrixTranspose <YOUR USERNAME>@marzola.disi.unitn.it:/home/<YOUR USERNAME>

Then login and go inside the project's folder

cd cudaMatrixTranspose
module load cuda


COMMANDS

"launch_main.sh" script sets parameters related to homework-2 executable and runs it.
To change run parameters and have a better understanding of its functionalities see : launch_main.sh

make
sbatch launch_main.sh

To visualize the results, once the node returns do:

cat output.out

"launch_benchmark.sh" script sets parameters related to benchmark_gpu executable and runs it.
extracted data can be found on the data folder
To change run parameters and have a better understanding of its functionalities see : launch_benchmark.sh

make benchmark
sbatch launch_benchmark.sh


Graph Plotting

Inside the project's directory there's also a python script which take's the content of data folder and generates 2 types of graphs

  • x : Matrix size - y : Mean execution time
  • x : Matrix size - y : Mean effective bandwidth

Test it by running (on you own device) :

python3 plot_data.py

You can customize what information to plot inside the script



Extra Customization

It's also possible to change some other parameters at compilation level (optimization level and matrix element data type) by changing some variables in the makefile) :

About

Optimizing matrix transposition on GPU with CUDA (University of Trento, Italy)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published