This is a repository where various scheduling policies available in OpenMP are investigated. The investigation is performed for two different workloads, one that is slightly unbalanced called loop1
and the other one that is very unbalanced with the most of the work concentrated in the first few iterations, called loop2
.
The following schedulers provided by OpenMP are investigated:
STATIC,n
DYNAMIC,n
GUIDED,n
where n
is the selected chunksize.
Additionally, a scheduler was designed by hand called affinity
scheduler aiming to combine the characteristics of the afformentioned schedulers and compare the performance.
includes/
: Contains the header file calledresources.h
necessary for the development of the code. Additionally, containsaffinity_structs.h
andmacros.h
necessary for the development of the affinity scheduler.src/main.c
: The main source file used to execute each scheduling option for the two available workloads.src/loops/
: Contains all the functions relevant to the workload, i.e initialisation and validation as well as execution of the workload.src/omplib/
: Contains wrap functions of OpenMP commands in an effort to hide the APIs functions.src/affinity/
: Contains all the functions used to develop the affinity scheduler.scripts/performance/
: Contains all the performance tests available to measure the performance of the code.scripts/pbs/
: Contains all the performance tests to run on the back-end of CIRRUS available to measure the performance of the code.scripts/plots/
: Contains all the plot scripts available to plot the results of the performance tests.res/
: Directory containing the raw results and plots for each test.
The designed affinity scheduler comes in two versions. The first one uses critical regions
in order to synchronize the threads while the second locks
. One can choose between the two versions by compiling the code with different DEFINE
flag. Moreover, one can also choose between which scheduler to use to measure its performance. In other words, one can use another DEFINE
flag to choose between the best_scheduling
option chosen for each workload or choose to determine the scheduling option on the runtime.
The following options are available:
-DRUNTIME
: Choose to select the scheduling option on the runtime.-DBEST_SCHEDULE
: Choose to use the best scheduling option determined for each workload.-DBEST_SCHEDULE_LOOP2
: Choose to use the best scheduling option determined for each workload after a further investigation ofloop2
.-DAFFINITY
: Choose to use affinity scheduler.-DLOCK
: If set, the affinity scheduler with locks is used, otherwise the one with critical regions.
Note that one should only choose one of the four main options shown above. In case no option is selected, the serial
version of the code is being executed.
To compile all the available versions of the code use:
$ make all
This will create all the necessary directories for the code to be executed. All the versions of the code are compiled using the different options showed above. This will result in the following executables:
bin/serial
: Serial version of the code.bin/runtime
: Parallel version of the code where scheduling can be determined on the runtime. Note that only the scheduling options provided by OpenMP can be selected.bin/best_schedule
: The best scheduling options provided by OpenMP are used for each workload.bin/best_schedule_loop2
: The best scheduling options provided by OpenMP are used for each workload after the best schedule option forloop2
was tunned based on its chunksize.bin/affinity
: The affinity scheduler with critical regions is used.bin/affinity_lock
: The affinity scheduler with locks is used.
Alternatively, one can compile each version as follows: Create the required directories using:
$ make dir
Build the serial version:
$ make bin/serial -B
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc obj/omplib.o obj/workload.o obj/main.o -o bin/serial -lm -qopenmp
Build the runtime version:
$ make bin/runtime DEFINE=-DRUNTIME -B
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc obj/omplib.o obj/workload.o obj/main.o -o bin/runtime -lm -qopenmp
Build the best_scheduling version:
$ make bin/best_schedule DEFINE=-DBEST_SCHEDULE -B
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/main.o -o bin/best_schedule -lm -qopenmp
Build the best_scheduling version for loop2:
$ make bin/best_schedule_loop2 DEFINE=-DBEST_SCHEDULE_LOOP2 -B
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/main.o -o bin/best_schedule_loop2 -lm -qopenmp
Build the affinity version with critical regions:
$ make bin/affinity DEFINE=-DAFFINITY -B
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/affinity.o -c src/affinity/affinity.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/mem.o -c src/affinity/mem.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/affinity.o obj/mem.o obj/main.o -o bin/affinity -lm -qopenmp
Build the affinity version with locks:
$ make bin/affinity_lock DEFINE=-DAFFINITY DEFINE+=-DLOCK -B
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/affinity.o -c src/affinity/affinity.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/mem.o -c src/affinity/mem.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/affinity.o obj/mem.o obj/main.o -o bin/affinity_lock -lm -qopenmp
To clean the project run:
$ make clean
To execute the serial code:
$ ./bin/serial
To execute the parallel code one has to choose the number of threads the code will be executed on. This can be done using:
$ export OMP_NUM_THREADS=$(THREADS)
where $(THREADS)
is the number of threads selected.
To executed the runtime version:
$ export OMP_SCHEDULE=$(KIND,n)
$ ./bin/runtime
where $(KIND,n)
is the selected scheduling option and chunksize used.
The available scheduling options are:
STATIC,n
: Static schedulerDYNAMIC,n
: Dynamic schedulerGUIDED,n
: Guided scheduler wheren
is the selected chunksize.
Example:
$ export OMP_NUM_THREADS=4
$ export OMP_SCHEDULE=DYNAMIC,2
$ ./bin/runtime
This will execute the code on 4 threads using a dynamic scheduler with chunkisize of 2 for each workload.
To executed the best_scheduling version:
$ ./bin/best_schedule
This will execute the code with GUIDED,16
for loop1
and DYNAMIC,8
for loop2
.
To executed the best_scheduling_loop2 version:
$ ./bin/best_schedule_loop2
This will execute the code with GUIDED,16
for loop1
and DYNAMIC,4
for loop2
.
To executed the affinity version with critical regions use:
$ ./bin/affinity
To executed the affinity version with locks use:
$ ./bin/affinity_lock
This test executes multiple times the bin/runtime
executable. Each time the performance of each OpenMP scheduling option is measured for different chunksizes. The number of threads is kept constant in order to determine the best scheduling option and chunksize for each workload.
Running on the front-end:
$ make runtime_test
Submitting a job on the back-end of CIRRUS:
$ make runtime_test_back
To plot the results once the test is finished run:
$ make plot_runtime_test
Observing the results from the previous test, the best scheduling option is selected for each workload. This test runs multiple times the bin/best_schedule
executable over a set of number of threads. The performance is then evaluated for each thread and each workload.
Running on the front-end:
$ make best_schedule_test
Submitting a job on the back-end of CIRRUS:
$ make best_schedule_test_back
To plot the results once the test is finished run:
$ make plot_runtime_test
As loop2
after the best_schedule test, saturates for the selected scheduling option and chunksize, a further investigation is performed. The executable bin/best_schedule_loop2
is executed multiple times over a set of number of threads and chunksizes for the selected best_scheduling option for loop2.
Running on the front-end:
$ make best_schedule_loop2_test
Submitting a job on the back-end of CIRRUS:
$ make best_schedule_loop2_test_back
To plot the results once the test is finished run:
$ make plot_runtime_test
The performance of the affinity scheduler is investigated for the two available versions, i.e when critical regions are used and when locks are used instead.
Running on the front-end:
$ make affinity_schedule_test
Submitting a job on the back-end of CIRRUS:
$ make affinity_schedule_test_back
To plot the results once the test is finished run:
$ make plot_runtime_test
The performance of all the implemented versions for each loop is evaluated and compared together.
Running on the front-end:
$ make performance_comparison_test
Submitting a job on the back-end of CIRRUS:
$ make performance_comparison_test_back
To plot the results once the test is finished run:
$ make plot_performance_comparison_test
Instead of submiting the test scripts one by one, one can use the following to perform all the tests together:
$ make run_tests_front
If one wants to submits the tests at the back-end can run:
$ make run_tests_back
Once all the tests are finished, the results can be plotted using:
$ make plot_tests