This directory contains input files for CP2K's benchmarks.
For measurements from different machines, please refer to CP2K benchmark suite, and for documentation on CP2K's input files, please refer to the Input Reference Manual. Python scripts for generating the scaling graphs are provided in tools/benchmark_plots/.
Note: the benchmark names make common use of acronyms. For explanations, please refer to the Glossary of Acronyms and Abbreviations.
The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads) for a particular problem, and give a good estimation for the parallel performance of the code for different types of methods.
The systems used to obtain the benchmark results are described on the systems page.
See the README.md
inside each benchmark sub-directory for descriptions of each
benchmark along with performance numbers.
Benchmarks currently available:
- Fayalite-FIST
- QS
- QS_DM_LS
- QS_HFX
- QS_diag
- QS_mp2_rpa
- QS_ot_ls
- QS_pao_ml_tio2
- QS_rubri
- QS_single_node
- QS_stmv
Some benchmarks require a preliminary step to generate an input file, e.g. a
wavefunction. When that is the case, it is specified in the README.md
inside
the benchmark's sub-directory.
The general way to run the benchmarks with the hybrid parallel executable is, e.g. for 2 threads per rank:
export OMP_NUM_THREADS=2
parallel_launcher launcher_options path_to_cp2k.psmp -i inputfile.inp -o logfile.log
where:
- The parallel_launcher is mpirun, mpiexec, or some variant such as aprun on Cray systems or srun when using Slurm.
launcher_options
specifies parallel placement in terms of total numbers of nodes, MPI ranks/tasks, tasks per node, and OpenMP threads per task (which should be equal to the value given to OMP_NUM_THREADS). This is not necessary if parallel runtime options are picked up by the launcher from the job environment.
The reported walltime for a given run can be obtained by querying the resulting
.log
file for CP2K's internal timing, as follows:
grep "CP2K " *.log
Moreover, the end of the resulting .log
files contains some performance numbers:
DBCSR STATISTICS
: statistics on DBCSR's computation and communication performance. First few lines: number of flops spent on different small dense block sizes, and which proportion of them ran on BLAS, Small Matrix-Matrix multiplicator (SMM
), and GPU (ACC
).DBCSR MESSAGE PASSING PERFORMANCE
: statistics on MPI calls in DBCSRMESSAGE PASSING PERFORMANCE
: statistics on MPI calls in CP2KT I M I N G
: timing and number of calls of CP2K functions
Python scripts for generating the scaling graphs are provided in cp2k/tools/benchmark_plots/.
We encourage you to contribute benchmark results from your own local cluster or HPC system - just run the inputs and add timings in the relevant sections below. Please also update the list of machines for which benchmark data is provided.