OpenMP offload

Welcome to the miniqmc for OpenMP offload wiki!

Build

Check out OMP_offload branch

git co OMP_offload

See build options in miniQMC How-to Guides.

We introduce a new option ENABLE_OFFLOAD in the current CMake setting to turn on/off offloading.

 -DENABLE_OFFLOAD=ON   # offload to accelerators like GPU
 -DENABLE_OFFLOAD=OFF  # default, CPU only

OFFLOAD_TARGET can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.

Run

Offload feature is currently implemented on miniqmc miniapp. It accepts command line arguments -g, -w, -a, -m, -n

-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations

The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched

-f avoid transfer back data for checking. Must be used when measuring performance.

Benchmark example

OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"

Build recipes

Update on Nov 17th 2019

IBM XL

Last verified on 16.1.1-5 cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=ON ..

With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue

cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..

LLVM Clang

Last verified on 16

# NVIDIA
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=sm_80 ..
# AMD
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=gfx906 ..

~~-D USE_OBJECT_TARGET=ON is used to workaround static linking issue.~~ not needed since LLVM 15.

Intel OneAPI

Last verified on beta08

cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=ON -D OFFLOAD_TARGET=spir64 ..

On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.

AMD AOMP

Last verified on 17.0-0

cmake -D CMAKE_CXX_COMPILER=clang++ \
      -D ENABLE_OFFLOAD=ON \
      -D QMC_GPU_ARCHS=gfx906 ..

GNU GCC

Last verified on 13 develop

cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=ON ..

Cray Clang

Last verified on 14. There is no need to load any architectural module like craype-accel-amd-gfx90a.

cmake -D CMAKE_CXX_COMPILER=crayCC \
      -D ENABLE_OFFLOAD=ON \
      -D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
      -D OFFLOAD_ARCH=gfx90a \
      -D QMC_MIXED_PRECISION=ON ..

NVHPC

cmake -DCMAKE_CXX_COMPILER=nvc++ -DENABLE_OFFLOAD=ON -DQMC_GPU_ARCHS=sm_80 -DQMC_MIXED_PRECISION=ON -DLAPACK_LIBRARIES="-llapack -lblas" -DCMAKE_EXE_LINKER_FLAGS=-pgf90libs ..

pass/fail dashboard

Compiler	Clang 12.0.0rc3	AOMP 11.12-0	XL 16.1.1-5	OneAPI 2021.2.0	Cray 11.0.2	GCC 11dev 20210315	NVHPC 21.02
device	NVIDIA	AMD	NVIDIA	Intel	NVIDIA	NVIDIA	NVIDIA
math header conflict	Pass	Pass	Pass	Pass	Pass	Pass	Pass
complex arithmetic	Pass	Pass	Pass	Pass	Fail	Pass	Pass
math linker error	Pass	Pass	Pass	Pass	Pass	Pass	Fail
static linking	Fail	Pass	Pass	Pass	Pass	Pass	Pass
Async tasking	Pass	FC	Pass	FC	FC	FC	Fail
multiple streams	Pass	Pass	Pass	FC	FC	FC	Pass
check_spo	Pass	Pass	Pass	Pass(R)	Pass	Pass	Fail
check_spo_batched	Pass	Pass	Pass	Pass(R)	Pass	Pass	Fail
miniqmc_sync_move	Pass	Pass	Pass	Pass	Pass	Pass	Pass

Pass the intended feature is supported and runs corrected.
Fail can be in compile, link and run or incorrect results.
FC functionally correct, run with correct results.
(R) regression in the current release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly