Skip to content

OpenMP offload

Ye Luo edited this page Mar 24, 2023 · 53 revisions

Welcome to the miniqmc for OpenMP offload wiki!

Build

Check out OMP_offload branch

git co OMP_offload

See build options in miniQMC How-to Guides.

We introduce a new option ENABLE_OFFLOAD in the current CMake setting to turn on/off offloading.

 -DENABLE_OFFLOAD=ON   # offload to accelerators like GPU
 -DENABLE_OFFLOAD=OFF  # default, CPU only

OFFLOAD_TARGET can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.

Run

Offload feature is currently implemented on miniqmc miniapp. It accepts command line arguments -g, -w, -a, -m, -n

-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations

The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched

-f avoid transfer back data for checking. Must be used when measuring performance.

Benchmark example

OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"

Build recipes

Update on Nov 17th 2019

IBM XL

Last verified on 16.1.1-5 cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=ON ..

With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue

cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..

LLVM Clang

Last verified on 16

# NVIDIA
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=sm_80 ..
# AMD
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=gfx906 ..

-D USE_OBJECT_TARGET=ON is used to workaround static linking issue. not needed since LLVM 15.

Intel OneAPI

Last verified on beta08

cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=ON -D OFFLOAD_TARGET=spir64 ..

On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.

AMD AOMP

Last verified on 17.0-0

cmake -D CMAKE_CXX_COMPILER=clang++ \
      -D ENABLE_OFFLOAD=ON \
      -D QMC_GPU_ARCHS=gfx906 ..

GNU GCC

Last verified on 13 develop

cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=ON ..

Cray Clang

Last verified on 14. There is no need to load any architectural module like craype-accel-amd-gfx90a.

cmake -D CMAKE_CXX_COMPILER=crayCC \
      -D ENABLE_OFFLOAD=ON \
      -D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
      -D OFFLOAD_ARCH=gfx90a \
      -D QMC_MIXED_PRECISION=ON ..

NVHPC

cmake -DCMAKE_CXX_COMPILER=nvc++ -DENABLE_OFFLOAD=ON -DQMC_GPU_ARCHS=sm_80 -DQMC_MIXED_PRECISION=ON -DLAPACK_LIBRARIES="-llapack -lblas" -DCMAKE_EXE_LINKER_FLAGS=-pgf90libs ..

pass/fail dashboard

Compiler Clang 12.0.0rc3 AOMP 11.12-0 XL 16.1.1-5 OneAPI 2021.2.0 Cray 11.0.2 GCC 11dev 20210315 NVHPC 21.02
device NVIDIA AMD NVIDIA Intel NVIDIA NVIDIA NVIDIA
math header conflict Pass Pass Pass Pass Pass Pass Pass
complex arithmetic Pass Pass Pass Pass Fail Pass Pass
math linker error Pass Pass Pass Pass Pass Pass Fail
static linking Fail Pass Pass Pass Pass Pass Pass
Async tasking Pass FC Pass FC FC FC Fail
multiple streams Pass Pass Pass FC FC FC Pass
check_spo Pass Pass Pass Pass(R) Pass Pass Fail
check_spo_batched Pass Pass Pass Pass(R) Pass Pass Fail
miniqmc_sync_move Pass Pass Pass Pass Pass Pass Pass
Pass the intended feature is supported and runs corrected.
Fail can be in compile, link and run or incorrect results.
FC functionally correct, run with correct results.
(R) regression in the current release.