- There are 3 offloading options for OpenMP: NVIDIA, AMD and Intel.
- If a user provides a value for
cuda_arch
, the execution will be automatically offloaded to NVIDIA. - If a user provides a value for
amdgpu_target
, the operation will be offloaded to AMD. - In the absence of
cuda_arch
andamdgpu_target
, the execution will be offloaded to Intel.
Flag | Definition |
---|---|
cuda_arch | - List of supported compute capabilities are provided here - Useful link for matching CUDA gencodes with NVIDIA architectures |
amdgpu_target | List of supported architectures are provided here |
# Example 1: for Intel offload
$ spack install babelstream%oneapi +omp
# Example 2: for Nvidia GPU for Volta (sm_70)
$ spack install babelstream +omp cuda_arch=70
# Example 3: for AMD GPU gfx701
$ spack install babelstream +omp amdgpu_target=gfx701
- No need to specify
amdgpu_target
orcuda_arch
here since we are using AMD and CUDA as backend respectively.
Flag | Definition |
---|---|
backend | 4 different backend options: - cuda - amd - intel - pocl |
# Example 1: CUDA backend
$ spack install babelstream%gcc +ocl backend=cuda
# Example 2: AMD backend
$ spack install babelstream%gcc +ocl backend=amd
# Example 3: Intel backend
$ spack install babelstream%gcc +ocl backend=intel
# Example 4: POCL backend
$ spack install babelstream%gcc +ocl backend=pocl
- Minimum GCC version requirement
10.1.0
- NVHPC Offload will be added in the future release
# Example 1: data
$ spack install babelstream +stddata
# Example 2: ranges
$ spack install babelstream +stdranges
# Example 3: indices
$ spack install babelstream +stdindices
amdgpu_target
andflags
are optional here.
Flag | Definition |
---|---|
amdgpu_target | List of supported architectures are provided here |
flags | Extra flags to pass |
# Example 1: ROCM default
$ spack install babelstream +rocm
# Example 2: ROCM with GPU target
$ spack install babelstream +rocm amdgpu_target=<gfx701>
# Example 3: ROCM with extra flags option
$ spack install babelstream +rocm flags=<xxx>
# Example 4: ROCM with GPU target and extra flags
$ spack install babelstream +rocm amdgpu_target=<gfx701> flags=<xxx>
- The
cuda_arch
value is mandatory here. - If a user provides a value for
mem
, device memory mode will be chosen accordingly - If a user provides a value for
flags
, additional CUDA flags will be passed to NVCC - In the absence of
mem
andflags
, the execution will choose DEFAULT for device memory mode and no additional flags will be passed
Flag | Definition |
---|---|
cuda_arch | - List of supported compute capabilities are provided here - Useful link for matching CUDA gencodes with NVIDIA architectures |
mem | Device memory mode: - DEFAULT allocate host and device memory pointers. - MANAGED use CUDA Managed Memory. - PAGEFAULT shared memory, only host pointers allocated |
flags | Extra flags to pass |
# Example 1: CUDA no mem and flags specified
$ spack install babelstream +cuda cuda_arch=<70>
# Example 2: for Nvidia GPU for Volta (sm_70)
$ spack install babelstream +cuda cuda_arch=<70> mem=<managed>
# Example 3: CUDA with mem and flags specified
$ spack install babelstream +cuda cuda_arch=<70> mem=<managed> flags=<CUDA_EXTRA_FLAGS>
- Kokkos implementation requires kokkos source folder to be provided because it builds it from the scratch
Flag | Definition |
---|---|
dir | Download the kokkos release from github repository ( https://github.com/kokkos/kokkos ) and extract the zip file to a directory you want and target this directory with dir flag |
backend | 2 different backend options: - cuda - omp |
cuda_arch | - List of supported compute capabilities are provided here - Useful link for matching CUDA gencodes with NVIDIA architectures |
# Example 1: No Backend option specified
$ spack install babelstream +kokkos dir=</home/user/Downloads/kokkos-x.x.xx>
# Example 2: CUDA backend
$ spack install babelstream +kokkos backend=cuda cuda_arch=70 dir=</home/user/Downloads/kokkos-x.x.xx>
# Example 3: OMP backend
$ spack install babelstream +kokkos backend=omp dir=</home/user/Downloads/kokkos-x.x.xx>
- Instructions for installing the intel compilers are provided here
Flag | Definition |
---|---|
implementation | 3 different implementation options: - OneAPI-ICPX - OneAPI-DPCPP - Compute-CPP |
# Example 1: No implementation option specified (build for OneAPI-ICPX)
$ spack install babelstream%oneapi +sycl2020
# Example 2: OneAPI-DPCPP implementation
$ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP
Flag | Definition |
---|---|
implementation | 2 different implementation options: - OneAPI-DPCPP - Compute-CPP |
# Example 1: OneAPI-DPCPP implementation
$ spack install babelstream +sycl2020 implementation=ONEAPI-DPCPP
- Target device selection process is automatic with 2 options:
- gpu : Globally set the target device to an NVIDIA GPU automatically if
cuda_arch
is specified - multicore : Globally set the target device to the host CPU automatically if
cpu_arch
is specified
- gpu : Globally set the target device to an NVIDIA GPU automatically if
Flag | Definition |
---|---|
cuda_arch | - List of supported compute capabilities are provided here - Useful link for matching CUDA gencodes with NVIDIA architectures |
CPU_ARCH | This sets the -tp (target processor) flag, possible values are: px - Generic x86 Processor bulldozer - AMD Bulldozer processor piledriver - AMD Piledriver processor zen - AMD Zen architecture (Epyc, Ryzen) zen2 - AMD Zen 2 architecture (Ryzen 2) sandybridge - Intel SandyBridge processor haswell - Intel Haswell processor knl - Intel Knights Landing processor skylake - Intel Skylake Xeon processor host - Link native version of HPC SDK cpu math library native - Alias for -tp host |
# Example 1: For GPU Run
$ spack install babelstream +acc cuda_arch=<70>
# Example 2: For Multicore CPU Run
$ spack install babelstream +acc cpu_arch=<bulldozer>
- RAJA implementation requires RAJA source folder to be provided because it builds it from the scratch
Flag | Definition |
---|---|
dir | Download the Raja release from github repository and extract the zip file to a directory you want and target this directory with dir flag |
backend | 2 different backend options: - cuda - omp |
offload | Choose offloading platform offload= [cpu]/[nvidia] |
# Example 1: For CPU offload with backend OMP
$ spack install babelstream +raja offload=cpu backend=omp dir=/home/dir/raja
# Example:
$ spack install babelstream +tbb
Flag | Definition |
---|---|
implementation | Choose one of the implementation for Thrust. Options are cuda and rocm |
backend | CUDA's Thrust implementation supports the following backends:- CUDA- OMP - TBB |
cuda_arch | - List of supported compute capabilities are provided here - Useful link for matching CUDA gencodes with NVIDIA architectures |
flags | Additional CUDA flags passed to nvcc, this is appended after CUDA_ARCH |
# Example1: CUDA implementation
$ spack install babelstream +thrust implementation=cuda backend=cuda cuda_arch=<70> flags=<option>
$ spack install babelstream +thrust implementation=cuda backend=omp cuda_arch=<70> flags=<option>
$ spack install babelstream +thrust implementation=cuda backend=tbb cuda_arch=<70> flags=<option>
# Example1: ROCM implementation
* spack install babelstream +thrust implementation=rocm backend=<option>