You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 28, 2021. It is now read-only.
felipe zapata edited this page Oct 16, 2019
·
1 revision
When the USE_CUDA flag is set to true for compiling and running the XTP functionality, the library will distribute some matrix multiplication over the Nvidia GPU and the available CPUs.
If the MKL module has not been activated during the compilation you may notice that performance degrades as more OpenMP threads are used. This issue is solved by pinning the OpenMP threads to a CPU. If you are using GCC then you can use the GOMP_CPU_affinity environmental variable to pin the CPUs. If you are using the Intel compiler, have a look at the KMP_AFFINITY environmental variable.