Query: QUDA Feature-SYCL branch #1332

Soujanyajanga · 2022-11-04T13:20:42Z

In the QUDA feature/sycl branch, is this SYCL backend fully functional.
Does it work on NVIDIA as well or is it intended only for INTEL architectures.
Please share the steps to excise tests on INTEL/NVIDIA platform.

jcosborn · 2022-11-04T14:22:46Z

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL
export CXX=dpcpp
o="$o -DCMAKE_INSTALL_PREFIX="
o="$o -DQUDA_FAST_COMPILE_REDUCE=ON"
o="$o -DQUDA_BUILD_SHAREDLIB=OFF"
cmake $o

make
make test

Soujanyajanga · 2022-11-07T06:26:23Z

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o

make make test

Was this SYCL backend tested with CLANG compiler.

jcosborn · 2022-11-16T15:55:03Z

I've only tested it with dpcpp/icpx.

Soujanyajanga · 2022-11-21T09:52:39Z

I've only tested it with dpcpp/icpx.

Following error is observed with latest code
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/inv_ca_gcr.cpp.o
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/dirac.cpp.o
clang-16: error: unknown argument: '-fhonor-nan-compares'
clang-16: error: unknown argument: '-fhonor-nan-compares'

Soujanyajanga · 2022-11-25T10:55:18Z

@jcosborn with latest intel LLVM compiler

Following error is observed with latest code
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/inv_ca_gcr.cpp.o
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/dirac.cpp.o
clang-16: error: unknown argument: '-fhonor-nan-compares'
clang-16: error: unknown argument: '-fhonor-nan-compares'

This error is from file "quda/lib/targets/sycl/target_sycl.cmake"
if("x${CMAKE_CXX_COMPILER_ID}" STREQUAL "xClang" OR
103 "x${CMAKE_CXX_COMPILER_ID}" STREQUAL "xIntelLLVM")
104 #target_compile_options(quda INTERFACE -fhonor-nan-compares)
105 #target_compile_options(quda PRIVATE -fhonor-nan-compares)
106 target_compile_options(quda PUBLIC -fhonor-nan-compares) >>>>>>>> as CLANG does not have support for this flag
107 target_compile_options(quda PUBLIC -Wno-tautological-constant-compare)

jcosborn · 2022-12-14T18:13:02Z

Thanks for reporting that. This is fixed now.
I have successfully tested it on Intel, but had issues on NVIDIA.

maddyscientist · 2022-12-14T18:13:50Z

@jcosborn what are the issues on NVIDIA?

jcosborn · 2022-12-14T19:15:06Z

I get a bunch of errors like:
ptxas error : Entry function '_ZTSZZN4quda6launchINS_9Kernel3DSINS_14dslash_functorENS_18dslash_functor_argINS_19domainWall4DFusedM5ENS_9packShmemELi2ELb0ELb1ELNS_10KernelTypeE5ENS_22DomainWall4DFusedM5ArgIsLi3ELi4EL21QudaReconstructType_s8ELNS_11Dslash5TypeE8EEEEELb0EEESB_EENSt9enable_ifIXntclsr6deviceE14use_kernel_argIT0_EEE11qudaError_tE4typeERKNS_12qudaStream_tERN4sycl3_V18nd_rangeILi3EEERKSE_ENKUlRNSM_7handlerEE_clEST_EUlNSM_7nd_itemILi3EEEE__with_offset' uses too much shared data (0x18000 bytes, 0xc000 max)

maddyscientist · 2022-12-14T19:21:50Z

Ok, it looks like you (or the SYCL backend) is using static shared memory as opposed to dynamic shared memory: the former has a limit of 48 KiB per thread block, the latter has a much larger limit (96 KiB on Volta, ~164 KiB on Ampere, ~228 KiB on Hopper). Is this something one has control of with SYCL on NVIDIA, or is it out of your hands?

jcosborn · 2022-12-14T19:24:17Z

I wasn't setting the compute capability before, I'm trying again with sm_80. I'm not sure what else I can change yet.

jxy · 2022-12-14T19:27:46Z

I though this line controls the size, no?

quda/include/targets/sycl/target_device.h

Line 196 in aa2ea41

constexpr int shared_memory_size() { return 32768; }

maddyscientist · 2022-12-14T19:37:19Z

@jcosborn the compute capability shouldn't matter here as the static limit is 48 KiB for all CUDA GPUs since Fermi (2010).
The fact that the compile throws this error indicates that static shared memory is being used as opposed to dynamic, and this is the first red flag here. For dynamic shared memory, the compiler doesn't know what the shared memory per block is so it can't throw an error like this.

At least with the CUDA target, with static shared memory, it doesn't surprise me an excess amount would be produced, as the SharedMemoryCacheHelper with a static allocation will request as much shared memory is required for the maximum block size (1024 threads).

jcosborn · 2022-12-14T20:56:29Z

Yes, it seems it will only use static shared memory:
intel/llvm#3329

I'll see what I can get to compile now, and look into setting a limit for it.

sy3394 · 2023-02-03T16:44:08Z

I have also several issues in compiling this branch of QUDA as well as some questions.

Questions:

Do you assume the user compiles this software using dpcpp, in particualr the one from oneAPI-2022.1.0?
I ask this question because some files include sycl/ext/oneapi/experimental/builtins.hpp, which can be found in the 2022 version of oneAPI distribution but not in the version 2021.2.0.
What are the command line options you used when installing oneAPI? I am wondering this because lib/targets/sycl/blas_lapack_mkl.cpp includes a file oneapi/mkl.hpp when QUDA_NATIVE_LAPACK is set True, which is the default. I assume this is part of oneAPI as the path contains oneapi. However, I was not able to locate this file in my oneAPI distribution.

There are some error massges when I try to compile QUDA of this branch.

lib/targets/sycl/device.cpp:105:91: error: 'max_work_item_sizes' does not name a template but is followed by template arguments. max_work_item_sizes is set in include/sycl/CL/sycl/info/info_desc.hpp from oneAPI to be max_work_item_sizes = CL_DEVICE_MAX_WORK_ITEM_SIZES. In turn, CL_DEVICE_MAX_WORK_ITEM_SIZES is set in include/sycl/CL/cl.h using #define. I'm not sure why I got this error. Is this due to incorrect installation of oneAPI or some missing command line argument for cmake when compiling QUDA?
There are other errors like the one above such as
lib/targets/sycl/device.cpp:81:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
These seem to suggest that I use sycl version or implementation different from what is assumed to be used for this branch of QUDA.

The list of similar errors

/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:81:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:105:110: error: invalid operands to binary expression ('sycl::info::device' and 'int')
    printfQuda("  Max work item sizes: %s\n", str(myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>()).c_str());
                                                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:105:91: error: 'max_work_item_sizes' does not name a template but is followed by template arguments
    printfQuda("  Max work item sizes: %s\n", str(myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>()).c_str());
                                                                                        ^                  ~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/info/info_desc.hpp:55:3: note: non-template declaration found by name lookup
max_work_item_sizes = CL_DEVICE_MAX_WORK_ITEM_SIZES,
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:146:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:154:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:161:17: error: no namespace named 'device' in namespace 'sycl::info'; did you mean simply 'device'?
      namespace id = sycl::info::device;
                     ^~~~~~~~~~~~~~~~~~
                     device
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:59:13: note: namespace 'device' defined here
namespace device
          ^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:164:81: error: no member named 'name' in namespace 'quda::device'
      printfQuda("%d - name:                    %s\n", device, d.get_info<id::name>().c_str());
                                                                          ~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:364:75: error: invalid operands to binary expression ('sycl::info::device' and 'int')
    auto val = myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>();
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:364:56: error: 'max_work_item

jcosborn · 2023-02-03T16:56:55Z

Yes, it generally requires the latest version of oneAPI (or intel-llvm). I'm currently testing with 2023.0.0. The issues you are seeing are due to differences in the older version of oneAPI.

sy3394 · 2023-02-06T13:59:22Z

Thank you for your prompt reply. I will install the new version and try it out.

Meanwhile, I have another simple question. I am trying to compile QUDA targeting SYCL because I want to use QUDA in the enviornment possibly without GPUs for testing purposes. Performance is not my main concern. I just need to run QUDA wihout GPUs. I assume this branch of QUDA works on CPUs. Am I correct?

jcosborn · 2023-02-06T19:42:40Z

Yes, it works with the opencl:cpu backend, though performance isn't very good.

li12242 · 2024-03-16T09:39:01Z

@jcosborn I have tried the following compiling commands, but have encountered some errors when linking.

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o

make make test

The following errors occur at linking. The compiler I used is icpx-2023.2.4, together with OneMKL-2023.0.0 version.

[ 82%] Linking CXX executable gauge_alg_test
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
../lib/libquda.a: error adding symbols: File format not recognized
icpx: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [tests/CMakeFiles/gauge_alg_test.dir/build.make:99: tests/gauge_alg_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:1088: tests/CMakeFiles/gauge_alg_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

li12242 · 2024-03-18T03:07:34Z

@jcosborn I have tried the following compiling commands, but have encountered some errors when linking.

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.
export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o
make make test

The following errors occur at linking. The compiler I used is icpx-2023.2.4, together with OneMKL-2023.0.0 version.
[ 82%] Linking CXX executable gauge_alg_test
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
../lib/libquda.a: error adding symbols: File format not recognized
icpx: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [tests/CMakeFiles/gauge_alg_test.dir/build.make:99: tests/gauge_alg_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:1088: tests/CMakeFiles/gauge_alg_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Sorry for the mistakes. I updated the binutils tools and the errors are disappeared.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query: QUDA Feature-SYCL branch #1332

Query: QUDA Feature-SYCL branch #1332

Soujanyajanga commented Nov 4, 2022

jcosborn commented Nov 4, 2022

Soujanyajanga commented Nov 7, 2022

jcosborn commented Nov 16, 2022

Soujanyajanga commented Nov 21, 2022

Soujanyajanga commented Nov 25, 2022

jcosborn commented Dec 14, 2022

maddyscientist commented Dec 14, 2022

jcosborn commented Dec 14, 2022

maddyscientist commented Dec 14, 2022 •

edited

Loading

jcosborn commented Dec 14, 2022

jxy commented Dec 14, 2022

maddyscientist commented Dec 14, 2022 •

edited

Loading

jcosborn commented Dec 14, 2022

sy3394 commented Feb 3, 2023

jcosborn commented Feb 3, 2023

sy3394 commented Feb 6, 2023

jcosborn commented Feb 6, 2023

li12242 commented Mar 16, 2024 •

edited

Loading

li12242 commented Mar 18, 2024

Query: QUDA Feature-SYCL branch #1332

Query: QUDA Feature-SYCL branch #1332

Comments

Soujanyajanga commented Nov 4, 2022

jcosborn commented Nov 4, 2022

Soujanyajanga commented Nov 7, 2022

jcosborn commented Nov 16, 2022

Soujanyajanga commented Nov 21, 2022

Soujanyajanga commented Nov 25, 2022

jcosborn commented Dec 14, 2022

maddyscientist commented Dec 14, 2022

jcosborn commented Dec 14, 2022

maddyscientist commented Dec 14, 2022 • edited Loading

jcosborn commented Dec 14, 2022

jxy commented Dec 14, 2022

maddyscientist commented Dec 14, 2022 • edited Loading

jcosborn commented Dec 14, 2022

sy3394 commented Feb 3, 2023

jcosborn commented Feb 3, 2023

sy3394 commented Feb 6, 2023

jcosborn commented Feb 6, 2023

li12242 commented Mar 16, 2024 • edited Loading

li12242 commented Mar 18, 2024

maddyscientist commented Dec 14, 2022 •

edited

Loading

maddyscientist commented Dec 14, 2022 •

edited

Loading

li12242 commented Mar 16, 2024 •

edited

Loading