Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: QUDA Feature-SYCL branch #1332

Open
Soujanyajanga opened this issue Nov 4, 2022 · 19 comments
Open

Query: QUDA Feature-SYCL branch #1332

Soujanyajanga opened this issue Nov 4, 2022 · 19 comments

Comments

@Soujanyajanga
Copy link

In the QUDA feature/sycl branch, is this SYCL backend fully functional.
Does it work on NVIDIA as well or is it intended only for INTEL architectures.
Please share the steps to excise tests on INTEL/NVIDIA platform.

@jcosborn
Copy link
Contributor

jcosborn commented Nov 4, 2022

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL
export CXX=dpcpp
o="$o -DCMAKE_INSTALL_PREFIX="
o="$o -DQUDA_FAST_COMPILE_REDUCE=ON"
o="$o -DQUDA_BUILD_SHAREDLIB=OFF"
cmake $o

make
make test

@Soujanyajanga
Copy link
Author

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o

make make test

Was this SYCL backend tested with CLANG compiler.

@jcosborn
Copy link
Contributor

I've only tested it with dpcpp/icpx.

@Soujanyajanga
Copy link
Author

I've only tested it with dpcpp/icpx.

Following error is observed with latest code
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/inv_ca_gcr.cpp.o
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/dirac.cpp.o
clang-16: error: unknown argument: '-fhonor-nan-compares'
clang-16: error: unknown argument: '-fhonor-nan-compares'

@Soujanyajanga
Copy link
Author

@jcosborn with latest intel LLVM compiler

Following error is observed with latest code
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/inv_ca_gcr.cpp.o
[ 12%] Building CXX object lib/CMakeFiles/quda_cpp.dir/dirac.cpp.o
clang-16: error: unknown argument: '-fhonor-nan-compares'
clang-16: error: unknown argument: '-fhonor-nan-compares'

This error is from file "quda/lib/targets/sycl/target_sycl.cmake"
if("x${CMAKE_CXX_COMPILER_ID}" STREQUAL "xClang" OR
103 "x${CMAKE_CXX_COMPILER_ID}" STREQUAL "xIntelLLVM")
104 #target_compile_options(quda INTERFACE -fhonor-nan-compares)
105 #target_compile_options(quda PRIVATE -fhonor-nan-compares)
106 target_compile_options(quda PUBLIC -fhonor-nan-compares) >>>>>>>> as CLANG does not have support for this flag
107 target_compile_options(quda PUBLIC -Wno-tautological-constant-compare)

@jcosborn
Copy link
Contributor

Thanks for reporting that. This is fixed now.
I have successfully tested it on Intel, but had issues on NVIDIA.

@maddyscientist
Copy link
Member

@jcosborn what are the issues on NVIDIA?

@jcosborn
Copy link
Contributor

I get a bunch of errors like:
ptxas error : Entry function '_ZTSZZN4quda6launchINS_9Kernel3DSINS_14dslash_functorENS_18dslash_functor_argINS_19domainWall4DFusedM5ENS_9packShmemELi2ELb0ELb1ELNS_10KernelTypeE5ENS_22DomainWall4DFusedM5ArgIsLi3ELi4EL21QudaReconstructType_s8ELNS_11Dslash5TypeE8EEEEELb0EEESB_EENSt9enable_ifIXntclsr6deviceE14use_kernel_argIT0_EEE11qudaError_tE4typeERKNS_12qudaStream_tERN4sycl3_V18nd_rangeILi3EEERKSE_ENKUlRNSM_7handlerEE_clEST_EUlNSM_7nd_itemILi3EEEE__with_offset' uses too much shared data (0x18000 bytes, 0xc000 max)

@maddyscientist
Copy link
Member

maddyscientist commented Dec 14, 2022

Ok, it looks like you (or the SYCL backend) is using static shared memory as opposed to dynamic shared memory: the former has a limit of 48 KiB per thread block, the latter has a much larger limit (96 KiB on Volta, ~164 KiB on Ampere, ~228 KiB on Hopper). Is this something one has control of with SYCL on NVIDIA, or is it out of your hands?

@jcosborn
Copy link
Contributor

I wasn't setting the compute capability before, I'm trying again with sm_80. I'm not sure what else I can change yet.

@jxy
Copy link
Contributor

jxy commented Dec 14, 2022

I though this line controls the size, no?

constexpr int shared_memory_size() { return 32768; }

@maddyscientist
Copy link
Member

maddyscientist commented Dec 14, 2022

@jcosborn the compute capability shouldn't matter here as the static limit is 48 KiB for all CUDA GPUs since Fermi (2010).
The fact that the compile throws this error indicates that static shared memory is being used as opposed to dynamic, and this is the first red flag here. For dynamic shared memory, the compiler doesn't know what the shared memory per block is so it can't throw an error like this.

At least with the CUDA target, with static shared memory, it doesn't surprise me an excess amount would be produced, as the SharedMemoryCacheHelper with a static allocation will request as much shared memory is required for the maximum block size (1024 threads).

@jcosborn
Copy link
Contributor

Yes, it seems it will only use static shared memory:
intel/llvm#3329

I'll see what I can get to compile now, and look into setting a limit for it.

@sy3394
Copy link

sy3394 commented Feb 3, 2023

I have also several issues in compiling this branch of QUDA as well as some questions.

Questions:

  1. Do you assume the user compiles this software using dpcpp, in particualr the one from oneAPI-2022.1.0?
    I ask this question because some files include sycl/ext/oneapi/experimental/builtins.hpp, which can be found in the 2022 version of oneAPI distribution but not in the version 2021.2.0.
  2. What are the command line options you used when installing oneAPI? I am wondering this because lib/targets/sycl/blas_lapack_mkl.cpp includes a file oneapi/mkl.hpp when QUDA_NATIVE_LAPACK is set True, which is the default. I assume this is part of oneAPI as the path contains oneapi. However, I was not able to locate this file in my oneAPI distribution.

There are some error massges when I try to compile QUDA of this branch.

  1. lib/targets/sycl/device.cpp:105:91: error: 'max_work_item_sizes' does not name a template but is followed by template arguments. max_work_item_sizes is set in include/sycl/CL/sycl/info/info_desc.hpp from oneAPI to be max_work_item_sizes = CL_DEVICE_MAX_WORK_ITEM_SIZES. In turn, CL_DEVICE_MAX_WORK_ITEM_SIZES is set in include/sycl/CL/cl.h using #define. I'm not sure why I got this error. Is this due to incorrect installation of oneAPI or some missing command line argument for cmake when compiling QUDA?
  2. There are other errors like the one above such as
    lib/targets/sycl/device.cpp:81:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    These seem to suggest that I use sycl version or implementation different from what is assumed to be used for this branch of QUDA.
The list of similar errors
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:81:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:105:110: error: invalid operands to binary expression ('sycl::info::device' and 'int')
    printfQuda("  Max work item sizes: %s\n", str(myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>()).c_str());
                                                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:105:91: error: 'max_work_item_sizes' does not name a template but is followed by template arguments
    printfQuda("  Max work item sizes: %s\n", str(myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>()).c_str());
                                                                                        ^                  ~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/info/info_desc.hpp:55:3: note: non-template declaration found by name lookup
max_work_item_sizes = CL_DEVICE_MAX_WORK_ITEM_SIZES,
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:146:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:154:16: error: no matching conversion for functional-style cast from 'int (*)(const sycl::device &)' to 'sycl::platform'
    auto p = sycl::platform(mySelector);
             ^~~~~~~~~~~~~~~~~~~~~~~~~
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:56:12: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::device_selector' for 1st argument
explicit platform(const device_selector &DeviceSelector);
         ^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:58:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'const sycl::platform' for 1st argument
platform(const platform &rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:60:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'sycl::platform' for 1st argument
platform(platform &&rhs) = default;
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:148:3: note: candidate constructor not viable: no known conversion from 'int (*)(const sycl::device &)' to 'std::shared_ptr<detail::platform_impl>' for 1st argument
platform(std::shared_ptr<detail::platform_impl> impl) : impl(impl) {}
^
/onyx/buildsets/eb_cyclamen/software/intel-compilers/2022.1.0/compiler/2022.1.0/linux/include/sycl/CL/sycl/platform.hpp:37:3: note: candidate constructor not viable: requires 0 arguments, but 1 was provided
platform();
^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:161:17: error: no namespace named 'device' in namespace 'sycl::info'; did you mean simply 'device'?
      namespace id = sycl::info::device;
                     ^~~~~~~~~~~~~~~~~~
                     device
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:59:13: note: namespace 'device' defined here
namespace device
          ^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:164:81: error: no member named 'name' in namespace 'quda::device'
      printfQuda("%d - name:                    %s\n", device, d.get_info<id::name>().c_str());
                                                                          ~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/../include/util_quda.h:91:29: note: expanded from macro 'printfQuda'
sprintf(getPrintBuffer(), __VA_ARGS__);              \
                          ^~~~~~~~~~~
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:364:75: error: invalid operands to binary expression ('sycl::info::device' and 'int')
    auto val = myDevice.get_info<sycl::info::device::max_work_item_sizes<3>>();
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
/cyclamen/home/syamamoto/src/quda_sycl/lib/targets/sycl/device.cpp:364:56: error: 'max_work_item

@jcosborn
Copy link
Contributor

jcosborn commented Feb 3, 2023

Yes, it generally requires the latest version of oneAPI (or intel-llvm). I'm currently testing with 2023.0.0. The issues you are seeing are due to differences in the older version of oneAPI.

@sy3394
Copy link

sy3394 commented Feb 6, 2023

Thank you for your prompt reply. I will install the new version and try it out.

Meanwhile, I have another simple question. I am trying to compile QUDA targeting SYCL because I want to use QUDA in the enviornment possibly without GPUs for testing purposes. Performance is not my main concern. I just need to run QUDA wihout GPUs. I assume this branch of QUDA works on CPUs. Am I correct?

@jcosborn
Copy link
Contributor

jcosborn commented Feb 6, 2023

Yes, it works with the opencl:cpu backend, though performance isn't very good.

@li12242
Copy link

li12242 commented Mar 16, 2024

@jcosborn I have tried the following compiling commands, but have encountered some errors when linking.

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.

export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o

make make test

The following errors occur at linking. The compiler I used is icpx-2023.2.4, together with OneMKL-2023.0.0 version.

[ 82%] Linking CXX executable gauge_alg_test
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
../lib/libquda.a: error adding symbols: File format not recognized
icpx: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [tests/CMakeFiles/gauge_alg_test.dir/build.make:99: tests/gauge_alg_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:1088: tests/CMakeFiles/gauge_alg_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

@li12242
Copy link

li12242 commented Mar 18, 2024

@jcosborn I have tried the following compiling commands, but have encountered some errors when linking.

It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below.
export QUDA_TARGET=SYCL export CXX=dpcpp o="$o -DCMAKE_INSTALL_PREFIX=" o="$o -DQUDA_FAST_COMPILE_REDUCE=ON" o="$o -DQUDA_BUILD_SHAREDLIB=OFF" cmake $o
make make test

The following errors occur at linking. The compiler I used is icpx-2023.2.4, together with OneMKL-2023.0.0 version.

[ 82%] Linking CXX executable gauge_alg_test
icpx: warning: use of 'dpcpp' is deprecated and will be removed in a future release. Use 'icpx -fsycl' [-Wdeprecated]
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
/usr/bin/ld: ../lib/libquda.a(timer.cpp.o): unable to initialize decompress status for section .debug_loc
../lib/libquda.a: error adding symbols: File format not recognized
icpx: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [tests/CMakeFiles/gauge_alg_test.dir/build.make:99: tests/gauge_alg_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:1088: tests/CMakeFiles/gauge_alg_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Sorry for the mistakes. I updated the binutils tools and the errors are disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants