-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: QUDA Feature-SYCL branch #1332
Comments
It is essentially fully functional. Depending on which version of oneapi and hardware you run with there may be some issues though. It requires Intel SYCL since it uses some Intel extensions. I've only tried it on Intel hardware, but it might run with the CUDA backend for Intel LLVM as well. Note that there are some changes to follow the SYCL 2020 spec that are in the upstream Intel LLVM repo which I haven't updated the code for yet. It should work with the current public oneapi release though. An example build and test commands (which will need updating soon) are below. export QUDA_TARGET=SYCL make |
Was this SYCL backend tested with CLANG compiler. |
I've only tested it with dpcpp/icpx. |
Following error is observed with latest code |
@jcosborn with latest intel LLVM compiler Following error is observed with latest code This error is from file "quda/lib/targets/sycl/target_sycl.cmake" |
Thanks for reporting that. This is fixed now. |
@jcosborn what are the issues on NVIDIA? |
I get a bunch of errors like: |
Ok, it looks like you (or the SYCL backend) is using static shared memory as opposed to dynamic shared memory: the former has a limit of 48 KiB per thread block, the latter has a much larger limit (96 KiB on Volta, ~164 KiB on Ampere, ~228 KiB on Hopper). Is this something one has control of with SYCL on NVIDIA, or is it out of your hands? |
I wasn't setting the compute capability before, I'm trying again with sm_80. I'm not sure what else I can change yet. |
I though this line controls the size, no? quda/include/targets/sycl/target_device.h Line 196 in aa2ea41
|
@jcosborn the compute capability shouldn't matter here as the static limit is 48 KiB for all CUDA GPUs since Fermi (2010). At least with the CUDA target, with static shared memory, it doesn't surprise me an excess amount would be produced, as the |
Yes, it seems it will only use static shared memory: I'll see what I can get to compile now, and look into setting a limit for it. |
I have also several issues in compiling this branch of QUDA as well as some questions. Questions:
There are some error massges when I try to compile QUDA of this branch.
The list of similar errors
|
Yes, it generally requires the latest version of oneAPI (or intel-llvm). I'm currently testing with 2023.0.0. The issues you are seeing are due to differences in the older version of oneAPI. |
Thank you for your prompt reply. I will install the new version and try it out. Meanwhile, I have another simple question. I am trying to compile QUDA targeting SYCL because I want to use QUDA in the enviornment possibly without GPUs for testing purposes. Performance is not my main concern. I just need to run QUDA wihout GPUs. I assume this branch of QUDA works on CPUs. Am I correct? |
Yes, it works with the opencl:cpu backend, though performance isn't very good. |
@jcosborn I have tried the following compiling commands, but have encountered some errors when linking.
The following errors occur at linking. The compiler I used is icpx-2023.2.4, together with OneMKL-2023.0.0 version.
|
Sorry for the mistakes. I updated the binutils tools and the errors are disappeared. |
In the QUDA feature/sycl branch, is this SYCL backend fully functional.
Does it work on NVIDIA as well or is it intended only for INTEL architectures.
Please share the steps to excise tests on INTEL/NVIDIA platform.
The text was updated successfully, but these errors were encountered: