-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences between alpaka, Kokkos, RAJA, etc. #1764
Comments
Hi @etiennemlb, I'll try to answer your question. Programming modelsAll of the mentioned projects aim for the same goal of performance-portable single-source programming. But they take different paths to achieve it (namely their programming models). RAJA parallelizes through a series of loop transformations which are mapped to the underlying hardware. Kokkos offers you a alpaka gives you the tools to write any portable kernel. However, it is your job as programmer to fill in the details of the algorithm. If you are interested in a more high-level approach of coding I recommend to take a look at our vikunja project (which is based on alpaka): https://github.com/alpaka-group/vikunja Other abstractionsBoth Kokkos and RAJA try to abstract away from the gritty details of device management, memory management, and so on. alpaka always gives you full control over anything happening in the program. You decide when buffer allocations, offloading to devices, ... happen. Thanks to alpaka's design it is also user-extensible. For example, if you don't like the way our device queue implementation works you can provide your own. If it fulfills all requirements mandated by our internal concepts it will integrate nicely with the other alpaka utilities. PerformanceSince all projects are tailored for the HPC crowd and are maintained by HPC experts they are usually in the same ballpark of performance. They are also similar (to each other and to native programming models) in their capabilities, i.e. what can be expressed in code and mapped to hardware. For a recent study of alpaka vs other programming models see here: https://link.springer.com/chapter/10.1007/978-3-031-10419-0_6 Most of alpaka's abstractions are resolved during compile-time (thus they don't result in runtime overhead). You can therefore assume that alpaka offers you a level of performance very close to the native models like CUDA since the generated machine code is very similar. Other aspectsWe don't use Kokkos and RAJA very often so I won't comment on their downsides. Please get in touch with their respective developers - they know their strengths and weaknesses much better than we do. alpaka is somewhat verbose (compared to the other programming models). This is not because we are bad API designers but because alpaka is very customizable and offers a lot of control for the user. It requires a certain familiarity with modern C++, though, and you shouldn't be afraid of using C++ templates. OpenCL and SYCLThese are somewhat different. Both are "just" API specifications provided by an industry consortium. Industry players (hardware vendors and sometimes third parties) need to provide an implementation suitable for a specific set of hardware. The degree of support varies across vendors; sometimes they don't (yet) support a newer revision of the standard, sometimes they rely heavily on their own extensions for performance. This makes true portability hard to achieve in practice unless you restrict yourself to the lowest common denominator (or different code paths for different runtimes). In addition, OpenCL is a split-source language (all of the others are single-source C++ APIs): Your host program is coded in C, C++ or another language while the device code is written in the OpenCL C(++) dialect and requires separate compilation at some point. I hope that cleared things up for you. |
Hi @etiennemlb, as an alpaka user and sometimes contributor, let me add the reasons the CMS Collaboration has decided to adopt Alpaka rather than Kokkos or SYCL/oneAPI as a perfomance portability solution for the next 3-5 years. Programming ModelIn our experience, programming with Alpaka closer to using CUDA than Kokkos and SYCL. Kokkos strongly advices you to use its abstractions, which may or may not map well to the algorithms and code base at hand. PerformanceFor us it was essential to achieve near-native performance on CPUs and NVIDIA GPUs. MaturityAfter an year-long investigation we concluded that Alpaka and Kokkos are well-estabilished products, while oneAPI was still kind of a work in progress, and the other SYCL backends were even less ready, especially for targeting NVIDIA or AMD hardware. We will continue to monitor their progress, of course. Single binary, multiple backendsOur software distribution model greatly benefits from being able to ship a single binary that can target multiple backends (e.g. CPUs, NVIDIA and AMD GPUs) at runtime. We could achieve this using native CUDA and ROCm, and using Alpaka. |
Hi, |
Thanks for asking this @etiennemlb. I'm pinning this issue for the time being since it serves nicely as a form of documentation. |
Hi,
I'm looking at different flavors of "Abstraction Library for Parallel Kernel Acceleration".
How is Alpaka different from Sycl, Kokkos, Raja or OpenCL ? Pros, cons.
Thanks
The text was updated successfully, but these errors were encountered: