-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for device and constant global variables in the SYCL backend #2242
Add support for device and constant global variables in the SYCL backend #2242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for this work !
As it introduces a breaking changes to the API, I would ask for enough time to consider all impacts before merging the changes.
//! | ||
//! struct DeviceMemoryKernel | ||
//! { | ||
//! ALPAKA_NO_HOST_ACC_WARNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ALPAKA_NO_HOST_ACC_WARNING
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! | ||
//! struct DeviceMemoryKernel | ||
//! { | ||
//! ALPAKA_NO_HOST_ACC_WARNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ALPAKA_NO_HOST_ACC_WARNING
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
include/alpaka/mem/view/Traits.hpp
Outdated
/* Copyright 2023 Axel Hübl, Benjamin Worpitz, Matthias Werner, Andrea Bocci, Jan Stephan, Bernhard Manfred Gruber, | ||
* Aurora Perego |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Copyright 2023 Axel Hübl, Benjamin Worpitz, Matthias Werner, Andrea Bocci, Jan Stephan, Bernhard Manfred Gruber, | |
* Aurora Perego | |
/* Copyright 2024 Axel Hübl, Benjamin Worpitz, Matthias Werner, Andrea Bocci, Jan Stephan, Bernhard Manfred Gruber, | |
* Aurora Perego |
d596e0b
to
81aa05e
Compare
|
||
namespace alpaka | ||
{ | ||
using sycl::ext::oneapi::experimental::device_global; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes device_global
visible to all in the alpaka
namespace, which may not be a good idea.
Could you move it at least inside the detail
namespace (and use it as detail::device_global
below) ?
Or, just use it fully expanded everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used it fully expanded
template<typename T> | ||
struct DevGlobalTrait<TagGpuHipRt, T> | ||
{ | ||
// CUDA implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CUDA implementation | |
// HIP/ROCm implementation |
|
||
template<typename TAcc, typename TViewSrc, typename TViewDstFwd, typename TQueue> | ||
ALPAKA_FN_HOST auto memcpy( | ||
TQueue& queue, | ||
alpaka::detail::DevGlobalImplGeneric<TAcc, TViewDstFwd>& viewDst, | ||
TViewSrc const& viewSrc) -> void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems too generic, as in principle it can match also for a CUDA DevGlobalImplGeneric
.
Also, TAcc
should be TTag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a constrain that TAcc
(actually TTag
) is one of TagCpuOmp2Blocks
, TagCpuOmp2Threads
, TagCpuSerial
, TagCpuTbbBlocks
, TagCpuThreads
?
There may be a smarter way, but at least this should avoid the wrong matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, IIUC TViewDstFwd
here is not really a View
, but the underlying type of the global variable.
If that's the case, could you rename it to TType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a constrain that
TAcc
(actuallyTTag
) is one ofTagCpuOmp2Blocks
,TagCpuOmp2Threads
,TagCpuSerial
,TagCpuTbbBlocks
,TagCpuThreads
?
ok, I'll try (and I'll probably also have to prevent the test from running on AccCpuSerial
when it is compiled with the flags to enable CUDA/HIP/SYCL)
By the way, IIUC
TViewDstFwd
here is not really aView
, but the underlying type of the global variable. If that's the case, could you rename it toTType
?
You are right, I'll change it
template<typename TAcc, typename TApi, bool TBlocking, typename TViewDst, typename TViewSrc> | ||
ALPAKA_FN_HOST auto memcpy( | ||
uniform_cuda_hip::detail::QueueUniformCudaHipRt<TApi, TBlocking>& queue, | ||
TViewDst& viewDst, | ||
alpaka::detail::DevGlobalImplGeneric<TAcc, TViewSrc>& viewSrc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TAcc
should be TTag
, it should somehoe match the TApi
(we don't want to use CUDA on a HIP/ROCm global variable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there is a map between TTag
and TApi
, but I'll look into it
81aa05e
to
315a980
Compare
968f567
to
ec8cb63
Compare
ec8cb63
to
8071966
Compare
Device global variables in oneAPI versions Now the only failures in the tests are some warnings (that become errors when compiling the tests) in the SYCL |
@psychocoderHPC can we remove |
offline discussed: yes we can remove the options if nessesary |
alpaka::memcpy( | ||
queueAcc, | ||
bufHost2, | ||
g_globalMemory2DUninitialized<typename alpaka::trait::AccToTag<Acc>::type>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer passing the Acc
type directly instead using the type trait alpaka::trait::AccToTag
. I think the common case is, that we have the Acc
type available and not the tag type, therefore we would have a lot of copy/past code.
Depending on if it makes sense to provide the possibility to use tags in the user API, you can ether move typename alpaka::trait::AccToTag<Acc>::type
to the implementation (no support for tags in the user API) or do some kind of overload and check if the given type is a alpaka accelerator type or tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved it to the implementation
template<typename TAcc, typename T>
using DevGlobal = typename detail::DevGlobalTrait<typename alpaka::trait::AccToTag<TAcc>::type, T>::Type;
@alpaka-group/alpaka-maintainers IMO we can merge this PR. Are there any voices against merging it? |
OK for me. Any further developments can happen in follow up PRs. |
@@ -16,92 +16,101 @@ using Elem = std::uint32_t; | |||
using Dim = alpaka::DimInt<2u>; | |||
using Idx = std::uint32_t; | |||
|
|||
#if !defined(ALPAKA_ACC_SYCL_ENABLED) | |||
ALPAKA_STATIC_ACC_MEM_GLOBAL alpaka::DevGlobal<TAcc, Elem[3][2]> g_globalMemory2DUninitialized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AuroraPerego We need to review this change again because of the bug shown #2259
In the original code we had constant global memory only and in the new code we have global memory and constant memory.
With your changes is there a difference between ALPAKA_STATIC_ACC_MEM_GLOBAL
and ALPAKA_STATIC_ACC_MEM_CONSTANT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO in this section we should only use ALPAKA_STATIC_ACC_MEM_CONSTANT
and later we have a test for ALPAKA_STATIC_ACC_MEM_GLOBAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original code we had constant global memory only and in the new code we have global memory and constant memory.
They were different also in the original code, with ALPAKA_STATIC_ACC_MEM_GLOBAL
expanding to __device__
and ALPAKA_STATIC_ACC_MEM_CONSTANT
expanding to __constant__
.
With your changes is there a difference between ALPAKA_STATIC_ACC_MEM_GLOBAL and ALPAKA_STATIC_ACC_MEM_CONSTANT?
Yes, they are using the __device__
and __constant__
attributes respectively (plus the inline
attribute)
IMO in this section we should only use
ALPAKA_STATIC_ACC_MEM_CONSTANT
and later we have a test forALPAKA_STATIC_ACC_MEM_GLOBAL
ok, but why? we were testing both here also before this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but why? we were testing both here also before this PR
This is a good question.
What if we define ALPAKA_STATIC_ACC_MEM_GLOBAL
static instead of inline
this should create a single instance too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it compiles and also solves the issue in #2259, I can make a PR with the change if you think it's correct (I don't know enough about static
, inline
and extern
to be sure of that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, what macros should be used to support all use cases ?
file1.cu
static __device__ int i = 0;
and
file2.h
extern __device__ int i;
file2.cu
#include "file2.h"
__device__ int i = 0;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, maybe static
is not what we want. The ideal would be to make the extern
keyword work also with the new implementation to allow the file2.h
/file2.cu
case, which IIUC is not possible with static
.
This tries to fix #2070.
This implementation changes the current API.
The declaration of device memory is done with the macro
ALPAKA_STATIC_ACC_MEM_GLOBAL(type, name)
that wraps the CUDA/HIP/serial variables in a structalpaka::DevGlobal
and declares asycl::ext::oneapi::experimental::device_global<type>
for SYCL. Theinline
attribute is used to ensure that only one instance of that variable exists across different translation units.name
is used for thememcpy
, whilename.get()
must be used in the kernel to align with the behavior of the SYCL backend.The
memcpy
has been specialized for the device global variables.The test with the SYCL backend failed with the original
KernelExecutionFixture
because it creates a newqueue
instead of using the one used for thememcpy
. I added a constructor that takes in input also thequeue
.Another issue with the test is that being compiled with the flags to enable the SYCL backend, the macro for the device global variable expands to the SYCL one (
sycl::ext::oneapi::experimental::device_global<type>
) and therefore it fails when running on theAccCpuSerial
. I have disabled this accelerator just for this test.Note that this happens also with the CUDA backend (with the macro expanded to
__device__ alpaka::DevGlobal<type> name
), but for some strange reason it works on the serial backend.Thanks to @fwyzard for the help :)