Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Transfer host variable to sycl kernel to avoid using unsupported memory capabilities. #930

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions tests/atomic_fence/atomic_fence.cpp
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ class run_atomic_fence {
" and test_type = " + test_type_name)
.create()) {
auto queue = once_per_unit::get_queue();
// Early return for unsupported memory order or memory scope.
if (!check_memory_order_scope_capabilities(queue, MemoryOrder,
MemoryScope, memory_order_name,
memory_scope_name)) {
Expand Down Expand Up @@ -226,11 +227,32 @@ class run_atomic_fence {
sycl::buffer<bool> res_buf(&res, sycl::range<1>(1));
sycl::buffer<int> sync_buffer(&sync, sycl::range<1>(1));
sycl::buffer<int> data_buffer(&data, sycl::range<1>(1));
// Using the const host variable MemoryScope in the kernel directly
// may cause compile fail for AOT build. We transfer MemoryScope to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a SYCL limitation or a compiler bug?
In https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:language.restrictions.kernels:

Variables with static storage duration that are odr-used inside a device function, must be either const or constexpr, and must also be either zero-initialized or constant-initialized.
So, why not using directly MemoryOrder instead of intermediate order_write and so on?

Copy link
Author

@haonanya haonanya Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @keryell , not a SYCL limitation or a compiler bug. The early return in the host code occurs in the run time. For aot compile, use MemoryOrder directly may cause compile failure in compile time if MemoryOrder is not supported by device. So the test may fail in build stage and fail to generate a binary to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem that the test tries to use an atomic fence order or scope that is not supported by the device? If that is the case, the test should use info::device::atomic_fence_order_capabilities and info::device::atomic_fence_scope_capabilities to determine the supported order and scope values, and it should avoid submitting the kernel to the device if the device doesn't support that fence order or scope.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck, yes, there is early return https://github.com/KhronosGroup/SYCL-CTS/blob/SYCL-2020/tests/atomic_fence/atomic_fence.cpp#L197 in the original tests, the check works in the run time, but for aot compile it may compile fail in device build stage, so the test may fail to build final binary.

// device code and in final runtime the device code would not run if
// host code early return for unsupported memory capabilities.
sycl::buffer<sycl::memory_scope> memory_scope_buffer(&MemoryScope,
sycl::range<1>(1));
// Transfer order_write to device code.
sycl::buffer<sycl::memory_order> memory_order_write_buffer(
&order_write, sycl::range<1>(1));
// Transfer order_read to device code.
sycl::buffer<sycl::memory_order> memory_order_read_buffer(
&order_read, sycl::range<1>(1));
queue.submit([&](sycl::handler& cgh) {
auto res_acc =
res_buf.template get_access<sycl::access_mode::write>(cgh);
auto sync_flag_acc = get_accessor(cgh, sync_buffer);
auto data_acc = get_accessor(cgh, data_buffer);
auto memory_scope_acc =
memory_scope_buffer.template get_access<sycl::access_mode::read>(
cgh);
auto memory_order_write_acc =
memory_order_write_buffer
.template get_access<sycl::access_mode::read>(cgh);
auto memory_order_read_acc =
memory_order_read_buffer
.template get_access<sycl::access_mode::read>(cgh);
cgh.parallel_for(sycl::nd_range<1>(global_range, local_range),
[=](sycl::nd_item<1> nditem) {
auto g = nditem.get_group();
Expand All @@ -246,7 +268,8 @@ class run_atomic_fence {
*data = value;
// Used atomic_fence to guarantee the order
// instructions execution
sycl::atomic_fence(order_write, MemoryScope);
sycl::atomic_fence(memory_order_write_acc[0],
memory_scope_acc[0]);
// Used atomic sync flag to avoid data raicing
sync_flag = 1;
} else {
Expand All @@ -257,7 +280,8 @@ class run_atomic_fence {
break;
}
}
sycl::atomic_fence(order_read, MemoryScope);
sycl::atomic_fence(memory_order_read_acc[0],
memory_scope_acc[0]);
// After the fence safe non-atomic reading
if (write_happened) {
// Non-atomic read of data
Expand Down
Loading