Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Transfer host variable to sycl kernel to avoid using unsupported memory capabilities. #930

Closed
wants to merge 2 commits into from

Conversation

haonanya
Copy link

@haonanya haonanya commented Sep 2, 2024

Test checks if memory capabilities are supported in the host code, and then early return in the final run time for unsupported capabilities. When use const host variable MemoryScope in the sycl kernel directly, aot compile will fail in the compile time if MemoryScope is not supported. So this patch transfers host variable to sycl kernel to avoid using unsupported memory capabilities.

@haonanya haonanya requested a review from a team as a code owner September 2, 2024 03:29
@CLAassistant
Copy link

CLAassistant commented Sep 2, 2024

CLA assistant check
All committers have signed the CLA.

@haonanya haonanya marked this pull request as draft September 2, 2024 03:53
@haonanya haonanya marked this pull request as ready for review September 2, 2024 06:09
@haonanya
Copy link
Author

haonanya commented Sep 2, 2024

@keryell, @ProGTX, can you please take a look? Thanks very much.

…d memory capabilities.

Test checks if memory capabilities are supported in the host code, and then
early return in the final run time for unsupported capabilities. When use
const host variable MemoryScope in the sycl kernel directly, aot
compile would fail in the compile time if MemoryScope is not supported.
So this patch transfers host variable to sycl kernel to avoid using
unsupported memory capabilities.
@haonanya haonanya changed the title [NFC] Do not use const host variable in the sycl kernel. [NFC] Transfer host variable to sycl kernel to avoid using unsupported memory capabilities. Sep 2, 2024
tests/atomic_fence/atomic_fence.cpp Outdated Show resolved Hide resolved
@@ -226,11 +227,32 @@ class run_atomic_fence {
sycl::buffer<bool> res_buf(&res, sycl::range<1>(1));
sycl::buffer<int> sync_buffer(&sync, sycl::range<1>(1));
sycl::buffer<int> data_buffer(&data, sycl::range<1>(1));
// Using the const host variable MemoryScope in the kernel directly
// may cause compile fail for AOT build. We transfer MemoryScope to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a SYCL limitation or a compiler bug?
In https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:language.restrictions.kernels:

Variables with static storage duration that are odr-used inside a device function, must be either const or constexpr, and must also be either zero-initialized or constant-initialized.
So, why not using directly MemoryOrder instead of intermediate order_write and so on?

Copy link
Author

@haonanya haonanya Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @keryell , not a SYCL limitation or a compiler bug. The early return in the host code occurs in the run time. For aot compile, use MemoryOrder directly may cause compile failure in compile time if MemoryOrder is not supported by device. So the test may fail in build stage and fail to generate a binary to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the problem that the test tries to use an atomic fence order or scope that is not supported by the device? If that is the case, the test should use info::device::atomic_fence_order_capabilities and info::device::atomic_fence_scope_capabilities to determine the supported order and scope values, and it should avoid submitting the kernel to the device if the device doesn't support that fence order or scope.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck, yes, there is early return https://github.com/KhronosGroup/SYCL-CTS/blob/SYCL-2020/tests/atomic_fence/atomic_fence.cpp#L197 in the original tests, the check works in the run time, but for aot compile it may compile fail in device build stage, so the test may fail to build final binary.

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@keryell keryell self-requested a review September 10, 2024 19:39
Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some formatting issues to fix.
Thanks.

@haonanya haonanya marked this pull request as draft September 13, 2024 06:24
@keryell
Copy link
Member

keryell commented Sep 25, 2024

@haonanya you can see the change to do about the formatting on https://github.com/KhronosGroup/SYCL-CTS/actions/runs/10674975860?pr=930
Could you apply it and remove the Draft as a blessed PR?

@haonanya
Copy link
Author

@haonanya you can see the change to do about the formatting on https://github.com/KhronosGroup/SYCL-CTS/actions/runs/10674975860?pr=930 Could you apply it and remove the Draft as a blessed PR?

@keryell , thanks for your patience! We didn't plan to submit the PR according to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features:

In order to guarantee source code portability of SYCL applications that use optional kernel features, all SYCL implementations must be able to compile device code that uses these optional features regardless of whether the implementation supports the 
features on any of its devices.

@keryell
Copy link
Member

keryell commented Sep 25, 2024

I see.
But actually "compiling" for all the devices might just be the compiler workflow not crashing on the FPGA and just generating an empty kernel for this device for example, since the runtime will check at launch time either it can run or not because of an unsupported feature.
Does it mean you can close the PR then?

@haonanya haonanya closed this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants