[oneDPL] A fix - added missed synchronization between two SYCL patterns. #1261

MikeDvorskiy · 2023-11-03T17:13:13Z

In some algorithm patterns there are calls of more then one SYCL backend patterns. As far as SYCL backend patterns are asynchronous, we have to make synch between the them. It might be set SYCL dependencies or "wait" of the first pattern at least.
"Setting SYCL dependencies" is more preferable and right approach. But, currently, SYCL backend API doesn't have possibility to pass the any dependencies. It should be re-designed in the future... But now we can apply simple fix - to wait of the first SYCL backend pattern call.

danhoeflinger

I agree with you that the long term fix is to use explicit dependencies between these pattern calls, and that sort of refactor is necessary.

However, I think that we may be able to improve the stopgap for the __pattern_stable_partition case as described in my comment.

danhoeflinger · 2023-11-03T21:07:52Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

+    __pattern_walk2(
        __par_backend_hetero::make_wrapped_policy<copy_back_wrapper>(__exec),


Here it seems that these two __pattern_walk2 calls are independent of each other (don't require a specific ordering), but both do need to be waited on before returning from __pattern_stable_partition.

Instead of enforcing ordering here, could we make them both async calls and then wait on the policy's queue here?

I agree here...
Yes, we have to call wait for two events here. But we still don't have such oneDPL API.
Usage of SYCL API directly is not allowed on this oneDPL code layer. (as alliterative of queue::wait, there is possibility to call event::wait(list of events))

Yes, you are correct. I agree that similar to the others, this requires a rewrite / refactor to improve by returning some event from __pattern_walk2 rather than the simple true / false async we have.

danhoeflinger · 2023-11-03T21:17:59Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

-        auto __out_end = __pattern_walk2</*_IsSync=*/::std::false_type>(
+        auto __out_end = __pattern_walk2(
            __par_backend_hetero::make_wrapped_policy<__initial_copy_1>(__exec), __first, __last, __out_first,


This __pattern_walk2 and the subsequent __pattern_sort do have a dependency with __out_first, where __pattern_walk2 needs to be finished before __pattern_sort can begin.

My first thought is that the accessors involved here provide the implicit data dependency to order these kernels appropriately and make this work as a pipeline (as described in the _IsSync comment in the definition of __pattern_walk2).

However, with USM pointers as input data, I believe we lose these implicit dependencies and the ordering of kernels. Is that also your understanding (and the reason this extra synchronization is required)?

Actually, the case is more complicated what we see...

USM: ideal solution - via depends_on(first_event) (preferable) or wait() on the first event (future).

hetero iterators - you are right - we have implicit SYCL data access dependency (data dependency graph) and we don't need explicit synch

host iterators: due to the first pattern contains a temporary sycl::buffer in its own scope, we cannot rely on SYCL data dependency graph. On the second pattern call we have got another temporary sycl::buffer and SYCL Runtime will not be aware that it is the same data, I think.

Regarding (3) - I think, to fix it we have to re-write __pattern_partial_sort_copy by calling SYCL backend patterns directly.

And there is
4) In case of host iterators we don't need call wait() even on the last pattern call Because, we have implicit wait (and copying data back to the host) on sycl::buffer destructor. (I "love"(a kind of sarcasm) SYCL:-)

That makes sense. For us to rely on implicit dependencies for (3), we need only a single call which is reused to __get_sycl_range. However, this is something that should be important anyway to avoid extra copies to and from the device with host data.

Perhaps that rewrite does not belong in this PR, but it seems that may be something we need to audit. If we are using multiple high level patterns with host iterators, we may be copying data to and from the device for each pattern unnecessarily.

danhoeflinger · 2023-11-03T21:26:08Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

        oneapi::dpl::__ranges::all_view<_Tp, __par_backend_hetero::access_mode::write>(__temp_buf.get_buffer());

    const auto __shift = __new_first - __first;
    oneapi::dpl::__par_backend_hetero::__parallel_for(
        oneapi::dpl::__par_backend_hetero::make_wrapped_policy<__rotate_wrapper>(__exec),
        unseq_backend::__rotate_copy<typename ::std::iterator_traits<_Iterator>::difference_type>{__n, __shift}, __n,
-        __buf.all_view(), __temp_rng);


This falls in a similar category as the partial_sort_copy case above.

For buffers, I believe the implicit data dependencies of the accessor to __temp_buf should order these appropriately, and the wait on the last __parallel_for should remove any race with the return.

However, considering USM pointer inputs I think we lose those and this can be a bug that requries the wait()

yes, wait is redundant here, because __temp_rng is a temporary sycl::buffer.
Moreover, I guess we don't need the wait below, on line 1638, due to a destructor of the temporary buffer is blocking.

(To tell the truth I dont like situation when we have to know data types (USM/buffer/host/etc) and SYCL sync rules and other details.)

Yes, it is not a great situation, but you are correct because __temp_buf is a temporary sycl::buffer these waits are unnecessary. Assuming we remove both wait(), we should add comments for each to describe the implicit synchronization as it is not clear. (I missed it even with this topic in my mind when looking back at this PR)

danhoeflinger

After taking another look through this, I agree that this should be applied as is (since we cannot improve it further without significant refactoring to provide some method of pipelining patterns via events. It may result in a slow down in some usages, but will fix bugs in others, which is the preferred option between the two.

We should create an issue to properly describe the intricacies of the requirements for a refactor, specifically from this topic:
https://github.com/oneapi-src/oneDPL/pull/1261/files/2a9d0f0af2f64d582f9b0d770d94fe3085ea4112#r1383348739

MikeDvorskiy · 2024-03-22T16:30:32Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

@@ -1621,20 +1624,23 @@ __pattern_rotate(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator
    auto __buf = __keep(__first, __last);
    auto __temp_buf = oneapi::dpl::__par_backend_hetero::__buffer<_ExecutionPolicy, _Tp>(__exec, __n);

-    auto __temp_rng =
+    auto __temp_rng_w =


Currently, I dont see any sense to change variable name here. I would suggest to undo the change. (line 1627 and 1634)

MikeDvorskiy · 2024-03-22T18:22:15Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

-            __brick_copy<__hetero_tag<_BackendTag>, _ExecutionPolicy>{});
+        auto __out_end =
+            __pattern_walk2(__tag, __par_backend_hetero::make_wrapped_policy<__initial_copy_1>(__exec), __first, __last,
+                            __out_first, __brick_copy<__hetero_tag<_BackendTag>, _ExecutionPolicy>{});

        // Use regular sort as partial_sort isn't required to be stable
        __pattern_sort(


In case USM pointer we have to wait a result of __pattern_walk2 on 1489...
But if we do it, we have got perf degradation with sycl_iterator type (sycl::buffer).

So, since we are not wanting to do some sort of large refactor to use event dependencies to pipeline patterns at this point...

What about something like this to avoid the perf degradation:
Create something like __type_requires_exterior_synchronization<__out_first> which would resolve to true_type for anything for non-sycl backend, and for sycl backend it could return bool_constant<!is_sycl_iterator_v<__out_first>>.

We can do this without mentioning sycl at this level.

This still leaves a lot to be desired vs the refactor (for host_iterators especially), but it at least removes the degradation.

is_sycl_iterator_v...
What if we have a "fancy pipe" over a sycl_iterator? or what should we do in case of mix? zip(zip(sycl_iter, usm_ptr), usm_ptr) and so on?

Yes, it would miss these, its not a perfect solution. It provides some improvement though for raw sycl_iterator or legacy hetero iterators. The better solution is using events which can be passed as dependencies to patterns and then into sycl kernels.

The solution described could be made better by adding a __contains_sycl_iterator which has specializations for our fancy iterators to pass through and check if any iterator child __contains_sycl_iterator and would therefore provide the synchronization.
zip and permutation would return the OR of their children iterators, transform would return its base's result. If it is not a fancy iterator, it would just be defined as described above.

Actually its a bit more complicated because we need the access pattern to actually dictate a dependency. Probably permutation iterator should only pass-through to its base, not its map, because the map will be read-only.

It is of course the user of this structure's responsibility to make sure the usage of the sequence will generate the synchronization, but permutation iterator wont require write access for it's map even if it is used in write mode.

Read-only followed by read-only wouldn't require an ordering. We need to be most careful to avoid believing there is a synchronization where there isn't one.

MikeDvorskiy · 2024-03-22T18:30:24Z

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

        oneapi::dpl::__ranges::all_view<_Tp, __par_backend_hetero::access_mode::write>(__temp_buf.get_buffer());

    const auto __shift = __new_first - __first;
    oneapi::dpl::__par_backend_hetero::__parallel_for(
        oneapi::dpl::__par_backend_hetero::make_wrapped_policy<__rotate_wrapper>(__exec),
        unseq_backend::__rotate_copy<typename ::std::iterator_traits<_Iterator>::difference_type>{__n, __shift}, __n,
-        __buf.all_view(), __temp_rng);


yes, wait is redundant here, because __temp_rng is a temporary sycl::buffer.
Moreover, I guess we don't need the wait below, on line 1638, due to a destructor of the temporary buffer is blocking.

(To tell the truth I dont like situation when we have to know data types (USM/buffer/host/etc) and SYCL sync rules and other details.)

danhoeflinger

I believe I agree with your removal of these synchronizations
From
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:buf-sync-rules

"When the buffer is destroyed, the destructor will block until all work in queues on the buffer have completed, then copy the contents of the buffer back to the host memory (if required) and then return."
It doesn't matter what mode the accessor is in, it will always block on the destructor.

Other than the performance degradation for that one case, this looks good. We can discuss in that thread if we want to try to fix that in this PR or leave as an issue / TODO.

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

MikeDvorskiy · 2024-04-03T14:31:05Z

leave as an issue / TODO.

I've Ieft a proper //TODO

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h

danhoeflinger

LGTM

…synch, + comments

…ns, removed unnecessary sync, comments #1261

…L patterns, removed unnecessary sync, comments #1261" This reverts commit 7115fb4.

MikeDvorskiy requested review from dmitriy-sobolev and danhoeflinger November 3, 2023 17:13

MikeDvorskiy changed the title ~~[oneDPL] A fix - added missed synchronization between two async SYCL patterns.~~ [oneDPL] A fix - added missed synchronization between two SYCL patterns. Nov 3, 2023

danhoeflinger reviewed Nov 3, 2023

View reviewed changes

akukanov added the follow through PRs/issues that should be completed/resolved label Nov 30, 2023

ValentinaKats added this to the 2022.6.0 milestone Feb 5, 2024

SergeyKopienko mentioned this pull request Mar 21, 2024

Fix __parallel_partial_sort call without wait/get #1460

Closed

SergeyKopienko force-pushed the dev/mdvorski/fix_missed_synch_alg_patterns branch from b4c25d3 to 3030539 Compare March 22, 2024 16:28

danhoeflinger previously approved these changes Mar 25, 2024

View reviewed changes

MikeDvorskiy mentioned this pull request Mar 25, 2024

[oneDPL][hetero] call wait for two events #1466

Open

MikeDvorskiy commented Mar 25, 2024

View reviewed changes

MikeDvorskiy dismissed danhoeflinger’s stale review via 6b263e5 March 26, 2024 12:00

MikeDvorskiy force-pushed the dev/mdvorski/fix_missed_synch_alg_patterns branch 4 times, most recently from 4f12be6 to 4b7fde2 Compare March 26, 2024 14:09

danhoeflinger reviewed Mar 27, 2024

View reviewed changes

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h Outdated Show resolved Hide resolved

danhoeflinger reviewed Apr 4, 2024

View reviewed changes

include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h Outdated Show resolved Hide resolved

danhoeflinger approved these changes Apr 4, 2024

View reviewed changes

[oneDPL][hetero] + missed synch between patterns,removed unnecessary …

513dcd8

…synch, + comments

MikeDvorskiy force-pushed the dev/mdvorski/fix_missed_synch_alg_patterns branch from 48fe95d to 513dcd8 Compare April 5, 2024 10:50

MikeDvorskiy merged commit 2f84319 into main Apr 5, 2024
19 of 20 checks passed

MikeDvorskiy deleted the dev/mdvorski/fix_missed_synch_alg_patterns branch April 5, 2024 12:17

MikeDvorskiy added a commit that referenced this pull request Apr 8, 2024

[oneDPL] A fix - added missed synchronization between two SYCL patter…

7115fb4

…ns, removed unnecessary sync, comments #1261

SergeyKopienko pushed a commit that referenced this pull request Apr 9, 2024

Revert "[oneDPL] A fix - added missed synchronization between two SYC…

c04ba94

…L patterns, removed unnecessary sync, comments #1261" This reverts commit 7115fb4.

SergeyKopienko pushed a commit that referenced this pull request Apr 9, 2024

Revert "[oneDPL] A fix - added missed synchronization between two SYC…

7c4efe0

…L patterns, removed unnecessary sync, comments #1261" This reverts commit 7115fb4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oneDPL] A fix - added missed synchronization between two SYCL patterns. #1261

[oneDPL] A fix - added missed synchronization between two SYCL patterns. #1261

MikeDvorskiy commented Nov 3, 2023 •

edited

Loading

danhoeflinger left a comment •

edited

Loading

danhoeflinger Nov 3, 2023

MikeDvorskiy Nov 6, 2023 •

edited

Loading

danhoeflinger Mar 25, 2024

danhoeflinger Nov 3, 2023

MikeDvorskiy Nov 6, 2023 •

edited

Loading

danhoeflinger Nov 6, 2023 •

edited

Loading

danhoeflinger Nov 3, 2023

MikeDvorskiy Mar 22, 2024

danhoeflinger Mar 25, 2024

danhoeflinger left a comment

MikeDvorskiy Mar 22, 2024

MikeDvorskiy Mar 22, 2024

danhoeflinger Mar 25, 2024

danhoeflinger Mar 25, 2024

MikeDvorskiy Mar 26, 2024

danhoeflinger Mar 26, 2024

danhoeflinger Mar 26, 2024

MikeDvorskiy Mar 22, 2024

danhoeflinger left a comment

MikeDvorskiy commented Apr 3, 2024

danhoeflinger left a comment

		__pattern_walk2(
		__par_backend_hetero::make_wrapped_policy<copy_back_wrapper>(__exec),

[oneDPL] A fix - added missed synchronization between two SYCL patterns. #1261

[oneDPL] A fix - added missed synchronization between two SYCL patterns. #1261

Conversation

MikeDvorskiy commented Nov 3, 2023 • edited Loading

danhoeflinger left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

danhoeflinger Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danhoeflinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danhoeflinger left a comment

Choose a reason for hiding this comment

MikeDvorskiy commented Apr 3, 2024

danhoeflinger left a comment

Choose a reason for hiding this comment

MikeDvorskiy commented Nov 3, 2023 •

edited

Loading

danhoeflinger left a comment •

edited

Loading

MikeDvorskiy Nov 6, 2023 •

edited

Loading

MikeDvorskiy Nov 6, 2023 •

edited

Loading

danhoeflinger Nov 6, 2023 •

edited

Loading