[oneDPL][ranges] support size limit for output for merge algorithm #1942

MikeDvorskiy · 2024-11-20T14:30:46Z

[oneDPL][ranges] support size limit for output for merge algorithm.
The change is according to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3179r2.html#range_as_output

serial pattern
parallel pattern (tbb)
parallel pattern (openMP)
parallel pattern (serial backend)
parallel pattern (DPCPP backend)

Update: Changes to draft status, causing faced to design issue, connected with different return types from the merge patterns - __result_and_scratch_storage/__result_and_scratch_storage_base. As an option - to have one common type of __result_and_scratch_storage for the all needs (ate least for pattern dpcpp merge patterns).
Update 2: the issue mentioned above has been resolved.

include/oneapi/dpl/pstl/glue_algorithm_ranges_impl.h

include/oneapi/dpl/pstl/hetero/algorithm_ranges_impl_hetero.h

include/oneapi/dpl/pstl/glue_algorithm_ranges_impl.h

SergeyKopienko · 2025-01-15T11:01:50Z

include/oneapi/dpl/pstl/algorithm_impl.h

@@ -2948,6 +2949,49 @@ __pattern_remove_if(__parallel_tag<_IsVector> __tag, _ExecutionPolicy&& __exec,
 // merge
 //------------------------------------------------------------------------

+template<typename It1, typename It2, typename ItOut, typename _Comp>
+std::pair<It1, It2>
+__brick_merge_2(It1 __it_1, It1 __it_1_e, It2 __it_2, It2 __it_2_e, ItOut __it_out, ItOut __it_out_e, _Comp __comp,


Probably the existing implementation of __serial_merge is more faster then this.

SergeyKopienko · 2025-01-15T11:03:52Z

include/oneapi/dpl/pstl/hetero/algorithm_ranges_impl_hetero.h

-    auto __n = __n1 + __n2;
-    if (__n == 0)
-        return 0;
+    if (__rng3.size() == 0)


Suggested change

if (__rng3.size() == 0)

if (__rng3.empty())

SergeyKopienko · 2025-01-15T11:05:11Z

include/oneapi/dpl/pstl/hetero/algorithm_ranges_impl_hetero.h

-    if (__n == 0)
-        return 0;
+    if (__rng3.size() == 0)
+        return {0, 0};

    //To consider the direct copying pattern call in case just one of sequences is empty.
    if (__n1 == 0)


We can make additional optimization here for the case when last(rng1) < first(rng2)

SergeyKopienko · 2025-01-15T11:08:49Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h

 {
    const _Index __rng1_size = std::min<_Index>(__n1 > __start1 ? __n1 - __start1 : _Index{0}, __chunk);
    const _Index __rng2_size = std::min<_Index>(__n2 > __start2 ? __n2 - __start2 : _Index{0}, __chunk);
    const _Index __rng3_size = std::min<_Index>(__rng1_size + __rng2_size, __chunk);

    const _Index __rng1_idx_end = __start1 + __rng1_size;
    const _Index __rng2_idx_end = __start2 + __rng2_size;
-    const _Index __rng3_idx_end = __start3 + __rng3_size;
+    const _Index __rng3_idx_end = std::min<_Index>(__n3, __start3 + __rng3_size);


looks like a logical error, because __n3 is the size but __rng3_idx_end is the last index.

__n3 also is the last index (limit) of range3.
Also as __n1 is the last index(limit) for range1 and __n2 is the last index(limit) for range2.

SergeyKopienko · 2025-01-15T11:09:13Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h

            });
-        // We should return the same thing in the second param of __future for compatibility


Please restore this comment

SergeyKopienko · 2025-01-15T11:12:08Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h

@@ -320,8 +335,13 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName,
                        __start = __base_diagonals_sp_global_ptr[__diagonal_idx];
                    }

-                    __serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem,
-                                   __nd_range_params.chunk, __n1, __n2, __comp);
+                    auto __ends = __serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem,


const auto

we know return type here, why you are using auto ?

cause auto is shorter and "faster" to write (a developer doesn't need searching for exact return type for __serial_merge(...) call.

As I understand, it doesn't correspondent with recommendations of @akukanov

SergeyKopienko · 2025-01-15T11:14:06Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h

@@ -391,7 +415,7 @@ __parallel_merge(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy

    using __value_type = oneapi::dpl::__internal::__value_t<_Range3>;

-    const std::size_t __n = __rng1.size() + __rng2.size();
+    const std::uint64_t __n = std::min<std::uint64_t>(__rng1.size() + __rng2.size(), __rng3.size());


Why we can't use std::size_t here as were before?

SergeyKopienko · 2025-01-15T11:15:18Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h

@@ -522,6 +522,7 @@ struct __usm_or_buffer_accessor
 struct __result_and_scratch_storage_base
 {
    virtual ~__result_and_scratch_storage_base() = default;
+    virtual std::size_t __get_data(sycl::event, std::size_t* __p_buf) const = 0;


As far as __result_and_scratch_storage_base already has __ in their name, I believe additional __ isn't required in method name.

SergeyKopienko · 2025-01-15T11:17:06Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h

+        return 0;
+    }
+
+    virtual std::size_t __get_data(sycl::event __event, std::size_t* __p_buf) const override


Technically this declaration is correct.
But for compatibility with the other code, as I seen, virtual aren't used together with override in our code.

SergeyKopienko · 2025-01-15T11:21:00Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h

@@ -729,6 +755,16 @@ class __future : private std::tuple<_Args...>
        return __storage.__wait_and_get_value(__my_event);
    }

+    constexpr auto
+    __wait_and_get_value(const std::shared_ptr<__result_and_scratch_storage_base>& __p_storage)


I propose to rewrite this overload:

__wait_and_get_value(__result_and_scratch_storage_base* __p_result_and_scratch_storage_base)

SergeyKopienko · 2025-01-15T11:23:52Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h

+    constexpr auto
+    __wait_and_get_value(const std::shared_ptr<__result_and_scratch_storage_base>& __p_storage)
+    {
+        std::size_t __buf[2] = {0, 0};


At this place you means that this method will always return some pair of std::size_t + std::size_t
So better to declare this in the method declaration too:

constexpr std::pair<std::size_t, std::size_t>

But will this method applicable for all use-cases? It's looks like some specific for your functional....

SergeyKopienko · 2025-01-15T11:28:44Z

include/oneapi/dpl/pstl/unseq_backend_simd.h

+            *__k = *__x;
+            ++__x;
+        }
+        else if(std::invoke(__comp, *__x, *__y))


Should we mandatory call std::invoke here?

SergeyKopienko · 2025-01-15T11:29:25Z

make/Makefile.common

@@ -49,13 +49,13 @@ endif # !os_name

 cfg ?= release

-device_type ?= GPU
+device_type ?= level_zero:gpu


Does the changes fin this file really linked with this PR ?

SergeyKopienko · 2025-01-15T11:31:22Z

include/oneapi/dpl/pstl/algorithm_impl.h

@@ -31,6 +31,7 @@
 #include "parallel_backend.h"
 #include "parallel_impl.h"
 #include "iterator_impl.h"
+#include "../functional"


Should we still have #include <functional> above?

SergeyKopienko · 2025-01-15T11:32:25Z

include/oneapi/dpl/pstl/algorithm_impl.h

+
+                                                        oneapi::dpl::__internal::__compare<_Comp, oneapi::dpl::identity>
+                                                            __cmp{__comp, oneapi::dpl::identity{}};
+                                                        const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0);


Suggested change

const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0);

const auto __res = __cmp(__it_1[__r], __it_2[__c]) ? 1 : 0;

SergeyKopienko · 2025-01-15T11:33:26Z

include/oneapi/dpl/pstl/algorithm_impl.h

+                                                        const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0);
+
+                                                        return __res < __val;


or the second variant:

retrun !__cmp(__it_1[__r], __it_2[__c]);

SergeyKopienko · 2025-01-15T11:34:04Z

include/oneapi/dpl/pstl/algorithm_impl.h

+                                            }
+
+                                            //serial merge n elements, starting from input x and y, to [i, j) output range
+                                            auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1,


Suggested change

auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1,

const auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1,

SergeyKopienko · 2025-01-15T13:46:55Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h

@@ -663,17 +674,17 @@ struct __result_and_scratch_storage : __result_and_scratch_storage_base
    // Note: this member function assumes the result is *ready*, since the __future has already
    // waited on the relevant event.
    _T
-    __get_value(size_t idx = 0) const
+    __get_value() const


We really shouldn't support more then one value in our code?

MikeDvorskiy marked this pull request as draft November 20, 2024 14:30

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 9 times, most recently from 33cd332 to d443dbe Compare November 27, 2024 12:03

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 4 times, most recently from 9ebcfb6 to 0066210 Compare November 28, 2024 11:55

MikeDvorskiy marked this pull request as ready for review November 28, 2024 15:24

MikeDvorskiy requested review from dmitriy-sobolev and danhoeflinger November 28, 2024 16:29

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 7 times, most recently from 3f648a7 to 5b078ad Compare November 29, 2024 17:24

dmitriy-sobolev reviewed Dec 19, 2024

View reviewed changes

include/oneapi/dpl/pstl/glue_algorithm_ranges_impl.h Show resolved Hide resolved

include/oneapi/dpl/pstl/hetero/algorithm_ranges_impl_hetero.h Show resolved Hide resolved

include/oneapi/dpl/pstl/glue_algorithm_ranges_impl.h Show resolved Hide resolved

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 98a7acb to c81b4c1 Compare December 23, 2024 13:50

MikeDvorskiy marked this pull request as draft December 29, 2024 10:00

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 76c3c16 to c0c8ba4 Compare January 14, 2025 13:49

MikeDvorskiy marked this pull request as ready for review January 14, 2025 13:49

[oneDPL][make] + usage ONEAPI_DEVICE_SELECTOR variable

c1ff14b

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from c0c8ba4 to ffea24a Compare January 14, 2025 13:51

MikeDvorskiy added this to the 2022.8.0 milestone Jan 14, 2025

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch 4 times, most recently from 02ed111 to 45a9bef Compare January 15, 2025 10:04

SergeyKopienko reviewed Jan 15, 2025

View reviewed changes

[oneDPL][ranges][merge] support size limit for output

bd19d40

MikeDvorskiy force-pushed the dev/mdvorski/merge_sized_output branch from 45a9bef to bd19d40 Compare January 15, 2025 13:30

SergeyKopienko reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oneDPL][ranges] support size limit for output for merge algorithm #1942

[oneDPL][ranges] support size limit for output for merge algorithm #1942

MikeDvorskiy commented Nov 20, 2024 •

edited

Loading

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

MikeDvorskiy Jan 15, 2025 •

edited

Loading

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

MikeDvorskiy Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

SergeyKopienko Jan 15, 2025

		});
		// We should return the same thing in the second param of __future for compatibility

	const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0);
	const auto __res = __cmp(__it_1[__r], __it_2[__c]) ? 1 : 0;

		const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0);

		return __res < __val;

	auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1,
	const auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1,

[oneDPL][ranges] support size limit for output for merge algorithm #1942

Are you sure you want to change the base?

[oneDPL][ranges] support size limit for output for merge algorithm #1942

Conversation

MikeDvorskiy commented Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeDvorskiy commented Nov 20, 2024 •

edited

Loading

MikeDvorskiy Jan 15, 2025 •

edited

Loading