-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oneDPL][ranges] support size limit for output for merge algorithm #1942
base: main
Are you sure you want to change the base?
Conversation
33cd332
to
d443dbe
Compare
9ebcfb6
to
0066210
Compare
3f648a7
to
5b078ad
Compare
98a7acb
to
c81b4c1
Compare
76c3c16
to
c0c8ba4
Compare
c0c8ba4
to
ffea24a
Compare
02ed111
to
45a9bef
Compare
@@ -2948,6 +2949,49 @@ __pattern_remove_if(__parallel_tag<_IsVector> __tag, _ExecutionPolicy&& __exec, | |||
// merge | |||
//------------------------------------------------------------------------ | |||
|
|||
template<typename It1, typename It2, typename ItOut, typename _Comp> | |||
std::pair<It1, It2> | |||
__brick_merge_2(It1 __it_1, It1 __it_1_e, It2 __it_2, It2 __it_2_e, ItOut __it_out, ItOut __it_out_e, _Comp __comp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the existing implementation of __serial_merge
is more faster then this.
auto __n = __n1 + __n2; | ||
if (__n == 0) | ||
return 0; | ||
if (__rng3.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (__rng3.size() == 0) | |
if (__rng3.empty()) |
if (__n == 0) | ||
return 0; | ||
if (__rng3.size() == 0) | ||
return {0, 0}; | ||
|
||
//To consider the direct copying pattern call in case just one of sequences is empty. | ||
if (__n1 == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make additional optimization here for the case when last(rng1) < first(rng2)
{ | ||
const _Index __rng1_size = std::min<_Index>(__n1 > __start1 ? __n1 - __start1 : _Index{0}, __chunk); | ||
const _Index __rng2_size = std::min<_Index>(__n2 > __start2 ? __n2 - __start2 : _Index{0}, __chunk); | ||
const _Index __rng3_size = std::min<_Index>(__rng1_size + __rng2_size, __chunk); | ||
|
||
const _Index __rng1_idx_end = __start1 + __rng1_size; | ||
const _Index __rng2_idx_end = __start2 + __rng2_size; | ||
const _Index __rng3_idx_end = __start3 + __rng3_size; | ||
const _Index __rng3_idx_end = std::min<_Index>(__n3, __start3 + __rng3_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a logical error, because __n3
is the size but __rng3_idx_end
is the last index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__n3
also is the last index (limit) of range3.
Also as __n1
is the last index(limit) for range1 and __n2
is the last index(limit) for range2.
}); | ||
// We should return the same thing in the second param of __future for compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore this comment
@@ -320,8 +335,13 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName, | |||
__start = __base_diagonals_sp_global_ptr[__diagonal_idx]; | |||
} | |||
|
|||
__serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem, | |||
__nd_range_params.chunk, __n1, __n2, __comp); | |||
auto __ends = __serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- const auto
- we know return type here, why you are using
auto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cause
auto
is shorter and "faster" to write (a developer doesn't need searching for exact return type for __serial_merge(...) call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, it doesn't correspondent with recommendations of @akukanov
@@ -391,7 +415,7 @@ __parallel_merge(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy | |||
|
|||
using __value_type = oneapi::dpl::__internal::__value_t<_Range3>; | |||
|
|||
const std::size_t __n = __rng1.size() + __rng2.size(); | |||
const std::uint64_t __n = std::min<std::uint64_t>(__rng1.size() + __rng2.size(), __rng3.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we can't use std::size_t
here as were before?
@@ -522,6 +522,7 @@ struct __usm_or_buffer_accessor | |||
struct __result_and_scratch_storage_base | |||
{ | |||
virtual ~__result_and_scratch_storage_base() = default; | |||
virtual std::size_t __get_data(sycl::event, std::size_t* __p_buf) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as __result_and_scratch_storage_base
already has __
in their name, I believe additional __
isn't required in method name.
return 0; | ||
} | ||
|
||
virtual std::size_t __get_data(sycl::event __event, std::size_t* __p_buf) const override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this declaration is correct.
But for compatibility with the other code, as I seen, virtual
aren't used together with override
in our code.
@@ -729,6 +755,16 @@ class __future : private std::tuple<_Args...> | |||
return __storage.__wait_and_get_value(__my_event); | |||
} | |||
|
|||
constexpr auto | |||
__wait_and_get_value(const std::shared_ptr<__result_and_scratch_storage_base>& __p_storage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to rewrite this overload:
__wait_and_get_value(__result_and_scratch_storage_base* __p_result_and_scratch_storage_base)
constexpr auto | ||
__wait_and_get_value(const std::shared_ptr<__result_and_scratch_storage_base>& __p_storage) | ||
{ | ||
std::size_t __buf[2] = {0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this place you means that this method will always return some pair of std::size_t + std::size_t
So better to declare this in the method declaration too:
constexpr std::pair<std::size_t, std::size_t>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But will this method applicable for all use-cases? It's looks like some specific for your functional....
*__k = *__x; | ||
++__x; | ||
} | ||
else if(std::invoke(__comp, *__x, *__y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mandatory call std::invoke
here?
@@ -49,13 +49,13 @@ endif # !os_name | |||
|
|||
cfg ?= release | |||
|
|||
device_type ?= GPU | |||
device_type ?= level_zero:gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the changes fin this file really linked with this PR ?
@@ -31,6 +31,7 @@ | |||
#include "parallel_backend.h" | |||
#include "parallel_impl.h" | |||
#include "iterator_impl.h" | |||
#include "../functional" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still have #include <functional>
above?
|
||
oneapi::dpl::__internal::__compare<_Comp, oneapi::dpl::identity> | ||
__cmp{__comp, oneapi::dpl::identity{}}; | ||
const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0); | |
const auto __res = __cmp(__it_1[__r], __it_2[__c]) ? 1 : 0; |
const auto __res = (__cmp(__it_1[__r], __it_2[__c]) ? 1 : 0); | ||
|
||
return __res < __val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or the second variant:
retrun !__cmp(__it_1[__r], __it_2[__c]);
} | ||
|
||
//serial merge n elements, starting from input x and y, to [i, j) output range | ||
auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1, | |
const auto __res = __brick_merge_2(__it_1 + __r, __it_1 + __n_1, |
45a9bef
to
bd19d40
Compare
@@ -663,17 +674,17 @@ struct __result_and_scratch_storage : __result_and_scratch_storage_base | |||
// Note: this member function assumes the result is *ready*, since the __future has already | |||
// waited on the relevant event. | |||
_T | |||
__get_value(size_t idx = 0) const | |||
__get_value() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really shouldn't support more then one value in our code?
[oneDPL][ranges] support size limit for output for merge algorithm.
The change is according to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3179r2.html#range_as_output
Update: Changes to draft status, causing faced to design issue, connected with different return types from the merge patterns -
__result_and_scratch_storage/__result_and_scratch_storage_base
. As an option - to have one common type of __result_and_scratch_storage for the all needs (ate least for pattern dpcpp merge patterns).Update 2: the issue mentioned above has been resolved.