-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize merge
algorithm for largest data sizes
#1933
Conversation
a6164fd
to
d4721ca
Compare
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
d8366c3
to
58eacbf
Compare
8f756f0
to
1b6cd34
Compare
…introduce new function __find_start_point_in Signed-off-by: Sergey Kopienko <[email protected]>
…introduce __parallel_merge_submitter_large for merge of biggest data sizes Signed-off-by: Sergey Kopienko <[email protected]>
…using __parallel_merge_submitter_large for merge data equal or greater then 4M items Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
…fix compile error Signed-off-by: Sergey Kopienko <[email protected]>
…fix Kernel names Signed-off-by: Sergey Kopienko <[email protected]>
…rename template parameter names in __parallel_merge_submitter Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
…fix review comment Signed-off-by: Sergey Kopienko <[email protected]>
…fix review comment Signed-off-by: Sergey Kopienko <[email protected]>
eebf508
to
400f695
Compare
…introduce __starting_size_limit_for_large_submitter into __parallel_merge Signed-off-by: Sergey Kopienko <[email protected]>
…fix warning: warning C4804: '<': unsafe use of type 'bool' in operation Signed-off-by: Sergey Kopienko <[email protected]>
…remove extra comments before __find_start_point_in function Signed-off-by: Sergey Kopienko <[email protected]>
8465a4b
to
2f9a357
Compare
I have fixed two moments in the So I propose to save May be make sense only one question here: do we prefer to rename |
Signed-off-by: Sergey Kopienko <[email protected]>
2f9a357
to
6dd8e51
Compare
…fix self-review comment: we should describe lambda here as mutable 1) for compatibility with previous implementation 2) because at https://en.cppreference.com/w/cpp/algorithm/merge (for example) we see that bool cmp(const Type1& a, const Type2& b); isn't const Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
1bae441
to
e55ee66
Compare
Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Outdated
Show resolved
Hide resolved
…fix review comment: constexpr bool kValue = false; has been removed Signed-off-by: Sergey Kopienko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great optimization work! Please wait for the final reviews from others as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think it worked out nicely.
As mentioned by @mmichel11, I still think there are couple places we have room for more optimization with more effort but this is a big step in the right direction.
Thanks @SergeyKopienko !
: (__i_elem < __n ? __find_start_point(__rng1, _IdType{0}, __n1, __rng2, | ||
_IdType{0}, __n2, __i_elem, __comp) | ||
: _split_point_t<_IdType>{__n1, __n2}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? :
operator is used instead of if.. else
in my proposal. It is by performance reasons? I've just interesting in...
(if.. else
was more readable, IMHO)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of __serial_merge
on ternary operator gave us good profit. So I hope some small profit we may have in this place too. But I am not ready to prepare any precision data about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In this PR we optimize
merge
algorithm for data sizes equal or greater then8M
items (16M
items forint
type).The main idea - we doing two submits:
submit
we find split point in some"base"
diagonal's subset.submit
we find split points in all other diagonal and runserial merge
for each diagonal (as before).But when we find split point on the current diagonal, we setup some indexes limits for
rng1
andrng2
: by this way we decrease the amount of reading data from source data ranges. For these limits we load split point's data from previous and next"base"
diagonals, calculated on the step (1).Applying this approach we have good perf profit for biggest data sizes with
float
andint
data types.Details:
Probably I should explain why I have create
and why I inherit
__result_and_scratch_storage
from__result_and_scratch_storage_base
:Let me explain the reason.
In the
__parallel_merge(oneapi::dpl::__internal::__device_backend_tag, ...)
function we checks in run-time some conditions to make decision which submitter we will call:__parallel_merge_submitter<std::uint32_t, ...>
__parallel_merge_submitter_large<std::uint32_t, ...>
__parallel_merge_submitter_large<std::uint64_t, ...>
In the cases (2) and (3) we should extend the life-time of different
__result_and_scratch_storage
instances which were created inside__result_and_scratch_storage<std::uint32_t>::operator()
or__result_and_scratch_storage<std::uint64_t>::operator()
:__result_and_scratch_storage
;__result_and_scratch_storage<std::uint32_t>
;__result_and_scratch_storage<std::uint64_t>
.So the
std::shared_ptr<__result_and_scratch_storage_base>
is some common interface for the both of them which can extend their life-time.About
virtual ~__result_and_scratch_storage_base() = default
- it's needed to properly destroy instances of inherited classes, which are really owned by shared-pointer instance in the cases (2) and (3). Withoutvirtual
here we will have memory-leaks.