-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge-sort: optimize leaf stage #1735
Conversation
4e39078
to
d5e5143
Compare
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
I've found that the changes result in incorrect sort on Nvidia GPUs. I am debugging it. Upd. It's been fixed in d92c9de. |
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
@dmitriy-sobolev , in this PR you have two calls of |
Thanks. I've fixed it. |
@dmitriy-sobolev please take a look to the code like this: const auto& __rng1 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset);
const auto& __rng2 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset + __n1);
const oneapi::dpl::__ranges::drop_view_simple __rng1(__dst, __offset);
const oneapi::dpl::__ranges::drop_view_simple __rng2(__dst, __offset + __n1); There are four places... |
Done |
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
546fdb5
to
5078687
Compare
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
4e0dfd0
to
4efce9c
Compare
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Dmitriy Sobolev <[email protected]>
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Dmitriy Sobolev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although you may want to check with other reviewers before merging
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Dmitriy Sobolev <[email protected]>
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge_sort.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Is it critical for performance that data-per-workitem is a template parameter of |
No it is not. As far as I remember the performance impact is marginal. Probably, it is better to get rid of it for now. |
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Process more items at the leaf-sort stage. It substantially improves the performance on GPU devices.
The proposed implementation is the most genetic since it sorts the data stably, and relies only on the move-assignability and move-copyability of the type (according to the sort algorithm requirements). It can be specialized in the future.