Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge-sort: optimize leaf stage #1735

Merged
merged 38 commits into from
Aug 12, 2024

Conversation

dmitriy-sobolev
Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev commented Jul 29, 2024

Process more items at the leaf-sort stage. It substantially improves the performance on GPU devices.

The proposed implementation is the most genetic since it sorts the data stably, and relies only on the move-assignability and move-copyability of the type (according to the sort algorithm requirements). It can be specialized in the future.

@dmitriy-sobolev
Copy link
Contributor Author

dmitriy-sobolev commented Aug 2, 2024

I've found that the changes result in incorrect sort on Nvidia GPUs. I am debugging it.

Upd. It's been fixed in d92c9de.

@SergeyKopienko
Copy link
Contributor

SergeyKopienko commented Aug 5, 2024

@dmitriy-sobolev , in this PR you have two calls of depends_on : mandatory required to capture their params by value (now it captured by reference so it's will work incorrectly).
UPD: already fixed.

@dmitriy-sobolev
Copy link
Contributor Author

@dmitriy-sobolev , in this PR you have two calls of depends_on : mandatory required to capture their params by value (now it captured by reference so it's will work incorrectly).

Thanks. I've fixed it.

@SergeyKopienko
Copy link
Contributor

SergeyKopienko commented Aug 6, 2024

@dmitriy-sobolev please take a look to the code like this:

                            const auto& __rng1 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset);
                            const auto& __rng2 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset + __n1);
  1. & isn't required here;
  2. Really we create the new instance of drop_view_simple in this code, so I propose to rewrite it:
                            const oneapi::dpl::__ranges::drop_view_simple __rng1(__dst, __offset);
                            const oneapi::dpl::__ranges::drop_view_simple __rng2(__dst, __offset + __n1);

There are four places...

@dmitriy-sobolev
Copy link
Contributor Author

@dmitriy-sobolev please take a look to the code like this:

                            const auto& __rng1 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset);
                            const auto& __rng2 = oneapi::dpl::__ranges::drop_view_simple(__dst, __offset + __n1);
  1. & isn't required here;
  2. Really we create the new instance of drop_view_simple in this code, so I propose to rewrite it:
                            const oneapi::dpl::__ranges::drop_view_simple __rng1(__dst, __offset);
                            const oneapi::dpl::__ranges::drop_view_simple __rng2(__dst, __offset + __n1);

There are four places...

Done

Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
@dmitriy-sobolev dmitriy-sobolev force-pushed the dev/dmitriy-sobolev/merge-sort-leaf branch from 546fdb5 to 5078687 Compare August 7, 2024 11:03
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
@dmitriy-sobolev dmitriy-sobolev force-pushed the dev/dmitriy-sobolev/merge-sort-leaf branch from 4e0dfd0 to 4efce9c Compare August 7, 2024 11:22
Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
mmichel11
mmichel11 previously approved these changes Aug 8, 2024
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although you may want to check with other reviewers before merging

Signed-off-by: Dmitriy Sobolev <[email protected]>
SergeyKopienko
SergeyKopienko previously approved these changes Aug 12, 2024
Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akukanov
Copy link
Contributor

Is it critical for performance that data-per-workitem is a template parameter of __leaf_sorter and __group_merge_path_sorter, instead of a runtime parameter? I am concerned about the proliferation of kernel instantiations, and the only reason why leaf sorters have different C++ type is that parameter.

@dmitriy-sobolev
Copy link
Contributor Author

dmitriy-sobolev commented Aug 12, 2024

Is it critical for performance that data-per-workitem is a template parameter of __leaf_sorter and __group_merge_path_sorter, instead of a runtime parameter? I am concerned about the proliferation of kernel instantiations, and the only reason why leaf sorters have different C++ type is that parameter.

No it is not. As far as I remember the performance impact is marginal. Probably, it is better to get rid of it for now.

Signed-off-by: Dmitriy Sobolev <[email protected]>
Signed-off-by: Dmitriy Sobolev <[email protected]>
Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dmitriy-sobolev dmitriy-sobolev merged commit 7597d5c into main Aug 12, 2024
20 of 21 checks passed
@dmitriy-sobolev dmitriy-sobolev deleted the dev/dmitriy-sobolev/merge-sort-leaf branch August 12, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants