Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune the amount of groups in __parallel_find_or pattern #1723

Merged
merged 40 commits into from
Aug 12, 2024

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Jul 23, 2024

In this PR we tune the amount of groups in __parallel_find_or pattern.
This approach give us some performance boost on bigger data sizes.

@SergeyKopienko SergeyKopienko marked this pull request as ready for review July 23, 2024 12:08
@SergeyKopienko SergeyKopienko added this to the 2022.7.0 milestone Jul 23, 2024
Copy link
Contributor

@adamfidel adamfidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  • It is stated that the performance is better for larger input sizes. Does this have any affect on smaller input sizes?
  • For which devices do we see a performance benefit?

@SergeyKopienko SergeyKopienko requested a review from adamfidel July 24, 2024 07:39
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler tuning approach you might want to try is to fix __iters_per_work_item to powers of two based on the input sizes. This might generate more optimized code and would remove the need for the complex approach of calculating __n_groups.

@SergeyKopienko SergeyKopienko requested a review from julianmi July 24, 2024 10:20
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tune_n_groups branch from b1544cd to 50e4789 Compare July 25, 2024 10:06
@SergeyKopienko
Copy link
Contributor Author

SergeyKopienko commented Jul 25, 2024

@julianmi, @danhoeflinger, @adamfidel implementation has been updated.

  • I wrote calculation of __required_iters_per_work_item in __parallel_find_or_n_groups_tuner<oneapi::dpl::__internal::__device_backend_tag>::operator() in the formula form.

We still have good perf profit for a lot of sizes.

@SergeyKopienko SergeyKopienko removed this from the 2022.7.0 milestone Jul 30, 2024
@SergeyKopienko SergeyKopienko marked this pull request as draft July 30, 2024 16:15
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tune_n_groups branch 8 times, most recently from 6130b17 to 44bcf79 Compare August 7, 2024 13:37
@SergeyKopienko SergeyKopienko marked this pull request as ready for review August 8, 2024 11:09
@SergeyKopienko SergeyKopienko added this to the 2022.7.0 milestone Aug 8, 2024
@SergeyKopienko
Copy link
Contributor Author

@danhoeflinger, @julianmi, @adamfidel Could you please take a look again?

…iters_per_work_item > 1

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tune_n_groups branch from aedac45 to e8f59e8 Compare August 9, 2024 08:18
SergeyKopienko and others added 7 commits August 9, 2024 15:05
Signed-off-by: Sergey Kopienko <[email protected]>
…ve loop in __parallel_find_or_nd_range_tuner<oneapi::dpl::__internal::__device_backend_tag>::operator()
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tune_n_groups branch from 5d266b3 to 4922f46 Compare August 9, 2024 16:10
Signed-off-by: Sergey Kopienko <[email protected]>
Signed-off-by: Sergey Kopienko <[email protected]>
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SergeyKopienko SergeyKopienko merged commit 52efa5b into main Aug 12, 2024
20 of 21 checks passed
@SergeyKopienko SergeyKopienko deleted the dev/skopienko/tune_n_groups branch August 12, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants