Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Break loop in __early_exit_find_or after first found element #1644

Merged

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Jun 26, 2024

In this PR we implement break of for-loop inside __early_exit_find_or after first found element.
The approach from the PR #1624 is applicable for the first/last element search.
This approach gives us a good performance boost.
Technically this break call implemented thought additional local variable with check inside of continue-condition in for-loop due performance results.

Also in this PR, we analyze the result of compare_exchange_strong() call and break loop if the value of atomic has been changed correctly to avoid extra atomic value load() calls.

… break for forward and backward search into __early_exit_find_or

Signed-off-by: Sergey Kopienko <[email protected]>
…view comment: call break only once

Signed-off-by: Sergey Kopienko <[email protected]>
…e compare_exchange_strong result

Signed-off-by: Sergey Kopienko <[email protected]>
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets have a conversation offline if any of these comments don't make sense. I think what I'm proposing should still be correct, and has the opportunity for performance improvement, but its worth discussing if you see any problems.

@SergeyKopienko SergeyKopienko requested a review from julianmi June 26, 2024 13:32
…/skopienko/insert_missed_breaks_into_parallel_find_or"

This reverts commit 42eb498, reversing
changes made to 39a7b6e.
…view comments: additional comments for the break.

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko requested a review from julianmi June 27, 2024 11:32
…uce __found_local_state variable to reduce the amount of operations with atomic

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko
Copy link
Contributor Author

@danhoeflinger, @julianmi how do you think, are we ready to merge this PR ?

danhoeflinger
danhoeflinger previously approved these changes Jun 27, 2024
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (wait for green CI)

julianmi
julianmi previously approved these changes Jun 27, 2024
…rformance issue in some cases for middle data sizes

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko dismissed stale reviews from julianmi and danhoeflinger via e64dab7 June 28, 2024 09:18
…rformance issue in some cases for middle data sizes

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/insert_missed_breaks_into_parallel_find_or branch from 441659c to 343b61c Compare June 28, 2024 09:41
@SergeyKopienko SergeyKopienko requested review from julianmi and danhoeflinger and removed request for dmitriy-sobolev June 28, 2024 09:56
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, I'd like to understand why this is faster than break statements. There might be potential performance benefits in other code parts where we use break statements in case this is a general issue.

@SergeyKopienko
Copy link
Contributor Author

LGTM. However, I'd like to understand why this is faster than break statements. There might be potential performance benefits in other code parts where we use break statements in case this is a general issue.

Just now I found one place with break pattern in Kernel code and going to investigate this place separatelly.
It's inside struct multiple_match_pred

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, It seems fine from a correctness perspective. It should simplify when we make the switch to fetch_min / fetch_max.

@danhoeflinger
Copy link
Contributor

My guess about the break statements is that it is more difficult for thread divergence within a subgroup. It does seem like the compiler should be able to generate code for a break; which is equivalently performant as a loop exit flag though.
It would be interesting to boil down to a very simple example and examine the resulting generated code.

@SergeyKopienko SergeyKopienko merged commit e1fee64 into main Jun 28, 2024
20 checks passed
@SergeyKopienko SergeyKopienko deleted the dev/skopienko/insert_missed_breaks_into_parallel_find_or branch June 28, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants