-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Break loop in __early_exit_find_or after first found element #1644
[Perf] Break loop in __early_exit_find_or after first found element #1644
Conversation
… break for forward and backward search into __early_exit_find_or Signed-off-by: Sergey Kopienko <[email protected]>
…view comment: call break only once Signed-off-by: Sergey Kopienko <[email protected]>
…thing_was_found Signed-off-by: Sergey Kopienko <[email protected]>
…e compare_exchange_strong result Signed-off-by: Sergey Kopienko <[email protected]>
…- __something_was_found" This reverts commit 3538d15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets have a conversation offline if any of these comments don't make sense. I think what I'm proposing should still be correct, and has the opportunity for performance improvement, but its worth discussing if you see any problems.
…onal comments Signed-off-by: Sergey Kopienko <[email protected]>
…fetch_min and fetch_max
…uce __found_local_state variable Signed-off-by: Sergey Kopienko <[email protected]>
…ko/insert_missed_breaks_into_parallel_find_or
…onal comments Signed-off-by: Sergey Kopienko <[email protected]>
…view comments: additional comments for the break. Signed-off-by: Sergey Kopienko <[email protected]>
…onal comments Signed-off-by: Sergey Kopienko <[email protected]>
…uce __found_local_state variable to reduce the amount of operations with atomic Signed-off-by: Sergey Kopienko <[email protected]>
Co-authored-by: Julian Miller <[email protected]>
Co-authored-by: Julian Miller <[email protected]>
Co-authored-by: Julian Miller <[email protected]>
Co-authored-by: Julian Miller <[email protected]>
Co-authored-by: Julian Miller <[email protected]>
@danhoeflinger, @julianmi how do you think, are we ready to merge this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (wait for green CI)
…rformance issue in some cases for middle data sizes Signed-off-by: Sergey Kopienko <[email protected]>
e64dab7
…rformance issue in some cases for middle data sizes Signed-off-by: Sergey Kopienko <[email protected]>
…GitHUB clang format Signed-off-by: Sergey Kopienko <[email protected]>
441659c
to
343b61c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. However, I'd like to understand why this is faster than break statements. There might be potential performance benefits in other code parts where we use break statements in case this is a general issue.
Just now I found one place with break pattern in Kernel code and going to investigate this place separatelly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, It seems fine from a correctness perspective. It should simplify when we make the switch to fetch_min / fetch_max.
My guess about the break statements is that it is more difficult for thread divergence within a subgroup. It does seem like the compiler should be able to generate code for a break; which is equivalently performant as a loop exit flag though. |
In this PR we implement break of for-loop inside
__early_exit_find_or
after first found element.The approach from the PR #1624 is applicable for the first/last element search.
This approach gives us a good performance boost.
Technically this
break
call implemented thought additional local variable with check inside of continue-condition infor
-loop due performance results.Also in this PR, we analyze the result of
compare_exchange_strong()
call and break loop if the value of atomic has been changed correctly to avoid extra atomic valueload()
calls.