-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements of __pattern_any_of
, __pattern_find_if
#1622
Conversation
return std::get<0>(op1) ? op1 : op2; | ||
} | ||
}; | ||
|
||
template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred> | ||
bool | ||
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__pattern_any_of impl on __parallel_transform_reduce
@@ -715,16 +771,44 @@ _Iterator | |||
__pattern_find_if(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__pattern_find_if impl on __parallel_transform_reduce
return std::get<0>(op1) ? op1 : op2; | ||
} | ||
}; | ||
|
||
template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred> | ||
bool | ||
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKopienko did you check a performance gain after your previous perf improvement for oneapi::dpl::__par_backend_hetero::__parallel_find_or
pattern?
(#1617)
My question here is: which base pattern is better (with performance perspective) for __pattern_any_of
?
The improved __parallel_find_or
or __parallel_transform_reduce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have measured this case.
The overall performance is the same for these two approaches.
But in details there are some differences.
return std::get<0>(op1) ? op1 : op2; | ||
} | ||
}; | ||
|
||
template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred> | ||
bool | ||
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last, | ||
_Pred __pred) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With code design perspective I would introduce some new (oneDPL middle patterns level) range based pattern, which would recall range based oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce
, and be shared between iterator based __pattern_any_of
and range based __pattern_any_of
.
return std::get<0>(op1) ? op1 : op2; | ||
} | ||
}; | ||
|
||
template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred> | ||
bool | ||
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if we have an intention to use "reduction" we can use a functional approach, shorter, readable and easer to further support.
For, example "any_of" can be expressed via
reduce(data | transform([pred](auto x) { return pred(x);}), false, std::logical_or<bool>{});
Please, have a look: https://godbolt.org/z/csMfaE54h
// __counting_iterator_t - iterate position (index) in source data | ||
using __counting_iterator_t = oneapi::dpl::counting_iterator<_difference_type>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following the reason to include the counting iterator and the rest of this that tracks the index of the output for __pattern_any_of
, which only returns the true
or false
information, and discards the index information. It seems we are adding lots of complexity and computation.
If we want to handle this with a transform_reduce, it seems like we can avoid the counting iterator, zipping, tuples, etc.
and just have a reduction operation std::logical_and
.
I should note that I'm not sure a reduction is what we want here necessarily. There may be some better "early exit" routine which checks for true
and short circuits remaining work if it encounters one. As discussed though, this would need to balance contention on some synchronization mechanism with the potential savings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, point, fixed.
f3bd4e9
to
716dfe3
Compare
3c9f8ea
to
b603c82
Compare
907fb29
to
57e7d25
Compare
57e7d25
to
f4f2141
Compare
…es __find_if_unary_transform_op, __find_if_binary_reduce_op
…on of __pattern_any_of on __parallel_transform_reduce
…on of __pattern_find_if on __parallel_transform_reduce
…on of __pattern_any_of on __parallel_transform_reduce - fix review comment: counting iterator not required. Signed-off-by: Sergey Kopienko <[email protected]>
f4f2141
to
f6d73e1
Compare
Our |
In this PR we made some performance improvements:
__pattern_any_of(__hetero_tag
has been implemented on__parallel_transform_reduce
;__pattern_find_if(__hetero_tag
has been implemented on__parallel_transform_reduce
(gave us the same results as without this change but with Performance improvements of__parallel_find_or
+__device_backend_tag
#1617).My experiments shows significant performance boost for subset of algorithms after these changes.