Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements of __pattern_any_of, __pattern_find_if #1622

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 96 additions & 16 deletions include/oneapi/dpl/pstl/hetero/algorithm_impl_hetero.h
Original file line number Diff line number Diff line change
Expand Up @@ -634,24 +634,45 @@ __pattern_count(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator
// any_of
//------------------------------------------------------------------------

struct __any_of_binary_reduce_op
{
bool
operator()(bool op1, bool op2) const
{
return op1 || op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
bool
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pattern_any_of impl on __parallel_transform_reduce

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKopienko did you check a performance gain after your previous perf improvement for oneapi::dpl::__par_backend_hetero::__parallel_find_or pattern?
(#1617)

My question here is: which base pattern is better (with performance perspective) for __pattern_any_of?
The improved __parallel_find_or or __parallel_transform_reduce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have measured this case.
The overall performance is the same for these two approaches.
But in details there are some differences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if we have an intention to use "reduction" we can use a functional approach, shorter, readable and easer to further support.
For, example "any_of" can be expressed via
reduce(data | transform([pred](auto x) { return pred(x);}), false, std::logical_or<bool>{});
Please, have a look: https://godbolt.org/z/csMfaE54h

_Pred __pred)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With code design perspective I would introduce some new (oneDPL middle patterns level) range based pattern, which would recall range based oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce, and be shared between iterator based __pattern_any_of and range based __pattern_any_of.

if (__first == __last)
using _difference_type = typename ::std::iterator_traits<_Iterator>::difference_type;

const _difference_type __n = __last - __first;
if (__n == 0)
return false;

using _Predicate = oneapi::dpl::unseq_backend::single_match_pred<_ExecutionPolicy, _Pred>;
using _data_type = typename ::std::iterator_traits<_Iterator>::value_type;

auto __keep = oneapi::dpl::__ranges::__get_sycl_range<__par_backend_hetero::access_mode::read, _Iterator>();
auto __buf = __keep(__first, __last);
using _result_type = oneapi::dpl::__internal::tuple<bool, _difference_type>;
const bool __init = false;

return oneapi::dpl::__par_backend_hetero::__parallel_find_or(
_BackendTag{},
__par_backend_hetero::make_wrapped_policy<__par_backend_hetero::__or_policy_wrapper>(
::std::forward<_ExecutionPolicy>(__exec)),
_Predicate{__pred}, __par_backend_hetero::__parallel_or_tag{}, __buf.all_view());
__any_of_binary_reduce_op __reduce_op;

using _Functor = unseq_backend::walk_n<_ExecutionPolicy, decltype(__pred)>;
using _RepackedTp = __par_backend_hetero::__repacked_tuple_t<_result_type>;

auto __keep_src_data =
oneapi::dpl::__ranges::__get_sycl_range<__par_backend_hetero::access_mode::read, _Iterator>();
auto __buf_src_data = __keep_src_data(__first, __last);

return oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce<bool, std::true_type /*is_commutative*/>(
_BackendTag{}, std::forward<_ExecutionPolicy>(__exec), __reduce_op, _Functor{__pred},
unseq_backend::__init_value<bool>{__init}, // initial value
__buf_src_data.all_view())
.get();
}

//------------------------------------------------------------------------
Expand Down Expand Up @@ -698,21 +719,80 @@ __pattern_equal(__hetero_tag<_BackendTag> __tag, _ExecutionPolicy&& __exec, _Ite
// find_if
//------------------------------------------------------------------------

template <typename _Tuple, typename _UnaryTransformOp>
struct __find_if_unary_transform_op
{
_UnaryTransformOp __transform_op;

template <typename Arg>
_Tuple
operator()(const Arg& arg) const
{
return {__transform_op(std::get<0>(arg)), std::get<1>(arg)};
}
};

template <typename _Tuple, typename _IsFirst>
struct __find_if_binary_reduce_op
{
_Tuple
operator()(const _Tuple& op1, const _Tuple& op2) const
{
if (std::get<0>(op1) && std::get<0>(op2))
{
if constexpr (_IsFirst{})
return {true, std::min(std::get<1>(op1), std::get<1>(op2))};
else
return {true, std::max(std::get<1>(op1), std::get<1>(op2))};
}

return std::get<0>(op1) ? op1 : op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
_Iterator
__pattern_find_if(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pattern_find_if impl on __parallel_transform_reduce

_Pred __pred)
{
if (__first == __last)
using _difference_type = typename ::std::iterator_traits<_Iterator>::difference_type;

const _difference_type __n = __last - __first;
if (__n == 0)
return __last;

using _Predicate = oneapi::dpl::unseq_backend::single_match_pred<_ExecutionPolicy, _Pred>;
using _result_type = oneapi::dpl::__internal::tuple<bool, _difference_type>;
const auto __init = _result_type{false, __n};

return __par_backend_hetero::__parallel_find(
_BackendTag{}, ::std::forward<_ExecutionPolicy>(__exec),
__par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::read>(__first),
__par_backend_hetero::make_iter_mode<__par_backend_hetero::access_mode::read>(__last), _Predicate{__pred},
::std::true_type{});
// __counting_iterator_t - iterate position (index) in source data
using __counting_iterator_t = oneapi::dpl::counting_iterator<_difference_type>;

using _zipped_data_type = typename std::iterator_traits<decltype(oneapi::dpl::make_zip_iterator(
__first, __counting_iterator_t{0}))>::value_type;

__find_if_binary_reduce_op<_zipped_data_type, /*_IsFirst*/ std::true_type> __reduce_op;
__find_if_unary_transform_op<_zipped_data_type, _Pred> __transform_op{__pred};

using _Functor = unseq_backend::walk_n<_ExecutionPolicy, decltype(__transform_op)>;
using _RepackedTp = __par_backend_hetero::__repacked_tuple_t<_result_type>;

auto __keep_src_data =
oneapi::dpl::__ranges::__get_sycl_range<__par_backend_hetero::access_mode::read, _Iterator>();
auto __buf_src_data = __keep_src_data(__first, __last);

const __counting_iterator_t __counting_it_first{0}, __counting_it_last{__n};
auto __keep_counting_it =
oneapi::dpl::__ranges::__get_sycl_range<__par_backend_hetero::access_mode::read, __counting_iterator_t>();
auto __buf_counting_it = __keep_counting_it(__counting_it_first, __counting_it_last);

auto res =
oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce<_RepackedTp, std::true_type /*is_commutative*/>(
_BackendTag{}, std::forward<_ExecutionPolicy>(__exec), __reduce_op, _Functor{__transform_op},
unseq_backend::__init_value<_RepackedTp>{__init}, // initial value
oneapi::dpl::__ranges::make_zip_view(__buf_src_data.all_view(), __buf_counting_it.all_view()))
.get();

return std::get<0>(res) ? __first + std::get<1>(res) : __last;
}

//------------------------------------------------------------------------
Expand Down
Loading