Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements of __pattern_any_of, __pattern_find_if #1622

Closed

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Jun 11, 2024

In this PR we made some performance improvements:

My experiments shows significant performance boost for subset of algorithms after these changes.

return std::get<0>(op1) ? op1 : op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
bool
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pattern_any_of impl on __parallel_transform_reduce

@@ -715,16 +771,44 @@ _Iterator
__pattern_find_if(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pattern_find_if impl on __parallel_transform_reduce

return std::get<0>(op1) ? op1 : op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
bool
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKopienko did you check a performance gain after your previous perf improvement for oneapi::dpl::__par_backend_hetero::__parallel_find_or pattern?
(#1617)

My question here is: which base pattern is better (with performance perspective) for __pattern_any_of?
The improved __parallel_find_or or __parallel_transform_reduce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have measured this case.
The overall performance is the same for these two approaches.
But in details there are some differences.

return std::get<0>(op1) ? op1 : op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
bool
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
_Pred __pred)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With code design perspective I would introduce some new (oneDPL middle patterns level) range based pattern, which would recall range based oneapi::dpl::__par_backend_hetero::__parallel_transform_reduce, and be shared between iterator based __pattern_any_of and range based __pattern_any_of.

return std::get<0>(op1) ? op1 : op2;
}
};

template <typename _BackendTag, typename _ExecutionPolicy, typename _Iterator, typename _Pred>
bool
__pattern_any_of(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _Iterator __first, _Iterator __last,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if we have an intention to use "reduction" we can use a functional approach, shorter, readable and easer to further support.
For, example "any_of" can be expressed via
reduce(data | transform([pred](auto x) { return pred(x);}), false, std::logical_or<bool>{});
Please, have a look: https://godbolt.org/z/csMfaE54h

Comment on lines 694 to 695
// __counting_iterator_t - iterate position (index) in source data
using __counting_iterator_t = oneapi::dpl::counting_iterator<_difference_type>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following the reason to include the counting iterator and the rest of this that tracks the index of the output for __pattern_any_of, which only returns the true or false information, and discards the index information. It seems we are adding lots of complexity and computation.
If we want to handle this with a transform_reduce, it seems like we can avoid the counting iterator, zipping, tuples, etc.
and just have a reduction operation std::logical_and.

I should note that I'm not sure a reduction is what we want here necessarily. There may be some better "early exit" routine which checks for true and short circuits remaining work if it encounters one. As discussed though, this would need to balance contention on some synchronization mechanism with the potential savings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, point, fixed.

@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/patterns_on_transform_reduce branch from f3bd4e9 to 716dfe3 Compare June 19, 2024 13:10
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/patterns_on_transform_reduce branch 2 times, most recently from 3c9f8ea to b603c82 Compare July 10, 2024 13:07
@SergeyKopienko SergeyKopienko marked this pull request as draft July 10, 2024 15:39
@SergeyKopienko SergeyKopienko removed this from the 2022.7.0 milestone Jul 10, 2024
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/patterns_on_transform_reduce branch 2 times, most recently from 907fb29 to 57e7d25 Compare July 12, 2024 14:50
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/patterns_on_transform_reduce branch from 57e7d25 to f4f2141 Compare July 25, 2024 12:29
…es __find_if_unary_transform_op, __find_if_binary_reduce_op
…on of __pattern_any_of on __parallel_transform_reduce
…on of __pattern_find_if on __parallel_transform_reduce
…on of __pattern_any_of on __parallel_transform_reduce - fix review comment: counting iterator not required.

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/patterns_on_transform_reduce branch from f4f2141 to f6d73e1 Compare August 1, 2024 07:31
@SergeyKopienko
Copy link
Contributor Author

Our parallel_find_or-based algorithms implementation is more perform now then transform_reduce-based implementation.
So this PR is not required anymore.

@SergeyKopienko SergeyKopienko deleted the dev/skopienko/patterns_on_transform_reduce branch August 5, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants