Implementation of tag dispatching on current codebase #1239

SergeyKopienko · 2023-10-20T15:50:23Z

This PR based on prototype from @rarutyun

Source prototype has been placed here:

!!! ATTENTION !!!

in this PR we disable `for_loop` staff for hetero policies:

for_loop;
for_loop_n;
for_loop_strided;
for_loop_n_strided.

Tag dispatching mechanism:

Allows to select parallel pattern to go (with select_backend function)
- Decision is made once basing on the execution policy and Iterator category
- Provides nested tags for the next level dispatching (parallel backend, vectorization, etc.): For example: __serial_backend_tag, __tbb_backend_tag, __omp_backend_tag, __device_backend_tag, __fpga_backend_tag, is_vector.
Patterns are selected based on the tag
- overload with generic tag (customization point)
- overloads for concrete tag types with optimized implementation
Parallel backend as well as vectorized vs non-vectorized bricks are selected basing on the nested tags.

Overall schema of tag dispatching:

Algorithm level - first level:

call const auto __dispatch_tag = oneapi::dpl::__internal::__select_backend(__exec, ...); on algorithm level;
pass __dispatch_tag into patterns;

Pattern level - second level:

get _BackendTag from the fist __Tag parameter type;
pass _BackendTag{} (instance of the backend tag into backend implementation.

Host backend tags:

struct __serial_backend_tag { }; // For serial backend
struct __tbb_backend_tag { }; // For TBB backend
struct __omp_backend_tag { }; // For OMP backend

---
title: Host backend tags
---
classDiagram
    class __serial_backend_tag {
    }
    class __tbb_backend_tag {
    }
    class __omp_backend_tag {
    }

Hetero backend tags:

struct __device_backend_tag { };

#if _ONEDPL_FPGA_DEVICE
struct __fpga_backend_tag : __device_backend_tag { };
#endif // _ONEDPL_FPGA_DEVICE

---
title: Hetero backend tags
---
classDiagram
__device_backend_tag <|-- __fpga_backend_tag 
    class __device_backend_tag {
    }
    class __fpga_backend_tag {
    }

Types of dispatching tags (host tags):

template <class _IsVector>
struct __serial_tag
{
    using __is_vector = _IsVector;
};

template <class _IsVector>
struct __parallel_tag
{
    using __is_vector = _IsVector;
    using __backend_tag = __par_backend_tag;
};

struct __parallel_forward_tag
{
    using __is_vector = ::std::false_type;
    using __backend_tag = __par_backend_tag;
};

Types of dispatching tags (hetero tags):

template <typename _BackendTag>
struct __hetero_tag
{
    using __backend_tag = _BackendTag;
};

How we define `__par_backend_tag` :

#if _ONEDPL_PAR_BACKEND_TBB
using __par_backend_tag = __tbb_backend_tag;
#elif _ONEDPL_PAR_BACKEND_OPENMP
using __par_backend_tag = __omp_backend_tag;
#elif _ONEDPL_PAR_BACKEND_SERIAL
using __par_backend_tag = __serial_backend_tag;
#else
#    error "Parallel backend was not specified"
#endif

Typical changes in the code

Changes in pattern calls:

before:

template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
oneapi::dpl::__internal::__enable_if_execution_policy<_ExecutionPolicy, bool>
any_of(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)
{
    return oneapi::dpl::__internal::__pattern_any_of(
        ::std::forward<_ExecutionPolicy>(__exec), __first, __last, __pred,
        oneapi::dpl::__internal::__is_vectorization_preferred<_ExecutionPolicy, _ForwardIterator>(__exec),
        oneapi::dpl::__internal::__is_parallelization_preferred<_ExecutionPolicy, _ForwardIterator>(__exec));
}

after:

template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
oneapi::dpl::__internal::__enable_if_execution_policy<_ExecutionPolicy, bool>
any_of(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)
{
    // 1. Build required dispatching tag by call __select_backend function:
    //  - we should pass into `__select_backend` execution policy and one iterator of each iterator types from params.
    const auto __dispatch_tag = oneapi::dpl::__internal::__select_backend(__exec, __first);

    // 2. Pass dispatching tag into pattern call
    // So __is_vectorization_preferred and __is_parallelization_preferred calls aren't required anymore
    return oneapi::dpl::__internal::__pattern_any_of(__dispatch_tag, ::std::forward<_ExecutionPolicy>(__exec), __first,
                                                     __last, __pred);
}

Functions with `enable_if_..._policy<...>` and `/parallel=/::std::false_type`

before:

template <class _ExecutionPolicy, class _ForwardIterator, class _Pred, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy<_ExecutionPolicy, bool>
__pattern_any_of(_ExecutionPolicy&&, _ForwardIterator, _ForwardIterator, _Pred, _IsVector,
                 /*parallel=*/::std::false_type) noexcept;

after:

template <class _Tag, class _ExecutionPolicy, class _ForwardIterator, class _Pred>
bool
__pattern_any_of(_Tag, _ExecutionPolicy&&, _ForwardIterator, _ForwardIterator, _Pred) noexcept;

in these functions we able to get _IsVector type as typename _Tag::__is_vector

Functions with `__enable_if_host_execution_policy_conditional`, `__is_random_access_iterator_v` and `/parallel=/::std::true_type`

before:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Function, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy_conditional<
    _ExecutionPolicy, __is_random_access_iterator_v<_RandomAccessIterator>>
__pattern_walk1(_ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Function, _IsVector, /*parallel=*/::std::true_type);

after:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Function>
void
__pattern_walk1(__parallel_forward_tag, _ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Function);

Functions with `enable_if_..._policy<...>` and `/parallel=/::std::true_type`

before:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Pred, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy<_ExecutionPolicy, bool>
__pattern_any_of(_ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Pred, _IsVector,
                 /*parallel=*/::std::true_type);

after:

template <class _IsVector, class _ExecutionPolicy, class _RandomAccessIterator, class _Pred>
bool
__pattern_any_of(__parallel_tag<_IsVector>, _ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Pred);

in these functions we move class _IsVector to first template param place.

Functions with `__enable_if_device_execution_policy`

before:

template <typename _ExecutionPolicy, typename _ForwardIterator, typename _Function,
          oneapi::dpl::__internal::__enable_if_device_execution_policy<_ExecutionPolicy, int> = 0>
auto
__pattern_walk1_async(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Function __f);

after:

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _Function>
auto
__pattern_walk1_async(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Function __f);

Changes in the `oneDPL` host policy classes

As result of this work now we have more clear host policy classes:

// 2.4, Sequential execution policy
class sequenced_policy
{
};

// 2.5, Parallel execution policy
class parallel_policy
{
};

// 2.6, Parallel+Vector execution policy
class parallel_unsequenced_policy
{
};

class unsequenced_policy
{
};

All functions like __allow_unsequenced, __allow_vector and __allow_parallel has been removed from these classes as not required anymore.

`__select_backend()` functions

`__select_backend()` functions for host policies and iterators

template <class... _IteratorTypes>
__serial_tag<std::false_type>
__select_backend(oneapi::dpl::execution::sequenced_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__serial_tag<__internal::__is_random_access_iterator<_IteratorTypes...>>
__select_backend(oneapi::dpl::execution::unsequenced_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__tag_type<std::false_type, _IteratorTypes...>
__select_backend(oneapi::dpl::execution::parallel_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__tag_type<__internal::__is_random_access_iterator<_IteratorTypes...>, _IteratorTypes...>
__select_backend(oneapi::dpl::execution::parallel_unsequenced_policy, _IteratorTypes&&...)
{
    return {};
}

`__select_backend()` functions for hetero policies and iterators

template <class... _IteratorTypes, typename _KernelName>
::std::enable_if_t<__is_random_access_iterator_v<_IteratorTypes...>, __hetero_tag<__device_backend_tag>>
__select_backend(const execution::device_policy<_KernelName>&, _IteratorTypes&&...)
{
    return {};
}

#if _ONEDPL_FPGA_DEVICE
template <class... _IteratorTypes, unsigned int _Factor, typename _KernelName>
::std::enable_if_t<__is_random_access_iterator_v<_IteratorTypes...>, __hetero_tag<__fpga_backend_tag>>
__select_backend(const execution::fpga_policy<_Factor, _KernelName>&, _IteratorTypes&&...)
{
    return {};
}
#endif

`__select_backend()` functions for hetero policies and ranges (in the namespace `__ranges`)

template <typename _KernelName, typename... _Ranges>
oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>
__select_backend(const execution::device_policy<_KernelName>&, _Ranges&&...)
{
    return {};
}

#if _ONEDPL_FPGA_DEVICE
template <unsigned int _Factor, typename _KernelName, typename... _Ranges>
oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__fpga_backend_tag>
__select_backend(const execution::fpga_policy<_Factor, _KernelName>&, _Ranges&&...)
{
    return {};
}
#endif

rarutyun

Please check that the code can be compiled with both dpcpp and tbb backends

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h

include/oneapi/dpl/pstl/execution_impl.h

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_fpga.h

include/oneapi/dpl/pstl/glue_memory_impl.h

…ing tag first is more logical

…eview comment: why do we add ranges namespace?

rarutyun

After a hard work or everybody, it looks good! Thanks.

MikeDvorskiy

LGTM

danhoeflinger

Some minor questions / feedback. Not yet a full review.

I tried to catch all the places where __parallel_tag<_IsVector> is mixed with _ForwardIterator though.

include/oneapi/dpl/pstl/execution_impl.h

include/oneapi/dpl/pstl/algorithm_fwd.h

include/oneapi/dpl/pstl/algorithm_impl.h

include/oneapi/dpl/pstl/execution_impl.h

include/oneapi/dpl/internal/exclusive_scan_by_segment_impl.h

include/oneapi/dpl/pstl/experimental/internal/for_loop.h

include/oneapi/dpl/pstl/experimental/internal/for_loop_impl.h

danhoeflinger · 2024-03-19T17:34:42Z

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_histogram.h

-__histogram_general_registers_local_reduction(_ExecutionPolicy&& __exec, const sycl::event& __init_event,
-                                              ::std::uint16_t __work_group_size, _Range1&& __input, _Range2&& __bins,
-                                              const _BinHashMgr& __binhash_manager)
+__histogram_general_registers_local_reduction(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy&& __exec,


What is the purpose of adding the tag to the signature of internal implementation details beyond all the "decisions" which the tag participates? (general question, not specific to histogram)

For histogram in particular, it seems that any function past __parallel_histogram has already made all the decisions it will make based on tag, and there is no alternative to oneapi::dpl::__internal::__device_backend_tag.

Or is there some decision that the tag extends as far as the policy extends?

I have added oneapi::dpl::__internal::__device_backend_tag here to have guaranties that this code will work only for device backend. It's something like "common practic" in this PR. But I agree with you, all decisions about tags has been made earlier in code.

About alternatives - we also have

#if _ONEDPL_FPGA_DEVICE struct __fpga_backend_tag : __device_backend_tag { }; #endif

Yes, I suppose there is a chance in the future we may want to specialize individual details based on FPGA vs general device tags. This is not a blocker for me, but in practice it does add an extra parameter to already long signature.

One could argue that we could omit this beyond current existing "decisions" the tag makes, and if we decide to specialize for FPGA in the future, we could then easily add it to the necessary calls to provide the tag information to the specialization point. Just raising the point, but I'll leave it up to you.

danhoeflinger

I raised the point offline, but I do see some risk of "silent" miss-dispatching from this PR.

With a huge PR, and potential for small errors in function signatures for specialized tags, we could fall back silently to less performant implementations. The result would still build and pass correctness testing, but would be less performant. static_asserts have been added to combat this and attempt to detect if these cases arise with the wrong tag appearing in the general code.

This isn't a blocker for me, but I wanted raise this in the PR to see if there are any other ideas to mitigate this risk.

danhoeflinger

With limited time, and an approval from a couple others, I haven't gone through every line in incredible detail, but I have looked through the whole PR. All my feedback has been addressed, explained, or an issue created.

With a merge of #1455, this generally looks good to me.

SergeyKopienko · 2024-03-20T11:33:10Z

UPD: __is_random_access_iterator_t has been removed as unused anymore in oneDPL code.

…random_access_iterator_t as unused anymore

danhoeflinger

In the context of the other approvals from @rarutyun @MikeDvorskiy, with my
review, and inspection of the changes since their approval, this LGTM.

As mentioned offline, I think we should merge #1456 or something similar in the future, including some comments documenting the expectations for usage of the tag dispatching system. However, I don't see these as blockers for this PR, as they are not fixing existing bugs, but rather preventing potential pitfalls of future extension.

MikeDvorskiy

LGTM

SergeyKopienko requested review from rarutyun and MikeDvorskiy October 20, 2023 15:51

SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch 3 times, most recently from bd679bd to 3335f24 Compare October 24, 2023 16:33

rarutyun reviewed Oct 24, 2023

View reviewed changes

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h Outdated Show resolved Hide resolved

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h Outdated Show resolved Hide resolved

SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch from daa96fa to 1622733 Compare October 25, 2023 08:34

rarutyun reviewed Oct 25, 2023

View reviewed changes

include/oneapi/dpl/pstl/execution_impl.h Outdated Show resolved Hide resolved

include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_fpga.h Outdated Show resolved Hide resolved

SergeyKopienko mentioned this pull request Nov 9, 2023

Fix serial execution of parallel algorithms with C++20 iterators #1266

Merged

SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch from ce40383 to 19a0f87 Compare November 16, 2023 16:19

SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch 8 times, most recently from 926c100 to 2cf29f9 Compare January 31, 2024 10:55

SergeyKopienko commented Jan 31, 2024

View reviewed changes

include/oneapi/dpl/pstl/glue_memory_impl.h Outdated Show resolved Hide resolved

SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch 11 times, most recently from 7a43fb9 to 810ff85 Compare February 1, 2024 11:24

Sergey Kopienko added 2 commits March 15, 2024 09:31

include/oneapi/dpl/pstl/algorithm_impl.h - fix review comment: captur…

dae0918

…ing tag first is more logical

include/oneapi/dpl/pstl/hetero/algorithm_ranges_impl_hetero.h - fix r…

ff10491

…eview comment: why do we add ranges namespace?

SergeyKopienko dismissed MikeDvorskiy’s stale review via ff10491 March 15, 2024 08:44

Merge branch 'main' into dev/skopienko/tag_dispatching

f1eba8a

rarutyun previously approved these changes Mar 15, 2024

View reviewed changes

SergeyKopienko requested a review from MikeDvorskiy March 15, 2024 15:07

MikeDvorskiy previously approved these changes Mar 15, 2024

View reviewed changes

Fix compile errors in ranges tests (#1450)

86a59e6

SergeyKopienko dismissed stale reviews from MikeDvorskiy and rarutyun via 86a59e6 March 18, 2024 15:36

danhoeflinger reviewed Mar 18, 2024

View reviewed changes

Fix review comments for tag dispatching (#1451)

ea59cb3

SergeyKopienko mentioned this pull request Mar 19, 2024

Tune vectorized binary_search, lower_bound, and upper_bound algorithms #1446

Merged

danhoeflinger reviewed Mar 19, 2024

View reviewed changes

include/oneapi/dpl/internal/exclusive_scan_by_segment_impl.h Show resolved Hide resolved

danhoeflinger reviewed Mar 19, 2024

View reviewed changes

include/oneapi/dpl/pstl/experimental/internal/for_loop.h Show resolved Hide resolved

SergeyKopienko mentioned this pull request Mar 19, 2024

What's mean host policy ? #1454

Closed

danhoeflinger reviewed Mar 19, 2024

View reviewed changes

include/oneapi/dpl/pstl/experimental/internal/for_loop_impl.h Show resolved Hide resolved

danhoeflinger reviewed Mar 19, 2024

View reviewed changes

include/oneapi/dpl/pstl/iterator_defs.h - remove template alias __is_…

e26873b

…random_access_iterator_t as unused anymore

SergeyKopienko requested review from danhoeflinger, rarutyun and MikeDvorskiy March 20, 2024 11:44

danhoeflinger approved these changes Mar 20, 2024

View reviewed changes

MikeDvorskiy approved these changes Mar 20, 2024

View reviewed changes

SergeyKopienko merged commit 7eae79e into main Mar 20, 2024
20 checks passed

SergeyKopienko deleted the dev/skopienko/tag_dispatching branch March 20, 2024 13:28

SergeyKopienko mentioned this pull request May 22, 2024

Fix __pattern_partial_sort_copy forward declaration #1598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of tag dispatching on current codebase #1239

Implementation of tag dispatching on current codebase #1239

SergeyKopienko commented Oct 20, 2023 •

edited

Loading

rarutyun left a comment •

edited

Loading

rarutyun left a comment

MikeDvorskiy left a comment

danhoeflinger left a comment

danhoeflinger Mar 19, 2024

danhoeflinger Mar 19, 2024

SergeyKopienko Mar 19, 2024 •

edited

Loading

SergeyKopienko Mar 19, 2024

danhoeflinger Mar 19, 2024

danhoeflinger left a comment

danhoeflinger left a comment

SergeyKopienko commented Mar 20, 2024 •

edited

Loading

danhoeflinger left a comment

MikeDvorskiy left a comment

Implementation of tag dispatching on current codebase #1239

Implementation of tag dispatching on current codebase #1239

Conversation

SergeyKopienko commented Oct 20, 2023 • edited Loading

!!! ATTENTION !!!

in this PR we disable for_loop staff for hetero policies:

Tag dispatching mechanism:

Overall schema of tag dispatching:

Algorithm level - first level:

Pattern level - second level:

Host backend tags:

Hetero backend tags:

Types of dispatching tags (host tags):

Types of dispatching tags (hetero tags):

How we define __par_backend_tag :

Typical changes in the code

Changes in pattern calls:

Functions with enable_if_..._policy<...> and /*parallel=*/::std::false_type

Functions with __enable_if_host_execution_policy_conditional, __is_random_access_iterator_v and /*parallel=*/::std::true_type

Functions with enable_if_..._policy<...> and /*parallel=*/::std::true_type

Functions with __enable_if_device_execution_policy

Changes in the oneDPL host policy classes

__select_backend() functions

__select_backend() functions for host policies and iterators

__select_backend() functions for hetero policies and iterators

__select_backend() functions for hetero policies and ranges (in the namespace __ranges)

rarutyun left a comment • edited Loading

Choose a reason for hiding this comment

rarutyun left a comment

Choose a reason for hiding this comment

MikeDvorskiy left a comment

Choose a reason for hiding this comment

danhoeflinger left a comment

Choose a reason for hiding this comment

danhoeflinger Mar 19, 2024

Choose a reason for hiding this comment

danhoeflinger Mar 19, 2024

Choose a reason for hiding this comment

SergeyKopienko Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

SergeyKopienko Mar 19, 2024

Choose a reason for hiding this comment

danhoeflinger Mar 19, 2024

Choose a reason for hiding this comment

danhoeflinger left a comment

Choose a reason for hiding this comment

danhoeflinger left a comment

Choose a reason for hiding this comment

SergeyKopienko commented Mar 20, 2024 • edited Loading

danhoeflinger left a comment

Choose a reason for hiding this comment

MikeDvorskiy left a comment

Choose a reason for hiding this comment

SergeyKopienko commented Oct 20, 2023 •

edited

Loading

in this PR we disable `for_loop` staff for hetero policies:

How we define `__par_backend_tag` :

Functions with `enable_if_..._policy<...>` and `/parallel=/::std::false_type`

Functions with `__enable_if_host_execution_policy_conditional`, `__is_random_access_iterator_v` and `/parallel=/::std::true_type`

Functions with `enable_if_..._policy<...>` and `/parallel=/::std::true_type`

Functions with `__enable_if_device_execution_policy`

Changes in the `oneDPL` host policy classes

`__select_backend()` functions

`__select_backend()` functions for host policies and iterators

`__select_backend()` functions for hetero policies and iterators

`__select_backend()` functions for hetero policies and ranges (in the namespace `__ranges`)

rarutyun left a comment •

edited

Loading

SergeyKopienko Mar 19, 2024 •

edited

Loading

SergeyKopienko commented Mar 20, 2024 •

edited

Loading