Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of tag dispatching on current codebase #1239

Merged
merged 588 commits into from
Mar 20, 2024

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Oct 20, 2023

This PR based on prototype from @rarutyun

Source prototype has been placed here:

!!! ATTENTION !!!

in this PR we disable for_loop staff for hetero policies:

  • for_loop;
  • for_loop_n;
  • for_loop_strided;
  • for_loop_n_strided.

Tag dispatching mechanism:

  • Allows to select parallel pattern to go (with select_backend function)
    • Decision is made once basing on the execution policy and Iterator category
    • Provides nested tags for the next level dispatching (parallel backend, vectorization, etc.): For example: __serial_backend_tag, __tbb_backend_tag, __omp_backend_tag, __device_backend_tag, __fpga_backend_tag, is_vector.
  • Patterns are selected based on the tag
    • overload with generic tag (customization point)
    • overloads for concrete tag types with optimized implementation
  • Parallel backend as well as vectorized vs non-vectorized bricks are selected basing on the nested tags.

Overall schema of tag dispatching:

Algorithm level - first level:

  • call const auto __dispatch_tag = oneapi::dpl::__internal::__select_backend(__exec, ...); on algorithm level;
  • pass __dispatch_tag into patterns;

Pattern level - second level:

  • get _BackendTag from the fist __Tag parameter type;
  • pass _BackendTag{} (instance of the backend tag into backend implementation.

Host backend tags:

struct __serial_backend_tag { }; // For serial backend
struct __tbb_backend_tag { }; // For TBB backend
struct __omp_backend_tag { }; // For OMP backend
---
title: Host backend tags
---
classDiagram
    class __serial_backend_tag {
    }
    class __tbb_backend_tag {
    }
    class __omp_backend_tag {
    }
Loading

Hetero backend tags:

struct __device_backend_tag { };

#if _ONEDPL_FPGA_DEVICE
struct __fpga_backend_tag : __device_backend_tag { };
#endif // _ONEDPL_FPGA_DEVICE
---
title: Hetero backend tags
---
classDiagram
__device_backend_tag <|-- __fpga_backend_tag 
    class __device_backend_tag {
    }
    class __fpga_backend_tag {
    }
Loading

Types of dispatching tags (host tags):

template <class _IsVector>
struct __serial_tag
{
    using __is_vector = _IsVector;
};

template <class _IsVector>
struct __parallel_tag
{
    using __is_vector = _IsVector;
    using __backend_tag = __par_backend_tag;
};

struct __parallel_forward_tag
{
    using __is_vector = ::std::false_type;
    using __backend_tag = __par_backend_tag;
};

Types of dispatching tags (hetero tags):

template <typename _BackendTag>
struct __hetero_tag
{
    using __backend_tag = _BackendTag;
};

How we define __par_backend_tag :

#if _ONEDPL_PAR_BACKEND_TBB
using __par_backend_tag = __tbb_backend_tag;
#elif _ONEDPL_PAR_BACKEND_OPENMP
using __par_backend_tag = __omp_backend_tag;
#elif _ONEDPL_PAR_BACKEND_SERIAL
using __par_backend_tag = __serial_backend_tag;
#else
#    error "Parallel backend was not specified"
#endif

Typical changes in the code

Changes in pattern calls:

  • before:
template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
oneapi::dpl::__internal::__enable_if_execution_policy<_ExecutionPolicy, bool>
any_of(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)
{
    return oneapi::dpl::__internal::__pattern_any_of(
        ::std::forward<_ExecutionPolicy>(__exec), __first, __last, __pred,
        oneapi::dpl::__internal::__is_vectorization_preferred<_ExecutionPolicy, _ForwardIterator>(__exec),
        oneapi::dpl::__internal::__is_parallelization_preferred<_ExecutionPolicy, _ForwardIterator>(__exec));
}
  • after:
template <class _ExecutionPolicy, class _ForwardIterator, class _Predicate>
oneapi::dpl::__internal::__enable_if_execution_policy<_ExecutionPolicy, bool>
any_of(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)
{
    // 1. Build required dispatching tag by call __select_backend function:
    //  - we should pass into `__select_backend` execution policy and one iterator of each iterator types from params.
    const auto __dispatch_tag = oneapi::dpl::__internal::__select_backend(__exec, __first);

    // 2. Pass dispatching tag into pattern call
    // So __is_vectorization_preferred and __is_parallelization_preferred calls aren't required anymore
    return oneapi::dpl::__internal::__pattern_any_of(__dispatch_tag, ::std::forward<_ExecutionPolicy>(__exec), __first,
                                                     __last, __pred);
}

Functions with enable_if_..._policy<...> and /*parallel=*/::std::false_type

before:

template <class _ExecutionPolicy, class _ForwardIterator, class _Pred, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy<_ExecutionPolicy, bool>
__pattern_any_of(_ExecutionPolicy&&, _ForwardIterator, _ForwardIterator, _Pred, _IsVector,
                 /*parallel=*/::std::false_type) noexcept;

after:

template <class _Tag, class _ExecutionPolicy, class _ForwardIterator, class _Pred>
bool
__pattern_any_of(_Tag, _ExecutionPolicy&&, _ForwardIterator, _ForwardIterator, _Pred) noexcept;
  • in these functions we able to get _IsVector type as typename _Tag::__is_vector

Functions with __enable_if_host_execution_policy_conditional, __is_random_access_iterator_v and /*parallel=*/::std::true_type

before:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Function, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy_conditional<
    _ExecutionPolicy, __is_random_access_iterator_v<_RandomAccessIterator>>
__pattern_walk1(_ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Function, _IsVector, /*parallel=*/::std::true_type);

after:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Function>
void
__pattern_walk1(__parallel_forward_tag, _ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Function);

Functions with enable_if_..._policy<...> and /*parallel=*/::std::true_type

before:

template <class _ExecutionPolicy, class _RandomAccessIterator, class _Pred, class _IsVector>
oneapi::dpl::__internal::__enable_if_host_execution_policy<_ExecutionPolicy, bool>
__pattern_any_of(_ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Pred, _IsVector,
                 /*parallel=*/::std::true_type);

after:

template <class _IsVector, class _ExecutionPolicy, class _RandomAccessIterator, class _Pred>
bool
__pattern_any_of(__parallel_tag<_IsVector>, _ExecutionPolicy&&, _RandomAccessIterator, _RandomAccessIterator, _Pred);

in these functions we move class _IsVector to first template param place.

Functions with __enable_if_device_execution_policy

before:

template <typename _ExecutionPolicy, typename _ForwardIterator, typename _Function,
          oneapi::dpl::__internal::__enable_if_device_execution_policy<_ExecutionPolicy, int> = 0>
auto
__pattern_walk1_async(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Function __f);

after:

template <typename _BackendTag, typename _ExecutionPolicy, typename _ForwardIterator, typename _Function>
auto
__pattern_walk1_async(__hetero_tag<_BackendTag>, _ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Function __f);

Changes in the oneDPL host policy classes

As result of this work now we have more clear host policy classes:

// 2.4, Sequential execution policy
class sequenced_policy
{
};

// 2.5, Parallel execution policy
class parallel_policy
{
};

// 2.6, Parallel+Vector execution policy
class parallel_unsequenced_policy
{
};

class unsequenced_policy
{
};

All functions like __allow_unsequenced, __allow_vector and __allow_parallel has been removed from these classes as not required anymore.

__select_backend() functions

__select_backend() functions for host policies and iterators

template <class... _IteratorTypes>
__serial_tag<std::false_type>
__select_backend(oneapi::dpl::execution::sequenced_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__serial_tag<__internal::__is_random_access_iterator<_IteratorTypes...>>
__select_backend(oneapi::dpl::execution::unsequenced_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__tag_type<std::false_type, _IteratorTypes...>
__select_backend(oneapi::dpl::execution::parallel_policy, _IteratorTypes&&...)
{
    return {};
}

template <class... _IteratorTypes>
__tag_type<__internal::__is_random_access_iterator<_IteratorTypes...>, _IteratorTypes...>
__select_backend(oneapi::dpl::execution::parallel_unsequenced_policy, _IteratorTypes&&...)
{
    return {};
}

__select_backend() functions for hetero policies and iterators

template <class... _IteratorTypes, typename _KernelName>
::std::enable_if_t<__is_random_access_iterator_v<_IteratorTypes...>, __hetero_tag<__device_backend_tag>>
__select_backend(const execution::device_policy<_KernelName>&, _IteratorTypes&&...)
{
    return {};
}

#if _ONEDPL_FPGA_DEVICE
template <class... _IteratorTypes, unsigned int _Factor, typename _KernelName>
::std::enable_if_t<__is_random_access_iterator_v<_IteratorTypes...>, __hetero_tag<__fpga_backend_tag>>
__select_backend(const execution::fpga_policy<_Factor, _KernelName>&, _IteratorTypes&&...)
{
    return {};
}
#endif

__select_backend() functions for hetero policies and ranges (in the namespace __ranges)

template <typename _KernelName, typename... _Ranges>
oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__device_backend_tag>
__select_backend(const execution::device_policy<_KernelName>&, _Ranges&&...)
{
    return {};
}

#if _ONEDPL_FPGA_DEVICE
template <unsigned int _Factor, typename _KernelName, typename... _Ranges>
oneapi::dpl::__internal::__hetero_tag<oneapi::dpl::__internal::__fpga_backend_tag>
__select_backend(const execution::fpga_policy<_Factor, _KernelName>&, _Ranges&&...)
{
    return {};
}
#endif

@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch 3 times, most recently from bd679bd to 3335f24 Compare October 24, 2023 16:33
Copy link
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that the code can be compiled with both dpcpp and tbb backends

@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/tag_dispatching branch 11 times, most recently from 7a43fb9 to 810ff85 Compare February 1, 2024 11:24
Sergey Kopienko added 2 commits March 15, 2024 09:31
rarutyun
rarutyun previously approved these changes Mar 15, 2024
Copy link
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a hard work or everybody, it looks good! Thanks.

MikeDvorskiy
MikeDvorskiy previously approved these changes Mar 15, 2024
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor questions / feedback. Not yet a full review.

I tried to catch all the places where __parallel_tag<_IsVector> is mixed with _ForwardIterator though.

include/oneapi/dpl/pstl/execution_impl.h Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_fwd.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_fwd.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_fwd.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_fwd.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/algorithm_impl.h Outdated Show resolved Hide resolved
include/oneapi/dpl/pstl/execution_impl.h Show resolved Hide resolved
__histogram_general_registers_local_reduction(_ExecutionPolicy&& __exec, const sycl::event& __init_event,
::std::uint16_t __work_group_size, _Range1&& __input, _Range2&& __bins,
const _BinHashMgr& __binhash_manager)
__histogram_general_registers_local_reduction(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy&& __exec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of adding the tag to the signature of internal implementation details beyond all the "decisions" which the tag participates? (general question, not specific to histogram)

For histogram in particular, it seems that any function past __parallel_histogram has already made all the decisions it will make based on tag, and there is no alternative to oneapi::dpl::__internal::__device_backend_tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is there some decision that the tag extends as far as the policy extends?

Copy link
Contributor Author

@SergeyKopienko SergeyKopienko Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added oneapi::dpl::__internal::__device_backend_tag here to have guaranties that this code will work only for device backend. It's something like "common practic" in this PR. But I agree with you, all decisions about tags has been made earlier in code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About alternatives - we also have

#if _ONEDPL_FPGA_DEVICE
struct __fpga_backend_tag : __device_backend_tag
{
};
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose there is a chance in the future we may want to specialize individual details based on FPGA vs general device tags. This is not a blocker for me, but in practice it does add an extra parameter to already long signature.

One could argue that we could omit this beyond current existing "decisions" the tag makes, and if we decide to specialize for FPGA in the future, we could then easily add it to the necessary calls to provide the tag information to the specialization point. Just raising the point, but I'll leave it up to you.

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised the point offline, but I do see some risk of "silent" miss-dispatching from this PR.

With a huge PR, and potential for small errors in function signatures for specialized tags, we could fall back silently to less performant implementations. The result would still build and pass correctness testing, but would be less performant. static_asserts have been added to combat this and attempt to detect if these cases arise with the wrong tag appearing in the general code.

This isn't a blocker for me, but I wanted raise this in the PR to see if there are any other ideas to mitigate this risk.

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With limited time, and an approval from a couple others, I haven't gone through every line in incredible detail, but I have looked through the whole PR. All my feedback has been addressed, explained, or an issue created.

With a merge of #1455, this generally looks good to me.

@SergeyKopienko
Copy link
Contributor Author

SergeyKopienko commented Mar 20, 2024

UPD: __is_random_access_iterator_t has been removed as unused anymore in oneDPL code.

…random_access_iterator_t as unused anymore
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the other approvals from @rarutyun @MikeDvorskiy, with my
review, and inspection of the changes since their approval, this LGTM.

As mentioned offline, I think we should merge #1456 or something similar in the future, including some comments documenting the expectations for usage of the tag dispatching system. However, I don't see these as blockers for this PR, as they are not fixing existing bugs, but rather preventing potential pitfalls of future extension.

Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SergeyKopienko SergeyKopienko merged commit 7eae79e into main Mar 20, 2024
20 checks passed
@SergeyKopienko SergeyKopienko deleted the dev/skopienko/tag_dispatching branch March 20, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants