Skip to content

Commit

Permalink
update of dr::shp::sort() (#1614)
Browse files Browse the repository at this point in the history
  • Loading branch information
lslusarczyk authored Jun 25, 2024
1 parent f7c82cc commit 7a0673b
Show file tree
Hide file tree
Showing 15 changed files with 9,003 additions and 8,417 deletions.
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
BasedOnStyle: LLVM

Standard: c++17
Standard: c++20

IndentWidth: 4
ColumnLimit: 120
Expand Down
128 changes: 128 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
[email protected].
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.
6 changes: 4 additions & 2 deletions documentation/library_guide/parallel_api/iterators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,11 @@ header. All iterators are implemented in the ``oneapi::dpl`` namespace.
The ``transform_iterator`` class provides the following constructors:

* ``transform_iterator()``: instantiates the iterator using a default constructed base iterator and unary functor.
This constructor participates in overload resolution only if the base iterator and unary functor are both default constructible.
This constructor participates in overload resolution only if the base iterator and unary functor are both default constructible.

* ``transform_iterator(iter)``: instantiates the iterator using the base iterator provided and a default constructed
unary functor. This constructor participates in overload resolution only if the unary functor is default constructible.
unary functor. This constructor participates in overload resolution only if the unary functor is default constructible.

* ``transform_iterator(iter, func)``: instantiates the iterator using the base iterator and unary functor provided.

To simplify the construction of the iterator, ``oneapi::dpl::make_transform_iterator`` is provided. The
Expand Down
89 changes: 87 additions & 2 deletions documentation/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,90 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
creating efficient heterogeneous applications.

New in 2022.6.0
===============
News
------------
- `oneAPI DPC++ Library Manual Migration Guide`_ to simplify the migration of Thrust* and CUB* APIs from CUDA*.
- ``radix_sort`` and ``radix_sort_by_key`` kernel templates were moved into
``oneapi::dpl::experimental::kt::gpu::esimd`` namespace. The former ``oneapi::dpl::experimental::kt::esimd``
namespace is deprecated and will be removed in a future release.
- The ``for_loop``, ``for_loop_strided``, ``for_loop_n``, ``for_loop_n_strided`` algorithms
in `namespace oneapi::dpl::experimental` are enforced to fail with device execution policies.

New Features
------------
- Added experimental ``inclusive_scan`` kernel template algorithm residing in
the ``oneapi::dpl::experimental::kt::gpu`` namespace.
- ``radix_sort`` and ``radix_sort_by_key`` kernel templates are extended with overloads for out-of-place sorting.
These overloads preserve the input sequence and sort data into the user provided output sequence.
- Improved performance of the ``reduce``, ``min_element``, ``max_element``, ``minmax_element``, ``is_partitioned``,
``lexicographical_compare``, ``binary_search``, ``lower_bound``, and ``upper_bound`` algorithms with device policies.
- ``sort``, ``stable_sort``, ``sort_by_key`` algorithms now use Radix sort [#fnote1]_
for sorting ``sycl::half`` elements compared with ``std::less`` or ``std::greater``.

Fixed Issues
------------
- Fixed compilation errors when using ``reduce``, ``min_element``, ``max_element``, ``minmax_element``,
``is_partitioned``, and ``lexicographical_compare`` with Intel oneAPI DPC++/C++ compiler 2023.0 and earlier.
- Fixed possible data races in the following algorithms used with device execution policies:
``remove_if``, ``unique``, ``inplace_merge``, ``stable_partition``, ``partial_sort_copy``, ``rotate``.
- Fixed excessive copying of data in ``std::vector`` allocated with a USM allocator for standard library
implementations which have allocator information in the ``std::vector::iterator`` type.
- Fixed an issue where checking ``std::is_default_constructible`` for ``transform_iterator`` with a functor
that is not default-constructible could cause a build error or an incorrect result.
- Fixed handling of `sycl device copyable`_ for internal and public oneDPL types.
- Fixed handling of ``std::reverse_iterator`` as input to oneDPL algorithms using a device policy.
- Fixed ``set_intersection`` to always copy from the first input sequence to the output,
where previously some calls would copy from the second input sequence.
- Fixed compilation errors when using ``oneapi::dpl::zip_iterator`` with the oneTBB backend and C++20.

Known Issues and Limitations
----------------------------
New in This Release
^^^^^^^^^^^^^^^^^^^
- ``histogram`` algorithm requires the output value type to be an integral type no larger than 4 bytes
when used with an FPGA policy.

Existing Issues
^^^^^^^^^^^^^^^
See oneDPL Guide for other `restrictions and known limitations`_.

- When compiled with ``-fsycl-pstl-offload`` option of Intel oneAPI DPC++/C++ compiler and with
``libstdc++`` version 8 or ``libc++``, ``oneapi::dpl::execution::par_unseq`` offloads
standard parallel algorithms to the SYCL device similarly to ``std::execution::par_unseq``
in accordance with the ``-fsycl-pstl-offload`` option value.
- When using the dpl modulefile to initialize the user's environment and compiling with ``-fsycl-pstl-offload``
option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory
containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working
environment to avoid the issue.
- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows.
- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined.
- ``sort``, ``stable_sort``, ``sort_by_key``, ``partial_sort_copy`` algorithms may work incorrectly or cause
a segmentation fault when used a DPC++ execution policy for CPU device, and built
on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options.
To avoid the issue, pass ``-fsycl-device-code-split=per_kernel`` option to the compiler.
- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment``
with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux.
To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead.
- Incorrect results may be produced by ``reduce``, ``reduce_by_segment``, and ``transform_reduce``
with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer
and executed on GPU devices.
For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION`` macro to ``1`` before
including oneDPL header files.
- ``std::tuple``, ``std::pair`` cannot be used with SYCL buffers to transfer data between host and device.
- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function
in the Microsoft* Visual C++ standard library.
- The ``oneapi::dpl::experimental::ranges::reverse`` algorithm is not available with ``-fno-sycl-unnamed-lambda`` option.
- STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of
the Microsoft* Visual C++ standard library.

New in 2022.5.0
===============

Expand Down Expand Up @@ -661,8 +745,8 @@ Known Issues and Limitations
(including ``std::ldexp``, ``std::frexp``, ``std::sqrt(std::complex<float>)``) require device support
for double precision.

.. [#fnote1] The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with
``std::less`` or ``std::greater``, otherwise Merge sort.
.. [#fnote1] The sorting algorithms in oneDPL use Radix sort for arithmetic data types and
``sycl::half`` (since oneDPL 2022.6) compared with ``std::less`` or ``std::greater``, otherwise Merge sort.
.. _`the oneDPL Specification`: https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html
.. _`oneDPL Guide`: https://oneapi-src.github.io/oneDPL/index.html
.. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html
Expand All @@ -671,3 +755,4 @@ Known Issues and Limitations
.. _`Macros`: https://oneapi-src.github.io/oneDPL/macros.html
.. _`2022.0 Changes`: https://oneapi-src.github.io/oneDPL/oneDPL_2022.0_changes.html
.. _`sycl device copyable`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable
.. _`oneAPI DPC++ Library Manual Migration Guide`: https://www.intel.com/content/www/us/en/developer/articles/guide/oneapi-dpcpp-library-manual-migration.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,25 @@ namespace oneapi::dpl::experimental::dr
{

template <typename I>
concept remote_iterator = std::forward_iterator<I>&&
requires(I& iter)
concept remote_iterator = std::forward_iterator<I> && requires(I& iter)
{
ranges::rank(iter);
};

template <typename R>
concept remote_range = rng::forward_range<R>&&
requires(R& r)
concept remote_range = rng::forward_range<R> && requires(R& r)
{
ranges::rank(r);
};

template <typename R>
concept distributed_range = rng::forward_range<R>&&
requires(R& r)
concept distributed_range = rng::forward_range<R> && requires(R& r)
{
ranges::segments(r);
};

template <typename I>
concept remote_contiguous_iterator = std::random_access_iterator<I>&&
requires(I& iter)
concept remote_contiguous_iterator = std::random_access_iterator<I> && requires(I& iter)
{
ranges::rank(iter);
{
Expand All @@ -41,39 +37,34 @@ requires(I& iter)
};

template <typename I>
concept distributed_iterator = std::forward_iterator<I>&&
requires(I& iter)
concept distributed_iterator = std::forward_iterator<I> && requires(I& iter)
{
ranges::segments(iter);
};

template <typename R>
concept remote_contiguous_range = remote_range<R>&& rng::random_access_range<R>&&
requires(R& r)
concept remote_contiguous_range = remote_range<R> && rng::random_access_range<R> && requires(R& r)
{
{
ranges::local(r)
} -> rng::contiguous_range;
};

template <typename R>
concept distributed_contiguous_range = distributed_range<R>&& rng::random_access_range<R>&&
requires(R& r)
concept distributed_contiguous_range = distributed_range<R> && rng::random_access_range<R> && requires(R& r)
{
{
ranges::segments(r)
} -> rng::random_access_range;
}
&&remote_contiguous_range<rng::range_value_t<decltype(ranges::segments(std::declval<R>()))>>;
} && remote_contiguous_range<rng::range_value_t<decltype(ranges::segments(std::declval<R>()))>>;

template <typename Iter>
concept distributed_contiguous_iterator = distributed_iterator<Iter>&& std::random_access_iterator<Iter>&&
requires(Iter& iter)
concept distributed_contiguous_iterator = distributed_iterator<Iter> && std::random_access_iterator<Iter> &&
requires(Iter& iter)
{
{
ranges::segments(iter)
} -> rng::random_access_range;
}
&&remote_contiguous_range<rng::range_value_t<decltype(ranges::segments(std::declval<Iter>()))>>;
} && remote_contiguous_range<rng::range_value_t<decltype(ranges::segments(std::declval<Iter>()))>>;

} // namespace oneapi::dpl::experimental::dr
Loading

0 comments on commit 7a0673b

Please sign in to comment.