diff --git a/documentation/release_notes.rst b/documentation/release_notes.rst index a86891531a8..3b5cb852553 100644 --- a/documentation/release_notes.rst +++ b/documentation/release_notes.rst @@ -8,6 +8,90 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C and provides high-productivity APIs aimed to minimize programming efforts of C++ developers creating efficient heterogeneous applications. +New in 2022.6.0 +=============== +News +------------ +- `oneAPI DPC++ Library Manual Migration Guide`_ to simplify the migration of Thrust* and CUB* APIs from CUDA*. +- ``radix_sort`` and ``radix_sort_by_key`` kernel templates were moved into + ``oneapi::dpl::experimental::kt::gpu::esimd`` namespace. The former ``oneapi::dpl::experimental::kt::esimd`` + namespace is deprecated and will be removed in a future release. +- The ``for_loop``, ``for_loop_strided``, ``for_loop_n``, ``for_loop_n_strided`` algorithms + in `namespace oneapi::dpl::experimental` are enforced to fail with device execution policies. + +New Features +------------ +- Added experimental ``inclusive_scan`` kernel template algorithm residing in + the ``oneapi::dpl::experimental::kt::gpu`` namespace. +- ``radix_sort`` and ``radix_sort_by_key`` kernel templates are extended with overloads for out-of-place sorting. + These overloads preserve the input sequence and sort data into the user provided output sequence. +- Improved performance of the ``reduce``, ``min_element``, ``max_element``, ``minmax_element``, ``is_partitioned``, + ``lexicographical_compare``, ``binary_search``, ``lower_bound``, and ``upper_bound`` algorithms with device policies. +- ``sort``, ``stable_sort``, ``sort_by_key`` algorithms now use Radix sort [#fnote1]_ + for sorting ``sycl::half`` elements compared with ``std::less`` or ``std::greater``. + +Fixed Issues +------------ +- Fixed compilation errors when using ``reduce``, ``min_element``, ``max_element``, ``minmax_element``, + ``is_partitioned``, and ``lexicographical_compare`` with Intel oneAPI DPC++/C++ compiler 2023.0 and earlier. +- Fixed possible data races in the following algorithms used with device execution policies: + ``remove_if``, ``unique``, ``inplace_merge``, ``stable_partition``, ``partial_sort_copy``, ``rotate``. +- Fixed excessive copying of data in ``std::vector`` allocated with a USM allocator for standard library + implementations which have allocator information in the ``std::vector::iterator`` type. +- Fixed an issue where checking ``std::is_default_constructible`` for ``transform_iterator`` with a functor + that is not default-constructible could cause a build error or an incorrect result. +- Fixed handling of `sycl device copyable`_ for internal and public oneDPL types. +- Fixed handling of ``std::reverse_iterator`` as input to oneDPL algorithms using a device policy. +- Fixed ``set_intersection`` to always copy from the first input sequence to the output, + where previously some calls would copy from the second input sequence. +- Fixed compilation errors when using ``oneapi::dpl::zip_iterator`` with the oneTBB backend and C++20. + +Known Issues and Limitations +---------------------------- +New in This Release +^^^^^^^^^^^^^^^^^^^ +- ``histogram`` algorithm requires the output value type to be an integral type no larger than 4 bytes + when used with an FPGA policy. + +Existing Issues +^^^^^^^^^^^^^^^ +See oneDPL Guide for other `restrictions and known limitations`_. + +- When compiled with ``-fsycl-pstl-offload`` option of Intel oneAPI DPC++/C++ compiler and with + ``libstdc++`` version 8 or ``libc++``, ``oneapi::dpl::execution::par_unseq`` offloads + standard parallel algorithms to the SYCL device similarly to ``std::execution::par_unseq`` + in accordance with the ``-fsycl-pstl-offload`` option value. +- When using the dpl modulefile to initialize the user's environment and compiling with ``-fsycl-pstl-offload`` + option of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory + containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working + environment to avoid the issue. +- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows. +- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data + used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``, + it is required that the provided input and destination iterators are equality comparable. + Furthermore, the equality comparison of the input and destination iterator must evaluate to true. + If these conditions are not met, the result of these algorithm calls is undefined. +- ``sort``, ``stable_sort``, ``sort_by_key``, ``partial_sort_copy`` algorithms may work incorrectly or cause + a segmentation fault when used a DPC++ execution policy for CPU device, and built + on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. + To avoid the issue, pass ``-fsycl-device-code-split=per_kernel`` option to the compiler. +- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``, + ``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment`` + with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler + with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux. + To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead. +- Incorrect results may be produced by ``reduce``, ``reduce_by_segment``, and ``transform_reduce`` + with 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer + and executed on GPU devices. + For a workaround, define the ``ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION`` macro to ``1`` before + including oneDPL header files. +- ``std::tuple``, ``std::pair`` cannot be used with SYCL buffers to transfer data between host and device. +- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function + in the Microsoft* Visual C++ standard library. +- The ``oneapi::dpl::experimental::ranges::reverse`` algorithm is not available with ``-fno-sycl-unnamed-lambda`` option. +- STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of + the Microsoft* Visual C++ standard library. + New in 2022.5.0 =============== @@ -661,8 +745,8 @@ Known Issues and Limitations (including ``std::ldexp``, ``std::frexp``, ``std::sqrt(std::complex)``) require device support for double precision. -.. [#fnote1] The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with - ``std::less`` or ``std::greater``, otherwise Merge sort. +.. [#fnote1] The sorting algorithms in oneDPL use Radix sort for arithmetic data types and + ``sycl::half`` (since oneDPL 2022.6) compared with ``std::less`` or ``std::greater``, otherwise Merge sort. .. _`the oneDPL Specification`: https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html .. _`oneDPL Guide`: https://oneapi-src.github.io/oneDPL/index.html .. _`Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes`: https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html @@ -671,3 +755,4 @@ Known Issues and Limitations .. _`Macros`: https://oneapi-src.github.io/oneDPL/macros.html .. _`2022.0 Changes`: https://oneapi-src.github.io/oneDPL/oneDPL_2022.0_changes.html .. _`sycl device copyable`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable +.. _`oneAPI DPC++ Library Manual Migration Guide`: https://www.intel.com/content/www/us/en/developer/articles/guide/oneapi-dpcpp-library-manual-migration.html