Add code change workaround for 64-bit reduce_by_segment bug #1791

mmichel11 · 2024-08-22T18:25:36Z

There is an IGC bug that affects reduce_by_segment with 64-bit types on GPU Series Max devices which has previously required us to provide the ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION macro workaround. This workaround invokes the legacy implementation which is around ~3x slower but produces correct results.

The IGC bug still exists, but I have a found a workaround with negligible performance impact within our reduce_by_segment implementation. This enables users to invoke the faster reduce_by_segment implementation without correctness issues.

By first initializing the private memory arrays to the known identity element prior to loading real data into some of the array indices, the register filling bug is avoided. I have verified with oneDPL tests (which previously caught this issue) and with external tests.

I have also removed the macro workaround and additional test.

I've collected information on the performance impact which is negligible. Feel free to request if you would like to see it.

Filling the SYCL private memory array with the identity prior to loading data works around the encountered IGC bug. No real performance impact can be measured with this change. The current macro workaround is also removed Signed-off-by: Matthew Michel <[email protected]>

Signed-off-by: Matthew Michel <[email protected]>

mmichel11 · 2024-08-22T18:27:59Z

include/oneapi/dpl/pstl/hetero/dpcpp/unseq_backend_sycl.h

@@ -34,21 +34,11 @@ namespace unseq_backend
 //This optimization depends on Intel(R) oneAPI DPC++ Compiler implementation such as support of binary operators from std namespace.
 //We need to use defined(SYCL_IMPLEMENTATION_INTEL) macro as a guard.

-template <typename _Tp>
-inline constexpr bool __can_use_known_identity =
-#    if ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION


Is it okay if we directly remove this macro check, or should it be deprecated first with a #warning or something similar? From the user's perspective, they should just see reduce_by_segment speedup with 64-bit types.

adamfidel · 2024-08-26T16:08:44Z

include/oneapi/dpl/internal/reduce_by_segment_impl.h

+                // TODO: Remove this initialization to the identity when possible. We load real data to __loc_partials
+                // in the first loop below but this initialization to the identity works around an IGC register
+                // filling bug.
+                std::array<__val_type, __vals_per_item> __loc_partials = {__identity};


Is this meant to fill the array with the identity value? Because I believe as it is currently written, only the first value in the array would be populated and the rest will be uninitialized. If the intent is for all of the elements to be the identity, then this can be written as:

Suggested change

std::array<__val_type, __vals_per_item> __loc_partials = {__identity};

std::array<__val_type, __vals_per_item> __loc_partials;

std::fill(__loc_partials.begin(), __loc_partials.end(), __identity);

Thanks, it looks like the rest of the elements may be initialized to 0: https://en.cppreference.com/w/c/language/array_initialization.

The way the fix was implemented still worked since it does not matter what is loaded into the array as long as it's something. However, I switched to your suggestion to be consistent.

Filling the array after its definition seems to reintroduce the bug. I will see if I can find a better solution. I suppose what we originally had adds a default constructability requirement we do not want.

Signed-off-by: Matthew Michel <[email protected]>

SergeyKopienko · 2024-09-04T08:02:24Z

include/oneapi/dpl/internal/reduce_by_segment_impl.h

@@ -351,7 +351,12 @@ __sycl_reduce_by_segment(__internal::__hetero_tag<_BackendTag>, _ExecutionPolicy
            __seg_reduce_wg_kernel,
 #endif
            sycl::nd_range<1>{__n_groups * __wgroup_size, __wgroup_size}, [=](sycl::nd_item<1> __item) {
-                ::std::array<__val_type, __vals_per_item> __loc_partials;
+                auto __identity = unseq_backend::__known_identity<_BinaryOperator, __val_type>;


Let's use __val_type instead of auto - it's will more readable, I think.

mmichel11 added 2 commits August 20, 2024 14:32

Add comment explaining workaround

ce29407

Signed-off-by: Matthew Michel <[email protected]>

mmichel11 added the bug label Aug 22, 2024

mmichel11 commented Aug 22, 2024

View reviewed changes

mmichel11 marked this pull request as ready for review August 22, 2024 18:30

mmichel11 requested review from julianmi, dmitriy-sobolev, SergeyKopienko, danhoeflinger and adamfidel August 22, 2024 18:43

adamfidel reviewed Aug 26, 2024

View reviewed changes

Fill all array elements with __identity instead of just the first

3e08d9d

Signed-off-by: Matthew Michel <[email protected]>

SergeyKopienko reviewed Sep 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add code change workaround for 64-bit reduce_by_segment bug #1791

Add code change workaround for 64-bit reduce_by_segment bug #1791

mmichel11 commented Aug 22, 2024 •

edited

Loading

mmichel11 Aug 22, 2024

adamfidel Aug 26, 2024

mmichel11 Aug 26, 2024

mmichel11 Aug 26, 2024

SergeyKopienko Sep 4, 2024

	std::array<__val_type, __vals_per_item> __loc_partials = {__identity};
	std::array<__val_type, __vals_per_item> __loc_partials;
	std::fill(__loc_partials.begin(), __loc_partials.end(), __identity);

Add code change workaround for 64-bit reduce_by_segment bug #1791

Are you sure you want to change the base?

Add code change workaround for 64-bit reduce_by_segment bug #1791

Conversation

mmichel11 commented Aug 22, 2024 • edited Loading

mmichel11 Aug 22, 2024

Choose a reason for hiding this comment

adamfidel Aug 26, 2024

Choose a reason for hiding this comment

mmichel11 Aug 26, 2024

Choose a reason for hiding this comment

mmichel11 Aug 26, 2024

Choose a reason for hiding this comment

SergeyKopienko Sep 4, 2024

Choose a reason for hiding this comment

mmichel11 commented Aug 22, 2024 •

edited

Loading