-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of Place Esimd Radix Sort and Tests #1439
Out of Place Esimd Radix Sort and Tests #1439
Conversation
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
49465ec
to
12cd664
Compare
radix_sort_out_of_place(sycl::queue __q, _KeysRng1&& __keys_rng, _KeysRng2&& __keys_rng_out, _KernelParam __param = {}) | ||
{ | ||
__impl::__check_esimd_sort_params<__radix_bits, _KernelParam::data_per_workitem, _KernelParam::workgroup_size>(); | ||
if (__keys_rng.size() <= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you wrote here <=
?
Usually we use the code like
if (__n == 0)
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks.
|
||
__event_chain = __radix_sort_onesweep_scan_submitter<__stage_count, __bin_count, _EsimdRadixSortScan>()( | ||
__q, __mem_holder.__global_hist_ptr(), __n, __event_chain); | ||
|
||
for (::std::uint32_t __stage = 0; __stage < __stage_count; __stage++) | ||
auto __submit_iteration = [&](auto __input_, auto __output_, auto __p_global_hist_, auto __p_group_hists_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the constructions like this may decrease performance, I think.
Sometimes they are required, of course. But in this place it's sooks like for me as the simple code style.
Of course, it's only my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention is to reduce repetitive code to make more apparent what is actually different between these invocations of __radix_sort_onesweep_submitter
, but I don't think it is required technically.
Also, my changes unroll the first iteration of the submission before the loop, as its inputs / outputs are special cased so there are three redundant calls, rather than just two.
I can go back to something more similar to the previous way of doing it without the lambda. I don't have a strong preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the lambda __submit_iteration
.
template <::std::uint8_t __radix_bits, ::std::uint16_t __data_per_work_item, ::std::uint16_t __work_group_size, | ||
typename _RngPack, typename _MemHolder> | ||
void | ||
__allocate_temp(_MemHolder& __mem_holder, ::std::size_t __n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better to return _MemHolder
instance from this method?
In this case we are able to not declare some their methods as const
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally abstracted these two functions out to be separate functions to be able to reuse from multiple places, but then later unified out_of_place and in_place versions together anyway pulling that decision up the stack, so they don't even need to be separate functions. I'll return them both to be within __onesweep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've restored this to be within the __onesweep
function.
auto expected_first = oneapi::dpl::make_zip_iterator(std::begin(expected_keys), std::begin(expected_values)); | ||
std::stable_sort(expected_first, expected_first + size, CompareKey<isAscending>{}); | ||
|
||
std::string parameters_msg = ", n: " + std::to_string(size) + ", sizeof(key): " + std::to_string(sizeof(KeyT)) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use std::strstream
you may to build this message without a lot of std::to_string
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks.
using _EsimRadixSortKernel = | ||
oneapi::dpl::__par_backend_hetero::__internal::__kernel_name_provider<__esimd_radix_sort_one_wg<_KernelName>>; | ||
|
||
return __radix_sort_one_wg_submitter<__is_ascending, __radix_bits, __data_per_work_item, __work_group_size, _KeyT, | ||
_EsimRadixSortKernel>()(__q, ::std::forward<_RngPack>(__pack), __n); | ||
_EsimRadixSortKernel>()(__q, __pack_in, __pack_out, __n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you not use std::forward
for pass __pack_in
and __pack_out
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks.
include/oneapi/dpl/experimental/kt/internal/esimd_radix_sort_dispatchers.h
Outdated
Show resolved
Hide resolved
include/oneapi/dpl/experimental/kt/internal/esimd_radix_sort_dispatchers.h
Outdated
Show resolved
Hide resolved
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
include/oneapi/dpl/experimental/kt/internal/esimd_radix_sort_dispatchers.h
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me besides _out_of_place
function name part, which is mentioned in the description.
if (__stage % 2 != 0) | ||
{ | ||
__event_chain = __radix_sort_onesweep_submitter<__is_ascending, __radix_bits, __data_per_work_item, | ||
__work_group_size, _EsimdRadixSortSweep>()( | ||
__q, __virt_pack1, __virt_pack2, __p_global_hist, __p_group_hists, __sweep_work_group_count, __n, | ||
__stage, __event_chain); | ||
} | ||
else | ||
{ | ||
__event_chain = __radix_sort_onesweep_submitter<__is_ascending, __radix_bits, __data_per_work_item, | ||
__work_group_size, _EsimdRadixSortSweep>()( | ||
__q, __virt_pack2, __virt_pack1, __p_global_hist, __p_group_hists, __sweep_work_group_count, __n, | ||
__stage, __event_chain); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I miss something obvious, these two kernel submissions will not generate identical kernels, and so should really have distinct names (like with the Even/Odd suffixes before the patch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now in hindsight, I think you are correct, because the types of __input_pack
, __virt_pack1
and __virt_pack2
may not match.
However, if they do match, do you agree that these would generate an identical kernel?
My proposal would be to merely include the types of the input & output in the kernel name, that way they aren't required to generate two identical kernels with different names when the types match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing some testing with no unnamed lambda, and seeing the issue. I'll provide an update when it is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the issue as proposed by including the decayed RngPack typenames in the kernel name.
I've also fixed kernel name clashing in the tests for this and the in-place version of esimd radix sort, and added a test utility to generate a new kernel_params struct with a new kernelname.
Thanks for catching this.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@dmitriy-sobolev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach to get rid of out_of_place
suffix looks good to me. I've left some minor comments, though.
using _EsimdRadixSortSweepInitial = oneapi::dpl::__par_backend_hetero::__internal::__kernel_name_provider< | ||
::std::conditional_t<__has_values, __esimd_radix_sort_onesweep_by_key<std::decay_t<_RngPack1>, std::decay_t<_RngPack2>, _KernelName>, | ||
__esimd_radix_sort_onesweep<std::decay_t<_RngPack1>, std::decay_t<_RngPack2>, _KernelName>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not strictly opposed of that approach, but there is a drawback: kernel names will become longer (I guess, much longer).
The motivation of the change is to separate kernel names for the potential arrangements of _RngPack1
, _RngPack2
, _RngPack3
, isn't it? If so, what do you think about using some constant instead, e.g.:
__esimd_radix_sort_onesweep<0, ...>
instead of__esimd_radix_sort_onesweep<std::decay_t<_RngPack1>, std::decay_t<_RngPack2>, ...>
,__esimd_radix_sort_onesweep<1, ...>
instead of__esimd_radix_sort_onesweep<std::decay_t<_RngPack3>, std::decay_t<_RngPack2>, ...>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to have a unique kernel name if and only if we have a unique kernel. We want to avoid same mangled name error, but also we don't want to add extra kernel compilation only to support a differently named kernel when the types match exactly.
Lets say _RngPack1
is the exact same as _RngPack3
, we then want __esimd_radix_sort_onesweep<std::decay_t<_RngPack1>, std::decay_t<_RngPack2>, ...>
and __esimd_radix_sort_onesweep<std::decay_t<_RngPack3>, std::decay_t<_RngPack2>, ...>
to be the exact same as well. If not, I think we require an extra kernel compilation only to satisfy having a different name.
If all three exactly match, we only want one kernel and one kernel name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable, thanks for describing.
Probably, it is still possible to use shorter kernel names. I will think about it separately as it is rather a "nice to have" thing.
Currently, I see the following options:
Rng1 type | Rng2 type | Rng3 type | Kernels | Constants |
---|---|---|---|---|
A | B | C | A->B, B->C, C->B | 0,1, 2 |
A | A | A | A->A, A->A, A->A | 3, 3, 3 |
A | A | B | A->A, A->B, B->A | 3, 0, 4 |
A | B | A | A->B, B->A, A->B | 0, 4, 0 |
B | A | A | B->A, A->A, A->A | 4, 3, 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested in the idea, but to create such constants, you would need to basically check is_same<decay_t<_RngPack1>, decay_t<_RngPack2>>
for all three combos and assign constants appropriately.
The code is ugly either way. Perhaps the shorter kernel name is worth it.
include/oneapi/dpl/experimental/kt/internal/esimd_radix_sort_dispatchers.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the code once again. I'm ready to approve once minor issues noted above have been solved.
Signed-off-by: Dan Hoeflinger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
APIs are disambiguated between iterator and range interfaces using
enable_if
to check if a type is an iterator. Documentation updates will be made in a separate PR.