-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use __result_and_scratch_storage within scan kernels #1770
Use __result_and_scratch_storage within scan kernels #1770
Conversation
27bdc8f
to
c8a274a
Compare
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h
Outdated
Show resolved
Hide resolved
c8a274a
to
4e4c0bc
Compare
I believe we should pass And may be the same in |
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_utils.h
Outdated
Show resolved
Hide resolved
It is a deep copy in the location you mention, although I don't think its that expensive since it is only smart pointers, not a copy of the data they point to. In that case, I think we do use it after the fact (to send it to the The only deep copies we have in this PR are into |
Exactly, under deep copy I mean copy of smart-pointers here with all atomic counter operations in their copy implementations. But we are able to avoid these at all. |
Deep copy: I agree that it is probably not that expensive but still unneeded. I will change it in a separate PR. Let's focus on getting the reduce-then-scan implementation in.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for taking a look at these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Waiting to merge this until #1769 is merged to avoid rebase complexity, unless that PR gets stuck for a long time. |
The base branch was changed.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
47a227c
to
c471ed5
Compare
Signed-off-by: Dan Hoeflinger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adjust existing scan kernels to use
__result_and_scratch_storage
rather than a temporary buffer for both scratch space and for the return within the resulting future.This is important because:
reduce_then_scan
kernels which utilize this structure for temporary and return storage. Since we have runtime branches selecting which algorithm to use (existing scan, single wg scan, single wg copy_if, and soon reduce_then_scan), these all need to have the same return future type. This means they should all use__result_and_scratch_storage
for their return (and therefore also any temporary storage).Single WG scan does not have a return or temporary storage, so it simply creates a dummy
__result_and_scratch_storage
to make a consistent type for the return, and no changes are needed in the kernel.For copy_if, unique and partition scan kernels, we need to write to the output result storage. Previous to to PR, we simply used the last element of the temporary storage buffer in the returned future.
This PR also avoids double dereferences, by passing pointers around and using them as pointers, rather than as const lvalue references to "accessors". (as mentioned in #1751) Until this PR, these actually were accessors, but now they are pointers.
This PR is targeted to #1769, to allow for a clean diff, and is a part of the following sequence of PRs meant to be merged in order:
#1769 [MERGED] Relocate __lazy_ctor_storage to utils header
#1770 Use __result_and_scratch_storage within scan kernels (This PR)
#1762 Add reduce_then_scan algorithm for transform scan family
#1763 Make Copy_if family of APIs use reduce_then_scan algorithm
#1764 Make Partition family of APIs use reduce_then_scan algorithm
#1765 Make Unique family of APIs use reduce_then_scan algorithm
This work is a collaboration between @mmichel11 @adamfidel and @danhoeflinger, and based upon an original prototype by Ted Painter.