-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-pass scan kernel template #1320
Conversation
…o dev/adamfidel/scan_kt
2ff21c9
to
f1e0709
Compare
Co-authored-by: Dmitriy Sobolev <[email protected]>
Co-authored-by: Dmitriy Sobolev <[email protected]>
…adamfidel/scan_kt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor comments.
Meanwhile, I'm looking at the kernel internals and will be back with the comments soon if any.
// Group reduction produces wrong results with multiplication of 64-bit for certain driver versions | ||
// TODO: When a driver fix is provided to resolve this issue, consider altering this macro or checking the driver version at runtime | ||
// of the underlying sycl::device to determine whether to include or exclude 64-bit type tests. | ||
#define _PSTL_GROUP_REDUCTION_MULT_INT64_BROKEN 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for the same problem as the macro right above (which is also documented as a known issue), or for a different one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was a different problem, but looking at it more, I think it's the same issue that the macros _PSTL_ICPX_TEST_RED_BY_SEG_BROKEN_64BIT_TYPES
and ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
are meant for.
I know the internal ticket number for this issue, but it's hard to correlate that this is the same issue that these other macros are for because we do not (rightfully so) associate our internal issue tracking with these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After an offline discussion with @mmichel11, we found that _PSTL_ICPX_TEST_RED_BY_SEG_BROKEN_64BIT_TYPES
is for a separate issue.
The macro ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
seems to be from the same root issue, but I feel that we should still use a separate macro for the following reasons:
- The
ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTION
macro is expected to be defined externally through CMake rather than defined intest_config.h
with the other macros - The macro's name suggests that when defined, it will apply some workaround for 64-bit group reductions with multiplies, whereas in this case, we want to explictly disable the tests cases with 64-bit types and the
std::multiplies
binary operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked everything besides atomics, especially the specifics of the used memory ordering. Everything what I've checked looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. I checked it far more thoroughly earlier on, but I have been reviewing delta commits and following the comments, and they look good to me.
My outstanding issues have been:
- init value, which we have decided to address in a later PR
- Memory footprint in SLM of joint algorithms, and how to communicate memory footprint requirements to the end user to provide them the proper guidance for selecting kernel params. I think this remains somewhat unresolved (as the sycl feature is a black box), but I think we can make a GH issue, and improve on this in the future (if we feel unsatisfied with our guidance in the docs with the current code). I don't think it is worth holding back this PR for this.
It would probably be good to get another approval from one of the others who have been working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I've reviewed the atomic operations and found no race conditions.
One more suggestions to consider doing in a separate PR: use regular (non-atomic) variables for values.
Thanks all for the reviews! @danhoeflinger and @dmitriy-sobolev, I will create GH issues to address the next steps that you have mentioned in your approvals. |
This PR provides an implementation of the single-pass scan algorithm as a kernel template.