Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KT] Single Pass Copy_if Kernel Template #1616

Draft
wants to merge 136 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 132 commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
e5c121b
Enable pragma unroll for open-source DPC++
joeatodd Nov 29, 2023
7c1cb0f
clang-format
joeatodd Nov 29, 2023
154161f
Start of single-pass scan kernel template
adamfidel Aug 18, 2023
16ec5ad
Fix hang in inclusive scan
adamfidel Aug 24, 2023
bd89601
Debug statements for scan kernel template
adamfidel Aug 31, 2023
60a69fc
Update scan kernel template test
adamfidel Sep 6, 2023
d526f04
Only have a single work-item per group query for previous tile status
adamfidel Sep 14, 2023
09e9bbf
First attempt at parallel lookback
adamfidel Sep 18, 2023
30e0da7
Working cooperative lookback
adamfidel Sep 22, 2023
2311929
Fix correctness issue with non-power-of-2 sizes
adamfidel Oct 25, 2023
0f58c07
Scan_kt Flags and Values separated (#15)
Alcpz Nov 7, 2023
8af98d6
Refactored Scan_kt code (#16)
Alcpz Nov 7, 2023
3de596e
Scan_kt: Single memory allocation for device_memory (#17) and async f…
Alcpz Nov 8, 2023
2d6ff78
Replace sycl::range with sycl::nd_range for fill
joeatodd Nov 8, 2023
124a912
Bug fix
joeatodd Nov 8, 2023
d716bbd
Global to local then perform op
Nov 8, 2023
6a474c7
Update based on feedback
Nov 10, 2023
ba7be34
Refactored cooperative_loopback and memory implementation (#24)
Alcpz Nov 21, 2023
69cc2fa
[Scan_kt] Atomic64 flags + value implementation (#25)
Alcpz Nov 23, 2023
b5851ce
constexpr, types and remove an unneeded check
joeatodd Nov 23, 2023
c9736c1
Correct static_cast ?
joeatodd Nov 23, 2023
0e450f7
Defer group comms in lookback
joeatodd Nov 23, 2023
95b5552
Disable dynamic tile ID by default
joeatodd Nov 23, 2023
3f30ec8
Reduce from register sums instead of local mem
joeatodd Nov 23, 2023
c147f05
Unrolled version of joint_inclusive_scan
joeatodd Nov 23, 2023
ab69568
Update include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_scan.h
joeatodd Nov 23, 2023
b992b84
Add TODO
joeatodd Nov 23, 2023
37726be
Changing fill kernel for a memset
Alcpz Nov 28, 2023
d7c3c78
Single wg implementation
Alcpz Nov 29, 2023
e42e68d
Add phase 1
Nov 21, 2023
54c0ae9
Add phase 2
Nov 27, 2023
ba543ed
Add phase 3
Nov 28, 2023
cdf74d0
Add count datatype _SizeT
Dec 5, 2023
c5670d8
Move away from atomics
Dec 5, 2023
8918b42
Sort out test logic
Dec 5, 2023
625f315
Remove unnecessary load and store functions
Dec 5, 2023
ca7a830
Release scratch mem
Dec 6, 2023
25238eb
Add single wg copy if
Dec 8, 2023
2f2ccb2
Fix unrolls and use memset
Dec 8, 2023
021fb9a
apply changes to single wg
Dec 8, 2023
c4b05a4
Remove unused variables
Dec 8, 2023
5d1ed8e
Clang-format copy_if_kt commits
Dec 8, 2023
6382acd
Merge pull request #28 from AidanBeltonS/dev/aidan/copy_if
AidanBeltonS Dec 8, 2023
70e751a
Enable pragma unroll for open-source DPC++
joeatodd Nov 29, 2023
a9fdaa3
Start of single-pass scan kernel template
adamfidel Aug 18, 2023
dfef06f
Fix hang in inclusive scan
adamfidel Aug 24, 2023
555f6f9
Debug statements for scan kernel template
adamfidel Aug 31, 2023
10cfc68
Update scan kernel template test
adamfidel Sep 6, 2023
53faf10
Only have a single work-item per group query for previous tile status
adamfidel Sep 14, 2023
dc63d16
First attempt at parallel lookback
adamfidel Sep 18, 2023
f8c3f2b
Working cooperative lookback
adamfidel Sep 22, 2023
1d72d3f
Fix correctness issue with non-power-of-2 sizes
adamfidel Oct 25, 2023
567a50e
Scan_kt Flags and Values separated (#15)
Alcpz Nov 7, 2023
0c91640
Refactored Scan_kt code (#16)
Alcpz Nov 7, 2023
78d2d7d
Scan_kt: Single memory allocation for device_memory (#17) and async f…
Alcpz Nov 8, 2023
55dc287
Replace sycl::range with sycl::nd_range for fill
joeatodd Nov 8, 2023
37bfd1d
Bug fix
joeatodd Nov 8, 2023
21038df
Global to local then perform op
Nov 8, 2023
bdcc9c9
Update based on feedback
Nov 10, 2023
9717e09
Refactored cooperative_loopback and memory implementation (#24)
Alcpz Nov 21, 2023
8d23836
[Scan_kt] Atomic64 flags + value implementation (#25)
Alcpz Nov 23, 2023
c3c3218
constexpr, types and remove an unneeded check
joeatodd Nov 23, 2023
d257702
Correct static_cast ?
joeatodd Nov 23, 2023
43e17ba
Defer group comms in lookback
joeatodd Nov 23, 2023
e5b3ca4
Disable dynamic tile ID by default
joeatodd Nov 23, 2023
ab346da
Reduce from register sums instead of local mem
joeatodd Nov 23, 2023
f87573c
Unrolled version of joint_inclusive_scan
joeatodd Nov 23, 2023
621adf7
Update include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_scan.h
joeatodd Nov 23, 2023
b8c837f
Add TODO
joeatodd Nov 23, 2023
8367be7
Changing fill kernel for a memset
Alcpz Nov 28, 2023
02ff9f3
Single wg implementation
Alcpz Nov 29, 2023
25a93ff
Add phase 1
Nov 21, 2023
aea6009
Add phase 2
Nov 27, 2023
d5c2cb5
Add phase 3
Nov 28, 2023
1f574b8
Add count datatype _SizeT
Dec 5, 2023
16ef9c2
Move away from atomics
Dec 5, 2023
45a1fb7
Sort out test logic
Dec 5, 2023
cec32d7
Remove unnecessary load and store functions
Dec 5, 2023
b7d659c
Release scratch mem
Dec 6, 2023
fdb1824
Add single wg copy if
Dec 8, 2023
1df5fbb
Fix unrolls and use memset
Dec 8, 2023
d8b77fe
apply changes to single wg
Dec 8, 2023
5b53de6
Remove unused variables
Dec 8, 2023
acc4f9b
Clang-format copy_if_kt commits
Dec 8, 2023
faae196
partial changes (fix and amend)
danhoeflinger May 13, 2024
9d39fc6
refactor to share lookback and memory mgr
danhoeflinger May 24, 2024
1b10214
formatting
danhoeflinger May 24, 2024
593c218
remove launder in favor of lazy ctor union
danhoeflinger May 24, 2024
5a1752c
distinguishing kernel names
danhoeflinger May 24, 2024
a0576c3
uglify
danhoeflinger May 24, 2024
c5065a7
format
danhoeflinger May 24, 2024
6a7291a
change single wg scan to submitter and kernel operator
danhoeflinger May 24, 2024
1b78dcd
change scan to submitter and kernel operator
danhoeflinger May 29, 2024
246dbf3
formatting
danhoeflinger May 29, 2024
3023fec
remove unnecessary variable
danhoeflinger May 29, 2024
52f6d82
renaming public APIs
danhoeflinger May 29, 2024
8b504d3
sync with scan for asychronicity
danhoeflinger May 29, 2024
6f60f10
sycl::event returns
danhoeflinger May 29, 2024
0dfcd15
naming and minor fixes
danhoeflinger May 29, 2024
12db722
removing single_wg public api
danhoeflinger May 29, 2024
4850fcf
temporarily disable single wg version
danhoeflinger May 29, 2024
a8ccbd1
wait after call for async algs
danhoeflinger May 30, 2024
70ad489
reenable single wg
danhoeflinger May 30, 2024
e78b72c
only need single phase for single wg
danhoeflinger May 30, 2024
2ca4287
reusing single workgroup copy_if from oneDPL main
danhoeflinger May 31, 2024
7538ed6
add option to opt out of compiling single wg
danhoeflinger May 31, 2024
293d724
adding opt out for single wg inclusive scan
danhoeflinger May 31, 2024
565ba3b
remove single_wg kt, in favor of main oneDPL version
danhoeflinger May 31, 2024
b3fbfe2
trying scalar version of copy_if
danhoeflinger Jun 4, 2024
e49bc9f
fix
danhoeflinger Jun 4, 2024
92438ee
fix
danhoeflinger Jun 4, 2024
5495da2
full sum
danhoeflinger Jun 4, 2024
a7ca1b5
switching arg to const ref
danhoeflinger Jun 4, 2024
2d53ead
branch by tile, not by workitem
danhoeflinger Jun 4, 2024
f26aff0
removing unused block_strided version
danhoeflinger Jun 4, 2024
c0ab651
range API and formatting
danhoeflinger Jun 4, 2024
517c341
removing unnecessary stuff
danhoeflinger Jun 4, 2024
427a5f4
naming consistency
danhoeflinger Jun 4, 2024
e9091e1
formatting
danhoeflinger Jun 4, 2024
7315d2b
reverting overreach
danhoeflinger Jun 4, 2024
0983043
upgrading tests to match scan, cmake
danhoeflinger Jun 5, 2024
e09ccae
test bugfix
danhoeflinger Jun 5, 2024
96acd30
bugfix for non-full case
danhoeflinger Jun 5, 2024
f15c759
fix range to check
danhoeflinger Jun 5, 2024
b195e4d
adjust data generation
danhoeflinger Jun 5, 2024
4528cbb
better fix for non-full case
danhoeflinger Jun 5, 2024
831f9c9
removing old test
danhoeflinger Jun 5, 2024
7be4594
undo change to unroll version check
danhoeflinger Jun 6, 2024
68c258d
formatting
danhoeflinger Jun 6, 2024
347bcf5
allowing alg to dictate active threads
danhoeflinger Jun 6, 2024
62f58e2
bugfix for indexes
danhoeflinger Jun 6, 2024
5c4bd74
clang format
danhoeflinger Jun 6, 2024
98324ed
address reviewer comments
danhoeflinger Jun 18, 2024
4377c2d
simplify data generation and cutoff calculation.
danhoeflinger Jun 18, 2024
496ed4d
strip out single workgroup opt out
danhoeflinger Jun 18, 2024
2d9e8a7
minimal data generation changes
danhoeflinger Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion include/oneapi/dpl/experimental/kt/kernel_param.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,13 @@ namespace oneapi::dpl::experimental::kt
{

template <std::uint16_t __data_per_work_item, std::uint16_t __work_group_size,
typename _KernelName = oneapi::dpl::execution::DefaultKernelName>
typename _KernelName = oneapi::dpl::execution::DefaultKernelName, typename _SingleWgOptOut = std::false_type>
dmitriy-sobolev marked this conversation as resolved.
Show resolved Hide resolved
struct kernel_param
{
static constexpr std::uint16_t data_per_workitem = __data_per_work_item;
static constexpr std::uint16_t workgroup_size = __work_group_size;
using kernel_name = _KernelName;
using single_wg_opt_out = _SingleWgOptOut;
};

} // namespace oneapi::dpl::experimental::kt
Expand Down
Loading
Loading