-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use device iterators in oneDPL algorithms #855
Comments
Talking about any containers based on USM - oneDPL supports just std::vector with USM allocator. https://oneapi-src.github.io/oneDPL/parallel_api/pass_data_algorithms.html
|
Currently, just USM pointers, std::vector<..., USM_allocator>::begin or begin/end over a sycl::buffer. |
Thanks to @MikeDvorskiy for pointing out My proposal to address this issue, along with #854, is to introduce a // Allocate USM device buffer
int* x_d = sycl::malloc_device<int>(100, q);
// Fill buffer pointed to by `x_d` with data.
. . .
// Create span from buffer
std::span<int> x(x_d, 100);
// Pass iterators from `std::span` into oneDPL reduce.
auto sum = oneapi::dpl::reduce(oneapi::dpl::execution::make_device_policy(q),
oneapi::dpl::make_direct_iterator(x.begin()),
oneapi::dpl::make_direct_iterator(x.end()),
0, std::plus()); I have implemented something similar internally in our distributed ranges codebase, but I've also written a quick draft of this in a PR. |
I'd like to pass device iterators---by which I mean random access iterators that work in device kernels---into oneDPL algorithms. Currently this doesn't work.
Here's a minimal example of what I'd like to do:
[full code tarball]
Here, instead of passing an
int*
to a USM device buffer into oneDPL reduce, I'm passing in the iterator type ofstd::span
, which happens to be GCC's__normal_iterator
. Currently, this results in a seg fault, I believe because oneDPL is creating a CPU-side copy of the buffer before launching the algorithm. (And the CPU-side access of a USM device allocation causes a seg fault.) Looking through some of the oneDPL code, it seems like this is what happens with most iterators, except for raw pointers and some special iterator types.In this specific example, I could of course call
.data()
instead of.begin()
to get raw pointers, which would have the desired behavior. However, I'm interested in using more complicated device iterator types that can't be represented by raw pointers.Is there any way to have oneDPL directly launch the kernel with my iterators, instead of copying the data CPU-side?
The text was updated successfully, but these errors were encountered: