Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use device iterators in oneDPL algorithms #855

Open
BenBrock opened this issue Mar 24, 2023 · 3 comments
Open

Cannot use device iterators in oneDPL algorithms #855

BenBrock opened this issue Mar 24, 2023 · 3 comments

Comments

@BenBrock
Copy link
Contributor

I'd like to pass device iterators---by which I mean random access iterators that work in device kernels---into oneDPL algorithms. Currently this doesn't work.

Here's a minimal example of what I'd like to do:

// Allocate USM device buffer
int* x_d = sycl::malloc_device<int>(100, q);

// Fill buffer pointed to by `x_d` with data.
. . .

// Create span from buffer
std::span<int> x(x_d, 100);

// Pass iterators from `std::span` into oneDPL reduce.
auto sum = oneapi::dpl::reduce(oneapi::dpl::execution::make_device_policy(q),
                               x.begin(), x.end(), 0, std::plus());

[full code tarball]

Here, instead of passing an int* to a USM device buffer into oneDPL reduce, I'm passing in the iterator type of std::span, which happens to be GCC's __normal_iterator. Currently, this results in a seg fault, I believe because oneDPL is creating a CPU-side copy of the buffer before launching the algorithm. (And the CPU-side access of a USM device allocation causes a seg fault.) Looking through some of the oneDPL code, it seems like this is what happens with most iterators, except for raw pointers and some special iterator types.

In this specific example, I could of course call .data() instead of .begin() to get raw pointers, which would have the desired behavior. However, I'm interested in using more complicated device iterator types that can't be represented by raw pointers.

Is there any way to have oneDPL directly launch the kernel with my iterators, instead of copying the data CPU-side?

@MikeDvorskiy
Copy link
Contributor

Talking about any containers based on USM - oneDPL supports just std::vector with USM allocator.

https://oneapi-src.github.io/oneDPL/parallel_api/pass_data_algorithms.html

Pass Data to Algorithms
You can use one of the following ways to pass data to an algorithm executed with a device policy:
- oneapi:dpl::begin and oneapi::dpl::end functions
- Unified shared memory (USM) pointers and std::vector with USM allocators
- Iterators of host-side std::vector

@MikeDvorskiy
Copy link
Contributor

MikeDvorskiy commented Mar 24, 2023

Is there any way to have oneDPL directly launch the kernel with my iterators, instead of copying the data CPU-side?

Currently, just USM pointers, std::vector<..., USM_allocator>::begin or begin/end over a sycl::buffer.

@BenBrock
Copy link
Contributor Author

Thanks to @MikeDvorskiy for pointing out oneapi::dpl::__internal::is_passed_directly, which identifies whether an iterator can be directly passed into a SYCL kernel when a oneDPL algorithm is executed.

My proposal to address this issue, along with #854, is to introduce a direct_iterator that wraps device iterators before they are passed into oneDPL. The helper function make_direct_iterator() returns one of these direct_iterators, and achieves the desired behavior when used with oneDPL algorithms like the example above.

// Allocate USM device buffer
int* x_d = sycl::malloc_device<int>(100, q);

// Fill buffer pointed to by `x_d` with data.
. . .

// Create span from buffer
std::span<int> x(x_d, 100);

// Pass iterators from `std::span` into oneDPL reduce.
auto sum = oneapi::dpl::reduce(oneapi::dpl::execution::make_device_policy(q),
                               oneapi::dpl::make_direct_iterator(x.begin()),
                               oneapi::dpl::make_direct_iterator(x.end()),
                               0, std::plus());

I have implemented something similar internally in our distributed ranges codebase, but I've also written a quick draft of this in a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants