Cannot use device iterators in oneDPL algorithms #855

BenBrock · 2023-03-24T00:14:09Z

I'd like to pass device iterators---by which I mean random access iterators that work in device kernels---into oneDPL algorithms. Currently this doesn't work.

Here's a minimal example of what I'd like to do:

// Allocate USM device buffer
int* x_d = sycl::malloc_device<int>(100, q);

// Fill buffer pointed to by `x_d` with data.
. . .

// Create span from buffer
std::span<int> x(x_d, 100);

// Pass iterators from `std::span` into oneDPL reduce.
auto sum = oneapi::dpl::reduce(oneapi::dpl::execution::make_device_policy(q),
                               x.begin(), x.end(), 0, std::plus());

[full code tarball]

Here, instead of passing an int* to a USM device buffer into oneDPL reduce, I'm passing in the iterator type of std::span, which happens to be GCC's __normal_iterator. Currently, this results in a seg fault, I believe because oneDPL is creating a CPU-side copy of the buffer before launching the algorithm. (And the CPU-side access of a USM device allocation causes a seg fault.) Looking through some of the oneDPL code, it seems like this is what happens with most iterators, except for raw pointers and some special iterator types.

In this specific example, I could of course call .data() instead of .begin() to get raw pointers, which would have the desired behavior. However, I'm interested in using more complicated device iterator types that can't be represented by raw pointers.

Is there any way to have oneDPL directly launch the kernel with my iterators, instead of copying the data CPU-side?

The text was updated successfully, but these errors were encountered:

MikeDvorskiy · 2023-03-24T14:23:01Z

Talking about any containers based on USM - oneDPL supports just std::vector with USM allocator.

https://oneapi-src.github.io/oneDPL/parallel_api/pass_data_algorithms.html

Pass Data to Algorithms
You can use one of the following ways to pass data to an algorithm executed with a device policy:
- oneapi:dpl::begin and oneapi::dpl::end functions
- Unified shared memory (USM) pointers and std::vector with USM allocators
- Iterators of host-side std::vector

MikeDvorskiy · 2023-03-24T14:24:46Z

Is there any way to have oneDPL directly launch the kernel with my iterators, instead of copying the data CPU-side?

Currently, just USM pointers, std::vector<..., USM_allocator>::begin or begin/end over a sycl::buffer.

BenBrock · 2023-03-25T00:52:52Z

Thanks to @MikeDvorskiy for pointing out oneapi::dpl::__internal::is_passed_directly, which identifies whether an iterator can be directly passed into a SYCL kernel when a oneDPL algorithm is executed.

My proposal to address this issue, along with #854, is to introduce a direct_iterator that wraps device iterators before they are passed into oneDPL. The helper function make_direct_iterator() returns one of these direct_iterators, and achieves the desired behavior when used with oneDPL algorithms like the example above.

// Allocate USM device buffer
int* x_d = sycl::malloc_device<int>(100, q);

// Fill buffer pointed to by `x_d` with data.
. . .

// Create span from buffer
std::span<int> x(x_d, 100);

// Pass iterators from `std::span` into oneDPL reduce.
auto sum = oneapi::dpl::reduce(oneapi::dpl::execution::make_device_policy(q),
                               oneapi::dpl::make_direct_iterator(x.begin()),
                               oneapi::dpl::make_direct_iterator(x.end()),
                               0, std::plus());

I have implemented something similar internally in our distributed ranges codebase, but I've also written a quick draft of this in a PR.

This was referenced Mar 24, 2023

Fix use of device iterators with oneDPL oneapi-src/distributed-ranges#137

Merged

Implement direct_iterator and make_direct_iterator #861

Open

masterleinad mentioned this issue Nov 1, 2024

oneDPL: Sort on device using Kokkos::RandomAccessIterator kokkos/kokkos#7502

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use device iterators in oneDPL algorithms #855

Cannot use device iterators in oneDPL algorithms #855

BenBrock commented Mar 24, 2023

MikeDvorskiy commented Mar 24, 2023

MikeDvorskiy commented Mar 24, 2023 •

edited by akukanov

Loading

BenBrock commented Mar 25, 2023

Cannot use device iterators in oneDPL algorithms #855

Cannot use device iterators in oneDPL algorithms #855

Comments

BenBrock commented Mar 24, 2023

MikeDvorskiy commented Mar 24, 2023

MikeDvorskiy commented Mar 24, 2023 • edited by akukanov Loading

BenBrock commented Mar 25, 2023

MikeDvorskiy commented Mar 24, 2023 •

edited by akukanov

Loading