-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement direct_iterator
and make_direct_iterator
#861
base: main
Are you sure you want to change the base?
Changes from all commits
b6ce547
2f955d4
9728854
e48ae04
9a7a99e
da75e3a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
// -*- C++ -*- | ||
//===-- direct_iterator.pass.cpp ------------------------------------------===// | ||
// | ||
// Copyright (C) Intel Corporation | ||
// | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
// This file incorporates work covered by the following copyright and permission | ||
// notice: | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include <oneapi/dpl/execution> | ||
#include <oneapi/dpl/algorithm> | ||
#include <oneapi/dpl/iterator> | ||
#include <oneapi/dpl/functional> | ||
|
||
#include "support/utils.h" | ||
|
||
#include <iostream> | ||
#include <vector> | ||
#include <numeric> | ||
|
||
#if __cpp_lib_span >= 202002L | ||
#include <span> | ||
#endif | ||
|
||
int | ||
main() | ||
{ | ||
#if TEST_DPCPP_BACKEND_PRESENT | ||
using T = int; | ||
|
||
const int n = 1000; | ||
|
||
std::vector<T> v(n); | ||
std::iota(v.begin(), v.end(), 0); | ||
|
||
sycl::queue q(sycl::default_selector_v); | ||
|
||
T* p = sycl::malloc_device<T>(n, q); | ||
|
||
q.memcpy(p, v.data(), n * sizeof(T)).wait(); | ||
|
||
auto v_ref = std::reduce(v.begin(), v.end(), 0); | ||
|
||
dpl::make_direct_iterator d_first(p); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to remove this part of the test if you prefer. My idea was that this could serve as a temporary workaround for #854. (Although as I mention in the issue, there is unfortunately a bug in the level zero that keeps this workaround from working on Intel multi-GPU systems.) |
||
dpl::make_direct_iterator d_last(p + n); | ||
|
||
auto v_dev = dpl::reduce(d_first, d_last, 0); | ||
|
||
EXPECT_EQ(v_ref, v_dev); | ||
|
||
#if __cpp_lib_span >= 202002L | ||
|
||
std::span<T> x(p, n); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems span There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, just fixed that typo to actually initialize |
||
|
||
dpl::make_direct_iterator s_first(x.begin()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably we can avoid this "identical" iterator-wrapper, by introducing just specialization for the trait like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't always want to pass in std::vector<int> v(...);
std::span s(v);
// Runtime error, since `s.begin()` is a host iterator and cannot
// be used directly on the device.
dpl::reduce(policy, s.begin(), s.end()); You can't in general know whether a span is accessible on the device, and this holds for most ranges you might encounter. There are a lot of iterator types that users might want to pass into oneDPL directly, and I don't think we can automatically most of them. I will add a better motivating example below. |
||
dpl::make_direct_iterator s_last(x.end()); | ||
|
||
auto s_dev = dpl::reduce(s_first, s_last, 0); | ||
|
||
EXPECT_EQ(v_ref, s_dev); | ||
#endif | ||
#endif | ||
|
||
return TestUtils::done(TEST_DPCPP_BACKEND_PRESENT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To tell the truth, I really don't understand an essence of that wrapper over
_Iter
.That wrapper repeats the all standard RA iterator functionality, including dereferencing. If
_Iter
is not accessible on a device,direct_iterator
also is not accessible on a device... So, what's an essence here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeDvorskiy Sorry for being late getting back to you; this slipped past my inbox.
The idea here is that you have a range/iterator accessible on the device. Let's say a
std::span<int>
to device memory. Then, you create a view based on that range. For example:This code works, but has terrible performance. The reason is that oneDPL does not know
transform_view<...>::iterator
is device accessible, so it copies all the elements one-by-one from the device to the host, then uses a buffer to copy it back to the device. We usedirect_iterator
to force oneDPL to use the iterator directly, since we know that it can be used directly on the device.This example is a bit simplified. In the use case in distributed ranges, we have an actual
device_ptr
as the underlying iterator type, so we do know that the data lives on the device. It might be worth thinking about how we could integrate distributed range's concepts of device vs. host memory with distributed ranges, but I think there will always be some cases where a user wants to explicitly "promote" a range to being directly accessible on the device. Using a standard library view is a prime example of this, as we're unlikely to be able to hardwire locality information into a view without modifying the standard. (Or providing our own implementation of all views.)