Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding std::vector<T, usm_allocator>::iterator to is_passed_directly types #1438

Merged
merged 27 commits into from
Apr 4, 2024

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Mar 7, 2024

It seems that std::vector with usm_allocator with sycl::usm::alloc::shared data were being processed as if they were host_iterators when being passed to oneDPL with device policy and dpcpp backend.

This means that they were being used to initialize a sycl::buffer which includes an extra copy to intermediate host memory. This is unnecessary because iterators to such data should be passed directly, and directly usable within a kernel.

This PR adds a specialization for the is_passed_directly trait, which enables such iterators to be properly passed directly. This avoids the extra copy.

This was uncovered in investigation #1429, and will be tested by those tests, after a suggestion from @mmichel11 .

@mmichel11
Copy link
Contributor

Our documentation only makes mention of the case where we use a sycl::usm_allocator with sycl::usm::alloc::shared, but should we also provide a specialization for sycl::usm_allocator with sycl::usm::alloc::host to be passed directly since SYCL host memory is device accessible?

@danhoeflinger
Copy link
Contributor Author

danhoeflinger commented Mar 7, 2024

Our documentation only makes mention of the case where we use a sycl::usm_allocator with sycl::usm::alloc::shared, but should we also provide a specialization for sycl::usm_allocator with sycl::usm::alloc::host to be passed directly since SYCL host memory is device accessible?

Its a good question, I will need to look into it some. @MikeDvorskiy @rarutyun @akukanov do you have an opinion on how we should handle std::vector input data allocated with a sycl::usm_allocator with sycl::usm::alloc::host memory?

@danhoeflinger
Copy link
Contributor Author

danhoeflinger commented Mar 8, 2024

Our documentation only makes mention of the case where we use a sycl::usm_allocator with sycl::usm::alloc::shared, but should we also provide a specialization for sycl::usm_allocator with sycl::usm::alloc::host to be passed directly since SYCL host memory is device accessible?

Its a good question, I will need to look into it some. @MikeDvorskiy @rarutyun @akukanov do you have an opinion on how we should handle std::vector input data allocated with a sycl::usm_allocator with sycl::usm::alloc::host memory?

After some thought, I believe we should support sycl::usm::alloc::host allocators as well as passed directly (and I have added it). The alternative is them being treated as host iterators, and being used to initialize a sycl::buffer. Of course, it is the user's responsibility to use host usm memory responsibly.
I am interested in hearing other opinions on this as well.

@danhoeflinger
Copy link
Contributor Author

When run through the input data sweep tests from #1429, it seems that both shared and host usm_allocator iterators encounter issues when used as the source iterator for a permutation_iterator. I don't yet understand why this causes issues, but should be understood before merging this PR.

@danhoeflinger danhoeflinger marked this pull request as draft March 8, 2024 17:13
@danhoeflinger
Copy link
Contributor Author

When run through the input data sweep tests from #1429, it seems that both shared and host usm_allocator iterators encounter issues when used as the source iterator for a permutation_iterator. I don't yet understand why this causes issues, but should be understood before merging this PR.

There was a bug of miss-placed > which was causing these usm_allocator iterators to not be treated as "passed_directly". It is now fixed, and the tests from #1429 now pass, and I've confirmed that they are handled via the correct path through the data input processing code.

@danhoeflinger danhoeflinger marked this pull request as ready for review March 8, 2024 17:25
@danhoeflinger danhoeflinger added this to the 2022.6.0 milestone Mar 11, 2024
@danhoeflinger
Copy link
Contributor Author

@MikeDvorskiy @mmichel11 Actually there is a fundamental problem with this, and it will not work. This breaks host_iterators because std::vector::iterator with a default allocator are indistinguishable from std::vector_iterator with a usm_allocator.

This is problematic though, because as it stands (without this PR), we have an extra copy to a host memory location, and also these iterators are unable to be used as the source iterator for permutation iterators (the same as host_iterators). Sadly, I see no apparent solution to this issue.

@danhoeflinger danhoeflinger reopened this Mar 11, 2024
@danhoeflinger danhoeflinger marked this pull request as draft March 11, 2024 18:21
@danhoeflinger danhoeflinger marked this pull request as ready for review March 11, 2024 19:45
@danhoeflinger
Copy link
Contributor Author

danhoeflinger commented Mar 11, 2024

Update:
Actually, for icpx, the std::vector::iterator does distinguish between host_iterator and usm_allocator, except for the case of std::vector<bool>::iterator (which is always std::_Bit_iterator, regardless of the allocator it seems).

I've provided a solution which should hopefully allow support when it is possible to, without breaking host_iterators.

It first checks if we can distinguish between host_iterator and iterator from a vector with usm_allocator using a new struct __vector_impl_distinguishes_usm_allocator_from_default. If we can distinguish between these types for the value type in question, we do and mark passed directly if we can detect usm_allocators. If we cannot distinguish between the two, we do not change the passed directly status.
This should not "spoil" any host_iterators, and it should allow better support for usm_allocators where it is possible.

@MikeDvorskiy Please take a look and see if you can find problems with this approach.

Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

typename ::std::vector<ValueType, typename sycl::usm_allocator<ValueType, sycl::usm::alloc::host>>::iterator;

template <typename Iter>
struct __vector_iter_distinguishes_by_allocator<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we played it extra safe here, but let be so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this provides more and better evidence that unique allocators (not defined by the standard) result in iterator types which are unique from each other. All of it is done at compile time, so I don't see a downside.

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a couple of minor comments.

template <typename Iter>
struct is_passed_directly<
Iter, std::enable_if_t<(std::is_same_v<Iter, oneapi::dpl::__internal::__usm_shared_alloc_vec_iter<Iter>> ||
std::is_same_v<Iter, oneapi::dpl::__internal::__usm_host_alloc_vec_iter<Iter>>)&&oneapi::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneapi::dpl::__internal::__usm_host_alloc_vec_iter<Iter>>)&&oneapi::
=> oneapi::dpl::__internal::__usm_host_alloc_vec_iter<Iter>>) && oneapi::

Copy link
Contributor Author

@danhoeflinger danhoeflinger Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed formatting. (clang format will now complain)

on details of the C++ standard library implementation and what information about the
allocator is included in the ``std::vector::iterator`` type definition. If USM allocated
vector iterators are not detectable with your C++ standard library, they will still function
as inputs to oneDPL, but they will be treated as if they were host-side
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it makes sense to add the clarification that in case "USM allocated
vector iterators are not detectable with your C++ standard library" a user can use vec.data() instead of "vec.begin()". It guarantees data processing w/o host-device extra copying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline discussion, I've made these changes, but also separated out the documentation changes into another PR #1477.

Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danhoeflinger danhoeflinger merged commit 6a45be7 into main Apr 4, 2024
19 of 20 checks passed
@danhoeflinger danhoeflinger deleted the dev/dhoeflin/improve_usm_shared_alloc_vectors branch April 4, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants