ddc::for_each should be marked KOKKOS_FUNCTION #695

blegouix · 2024-12-04T19:01:39Z

Am I correct ? Is the issue related to std::array member functions not being marked __device__ ?

The text was updated successfully, but these errors were encountered:

tpadioleau · 2024-12-04T21:09:01Z

It could be possible to make it KOKKOS_FUNCTION but at the cost of a lot of warnings from nvcc. The compiler sometimes compiles both the CPU and the GPU version even if only one the two is being used. So if one uses for_each with a host-only functor, nvcc will warn that the GPU version is calling a host-only function.

We tried to work on that a moment ago, see https://github.com/CExA-project/ddc/pull/174/files, it has never been merged. We have never found a trick to make it work with a single name so far.

blegouix · 2024-12-05T12:44:18Z

Oh ok you looked at it already then. Do you know how Kokkos deals with the problem, which should appear in Kokkos::parallel_for too ? It seems this is just inline but still callable from a GPU kernel (ie. with TeamThreadRange policy) :

https://github.com/kokkos/kokkos/blob/b2f0fa0aa6ebf8d36306c913dd7442e4222d375c/core/src/Kokkos_Parallel.hpp#L152

tpadioleau · 2024-12-05T14:49:20Z

Regarding the team policy, you see that Kokkos also annotates the functions https://github.com/kokkos/kokkos/blob/14be07bb436da168206b6040bf6a4d4da4f470eb/core/src/Cuda/Kokkos_Cuda_Team.hpp#L488-L504. Kokkos also does it for a host team policy, https://github.com/kokkos/kokkos/blob/14be07bb436da168206b6040bf6a4d4da4f470eb/core/src/impl/Kokkos_HostThreadTeam.hpp#L780-L790.

A difference between Kokkos and DDC is that in DDC we wanted to provide a for_each that would work with host-only functors whereas Kokkos always ask the users to annotate KOKKOS_FUNCTION the functors you pass (and lambdas in a KOKKOS_FUNCTION are also implicitly KOKKOS_FUNCTION). If we stick to the Kokkos policy we could also provide a KOKKOS_FUNCTION for_each.

blegouix · 2024-12-06T11:44:11Z

Ok I see, thanks for the explanation this is clear! I will try to see if I can get an additional KOKKOS_FUNCTION version of ddc::for_each which takes a KOKKOS_FUNCTION functor and can coexist with the non-KOKKOS_FUNCTION version using Sfinae.

Also, do you anticipate an issue due to the usage of std::array inside ddc::for_each ? At least std::array::operator[] is not callable from device.

tpadioleau · 2024-12-13T17:54:11Z

Ok I see, thanks for the explanation this is clear! I will try to see if I can get an additional KOKKOS_FUNCTION version of ddc::for_each which takes a KOKKOS_FUNCTION functor and can coexist with the non-KOKKOS_FUNCTION version using Sfinae.

Also, do you anticipate an issue due to the usage of std::array inside ddc::for_each ? At least std::array::operator[] is not callable from device.

As discussed on the slack, not particularly because compilers have been able to handle constexpr functions correctly on the device for a while. That said it remains an experimental feature in CUDA nvcc so we never know.

blegouix · 2024-12-15T12:39:59Z

I am trying to solve the problem and I agree this does not seem to be feasible with a single-name function. Can I make a MR where I add a annotated_for_each (and annotated_transform_reduce) which has the same implementations but with the KOKKOS_FUNCTION annotation ?

blegouix · 2024-12-15T12:51:21Z

Otherwise annotating the existing for_each with:

#pragma hd_warning_disable

Suppress the warnings.

tpadioleau · 2024-12-16T11:11:13Z

Otherwise annotating the existing for_each with:
#pragma hd_warning_disable
Suppress the warnings.

We cannot take this approach. There is no documentation about CUDA pragmas. And what about a warning that would be triggered for a good reason ?

tpadioleau · 2024-12-29T17:00:24Z

Closing as it is a duplicate of #172

blegouix mentioned this issue Dec 15, 2024

KOKKOS_FUNCTION-annotated for_each and transform_reduce #708

Draft

tpadioleau closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddc::for_each should be marked KOKKOS_FUNCTION #695

ddc::for_each should be marked KOKKOS_FUNCTION #695

blegouix commented Dec 4, 2024 •

edited

Loading

tpadioleau commented Dec 4, 2024

blegouix commented Dec 5, 2024

tpadioleau commented Dec 5, 2024 •

edited

Loading

blegouix commented Dec 6, 2024

tpadioleau commented Dec 13, 2024 •

edited

Loading

blegouix commented Dec 15, 2024 •

edited

Loading

blegouix commented Dec 15, 2024 •

edited

Loading

tpadioleau commented Dec 16, 2024

tpadioleau commented Dec 29, 2024

ddc::for_each should be marked KOKKOS_FUNCTION #695

ddc::for_each should be marked KOKKOS_FUNCTION #695

Comments

blegouix commented Dec 4, 2024 • edited Loading

tpadioleau commented Dec 4, 2024

blegouix commented Dec 5, 2024

tpadioleau commented Dec 5, 2024 • edited Loading

blegouix commented Dec 6, 2024

tpadioleau commented Dec 13, 2024 • edited Loading

blegouix commented Dec 15, 2024 • edited Loading

blegouix commented Dec 15, 2024 • edited Loading

tpadioleau commented Dec 16, 2024

tpadioleau commented Dec 29, 2024

blegouix commented Dec 4, 2024 •

edited

Loading

tpadioleau commented Dec 5, 2024 •

edited

Loading

tpadioleau commented Dec 13, 2024 •

edited

Loading

blegouix commented Dec 15, 2024 •

edited

Loading

blegouix commented Dec 15, 2024 •

edited

Loading