Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddc::for_each should be marked KOKKOS_FUNCTION #695

Closed
blegouix opened this issue Dec 4, 2024 · 9 comments
Closed

ddc::for_each should be marked KOKKOS_FUNCTION #695

blegouix opened this issue Dec 4, 2024 · 9 comments

Comments

@blegouix
Copy link
Collaborator

blegouix commented Dec 4, 2024

Am I correct ? Is the issue related to std::array member functions not being marked __device__ ?

@tpadioleau
Copy link
Member

It could be possible to make it KOKKOS_FUNCTION but at the cost of a lot of warnings from nvcc. The compiler sometimes compiles both the CPU and the GPU version even if only one the two is being used. So if one uses for_each with a host-only functor, nvcc will warn that the GPU version is calling a host-only function.

We tried to work on that a moment ago, see https://github.com/CExA-project/ddc/pull/174/files, it has never been merged. We have never found a trick to make it work with a single name so far.

@blegouix
Copy link
Collaborator Author

blegouix commented Dec 5, 2024

Oh ok you looked at it already then. Do you know how Kokkos deals with the problem, which should appear in Kokkos::parallel_for too ? It seems this is just inline but still callable from a GPU kernel (ie. with TeamThreadRange policy) :

https://github.com/kokkos/kokkos/blob/b2f0fa0aa6ebf8d36306c913dd7442e4222d375c/core/src/Kokkos_Parallel.hpp#L152

@tpadioleau
Copy link
Member

tpadioleau commented Dec 5, 2024

Regarding the team policy, you see that Kokkos also annotates the functions https://github.com/kokkos/kokkos/blob/14be07bb436da168206b6040bf6a4d4da4f470eb/core/src/Cuda/Kokkos_Cuda_Team.hpp#L488-L504. Kokkos also does it for a host team policy, https://github.com/kokkos/kokkos/blob/14be07bb436da168206b6040bf6a4d4da4f470eb/core/src/impl/Kokkos_HostThreadTeam.hpp#L780-L790.

A difference between Kokkos and DDC is that in DDC we wanted to provide a for_each that would work with host-only functors whereas Kokkos always ask the users to annotate KOKKOS_FUNCTION the functors you pass (and lambdas in a KOKKOS_FUNCTION are also implicitly KOKKOS_FUNCTION). If we stick to the Kokkos policy we could also provide a KOKKOS_FUNCTION for_each.

@blegouix
Copy link
Collaborator Author

blegouix commented Dec 6, 2024

Ok I see, thanks for the explanation this is clear! I will try to see if I can get an additional KOKKOS_FUNCTION version of ddc::for_each which takes a KOKKOS_FUNCTION functor and can coexist with the non-KOKKOS_FUNCTION version using Sfinae.

Also, do you anticipate an issue due to the usage of std::array inside ddc::for_each ? At least std::array::operator[] is not callable from device.

@tpadioleau
Copy link
Member

tpadioleau commented Dec 13, 2024

Ok I see, thanks for the explanation this is clear! I will try to see if I can get an additional KOKKOS_FUNCTION version of ddc::for_each which takes a KOKKOS_FUNCTION functor and can coexist with the non-KOKKOS_FUNCTION version using Sfinae.

Also, do you anticipate an issue due to the usage of std::array inside ddc::for_each ? At least std::array::operator[] is not callable from device.

As discussed on the slack, not particularly because compilers have been able to handle constexpr functions correctly on the device for a while. That said it remains an experimental feature in CUDA nvcc so we never know.

@blegouix
Copy link
Collaborator Author

blegouix commented Dec 15, 2024

I am trying to solve the problem and I agree this does not seem to be feasible with a single-name function. Can I make a MR where I add a annotated_for_each (and annotated_transform_reduce) which has the same implementations but with the KOKKOS_FUNCTION annotation ?

@blegouix
Copy link
Collaborator Author

blegouix commented Dec 15, 2024

Otherwise annotating the existing for_each with:

#pragma hd_warning_disable

Suppress the warnings.

@tpadioleau
Copy link
Member

Otherwise annotating the existing for_each with:

#pragma hd_warning_disable

Suppress the warnings.

We cannot take this approach. There is no documentation about CUDA pragmas. And what about a warning that would be triggered for a good reason ?

@tpadioleau
Copy link
Member

Closing as it is a duplicate of #172

@tpadioleau tpadioleau closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants