-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate std::execution for alpaka #2235
Comments
Meeting notes
|
Do you have a source that says that C++17 parallel STL (PSTL) == std::execution? In my mind, they are completely separate things. The PSTL is a parallelisation of STL algorithms, whereas std::execution refers to senders&receivers (and previously to executers) which are a lower level task graph building framework.
Oh, I think it's perfectly feasible. This is what vikunja is, or should be. I imagine it like:
Where |
The problem turned out to be after each transform call, the result will be copied to the CPU again not to break the c++ standard. It will not be a "task graph" like solution. ( as @SimeonEhrig said. ) Parameters can be passed by beginFilter() instead of begin (as @psychocoderHPC proposed) or in my opinion may be by lambda using a lambda generator. |
C++17 parallel STL is in the header called execution, hence execution policies of C++17 are in the execution namespace. https://en.cppreference.com/w/cpp/header/execution |
That's why we have C++20 ranges, so we can build lazy graphs to then schedule them on a PSTL algorithm: auto graph = data | transform(...) | filter(...) | transform(...) ;
for_each(with_alpaka(... params ...), begin(data), end(data)); However, there are certain problems with this design as well. E.g. some implementations struggle to accelerate non-random access ranges, like filter ranges. Still, sometimes a simple transform or reduce is all you need, and you would be happy to write that in a single line :)
That is correct. However, the C++ proposal |
Yes, using ranges would simplify the alpaka pipeline a lot. Which is I think needed, but needs C++20 as you said. |
First part of std::execution is c++17 stl execution policies. Can stl functions like std::transform be used by alpaka so that they use different backends.
New std::Execution includes sender/receiver graph structure
std::execution or sender/receiver framework is a proposal (expected to be included in C++26) which creates an abstraction mainly for async task scheduling by creating a task graph and a scheduler to run those tasks. Scheduler is similar to alpaka accelerator concept. It does not create a mechanism to access to the thread index or block index.
Each task is called sender, and senders have different types (created by specific sender adapters) as nodes of directed acyclic graph. Some examples:
"Then" sender adapter, returns a sender which waits for the result of previous sender in the task graph and uses that result as an argument to a given function.
"Bulk" sender adapter, returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender.
"when_all" returns a sender that completes once all of the input senders have completed.
As a preliminary result: std::execution is another abstraction similar to alpaka, in my opinion it is an alternative to alpaka because it creates an abstration over different execution resources. Nameyl, by creating scheduler concept it provides adaptability to different backends. On the other hand, it does not provide direct access to the tread or block indexes. It lefts "the thread managament issue" to the different backend implementations of the "schedulers".
By defining task types which interpret previous task's results in a structured way it helps creating structured paralelism. Nvidia implementation of std::execution (with stdexec backend for gpu support) shows that it has an nvidia support. After many discussions, this proposal is not included in 2023, it is expected to be included in 2026.
Sources
A detailed and good video
https://youtu.be/QSaUCzL7nCU?si=g4kl_DXrAa4Sd_ZD
A recent version of proposal:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html
Previous paper:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html
Implementation by nvidia, with stdexec for GPU backend (scheduler)
https://github.com/NVIDIA/stdexec?tab=readme-ov-file
The text was updated successfully, but these errors were encountered: