Investigate std::execution for alpaka #2235

mehmetyusufoglu · 2024-01-30T08:51:55Z

First part of std::execution is c++17 stl execution policies. Can stl functions like std::transform be used by alpaka so that they use different backends.
New std::Execution includes sender/receiver graph structure

std::execution or sender/receiver framework is a proposal (expected to be included in C++26) which creates an abstraction mainly for async task scheduling by creating a task graph and a scheduler to run those tasks. Scheduler is similar to alpaka accelerator concept. It does not create a mechanism to access to the thread index or block index.

Each task is called sender, and senders have different types (created by specific sender adapters) as nodes of directed acyclic graph. Some examples:

"Then" sender adapter, returns a sender which waits for the result of previous sender in the task graph and uses that result as an argument to a given function.

"Bulk" sender adapter, returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender.

"when_all" returns a sender that completes once all of the input senders have completed.

As a preliminary result: std::execution is another abstraction similar to alpaka, in my opinion it is an alternative to alpaka because it creates an abstration over different execution resources. Nameyl, by creating scheduler concept it provides adaptability to different backends. On the other hand, it does not provide direct access to the tread or block indexes. It lefts "the thread managament issue" to the different backend implementations of the "schedulers".

By defining task types which interpret previous task's results in a structured way it helps creating structured paralelism. Nvidia implementation of std::execution (with stdexec backend for gpu support) shows that it has an nvidia support. After many discussions, this proposal is not included in 2023, it is expected to be included in 2026.

Sources
A detailed and good video
https://youtu.be/QSaUCzL7nCU?si=g4kl_DXrAa4Sd_ZD

A recent version of proposal:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html

Previous paper:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html

Implementation by nvidia, with stdexec for GPU backend (scheduler)

https://github.com/NVIDIA/stdexec?tab=readme-ov-file

mehmetyusufoglu · 2024-02-01T08:39:16Z

Meeting notes

Related with the standard execution policies of C++17: @psychocoderHPC made a good suggestion, but after discussions it turned out to be not feasible enough.
About the std::execution proposal for C++26: The task graph concept and connecting nodes with pipes so that the results of one node passed to another node (without copying the results to the CPU back) is important and promising. But since it is still a proposal for C++26 and currently there is no urgent need for supporting it with alpaka, there is nothing planned for now.

bernhardmgruber · 2024-02-02T17:36:59Z

First part of std::execution is c++17 stl execution policies. Can stl functions like std::transform be used by alpaka so that they use different backends.

Do you have a source that says that C++17 parallel STL (PSTL) == std::execution? In my mind, they are completely separate things. The PSTL is a parallelisation of STL algorithms, whereas std::execution refers to senders&receivers (and previously to executers) which are a lower level task graph building framework.

Related with the standard execution policies of C++17: @psychocoderHPC made a good suggestion, but after discussions it turned out to be not feasible enough.

Oh, I think it's perfectly feasible. This is what vikunja is, or should be. I imagine it like:

transform(with_alpaka(... params ...), begin(data), end(data), op);

Where with_alpaka(... params ...) is the execution policy.

mehmetyusufoglu · 2024-02-02T17:47:33Z

The problem turned out to be after each transform call, the result will be copied to the CPU again not to break the c++ standard. It will not be a "task graph" like solution. ( as @SimeonEhrig said. ) Parameters can be passed by beginFilter() instead of begin (as @psychocoderHPC proposed) or in my opinion may be by lambda using a lambda generator.

mehmetyusufoglu · 2024-02-02T17:55:33Z

C++17 parallel STL is in the header called execution, hence execution policies of C++17 are in the execution namespace. https://en.cppreference.com/w/cpp/header/execution

bernhardmgruber · 2024-02-02T18:13:18Z

The problem turned out to be after each transform call, the result will be copied to the CPU again not to break the c++ standard. It will not be a "task graph" like solution. ( as @SimeonEhrig said. )

That's why we have C++20 ranges, so we can build lazy graphs to then schedule them on a PSTL algorithm:

auto graph = data | transform(...) | filter(...) | transform(...) ;
for_each(with_alpaka(... params ...), begin(data), end(data));

However, there are certain problems with this design as well. E.g. some implementations struggle to accelerate non-random access ranges, like filter ranges.

Still, sometimes a simple transform or reduce is all you need, and you would be happy to write that in a single line :)

C++17 parallel STL is in the header called execution, hence execution policies of C++17 are in the execution namespace. https://en.cppreference.com/w/cpp/header/execution

That is correct. However, the C++ proposal std::execution is something different than the PSTL or execution policies.

mehmetyusufoglu · 2024-02-02T18:52:34Z

Yes, using ranges would simplify the alpaka pipeline a lot. Which is I think needed, but needs C++20 as you said.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate std::execution for alpaka #2235

Investigate std::execution for alpaka #2235

mehmetyusufoglu commented Jan 30, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 1, 2024 •

edited

Loading

bernhardmgruber commented Feb 2, 2024

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading

bernhardmgruber commented Feb 2, 2024

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading

Investigate std::execution for alpaka #2235

Investigate std::execution for alpaka #2235

Comments

mehmetyusufoglu commented Jan 30, 2024 • edited Loading

mehmetyusufoglu commented Feb 1, 2024 • edited Loading

bernhardmgruber commented Feb 2, 2024

mehmetyusufoglu commented Feb 2, 2024 • edited Loading

mehmetyusufoglu commented Feb 2, 2024 • edited Loading

bernhardmgruber commented Feb 2, 2024

mehmetyusufoglu commented Feb 2, 2024 • edited Loading

mehmetyusufoglu commented Jan 30, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 1, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading

mehmetyusufoglu commented Feb 2, 2024 •

edited

Loading