Skip to content

Commit

Permalink
Merge pull request #291 from TESSEorg/evaleev/dox/hw-example
Browse files Browse the repository at this point in the history
devsamp/main -> devsamp/helloworld
  • Loading branch information
evaleev authored Jun 10, 2024
2 parents e7ba5a6 + 7af06de commit 86317b9
Show file tree
Hide file tree
Showing 15 changed files with 112 additions and 150 deletions.
12 changes: 9 additions & 3 deletions .github/workflows/cmake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ jobs:

- name: Install prerequisite MacOS packages
if: ${{ matrix.os == 'macos-latest' }}
run: brew install ninja boost eigen open-mpi bison ccache
run: |
brew install ninja boost eigen open-mpi bison ccache
echo "MPIEXEC=/opt/homebrew/bin/mpiexec" >> $GITHUB_ENV
- name: Install prerequisites Ubuntu packages
if: ${{ matrix.os == 'ubuntu-22.04' }}
Expand All @@ -54,6 +56,7 @@ jobs:
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt-get update
sudo apt-get -y install ninja-build g++-12 liblapack-dev libboost-dev libboost-serialization-dev libboost-random-dev libeigen3-dev openmpi-bin libopenmpi-dev libtbb-dev ccache flex bison cmake doxygen
echo "MPIEXEC=/usr/bin/mpiexec" >> $GITHUB_ENV
- name: Create Build Environment
# Some projects don't allow in-source building, so create a separate build directory
Expand Down Expand Up @@ -110,10 +113,13 @@ jobs:
working-directory: ${{github.workspace}}/build
shell: bash
run: |
cmake -S $GITHUB_WORKSPACE/doc/dox/dev/devsamp/main -B test_install_devsamp_main -DCMAKE_PREFIX_PATH=${{github.workspace}}/install || (cat test_install_devsamp_main/CMakeFiles/CMakeOutput.log && cat test_install_devsamp_main/CMakeFiles/CMakeError.log)
cmake --build test_install_devsamp_main
cmake -S $GITHUB_WORKSPACE/doc/dox/dev/devsamp/helloworld -B test_install_devsamp_helloworld -DCMAKE_PREFIX_PATH=${{github.workspace}}/install || (cat test_install_devsamp_helloworld/CMakeFiles/CMakeOutput.log && cat test_install_devsamp_helloworld/CMakeFiles/CMakeError.log)
cmake --build test_install_devsamp_helloworld
$MPIEXEC -n 2 test_install_devsamp_helloworld/helloworld-parsec
$MPIEXEC -n 2 test_install_devsamp_helloworld/helloworld-mad
cmake -S $GITHUB_WORKSPACE/doc/dox/dev/devsamp/fibonacci -B test_install_devsamp_fibonacci -DCMAKE_PREFIX_PATH=${{github.workspace}}/install || (cat test_install_devsamp_fibonacci/CMakeFiles/CMakeOutput.log && cat test_install_devsamp_fibonacci/CMakeFiles/CMakeError.log)
cmake --build test_install_devsamp_fibonacci
$MPIEXEC -n 2 test_install_devsamp_fibonacci/fibonacci-parsec
cmake -E make_directory test_install_userexamples
cat > test_install_userexamples/CMakeLists.txt <<EOF
cmake_minimum_required(VERSION 3.14)
Expand Down
40 changes: 28 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
![Build Status](https://github.com/TESSEorg/ttg/workflows/CMake/badge.svg)
![Build Status](https://github.com/TESSEorg/ttg/actions/workflows/cmake.yml/badge.svg)

# TTG
This is the C++ API for the Template Task Graph (TTG) programming model for flowgraph-based composition of high-performance algorithms executable on distributed heterogeneous computer platforms. The TTG API abstracts out the details of the underlying task and data flow runtime; the current realization is implemented using [MADNESS](https://github.com/m-a-d-n-e-s-s/madness) and [PaRSEC](https://bitbucket.org/icldistcomp/parsec.git) runtimes as backends.
Expand Down Expand Up @@ -30,7 +30,7 @@ The development of TTG was motivated by _irregular_ scientific applications like
int main(int argc, char *argv[]) {
ttg::initialize(argc, argv);

auto tt = ttg::make_tt([]() { std::cout << "Hello, World!"; });
auto tt = ttg::make_tt([]() { std::cout << "Hello, World!\n"; });

ttg::make_graph_executable(tt);
ttg::execute();
Expand All @@ -55,17 +55,19 @@ if (NOT TARGET ttg-parsec) # else build from source
FetchContent_MakeAvailable( ttg )
endif()
add_executable(hw-parsec helloworld.cpp)
add_executable(helloworld-parsec helloworld.cpp)
target_link_libraries(hw-parsec PRIVATE ttg-parsec)
target_compile_definitions(hw-parsec PRIVATE TTG_USE_PARSEC=1)
```

Configure + build:

```sh
> cmake -S . -B build && cmake --build build --target hw-parsec
> cmake -S . -B build && cmake --build build --target helloworld-parsec
```

The complete example, including the CMake build harness using a slightly easier way to build the executable (using `add_ttg_executable` CMake macro), can be found in [dox examples](https://github.com/TESSEorg/ttg/tree/master/doc/dox/dev/devsamp/helloworld).

## "Hello, World!" Walkthrough

Although it does not involve any useful flow of computation and/or data, the above "Hello, World!" TTG program introduces several key TTG concepts and illustrates what you need to do to write a complete TTG program. So let's walk through it.
Expand Down Expand Up @@ -95,7 +97,7 @@ Every TTG program must:
- make TTG executable and kickstart the execution by sending a control or data message to the TTG,
- shut down the runtime

Let's go over each of these steps using the "Hello, World!" example.
Let's go over each of these steps using the "Hello, World!" example. The complete example, including the CMake build harness, can be found in [dox examples](https://github.com/TESSEorg/ttg/tree/master/doc/dox/dev/devsamp/fibonacci).

### Select the TTG Backend

Expand Down Expand Up @@ -138,12 +140,12 @@ To make a TTG create and connect one or more TTs. The simplest TTG consists of a
The "Hello, World!" example contains a single TT that executes a single task (hence, task ID can be omitted, i.e., void) that does not take and produce any data. The easiest way to make such a TT is by wrapping a callable (e.g., a lambda) with `ttg::make_tt`:

```cpp
auto tt = ttg::make_tt([]() { std::cout << "Hello, World!"; });
auto tt = ttg::make_tt([]() { std::cout << "Hello, World!\n"; });
```
## Execute TTG
To execute a TTG we must make it executable (this will declare the TTG complete). To execute the TTG its root TT must receive at least one message; since in this case the task does not receive either task ID or data the message is empty (i.e., void):
To execute a TTG we must make it executable (this will declare the TTG program complete so no additional changes to the flowgraph are possible). To execute the TTG its root TT must receive at least one message; since in this case the task does not receive either task ID or data the message is empty (i.e., void):
```cpp
ttg::make_graph_executable(tt);
Expand All @@ -152,7 +154,7 @@ To execute a TTG we must make it executable (this will declare the TTG complete)
tt->invoke();
```

Note that we must ensure that only one such message must be generated. Since TTG execution uses the Single Program Multiple Data (SPMD) model,
`ttg::execute()` must occur before, not after, sending any messages. Note also that we must ensure that only one such message must be generated. Since TTG execution uses the Single Program Multiple Data (SPMD) model,
when launching the TTG program as multiple processes only the first process (rank) gets to send the message.

## Finalize TTG
Expand Down Expand Up @@ -243,6 +245,7 @@ $F_{n-1},F_{n-2} \to F_{n}$).
To illustrate the real power of TTG let's tweak the problem slightly: instead of computing first $N$ Fibonacci numbers let's find the largest Fibonacci number smaller than some $N$. The key difference in the latter case is that, unlike the former, the number of tasks is NOT known a priori; furthermore, to make a decision whether we need to compute next Fibonacci number we must examine the value returned by the previous task. This is an example of data-dependent tasking, where the decision which (if any) task to execute next depends on the values produced by previous tasks. The ability to compose regular as well as data-dependent task graphs is a distinguishing strength of TTG.
To make things even more interesting, we will demonstrate how to implement such program both for execution on CPUs as well as on accelerators (GPUs).
The complete examples, including the CMake build harness, can be found in [dox examples](https://github.com/TESSEorg/ttg/tree/master/doc/dox/dev/devsamp/fibonacci).
### The CPU Version
Expand Down Expand Up @@ -300,12 +303,11 @@ int main(int argc, char* argv[]) {
auto fib = make_ttg_fib_lt(N);
ttg::make_graph_executable(fib.get());
ttg::execute();
if (ttg::default_execution_context().rank() == 0)
fib->template in<0>()->send(1, Fn{});;
ttg::execute();
ttg::fence();
ttg::finalize();
return 0;
}
Expand Down Expand Up @@ -394,6 +396,22 @@ auto make_ttg_fib_lt(const int64_t F_n_max = 1000) {
ops.emplace_back(std::move(print));
return make_ttg(std::move(ops), ins, std::make_tuple(), "Fib_n < N");
}

int main(int argc, char* argv[]) {
ttg::initialize(argc, argv, -1);
int64_t N = 1000;
if (argc > 1) N = std::atol(argv[1]);

auto fib = make_ttg_fib_lt(N);
ttg::make_graph_executable(fib.get());
ttg::execute();
if (ttg::default_execution_context().rank() == 0)
fib->template in<0>()->send(1, Fn{});;

ttg::fence();
ttg::finalize();
return 0;
}
```
Although the structure of the device-capable program is nearly identical to the CPU version, there are important differences:
Expand Down Expand Up @@ -450,8 +468,6 @@ Here's the CUDA version of the device kernel and its host-side wrapper; ROCm and

`cu_next_value` is the device kernel that evaluates $F_{n+1}$ from $F_{n}$ and $F_{n-1}$. `next_value` is a host function that launches `cu_next_value`; this is the function called in the `fib` task.

The complete example, including the CMake build harness, can be found in [dox examples](https://github.com/TESSEorg/ttg/tree/master/doc/dox/dev/devsamp/fibonacci).

## Debugging TTG Programs

### TTG Visualization
Expand Down
7 changes: 6 additions & 1 deletion doc/dox/dev/devsamp/fibonacci/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ cmake_minimum_required(VERSION 3.14)
project(ttg-devsample-fibonacci)

find_package(ttg REQUIRED)
if (NOT TARGET ttg-parsec) # else build from source
include(FetchContent)
FetchContent_Declare(ttg GIT_REPOSITORY https://github.com/TESSEorg/ttg.git)
FetchContent_MakeAvailable( ttg )
endif()

add_ttg_executable(fibonacci fibonacci.cc NOT_EXCLUDE_FROM_ALL)
# Fib device test
Expand All @@ -11,4 +16,4 @@ if (TTG_HAVE_CUDA)
fibonacci_cuda_kernel.h
fibonacci_cuda_kernel.cu
LINK_LIBRARIES std::coroutine RUNTIMES "parsec" NOT_EXCLUDE_FROM_ALL)
endif()
endif()
17 changes: 17 additions & 0 deletions doc/dox/dev/devsamp/fibonacci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Largest Fibonacci number

This directory contains TTG programs computing the largest Fibonacci number smaller than $N$:

- CPU version: `fibonacci.cc`
- Device version: `fibonacci_device.cc`
- CUDA kernel: `fibonacci_cuda_kernel.{cu,h}`

## Build

After TTG has been installed to `/path/to/ttg`, do this:

- configure: `cmake -S . -B build -DCMAKE_PREFIX_PATH="/path/to/ttg"`
- build:
- CPU version: `cmake --build build --target fibonacci`
- CUDA version (TTG must have been configured with CUDA support): `cmake --build build --target fibonacci_cuda`
- run: `./build/fibonacci N` or `./build/fibonacci_cuda N`
8 changes: 6 additions & 2 deletions doc/dox/dev/devsamp/fibonacci/fibonacci.cc
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,16 @@ int main(int argc, char* argv[]) {
ttg::initialize(argc, argv, -1);
int64_t N = (argc > 1) ? std::atol(argv[1]) : 1000;

// make TTG
auto fib = make_ttg_fib_lt(N);
// program complete, declare it executable
ttg::make_graph_executable(fib.get());
// start execution
ttg::execute();
// start the computation by sending the first message
if (ttg::default_execution_context().rank() == 0)
fib->template in<0>()->send(1, Fn{});;

ttg::execute();
// wait for the computation to finish
ttg::fence();

ttg::finalize();
Expand Down
10 changes: 7 additions & 3 deletions doc/dox/dev/devsamp/fibonacci/fibonacci_device.cc
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,17 @@ int main(int argc, char* argv[]) {
ttg::trace_on();
int64_t N = 1000;
if (argc > 1) N = std::atol(argv[1]);
auto fib = make_ttg_fib_lt(N); // computes largest F_n < N

// make TTG
auto fib = make_ttg_fib_lt(N); // computes largest F_n < N
// program complete, declare it executable
ttg::make_graph_executable(fib.get());
// start execution
ttg::execute(ttg::ttg_default_execution_context());
// start the computation by sending the first message
if (ttg::default_execution_context().rank() == 0)
fib->template in<0>()->send(1, Fn{});;

ttg::execute(ttg::ttg_default_execution_context());
// wait for the computation to finish
ttg::fence(ttg::ttg_default_execution_context());

ttg::finalize();
Expand Down
11 changes: 11 additions & 0 deletions doc/dox/dev/devsamp/helloworld/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
cmake_minimum_required(VERSION 3.14)
project(ttg-devsample-helloworld)

find_package(ttg REQUIRED)
if (NOT TARGET ttg-parsec) # else build from source
include(FetchContent)
FetchContent_Declare(ttg GIT_REPOSITORY https://github.com/TESSEorg/ttg.git)
FetchContent_MakeAvailable( ttg )
endif()

add_ttg_executable(helloworld helloworld.cpp NOT_EXCLUDE_FROM_ALL)
11 changes: 11 additions & 0 deletions doc/dox/dev/devsamp/helloworld/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# TTG "Hello World"

This directory contains the TTG "Hello World" program

## Build

After TTG has been installed to `/path/to/ttg`, do this:

- configure: `cmake -S . -B build -DCMAKE_PREFIX_PATH="/path/to/ttg"`
- build: `cmake --build build`
- run: `./build/helloworld-parsec` or `./build/helloworld-mad`
17 changes: 17 additions & 0 deletions doc/dox/dev/devsamp/helloworld/helloworld.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#include <ttg.h>

using namespace ttg;

int main(int argc, char *argv[]) {
ttg::initialize(argc, argv);

auto tt = ttg::make_tt([]() { std::cout << "Hello, World!\n"; });

ttg::make_graph_executable(tt);
ttg::execute();
if (ttg::get_default_world().rank() == 0) tt->invoke();
ttg::fence();

ttg::finalize();
return 0;
}
6 changes: 0 additions & 6 deletions doc/dox/dev/devsamp/main/CMakeLists.txt

This file was deleted.

8 changes: 0 additions & 8 deletions doc/dox/dev/devsamp/main/test.cpp

This file was deleted.

8 changes: 0 additions & 8 deletions tests/unit/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,6 @@ add_ttg_executable(serialization serialization.cc unit_main.cpp
add_ttg_executable(serialization_boost serialization_boost.cc
LINK_LIBRARIES ttg-serialization-boost RUNTIMES "parsec")

# Fib device test
if (TTG_HAVE_CUDA)
add_ttg_executable(fibonacci_device fibonacci_device.cc
fibonacci_cuda_kernel.h
fibonacci_cuda_kernel.cu
LINK_LIBRARIES std::coroutine RUNTIMES "parsec")
endif()

# TODO: convert into unit test
#if (TARGET MADworld)
#add_executable(splitmd_serialization splitmd_serialization.cc unit_main.cpp)
Expand Down
15 changes: 0 additions & 15 deletions tests/unit/fibonacci_cuda_kernel.cu

This file was deleted.

4 changes: 0 additions & 4 deletions tests/unit/fibonacci_cuda_kernel.h

This file was deleted.

Loading

0 comments on commit 86317b9

Please sign in to comment.