Add device hints and return to PaRSEC main repo #283

devreal · 2024-05-22T19:21:48Z

This PR does two things that got intermingled:

Add a mapper to device-enabled TTs to hint at which device to execute on. This almost doubled performance on Frontier.
Example hinting at keeping all tiles of a row on the same device (see potrf.h for actual use):

tt->set_devicemapper([](const Key& key){ return (key[0] / A.P()) % ttg::device::num_devices(); });

Return us to PaRSEC master after @abouteiller and @therault massaged PaRSEC long enough to make it fit our needs. Thanks! We now collect the input data in the evaluate function so that device selection can happen before the main task callback executes.

abouteiller · 2024-05-22T19:43:47Z

ttg/ttg/parsec/device.h

+
+  inline
+  int num_devices() {
+    return parsec_nb_devices - detail::first_device_id;


We now have parsec_context_query(ctx, PARSEC_CONTEXT_QUERY_DEVICES, PARSEC_DEV_CUDA) that can be used to count devices without having to know about the first device id.

I still need the first device ID for the conversion between TTG device IDs (matching the CUDA/HIP/L0 ID space) and parsec's ID space. What is the best way for doing that without knowing the first device ID?

I don't think we have something nice for that atm.

BTW you may still want to cache the result because context_query is running a loop.

abouteiller · 2024-05-22T19:50:04Z

ttg/ttg/parsec/ttg.h

@@ -765,7 +766,7 @@ namespace ttg_parsec {
        parsec_ttg_task_t<TT> *me = (parsec_ttg_task_t<TT> *)parsec_task;
        return me->template invoke_op<ttg::ExecutionSpace::CUDA>();
      } else {
-        throw std::runtime_error("PaRSEC CUDA hook invoked on a TT that does not support CUDA operations!");
+        return PARSEC_HOOK_RETURN_NEXT;


It is now incorrect to return NEXT from a hook, it can be returned only during evaluate. This will cause parsec to invoke a fatal error. You shall return PARSEC_HOOK_RETURN_ERROR instead, but. that will also cause a fatal error, as it is too late in the game to change your mind (the data has been positioned on that device, and the owner device has been set).

Will be fixed, thanks!

abouteiller · 2024-05-22T19:54:05Z

ttg/ttg/parsec/ttg.h

+        PARSEC_OBJ_CONSTRUCT(gpu_task, parsec_list_item_t);
+        gpu_task->ec = parsec_task;
+        gpu_task->task_type = 0; // user task
+        //gpu_task->load = 1;    // TODO: can we do better?


the load will be filled up by the default-time-estimate function, which by default will use the compute capacity of the target device to compute in normalized units: 1 load on the fastest device, x load on slower devices where x is the performance ratio of that device w.r.t. the fastest device. In short you should not need to do this commented out thing at all. You can provide better time_estimate functions (see how it's done in ptg) if you want to be more precise w.r.t. the amount of flops individual tasks incur.

I removed the comment.

The new copy versioning in PaRSEC requires this only for CPU tasks to make sure the results are handled correctly. Signed-off-by: Joseph Schuchart <[email protected]>

Not sure if this really needed. Would prefer to remove this again. Signed-off-by: Joseph Schuchart <[email protected]>

This is now handled by parsec directly. Signed-off-by: Joseph Schuchart <[email protected]>

…hook PaRSEC needs to know what input data we have and needs to be available in the taskclass, so put a copy of the task-class into the task object. Signed-off-by: Joseph Schuchart <[email protected]>

Signed-off-by: Joseph Schuchart <[email protected]>

The hip compiler complains about the override keyword missing. Signed-off-by: Joseph Schuchart <[email protected]>

The second element of the key contains the tile ID, the first one contains the iteration. Signed-off-by: Joseph Schuchart <[email protected]>

Signed-off-by: Joseph Schuchart <[email protected]>

For POTRF, we want to provide a hint that tasks on the same column should be executed on the same device, to reduce data movement and provide a hint on load balancing up front. Signed-off-by: Joseph Schuchart <[email protected]>

Signed-off-by: Joseph Schuchart <[email protected]>

We should not return PARSEC_HOOK_RETURN_NEXT in hooks, only in the evaluate callback. Signed-off-by: Joseph Schuchart <[email protected]>

Signed-off-by: Joseph Schuchart <[email protected]>

devreal · 2024-05-30T15:33:31Z

This PR now also includes detection of broken coro support in GCC and fixes for building without coroutine support. I think this is ready for review/shipment.

@abouteiller We'll fix the device ID computation in a separate PR.

devreal requested review from evaleev and therault May 22, 2024 19:21

abouteiller reviewed May 22, 2024

View reviewed changes

devreal and others added 17 commits May 23, 2024 10:30

Move copy version increment to CPU hook

6c69a79

The new copy versioning in PaRSEC requires this only for CPU tasks to make sure the results are handled correctly. Signed-off-by: Joseph Schuchart <[email protected]>

Initialize parsec_task data[].data_in to NULL

de6d8c1

Not sure if this really needed. Would prefer to remove this again. Signed-off-by: Joseph Schuchart <[email protected]>

Remove use of parsec_get_best_device

6492573

This is now handled by parsec directly. Signed-off-by: Joseph Schuchart <[email protected]>

Use the evaluate hook to query input data before calling the compute …

3be65cf

…hook PaRSEC needs to know what input data we have and needs to be available in the taskclass, so put a copy of the task-class into the task object. Signed-off-by: Joseph Schuchart <[email protected]>

Properly construct the parsec task

e42e3bd

Signed-off-by: Joseph Schuchart <[email protected]>

Use correct evaluate function signatures

0216894

Signed-off-by: Joseph Schuchart <[email protected]>

Properly construct parsec task object

2e57419

Signed-off-by: Joseph Schuchart <[email protected]>

Fix dataflags binary operators

edf48fc

Signed-off-by: Joseph Schuchart <[email protected]>

Fix logic for detecting device-aware MPI

0e62352

Signed-off-by: Joseph Schuchart <[email protected]>

Add override keyword to fence and make_executable

38e7efb

The hip compiler complains about the override keyword missing. Signed-off-by: Joseph Schuchart <[email protected]>

POTRF: use correct keymap for SYRK

5dc442d

The second element of the key contains the tile ID, the first one contains the iteration. Signed-off-by: Joseph Schuchart <[email protected]>

Remove unused variable

ec2b796

Signed-off-by: Joseph Schuchart <[email protected]>

Add device hint to TT and buffer

b7c5088

For POTRF, we want to provide a hint that tasks on the same column should be executed on the same device, to reduce data movement and provide a hint on load balancing up front. Signed-off-by: Joseph Schuchart <[email protected]>

Fixes to device hint implementation

27f3fde

Signed-off-by: Joseph Schuchart <[email protected]>

Point back to PaRSEC main repository

ee6fb2a

Signed-off-by: Joseph Schuchart <[email protected]>

Add missing fwd decl in madness backend

f3e8b14

Signed-off-by: Joseph Schuchart <[email protected]>

Cleanup return values in parsec backend

2384e4d

We should not return PARSEC_HOOK_RETURN_NEXT in hooks, only in the evaluate callback. Signed-off-by: Joseph Schuchart <[email protected]>

devreal force-pushed the new_parsec_device_code branch from 6aca5b2 to 2384e4d Compare May 23, 2024 14:30

devreal added 3 commits May 24, 2024 14:00

Install madness backend device.h header

4b50c68

Signed-off-by: Joseph Schuchart <[email protected]>

Add missing fwd.h header to device.h

72aa980

Signed-off-by: Joseph Schuchart <[email protected]>

Check for broken GCC versions and disable Coroutine support if found

755ab26

Signed-off-by: Joseph Schuchart <[email protected]>

devreal force-pushed the new_parsec_device_code branch from 0745ea0 to 755ab26 Compare May 28, 2024 15:41

devreal added 2 commits May 28, 2024 12:09

Protect unit tests for cases where coroutines are not found

8947d84

Signed-off-by: Joseph Schuchart <[email protected]>

Fix macro TTG_PROCESS_TT_OP_RETURN if coros are not available

40595c1

Signed-off-by: Joseph Schuchart <[email protected]>

evaleev approved these changes May 29, 2024

View reviewed changes

devreal added 2 commits May 30, 2024 09:28

Only use suspended_task_address if we have coroutines

1421c4f

Signed-off-by: Joseph Schuchart <[email protected]>

Replace TTG_HAS_COROUTINE with TTG_HAVE_COROUTINE and add to config.h

df32b1e

Signed-off-by: Joseph Schuchart <[email protected]>

Add TTG_ENABLE_COROUTINES CMake option and fix non-coro builds

a9c33d4

Signed-off-by: Joseph Schuchart <[email protected]>

abouteiller approved these changes May 31, 2024

View reviewed changes

devreal merged commit 754c7d7 into TESSEorg:master Jun 4, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add device hints and return to PaRSEC main repo #283

Add device hints and return to PaRSEC main repo #283

devreal commented May 22, 2024

abouteiller May 22, 2024

devreal May 22, 2024

abouteiller May 24, 2024

abouteiller May 22, 2024

devreal May 22, 2024

abouteiller May 22, 2024

devreal May 22, 2024

devreal commented May 30, 2024

Add device hints and return to PaRSEC main repo #283

Add device hints and return to PaRSEC main repo #283

Conversation

devreal commented May 22, 2024

abouteiller May 22, 2024

Choose a reason for hiding this comment

devreal May 22, 2024

Choose a reason for hiding this comment

abouteiller May 24, 2024

Choose a reason for hiding this comment

abouteiller May 22, 2024

Choose a reason for hiding this comment

devreal May 22, 2024

Choose a reason for hiding this comment

abouteiller May 22, 2024

Choose a reason for hiding this comment

devreal May 22, 2024

Choose a reason for hiding this comment

devreal commented May 30, 2024