Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device hints and return to PaRSEC main repo #283

Merged
merged 25 commits into from
Jun 4, 2024

Conversation

devreal
Copy link
Contributor

@devreal devreal commented May 22, 2024

This PR does two things that got intermingled:

  1. Add a mapper to device-enabled TTs to hint at which device to execute on. This almost doubled performance on Frontier.
    Example hinting at keeping all tiles of a row on the same device (see potrf.h for actual use):
tt->set_devicemapper([](const Key& key){ return (key[0] / A.P()) % ttg::device::num_devices(); });
  1. Return us to PaRSEC master after @abouteiller and @therault massaged PaRSEC long enough to make it fit our needs. Thanks! We now collect the input data in the evaluate function so that device selection can happen before the main task callback executes.

@devreal devreal requested review from evaleev and therault May 22, 2024 19:21

inline
int num_devices() {
return parsec_nb_devices - detail::first_device_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have parsec_context_query(ctx, PARSEC_CONTEXT_QUERY_DEVICES, PARSEC_DEV_CUDA) that can be used to count devices without having to know about the first device id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need the first device ID for the conversion between TTG device IDs (matching the CUDA/HIP/L0 ID space) and parsec's ID space. What is the best way for doing that without knowing the first device ID?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have something nice for that atm.

BTW you may still want to cache the result because context_query is running a loop.

@@ -765,7 +766,7 @@ namespace ttg_parsec {
parsec_ttg_task_t<TT> *me = (parsec_ttg_task_t<TT> *)parsec_task;
return me->template invoke_op<ttg::ExecutionSpace::CUDA>();
} else {
throw std::runtime_error("PaRSEC CUDA hook invoked on a TT that does not support CUDA operations!");
return PARSEC_HOOK_RETURN_NEXT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now incorrect to return NEXT from a hook, it can be returned only during evaluate. This will cause parsec to invoke a fatal error. You shall return PARSEC_HOOK_RETURN_ERROR instead, but. that will also cause a fatal error, as it is too late in the game to change your mind (the data has been positioned on that device, and the owner device has been set).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed, thanks!

PARSEC_OBJ_CONSTRUCT(gpu_task, parsec_list_item_t);
gpu_task->ec = parsec_task;
gpu_task->task_type = 0; // user task
//gpu_task->load = 1; // TODO: can we do better?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the load will be filled up by the default-time-estimate function, which by default will use the compute capacity of the target device to compute in normalized units: 1 load on the fastest device, x load on slower devices where x is the performance ratio of that device w.r.t. the fastest device. In short you should not need to do this commented out thing at all. You can provide better time_estimate functions (see how it's done in ptg) if you want to be more precise w.r.t. the amount of flops individual tasks incur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the comment.

devreal and others added 17 commits May 23, 2024 10:30
The new copy versioning in PaRSEC requires this only for
CPU tasks to make sure the results are handled correctly.

Signed-off-by: Joseph Schuchart <[email protected]>
Not sure if this really needed. Would prefer to remove this again.


Signed-off-by: Joseph Schuchart <[email protected]>
This is now handled by parsec directly.

Signed-off-by: Joseph Schuchart <[email protected]>
…hook

PaRSEC needs to know what input data we have and needs to
be available in the taskclass, so put a copy of the task-class
into the task object.

Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
The hip compiler complains about the override keyword missing.

Signed-off-by: Joseph Schuchart <[email protected]>
The second element of the key contains the tile ID, the first
one contains the iteration.

Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
For POTRF, we want to provide a hint that tasks on the same column
should be executed on the same device, to reduce data movement
and provide a hint on load balancing up front.

Signed-off-by: Joseph Schuchart <[email protected]>
Signed-off-by: Joseph Schuchart <[email protected]>
We should not return PARSEC_HOOK_RETURN_NEXT in hooks, only in
the evaluate callback.


Signed-off-by: Joseph Schuchart <[email protected]>
@devreal
Copy link
Contributor Author

devreal commented May 30, 2024

This PR now also includes detection of broken coro support in GCC and fixes for building without coroutine support. I think this is ready for review/shipment.

@abouteiller We'll fix the device ID computation in a separate PR.

@devreal devreal merged commit 754c7d7 into TESSEorg:master Jun 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants