Skip to content

Commit

Permalink
document GPU temporary copy
Browse files Browse the repository at this point in the history
  • Loading branch information
albestro committed Nov 4, 2021
1 parent a94417a commit a0f9199
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions include/dlaf/communication/kernels/broadcast.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,15 @@ struct ScheduleRecvBcast {
using matrix::duplicateIfNeeded;
using matrix::copy_o;

// Note:
//
// TILE_GPU -+-> duplicateIfNeeded<CPU> ---> TILE_CPU ---> recvBcast ---> TILE_CPU -+-> copy
// | |
// +----------------------------------------------------------------------+
//
// Actually `duplicateIfNeeded` always makes a copy, because it is always needed since this
// is the specialization for GPU input and MPI withuot CUDA_RDMA requires CPU memory.

auto tile_gpu = tile.share();
auto tile_cpu = duplicateIfNeeded<Device::CPU>(tile_gpu);

Expand Down

0 comments on commit a0f9199

Please sign in to comment.