Spmm ctlflow nomem #8

therault · 2021-10-13T23:30:45Z

Use the fact that the plan is splitting each matrix in regular squares to replace memorization of the plan with inline computation.
Loops are in dim^3 (worst case... number of gemms in each phase to be exact) to compute the big things (number of gemms in a phase, list of gemms in a phase).

This is not ideal yet: at the same time we were building the computational plan, we would build the communication plan. We can skip the computational plan building step now, but we still need to build the communication plan. Each task needs to know exactly what other tasks it passes data to, and because tasks are named with plan index, this means the communication tasks need to remember which communication phase is connected to which computation phase.

Storing the communication plan is much smaller, though, and the objects don't need to be sorted / ordered.

Hopefully this reduces the time spent building the plan significantly already. Still working to remove the plan building altogether.

… have the communication steps pre-computed

…GEMMs run are computed from the matrix metadata and the dimension of the strategy, no need for memorizing those at construction time

Signed-off-by: Joseph Schuchart <[email protected]>

SPMM: inline the local_gemm callback and fix some compiler issues

…the more efficient algorithm to build the communication plan; display the time spent in the constructor, and compute the flops with this time

therault and others added 11 commits October 13, 2021 16:19

WIP to merge -- replace memoization with online computation

03684c4

Merge conflict

ad82012

Get rid of the pre-computation of the computational steps... We still…

6546bcb

… have the communication steps pre-computed

Dependencies between A or B and the steps in which the corresponding …

13c502c

…GEMMs run are computed from the matrix metadata and the dimension of the strategy, no need for memorizing those at construction time

A faster way to iterate over the local gemms

5ad7c7e

Remove need for any gemmset_t; remove many memory allocations.

0e07eb2

SPMM: inline local_gemm callback

79a3b3d

Signed-off-by: Joseph Schuchart <[email protected]>

SPMM: fix compiler issues with duration_cast and ttg_sum

47931d2

Signed-off-by: Joseph Schuchart <[email protected]>

Merge pull request #9 from devreal/spmm-ctlflow-nomem-inline

b1690cf

SPMM: inline the local_gemm callback and fix some compiler issues

Cleanup: remove all pre-computation of the computational phases; use …

e56b545

…the more efficient algorithm to build the communication plan; display the time spent in the constructor, and compute the flops with this time

Stop iterating as soon as we have seen all tiles from A and B

5028fb0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spmm ctlflow nomem #8

Spmm ctlflow nomem #8

therault commented Oct 13, 2021

Spmm ctlflow nomem #8

Are you sure you want to change the base?

Spmm ctlflow nomem #8

Conversation

therault commented Oct 13, 2021