Support two outputs in split_reduce #3097

pfultz2 · 2024-05-16T18:26:46Z

No description provided.

test/split_reduce.cpp

umangyadav · 2024-07-22T14:55:43Z

test/split_reduce.cpp

+                return {rsum2, rsum1};
+            });
+        auto rsum2 = mm->add_instruction(migraphx::make_op("get_tuple_elem", {{"index", 0}}), rsum);
+        auto rsum1 = mm->add_instruction(migraphx::make_op("get_tuple_elem", {{"index", 1}}), rsum);
+        auto mul =
+            add_pointwise(p2, mm, "main:pointwise1", {rsum1, rsum2}, single_pointwise("mul"));


rsum2 is at tuple elem index 0
and rsum1 is at tuple elem index 1.

Which can be error prone.

test/split_reduce.cpp

src/targets/gpu/kernels/include/migraphx/kernels/rank.hpp

src/targets/gpu/kernels/include/migraphx/kernels/atomic.hpp

src/split_reduce.cpp

umangyadav · 2024-07-23T18:54:47Z

This PR fails to compile if i run
MIGRAPHX_DISABLE_LAYERNORM_FUSION=1 MIGRAPHX_ENABLE_SPLIT_REDUCE=1 ./bin/test_verify "test_layernorm_large"

umangyadav · 2024-07-23T19:40:04Z

This currently will not fuse
pointwise preceding "reduce" ops is pointwise is alive after the reduce_ops.

This is what happens with #3212

@6 = convolution[padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0](sample,@0) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}
@7 = broadcast[axis=1,out_lens={2, 320, 64, 64}](@1) -> float_type, {2, 320, 64, 64}, {0, 1, 0, 0}
@8 = contiguous(@7) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}
@9 = reshape[dims={2, 32, 10, 64, 64}](@6) -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@10 = reshape[dims={2, 32, 10, 64, 64}](@8) -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@11 = multibroadcast[out_lens={2, 320, 64, 64},out_dyn_dims={}](@3) -> float_type, {2, 320, 64, 64}, {0, 1, 0, 0}
@12 = contiguous(@11) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}
@13 = multibroadcast[out_lens={2, 320, 64, 64},out_dyn_dims={}](@4) -> float_type, {2, 320, 64, 64}, {0, 1, 0, 0}
@14 = contiguous(@13) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}
@15 = reshape[dims={2, 32, 10, 64, 64}](@12) -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@16 = reshape[dims={2, 32, 10, 64, 64}](@14) -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@17 = pointwise(@9,@10), [main:pointwise0] -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@18 = split_fused_reduce[axes={2, 3, 4},assign=assign_add](@17), [main:pointwise0:main:pointwise2:main:reduce_sum1:main:pointwise4:main:pointwise6:main:pointwise1:main:reduce_sum0_reshape_reshape:main:pointwise10_split] -> [float_type, {2, 32, 1, 1, 1}, {32, 1, 1, 1, 1}, float_type, {2, 32, 1, 1, 1}, {32, 1, 1, 1, 1}]
@19 = get_tuple_elem[index=0](@18) -> float_type, {2, 32, 1, 1, 1}, {32, 1, 1, 1, 1}
@20 = get_tuple_elem[index=1](@18) -> float_type, {2, 32, 1, 1, 1}, {32, 1, 1, 1, 1}
@21 = multibroadcast[out_lens={2, 32, 10, 64, 64},out_dyn_dims={}](@19) -> float_type, {2, 32, 10, 64, 64}, {32, 1, 0, 0, 0}
@22 = multibroadcast[out_lens={2, 32, 10, 64, 64},out_dyn_dims={}](@20) -> float_type, {2, 32, 10, 64, 64}, {32, 1, 0, 0, 0}
@23 = pointwise(@21,@22,@17,@21,@15,@16), [main:pointwise4] -> float_type, {2, 32, 10, 64, 64}, {1310720, 40960, 4096, 64, 1}
@24 = reshape[dims={2, 320, 64, 64}](@23) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}
@25 = convolution[padding={1, 1, 1, 1},stride={1, 1},dilation={1, 1},group=1,padding_mode=0](@24,@2) -> float_type, {2, 320, 64, 64}, {1310720, 4096, 64, 1}

@17 is not fused with @18 because @17 isused later at @23.

pfultz2 · 2024-07-24T14:48:37Z

@17 is not fused with @18 because @17 isused later at https://github.com/23.

Yea we need to support multi-output fusion in the future. We can probably just do the fusion initially with mlir.

pfultz2 · 2024-07-24T14:52:27Z

This PR fails to compile if i run MIGRAPHX_DISABLE_LAYERNORM_FUSION=1 MIGRAPHX_ENABLE_SPLIT_REDUCE=1 ./bin/test_verify "test_layernorm_large"

This is fixed.

Co-authored-by: Umang Yadav <[email protected]>

…GraphX into split-reduce2

pfultz2 added 24 commits April 15, 2024 14:18

Add atomic ops

7e24411

Add missing header

244d8b8

Add support for half type

c53c40a

Merge branch 'develop' into split-reduce2

15c06b5

Handle two reductions

3931cfc

Format

0370543

Handle multi outputs in split reduce

d4db0f6

Format

5b37853

Split two reductions

ac747b2

Format

2f7e96c

Merge

acb291b

Add split fix

c6a7caa

Fix bug with live instruction after split

25442a5

Format

3b04922

Remove debug prints

1cfa65e

Fix merge conflict

6533429

Merge branch 'develop' into split-reduce2

36af65c

Use reaches

78161de

Merge branch 'develop' into split-reduce2

166f7c9

Remvoe dominator

6647d4b

Add test for multi out split reduce

019bb0d

Format

c33f7fd

Add dominator back

86df8f1

Format

7e5babf

pfultz2 requested a review from causten as a code owner May 16, 2024 18:26

pfultz2 requested review from umangyadav and TedThemistokleous May 16, 2024 18:27

TedThemistokleous assigned pfultz2 May 16, 2024

TedThemistokleous added enhancement New feature or request roadmap Tasks to finish for a release labels May 16, 2024

causten added 2 commits June 25, 2024 22:42

Merge branch 'develop' into split-reduce2

f3b2b95

Merge branch 'develop' into split-reduce2

c0c51c5

umangyadav approved these changes Jul 22, 2024

View reviewed changes

umangyadav reviewed Jul 22, 2024

View reviewed changes

src/targets/gpu/kernels/include/migraphx/kernels/rank.hpp Show resolved Hide resolved

umangyadav reviewed Jul 22, 2024

View reviewed changes

src/targets/gpu/kernels/include/migraphx/kernels/atomic.hpp Show resolved Hide resolved

umangyadav reviewed Jul 22, 2024

View reviewed changes

src/targets/gpu/kernels/include/migraphx/kernels/atomic.hpp Show resolved Hide resolved

umangyadav reviewed Jul 22, 2024

View reviewed changes

src/split_reduce.cpp Show resolved Hide resolved

pfultz2 added 3 commits July 23, 2024 12:56

Add missing elipsis

bb76528

Add licenses

51d3c5f

Format

cb909a4

pfultz2 and others added 8 commits July 24, 2024 07:54

Merge branch 'develop' into split-reduce2

374e74b

Update test/split_reduce.cpp

0f785f0

Co-authored-by: Umang Yadav <[email protected]>

Update test/split_reduce.cpp

97e4861

Co-authored-by: Umang Yadav <[email protected]>

Fix test

32140c9

Format

daa607c

Merge branch 'split-reduce2' of github.com:ROCmSoftwarePlatform/AMDMI…

bca36d8

…GraphX into split-reduce2

Format

00fef22

Merge branch 'develop' into split-reduce2

ba53ce4

umangyadav mentioned this pull request Jul 30, 2024

Fuse Split-Reduce with MLIR #3319

Merged

pfultz2 and others added 3 commits July 30, 2024 10:02

Update TODO

c1cba50

Merge branch 'develop' into split-reduce2

74496ba

add MIGRAPHX_EXPORT For the reaches

eb4f262

causten merged commit 403ee86 into develop Jul 31, 2024
34 of 40 checks passed

causten deleted the split-reduce2 branch July 31, 2024 13:15

shivadbhavsar mentioned this pull request Aug 6, 2024

benchmark: No valid tuned compilation for fused_reduce with <no problem key> #3347

Closed

TedThemistokleous pushed a commit that referenced this pull request Aug 21, 2024

Support two outputs in split_reduce (#3097)

d7a016e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support two outputs in split_reduce #3097

Support two outputs in split_reduce #3097

pfultz2 commented May 16, 2024

umangyadav Jul 22, 2024

umangyadav commented Jul 23, 2024

umangyadav commented Jul 23, 2024

pfultz2 commented Jul 24, 2024

pfultz2 commented Jul 24, 2024

Support two outputs in split_reduce #3097

Support two outputs in split_reduce #3097

Conversation

pfultz2 commented May 16, 2024

umangyadav Jul 22, 2024

Choose a reason for hiding this comment

umangyadav commented Jul 23, 2024

umangyadav commented Jul 23, 2024

pfultz2 commented Jul 24, 2024

pfultz2 commented Jul 24, 2024