Rewrite reduce mean/variance #2883

pfultz2 · 2024-03-12T21:14:53Z

Rewrites mean/variance to use reduce_mean(x) and reduce_mean(x*x) so it can be fused in the same reduction.

migraphx-bot · 2024-03-12T23:18:20Z

Test	Batch	Rate new 52fe8e	Rate old ee68f7	Diff	Compare
torchvision-resnet50	64	2,824.97	2,821.87	0.11%	✅
torchvision-resnet50_fp16	64	6,405.88	6,407.36	-0.02%	✅
torchvision-densenet121	32	2,092.78	2,096.07	-0.16%	✅
torchvision-densenet121_fp16	32	3,703.81	3,687.09	0.45%	✅
torchvision-inceptionv3	32	1,604.11	1,605.13	-0.06%	✅
torchvision-inceptionv3_fp16	32	2,555.85	2,551.45	0.17%	✅
cadene-inceptionv4	16	718.11	717.52	0.08%	✅
cadene-resnext64x4	16	680.69	680.72	-0.00%	✅
slim-mobilenet	64	5,947.46	5,944.92	0.04%	✅
slim-nasnetalarge	64	154.18	154.07	0.07%	✅
slim-resnet50v2	64	2,589.24	2,583.14	0.24%	✅
bert-mrpc-onnx	8	920.32	921.72	-0.15%	✅
bert-mrpc-tf	1	396.61	397.23	-0.16%	✅
pytorch-examples-wlang-gru	1	403.27	395.90	1.86%	✅
pytorch-examples-wlang-lstm	1	428.73	374.49	14.48%	🔆
torchvision-resnet50_1	1	608.81	609.69	-0.14%	✅
cadene-dpn92_1	1	390.69	390.37	0.08%	✅
cadene-resnext101_1	1	332.01	333.18	-0.35%	✅
onnx-taau-downsample	1	306.56	307.34	-0.25%	✅
dlrm-criteoterabyte	1	28.87	28.87	0.02%	✅
dlrm-criteoterabyte_fp16	1	48.27	48.32	-0.10%	✅
agentmodel	1	7,222.46	7,330.30	-1.47%	✅
unet_fp16	2	57.55	57.54	0.02%	✅
resnet50v1_fp16	1	906.34	911.10	-0.52%	✅
resnet50v1_int8	1	816.74	813.46	0.40%	✅
bert_base_cased_fp16	64	1,034.77	1,033.99	0.08%	✅
bert_large_uncased_fp16	32	300.48	300.56	-0.03%	✅
bert_large_fp16	1	159.95	160.04	-0.05%	✅
distilgpt2_fp16	16	1,854.32	1,854.56	-0.01%	✅
yolov5s	1	475.96	474.48	0.31%	✅
tinyllama	1	32.99	32.97	0.04%	✅
vicuna-fastchat	1	157.86	158.01	-0.10%	✅
whisper-tiny-encoder	1	348.54	346.15	0.69%	✅
whisper-tiny-decoder	1	396.66	396.01	0.16%	✅

Check results before merge 🔆

migraphx-bot · 2024-03-12T23:18:22Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

TedThemistokleous · 2024-03-29T15:21:06Z

src/targets/gpu/prepare_reduce.cpp

+    });
+    auto preduce = m.insert_instruction(last, parallel_reduce{op}, inputs);
+    int i        = 0;
+    for(auto reduce : reduces)


Might be a slight nit-pick in naming but make "reduces" into something like "reductions" Just makes this a but more readable and then be clearer when you grab a reduce from the list of reductions.

Same goes for "preduce" make this into parallel_reductions since this might get interpreted as pointer-to-reduce with the name preduce.

I use the name reduce since the operator name is reduce. It seems confusing to call it reduction when the operator is not named reduction. I guess i can make these single letters and then you can interpret it however makes sense for you.

TedThemistokleous · 2024-03-29T15:28:38Z

test/fuse_reduce.cpp

+    }
+    EXPECT(p1.sort() == p2.sort());
+}
+
 TEST_CASE(reduce_reduce_mismatch_axis)


This test case is good and I'm sure these can be massaged to create the other tests to handle cases where matching fails and some of the other coverage error warnings

TedThemistokleous

Add additional test coverage rewrite_reduce.cpp
Readability for prepare_reduce.cpp

TedThemistokleous

Looks good.

pfultz2 added 22 commits February 19, 2024 14:13

Rewrite reduce mean variance

b3f8fc1

Format

b5e17af

Add codegen

6736534

Format

d2064c1

Improve fusions with broadcast

c3244b3

Format

97c9b78

Get inputs

04bf76c

Format

4389985

Remove semicolon

299274f

Optimize before

7fdfc9c

Format

2c4b949

Fixes

defca20

Format

d7b9593

Check shape is the same as well

cd67d8f

Skip reduce_mean

974b44f

Rewrite reduce_mean

2aab79e

Format

05fdeed

Fix div by zero

c916f61

Format

611051e

Merge branch 'develop' into reduce-mean-variance

86ef51e

Merge

fcc1781

Format

f0fbe43

pfultz2 requested a review from causten as a code owner March 12, 2024 21:14

Use explict constructor

b678037

pfultz2 mentioned this pull request Mar 18, 2024

Use float or higher accumulator for layernorm internally when calculating Mean #2891

Open

pfultz2 requested review from umangyadav, TedThemistokleous and CharlieL7 March 20, 2024 14:03

pfultz2 added 6 commits March 26, 2024 14:40

Format

ca7c7c4

Update docs

da23ade

Fix parallel reduce

459a337

Format

8f2f2c1

Add test

b345606

Format

152b052

pfultz2 requested a review from a team as a code owner March 29, 2024 00:33

pfultz2 requested review from umangyadav and TedThemistokleous March 29, 2024 00:42

Update license

d0f2ed2

TedThemistokleous reviewed Mar 29, 2024

View reviewed changes

TedThemistokleous requested changes Mar 29, 2024

View reviewed changes

This was referenced Apr 4, 2024

Run pointwise/reduce first without rewriting reshapes #2939

Merged

Add propagate_precision pass #2853

Open

pfultz2 added 5 commits April 11, 2024 10:25

Rename variables

07f6ca5

Format

dce056f

Remove comment

75c18b7

Add more unit tests

1d94de4

Format

792a0a5

pfultz2 requested a review from TedThemistokleous April 22, 2024 21:24

pfultz2 added 3 commits April 24, 2024 09:56

Fix compile error with test

fba41ea

Handle contiguous output

7ba5dd4

Format

cc0c435

TedThemistokleous approved these changes Apr 25, 2024

View reviewed changes

Format

98b69d4

kahmed10 approved these changes Apr 26, 2024

View reviewed changes

Merge branch 'develop' into reduce-mean-variance

52fe8e9

pfultz2 merged commit 56d341d into develop Apr 27, 2024
39 of 42 checks passed

pfultz2 deleted the reduce-mean-variance branch April 27, 2024 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite reduce mean/variance #2883

Rewrite reduce mean/variance #2883

pfultz2 commented Mar 12, 2024

migraphx-bot commented Mar 12, 2024 •

edited

Loading

migraphx-bot commented Mar 12, 2024 •

edited

Loading

TedThemistokleous Mar 29, 2024

pfultz2 Apr 11, 2024

TedThemistokleous Mar 29, 2024

TedThemistokleous left a comment •

edited

Loading

TedThemistokleous left a comment

Rewrite reduce mean/variance #2883

Rewrite reduce mean/variance #2883

Conversation

pfultz2 commented Mar 12, 2024

migraphx-bot commented Mar 12, 2024 • edited Loading

migraphx-bot commented Mar 12, 2024 • edited Loading

TedThemistokleous Mar 29, 2024

Choose a reason for hiding this comment

pfultz2 Apr 11, 2024

Choose a reason for hiding this comment

TedThemistokleous Mar 29, 2024

Choose a reason for hiding this comment

TedThemistokleous left a comment • edited Loading

Choose a reason for hiding this comment

TedThemistokleous left a comment

Choose a reason for hiding this comment

migraphx-bot commented Mar 12, 2024 •

edited

Loading

migraphx-bot commented Mar 12, 2024 •

edited

Loading

TedThemistokleous left a comment •

edited

Loading