Add FLUX e2e example #3619

shivadbhavsar · 2024-11-13T21:49:28Z

No description provided.

codecov · 2024-11-13T23:20:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.18%. Comparing base (ebf82f6) to head (f7908ea).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3619   +/-   ##
========================================
  Coverage    92.18%   92.18%           
========================================
  Files          514      514           
  Lines        21780    21780           
========================================
  Hits         20078    20078           
  Misses        1702     1702

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kahmed10 · 2024-11-15T17:41:26Z

Can we add a reference image to this PR? Using the same prompt in the README

kahmed10 · 2024-11-15T17:42:02Z

examples/diffusion/python_flux/README.md

@@ -0,0 +1,27 @@
+## Setup
+
+Make sure python interpreter can find migraphx. Default location:


Can we add some instructions on setting up a python virtual environment?

migraphx-bot · 2024-11-21T21:50:10Z

Test	Batch	Rate new fb2086	Rate old 0f36aa	Diff	Compare
torchvision-resnet50	64	3,255.22	3,261.99	-0.21%	✅
torchvision-resnet50_fp16	64	6,994.20	6,984.41	0.14%	✅
torchvision-densenet121	32	2,434.53	2,434.46	0.00%	✅
torchvision-densenet121_fp16	32	4,067.68	4,068.77	-0.03%	✅
torchvision-inceptionv3	32	1,629.44	1,630.14	-0.04%	✅
torchvision-inceptionv3_fp16	32	2,745.56	2,746.22	-0.02%	✅
cadene-inceptionv4	16	765.51	765.59	-0.01%	✅
cadene-resnext64x4	16	810.85	809.78	0.13%	✅
slim-mobilenet	64	7,467.09	7,474.57	-0.10%	✅
slim-nasnetalarge	64	208.52	208.58	-0.03%	✅
slim-resnet50v2	64	3,440.60	3,441.49	-0.03%	✅
bert-mrpc-onnx	8	1,148.56	1,150.80	-0.19%	✅
bert-mrpc-tf	1	463.84	465.54	-0.37%	✅
pytorch-examples-wlang-gru	1	423.77	420.06	0.88%	✅
pytorch-examples-wlang-lstm	1	482.78	381.98	26.39%	🔆
torchvision-resnet50_1	1	801.36	750.44	6.79%	🔆
cadene-dpn92_1	1	399.50	398.35	0.29%	✅
cadene-resnext101_1	1	383.29	382.96	0.09%	✅
onnx-taau-downsample	1	345.95	346.08	-0.04%	✅
dlrm-criteoterabyte	1	33.34	33.35	-0.03%	✅
dlrm-criteoterabyte_fp16	1	52.72	52.68	0.07%	✅
agentmodel	1	10,173.73	8,091.53	25.73%	🔆
unet_fp16	2	58.88	58.77	0.18%	✅
resnet50v1_fp16	1	943.56	943.16	0.04%	✅
resnet50v1_int8	1	1,018.66	1,012.12	0.65%	✅
bert_base_cased_fp16	64	1,170.84	1,169.97	0.07%	✅
bert_large_uncased_fp16	32	363.78	363.75	0.01%	✅
bert_large_fp16	1	200.63	199.03	0.81%	✅
distilgpt2_fp16	16	2,202.68	2,201.98	0.03%	✅
yolov5s	1	529.66	539.79	-1.88%	✅
tinyllama	1	43.36	43.42	-0.14%	✅
vicuna-fastchat	1	171.63	175.75	-2.34%	✅
whisper-tiny-encoder	1	418.15	418.02	0.03%	✅
whisper-tiny-decoder	1	422.93	428.37	-1.27%	✅

Check results before merge 🔆

migraphx-bot · 2024-11-21T21:50:12Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

@1

Although, this prevents simplifying as much, it does help preserve the permutation of the broadcasted axes. So if we have a tensor of `{2, 16, 10240}` that goes into a reduction along the last axis it will output to `{2, 16, 1}`, which may be broadcasted back into `{2, 16, 10240}`, but there could be more shape transformations after the reduce but before an pointwise operator: ``` @1 = multibroadcast[out_lens={2, 16, 10240},out_dyn_dims={}](@0) -> int64_type, {2, 16, 10240}, {16, 1, 0} @2 = reshape[dims={2, 160, 32, 32}](@1) -> int64_type, {2, 160, 32, 32}, {163840, 1024, 32, 1} @3 = transpose[permutation={0, 2, 3, 1}](@2) -> int64_type, {2, 32, 32, 160}, {163840, 32, 1, 1024} ``` On develop this would be simplified to: ``` @1 = unsqueeze[axes={1, 2, 5},steps={}](@0) -> int64_type, {2, 1, 1, 16, 1, 1}, {16, 16, 16, 1, 1, 1} @2 = multibroadcast[out_lens={2, 1, 1, 16, 1, 10},out_dyn_dims={}](@1) -> int64_type, {2, 1, 1, 16, 1, 10}, {16, 16, 16, 1, 1, 0} @3 = reshape[dims={2, 1, 1, 160}](@2) -> int64_type, {2, 1, 1, 160}, {160, 160, 160, 1} @4 = multibroadcast[out_lens={2, 32, 32, 160},out_dyn_dims={}](@3) -> int64_type, {2, 32, 32, 160}, {160, 0, 0, 1} ``` Ideally, we would want to apply these transformations without the broadcast before the reduction but if it simplified like above because the shape_transform_descriptor doesnt track the permutation of the the broadcasted axes. With this PR, it will simplify to: ``` @1 = unsqueeze[axes={3, 4},steps={}](@0) -> int64_type, {2, 16, 1, 1, 1}, {16, 1, 1, 1, 1} @2 = transpose[permutation={0, 3, 4, 1, 2}](@1) -> int64_type, {2, 1, 1, 16, 1}, {16, 1, 1, 1, 1} @3 = multibroadcast[out_lens={2, 1, 1, 16, 10},out_dyn_dims={}](@2) -> int64_type, {2, 1, 1, 16, 10}, {16, 1, 1, 1, 0} @4 = reshape[dims={2, 1, 1, 160}](@3) -> int64_type, {2, 1, 1, 160}, {160, 160, 160, 1} @5 = multibroadcast[out_lens={2, 32, 32, 160},out_dyn_dims={}](@4) -> int64_type, {2, 32, 32, 160}, {160, 0, 0, 1} ``` This has a transpose because the shape_transform_descriptor understands how it will output in NHWC, which means we can make the input to the reduction NHWC layout as well. This PR doesn't enable such rewriting, it only modifies the shape_transform descriptor to track such layouts. Also, there is some updates to the tests as well: - Validate that a simplified transformation produces the same result - Check that the simplification cannot be simplified further

Initial commit

77443b5

add readme and deps

5bc406f

shivadbhavsar self-assigned this Nov 14, 2024

shivadbhavsar marked this pull request as ready for review November 14, 2024 00:44

shivadbhavsar requested review from a team and causten as code owners November 14, 2024 00:44

shivadbhavsar requested review from richagadgil and kahmed10 November 14, 2024 00:45

kahmed10 reviewed Nov 15, 2024

View reviewed changes

shivadbhavsar and others added 6 commits November 15, 2024 19:10

add exhaustive tune flag

4dcb302

Merge branch 'develop' into flux_example

e7e75b5

update for batch size benchmarking

4ab9368

Merge branch 'develop' into flux_example

2f3d397

add clipping wraper to fix fp16 inference

e61e070

Merge branch 'develop' into flux_example

fb2086b

pfultz2 and others added 11 commits December 18, 2024 12:14

Disable dot/mul optimizations when there is int4 weights (#3645)

fac0cb2

Raise MIGX VRM to 2.12.0 (#3651)

44b336e

fix mlir attention reject flag for mi300 (#3652)

c224288

Update onnxruntime main b1ccbe2a8efed30b749207b1a29ae03c50289040 (#3653)

70369ee

remove bf16 (#3654)

8c12f3e

Fuse reshapes across concat (#3637)

9d27955

Bump rocm-docs-core from 1.8.5 to 1.9.0 in /docs/sphinx (#3656)

b736656

Enable fp8e5m2fnuz type (#3570)

5e18b64

Remove redundant call to dead_code_elimination pass (#3661)

d004dc2

Catch invalid broadcasts in pointwise reduce fusion (#3659)

80e6d37

Trim down tests (#3674)

c0a5863

spolifroni-amd and others added 15 commits December 18, 2024 12:14

Added onnx_operators to TOC and landing page (#3668)

88ae0b6

updated metadata (#3667)

2cccb73

Refactor GPU math functions (#3657)

ac79e03

Increase timeout to 3 hours (#3675)

0a1e9ec

bit_cast operator (#3655)

60e4757

Update onnxruntime main 1128882bfd2a97c20f8a2a5ddb26cb0d42d9ebba (#3669)

c72720a

Update rocMLIR main c443cf85a09f289c147d7b01f93c1e51390ff65f (#3670)

0115120

GEMM pointwise fusion for hipBLASLt (#3662)

260cfa2

Bump rocm-docs-core from 1.9.0 to 1.10.0 in /docs/sphinx (#3671)

4804348

Print python from onxx backend test (#3672)

9a305fc

Enable non-packed inputs for mlir (#3541)

22307b0

Merge remote-tracking branch 'origin/develop' into flux_example

28747e2

revert merge errors

6c1d01f

revert remaining merge errors

f7908ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FLUX e2e example #3619

Add FLUX e2e example #3619

shivadbhavsar commented Nov 13, 2024

codecov bot commented Nov 13, 2024 •

edited

Loading

kahmed10 commented Nov 15, 2024

kahmed10 Nov 15, 2024

migraphx-bot commented Nov 21, 2024

migraphx-bot commented Nov 21, 2024

		@@ -0,0 +1,27 @@
		## Setup

		Make sure python interpreter can find migraphx. Default location:

Add FLUX e2e example #3619

Are you sure you want to change the base?

Add FLUX e2e example #3619

Conversation

shivadbhavsar commented Nov 13, 2024

codecov bot commented Nov 13, 2024 • edited Loading

Codecov Report

kahmed10 commented Nov 15, 2024

kahmed10 Nov 15, 2024

Choose a reason for hiding this comment

migraphx-bot commented Nov 21, 2024

migraphx-bot commented Nov 21, 2024

codecov bot commented Nov 13, 2024 •

edited

Loading