Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concat fusions #2460

Merged
merged 49 commits into from
Jan 4, 2024
Merged

Add concat fusions #2460

merged 49 commits into from
Jan 4, 2024

Conversation

pfultz2
Copy link
Collaborator

@pfultz2 pfultz2 commented Nov 20, 2023

This will fuse the inputs to concat if they are pointwise operators.

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Nov 21, 2023

Test Batch Rate new
525826
Rate old
c3049e
Diff Compare
torchvision-resnet50 64 2,835.40 nan nan%
torchvision-resnet50_fp16 64 6,502.28 6,508.68 -0.10%
torchvision-densenet121 32 2,083.41 2,096.89 -0.64%
torchvision-densenet121_fp16 32 3,665.74 3,663.69 0.06%
torchvision-inceptionv3 32 1,598.15 1,597.69 0.03%
torchvision-inceptionv3_fp16 32 2,565.21 2,570.26 -0.20%
cadene-inceptionv4 16 723.06 722.53 0.07%
cadene-resnext64x4 16 690.05 692.47 -0.35%
slim-mobilenet 64 8,326.62 8,340.32 -0.16%
slim-nasnetalarge 64 231.93 230.68 0.54%
slim-resnet50v2 64 2,665.83 2,666.08 -0.01%
bert-mrpc-onnx 8 813.52 812.68 0.10%
bert-mrpc-tf 1 387.87 388.68 -0.21%
pytorch-examples-wlang-gru 1 302.51 307.81 -1.72%
pytorch-examples-wlang-lstm 1 318.59 321.36 -0.86%
torchvision-resnet50_1 1 608.24 600.35 1.31%
torchvision-inceptionv3_1 1 343.47 344.92 -0.42%
cadene-dpn92_1 1 402.30 402.49 -0.05%
cadene-resnext101_1 1 328.54 327.84 0.21%
slim-vgg16_1 1 459.46 459.46 0.00%
slim-mobilenet_1 1 2,178.00 2,148.27 1.38%
slim-inceptionv4_1 1 213.93 214.50 -0.27%
onnx-taau-downsample 1 305.39 305.58 -0.06%
dlrm-criteoterabyte 1 21.62 21.58 0.15%
dlrm-criteoterabyte_fp16 1 40.66 40.69 -0.06%
agentmodel 1 6,037.26 6,010.48 0.45%
unet_fp16 2 54.79 54.81 -0.04%
resnet50v1_fp16 1 931.58 932.09 -0.05%
bert_base_cased_fp16 64 924.67 924.64 0.00%
bert_large_uncased_fp16 32 290.67 290.67 0.00%
bert_large_fp16 1 171.81 171.98 -0.10%
distilgpt2_fp16 16 1,280.85 1,281.64 -0.06%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Nov 21, 2023


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ slim-vgg16_1: PASSED: MIGraphX meets tolerance

     ✅ slim-mobilenet_1: PASSED: MIGraphX meets tolerance

     ✅ slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

     ✅ bert_large_uncased_fp16: PASSED: MIGraphX meets tolerance

     ✅ bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the old license stamper script being run? The date should be 2023

/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other file

/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other file

/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other file

@@ -167,6 +168,8 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti
dead_code_elimination{},
enable_pass(not enabled(MIGRAPHX_DISABLE_REDUCE_FUSION{}), fuse_reduce{}),
dead_code_elimination{},
fuse_concat{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking the question here more out of curiosity. I assume order doesn't matter here for fuse_pointwise vs fuse_concat? Is there a benefit to swapping order ever around the fuse_reduce?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to run after fuse_pointwise otherwise there will be no pointwise modules to fuse with.

Is there a benefit to swapping order ever around the fuse_reduce?

I dont think so. In either case we will have two kernels that need to be run if these two passes overlap.

@TedThemistokleous
Copy link
Collaborator

Fix CI but otherwise I get the idea.

@causten causten merged commit 7532007 into develop Jan 4, 2024
14 of 15 checks passed
@causten causten deleted the concat2 branch January 4, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants