Enable simplify qdq to work with FP8 types and fix bug in pass #2528

umangyadav · 2023-12-07T03:01:55Z

match_find_quantizable_ops assumed dequantizelinear only has one use. it fails when it has multiple uses.

This PR fixes that issue and adds FP8 dtypes as allowed quantized type in simplify_qdq pass.

This set of changes helps running FP8 quantized model along with changes in #2506 .

migraphx-bot · 2023-12-07T04:57:53Z

Test	Batch	Rate new 95616a	Rate old a09dc5	Diff	Compare
torchvision-resnet50	64	2,834.92	2,833.31	0.06%	✅
torchvision-resnet50_fp16	64	6,502.95	6,501.92	0.02%	✅
torchvision-densenet121	32	2,072.36	2,094.20	-1.04%	✅
torchvision-densenet121_fp16	32	3,667.31	3,665.08	0.06%	✅
torchvision-inceptionv3	32	1,598.30	1,594.40	0.24%	✅
torchvision-inceptionv3_fp16	32	2,559.88	2,557.61	0.09%	✅
cadene-inceptionv4	16	722.37	721.79	0.08%	✅
cadene-resnext64x4	16	692.35	692.75	-0.06%	✅
slim-mobilenet	64	8,322.56	8,332.19	-0.12%	✅
slim-nasnetalarge	64	230.55	230.63	-0.03%	✅
slim-resnet50v2	64	2,665.09	2,664.91	0.01%	✅
bert-mrpc-onnx	8	823.76	823.07	0.08%	✅
bert-mrpc-tf	1	388.65	389.27	-0.16%	✅
pytorch-examples-wlang-gru	1	297.92	299.82	-0.63%	✅
pytorch-examples-wlang-lstm	1	313.30	308.52	1.55%	✅
torchvision-resnet50_1	1	608.00	602.66	0.89%	✅
torchvision-inceptionv3_1	1	342.26	343.14	-0.26%	✅
cadene-dpn92_1	1	402.28	404.59	-0.57%	✅
cadene-resnext101_1	1	328.33	328.32	0.00%	✅
slim-vgg16_1	1	459.12	460.67	-0.34%	✅
slim-mobilenet_1	1	2,112.58	2,099.38	0.63%	✅
slim-inceptionv4_1	1	214.86	213.62	0.58%	✅
onnx-taau-downsample	1	305.53	304.86	0.22%	✅
dlrm-criteoterabyte	1	21.59	21.62	-0.14%	✅
dlrm-criteoterabyte_fp16	1	40.63	40.65	-0.06%	✅
agentmodel	1	5,652.14	5,935.95	-4.78%	🔴
unet_fp16	2	54.79	54.68	0.20%	✅
resnet50v1_fp16	1	943.90	932.98	1.17%	✅
bert_base_cased_fp16	64	903.60	903.11	0.06%	✅
bert_large_uncased_fp16	32	285.61	285.63	-0.00%	✅
bert_large_fp16	1	166.63	166.70	-0.04%	✅
distilgpt2_fp16	16	1,281.23	1,281.25	-0.00%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2023-12-07T04:57:55Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ slim-vgg16_1: PASSED: MIGraphX meets tolerance

✅ slim-mobilenet_1: PASSED: MIGraphX meets tolerance

✅ slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large_uncased_fp16: PASSED: MIGraphX meets tolerance

✅ bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

Enable simplify qdq to work with FP8 types

95616a0

umangyadav requested review from shivadbhavsar and pfultz2 December 7, 2023 03:01

umangyadav self-assigned this Dec 7, 2023

umangyadav added the FP8 issues related to FP8 implemenation label Dec 7, 2023

shivadbhavsar approved these changes Dec 7, 2023

View reviewed changes

pfultz2 approved these changes Dec 7, 2023

View reviewed changes

causten merged commit 22f60c2 into develop Dec 7, 2023
34 of 41 checks passed

causten deleted the qdq_fp8 branch December 8, 2023 17:27

Provide feedback