Add QLinearMul operator #2430

gyulaz-htec · 2023-11-10T12:38:37Z

QLinearMul operator is required for the DenseNet-int8 onnx zoo model.

pfultz2 · 2023-11-10T16:35:20Z

This should reuse the parse_qlinearadd class, but it should be renamed to parse_qlinearbinary, you can use opd.op_name to get the operator name:

struct parse_qlinearbinary : op_parser<parse_qlinearbinary>
{
    std::vector<op_desc> operators() const { return {{"QLinearAdd", "add"}, {"QLinearMul", "mul"}}; }

    void check_inputs(...) const
    {
        ...
    }

    instruction_ref parse(const op_desc& opd,
                          const onnx_parser& /*parser*/,
                          const onnx_parser::node_info& info,
                          const std::vector<instruction_ref>& args) const
    {
        check_inputs(args);

        // A
        const auto& in_a         = args[0];
        const auto& in_scale_a   = args[1];
        const auto& in_zero_pt_a = args[2];

        auto dquant_a = bcast_qdq_instr("dequantizelinear", in_a, in_scale_a, in_zero_pt_a, info);

        // B
        const auto& in_b         = args[3];
        const auto& in_scale_b   = args[4];
        const auto& in_zero_pt_b = args[5];
        auto dquant_b = bcast_qdq_instr("dequantizelinear", in_b, in_scale_b, in_zero_pt_b, info);

        // C = A + B
        auto out_c = info.add_common_op(opd.op_name, dquant_a, dquant_b);

        const auto& in_scale_c = args[6];

        // zero_pt for C is supplied as the last optional argument..
        if(args.size() == 8)
            return (bcast_qdq_instr("quantizelinear", out_c, in_scale_c, args[7], info));

        // if no zero_pt: just broadcast the scale..
        auto bcast_scale_c = bcast_scalar_instr(out_c->get_shape(), in_scale_c, info);
        return (info.add_instruction(migraphx::make_op("quantizelinear"), out_c, bcast_scale_c));
    }
};

migraphx-bot · 2023-11-10T19:35:18Z

Test	Batch	Rate new cd4b84	Rate old 5488b4	Diff	Compare
torchvision-resnet50	64	2,830.97	2,829.72	0.04%	✅
torchvision-resnet50_fp16	64	6,493.61	6,489.77	0.06%	✅
torchvision-densenet121	32	2,092.86	2,097.04	-0.20%	✅
torchvision-densenet121_fp16	32	3,672.18	3,666.45	0.16%	✅
torchvision-inceptionv3	32	1,586.29	1,585.55	0.05%	✅
torchvision-inceptionv3_fp16	32	2,567.95	2,565.93	0.08%	✅
cadene-inceptionv4	16	702.85	703.34	-0.07%	✅
cadene-resnext64x4	16	691.80	691.44	0.05%	✅
slim-mobilenet	64	8,327.42	8,325.11	0.03%	✅
slim-nasnetalarge	64	225.49	225.43	0.03%	✅
slim-resnet50v2	64	2,664.56	2,662.69	0.07%	✅
bert-mrpc-onnx	8	823.16	823.96	-0.10%	✅
bert-mrpc-tf	1	388.87	389.58	-0.18%	✅
pytorch-examples-wlang-gru	1	296.57	300.70	-1.37%	✅
pytorch-examples-wlang-lstm	1	320.81	311.20	3.09%	🔆
torchvision-resnet50_1	1	606.28	606.23	0.01%	✅
torchvision-inceptionv3_1	1	340.40	343.25	-0.83%	✅
cadene-dpn92_1	1	401.46	398.53	0.73%	✅
cadene-resnext101_1	1	326.62	325.19	0.44%	✅
slim-vgg16_1	1	460.42	459.24	0.26%	✅
slim-mobilenet_1	1	2,113.09	2,137.74	-1.15%	✅
slim-inceptionv4_1	1	220.41	217.92	1.14%	✅
onnx-taau-downsample	1	304.04	303.77	0.09%	✅
dlrm-criteoterabyte	1	21.62	21.59	0.15%	✅
dlrm-criteoterabyte_fp16	1	40.56	40.66	-0.23%	✅
agentmodel	1	nan	nan	nan%	❌
unet_fp16	2	54.72	54.72	-0.01%	✅
resnet50v1_fp16	1	954.12	960.84	-0.70%	✅
bert_base_cased_fp16	64	902.93	903.18	-0.03%	✅
bert_large_uncased_fp16	32	285.55	285.56	-0.00%	✅
bert_large_fp16	1	166.74	166.91	-0.11%	✅
distilgpt2_fp16	16	1,281.71	1,280.50	0.09%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2023-11-10T19:35:20Z

:white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

:white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

:white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

❌agentmodel: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 336, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 254, in main
pred_migx = np.array(model.run(params)[-1])
RuntimeError: /src/AMDMIGraphX/src/targets/gpu/device/include/migraphx/gpu/device/visit.hpp:140: hip_visit_views_impl: Ranks must be the same

:white_check_mark:unet: PASSED: MIGraphX meets tolerance

:white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

lakhinderwalia

Could you please help me understand how you could use the exact test data-set from qlinearadd** tests to verify for qlinearmul_test & qlinear_bcast_test. The compute results for Add and Mul operations should be (very) different. Thanks.

gyulaz-htec · 2023-11-13T06:17:10Z

@lakhinderwalia I've lowered the scales for A and B from 0.05 to 0.005 and increased the scale for C compared to QLinearAdd.
With these changes the output of ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale is below 0.5, so adding it to to C_zero_point will always output the value of C_zero_point

gyulaz-htec · 2023-11-13T09:14:10Z

@pfultz2 @lakhinderwalia I've updated the PR with the proposed parse_qlinearadd struct. Also if you want me to change the test to use more diverse results/scale values/zero points, let me know and I will update it.

lakhinderwalia · 2023-11-13T17:09:32Z

@lakhinderwalia I've lowered the scales for A and B from 0.05 to 0.005 and increased the scale for C compared to QLinearAdd. With these changes the output of ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale is below 0.5, so adding it to to C_zero_point will always output the value of C_zero_point

Please note, all the output values that you are testing/verifying are zeros. This test-vector originally designed for Add isn't right for Multiplication. It is likely that your scale is so small that it is all effectively converting to zeros -- and the test passes. Thanks.

gyulaz-htec · 2023-11-14T12:54:21Z

@lakhinderwalia I've changed the scale values so the in between values are not zeros, also updated the results according to that.

lakhinderwalia · 2023-11-15T05:12:29Z

test/onnx/gen_onnx.py

+    zero_pt_a = helper.make_tensor('A_zero_point', TensorProto.UINT8, [], [0])
+
+    b = helper.make_tensor_value_info('B', TensorProto.UINT8, [64])
+    sc_b = helper.make_tensor('B_scale', TensorProto.FLOAT, [], [-0.05])


Quantization scale is always a positive number.

Changed it to positive number, and the test results according to that.

lakhinderwalia

Thank you.

gyulaz-htec requested review from lakhinderwalia, pfultz2 and CharlieL7 and removed request for lakhinderwalia and pfultz2 November 10, 2023 12:38

gyulaz-htec mentioned this pull request Nov 10, 2023

ONNX Model Zoo models migraphx-benchmark/AMDMIGraphX#141

Open

26 tasks

lakhinderwalia reviewed Nov 10, 2023

View reviewed changes

gyulaz-htec force-pushed the qlinearmul branch from 26e8dc7 to 9a5db27 Compare November 13, 2023 08:29

gyulaz-htec requested a review from lakhinderwalia November 13, 2023 09:14

gyulaz-htec force-pushed the qlinearmul branch 2 times, most recently from 2fe89ff to 846ccc3 Compare November 13, 2023 11:27

pfultz2 approved these changes Nov 13, 2023

View reviewed changes

gyulaz-htec force-pushed the qlinearmul branch from 846ccc3 to 4534fcd Compare November 14, 2023 12:51

gyulaz-htec requested a review from pfultz2 November 14, 2023 12:54

lakhinderwalia reviewed Nov 15, 2023

View reviewed changes

Add QLinearMul operator

566907f

gyulaz-htec force-pushed the qlinearmul branch from 4534fcd to 566907f Compare November 15, 2023 09:35

gyulaz-htec requested a review from lakhinderwalia November 15, 2023 09:53

lakhinderwalia approved these changes Nov 16, 2023

View reviewed changes

Merge branch 'develop' into qlinearmul

7bffbe4

pfultz2 approved these changes Nov 16, 2023

View reviewed changes

Merge branch 'develop' into qlinearmul

cd4b841

Merge branch 'develop' into qlinearmul

4e36cf7

causten merged commit 0102d44 into develop Nov 17, 2023
8 of 9 checks passed

causten deleted the qlinearmul branch January 5, 2024 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QLinearMul operator #2430

Add QLinearMul operator #2430

gyulaz-htec commented Nov 10, 2023

pfultz2 commented Nov 10, 2023

migraphx-bot commented Nov 10, 2023 •

edited

Loading

migraphx-bot commented Nov 10, 2023

lakhinderwalia left a comment

gyulaz-htec commented Nov 13, 2023

gyulaz-htec commented Nov 13, 2023

lakhinderwalia commented Nov 13, 2023

gyulaz-htec commented Nov 14, 2023

lakhinderwalia Nov 15, 2023

gyulaz-htec Nov 15, 2023

lakhinderwalia left a comment

Add QLinearMul operator #2430

Add QLinearMul operator #2430

Conversation

gyulaz-htec commented Nov 10, 2023

pfultz2 commented Nov 10, 2023

migraphx-bot commented Nov 10, 2023 • edited Loading

migraphx-bot commented Nov 10, 2023

lakhinderwalia left a comment

Choose a reason for hiding this comment

gyulaz-htec commented Nov 13, 2023

gyulaz-htec commented Nov 13, 2023

lakhinderwalia commented Nov 13, 2023

gyulaz-htec commented Nov 14, 2023

lakhinderwalia Nov 15, 2023

Choose a reason for hiding this comment

gyulaz-htec Nov 15, 2023

Choose a reason for hiding this comment

lakhinderwalia left a comment

Choose a reason for hiding this comment

migraphx-bot commented Nov 10, 2023 •

edited

Loading