Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QLinearMul operator #2430

Merged
merged 4 commits into from
Nov 17, 2023
Merged

Add QLinearMul operator #2430

merged 4 commits into from
Nov 17, 2023

Conversation

gyulaz-htec
Copy link
Collaborator

QLinearMul operator is required for the DenseNet-int8 onnx zoo model.

Fixes: migraphx-benchmark#150

@pfultz2
Copy link
Collaborator

pfultz2 commented Nov 10, 2023

This should reuse the parse_qlinearadd class, but it should be renamed to parse_qlinearbinary, you can use opd.op_name to get the operator name:

struct parse_qlinearbinary : op_parser<parse_qlinearbinary>
{
    std::vector<op_desc> operators() const { return {{"QLinearAdd", "add"}, {"QLinearMul", "mul"}}; }

    void check_inputs(...) const
    {
        ...
    }

    instruction_ref parse(const op_desc& opd,
                          const onnx_parser& /*parser*/,
                          const onnx_parser::node_info& info,
                          const std::vector<instruction_ref>& args) const
    {
        check_inputs(args);

        // A
        const auto& in_a         = args[0];
        const auto& in_scale_a   = args[1];
        const auto& in_zero_pt_a = args[2];

        auto dquant_a = bcast_qdq_instr("dequantizelinear", in_a, in_scale_a, in_zero_pt_a, info);

        // B
        const auto& in_b         = args[3];
        const auto& in_scale_b   = args[4];
        const auto& in_zero_pt_b = args[5];
        auto dquant_b = bcast_qdq_instr("dequantizelinear", in_b, in_scale_b, in_zero_pt_b, info);

        // C = A + B
        auto out_c = info.add_common_op(opd.op_name, dquant_a, dquant_b);

        const auto& in_scale_c = args[6];

        // zero_pt for C is supplied as the last optional argument..
        if(args.size() == 8)
            return (bcast_qdq_instr("quantizelinear", out_c, in_scale_c, args[7], info));

        // if no zero_pt: just broadcast the scale..
        auto bcast_scale_c = bcast_scalar_instr(out_c->get_shape(), in_scale_c, info);
        return (info.add_instruction(migraphx::make_op("quantizelinear"), out_c, bcast_scale_c));
    }
};

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Nov 10, 2023

Test Batch Rate new
cd4b84
Rate old
5488b4
Diff Compare
torchvision-resnet50 64 2,830.97 2,829.72 0.04%
torchvision-resnet50_fp16 64 6,493.61 6,489.77 0.06%
torchvision-densenet121 32 2,092.86 2,097.04 -0.20%
torchvision-densenet121_fp16 32 3,672.18 3,666.45 0.16%
torchvision-inceptionv3 32 1,586.29 1,585.55 0.05%
torchvision-inceptionv3_fp16 32 2,567.95 2,565.93 0.08%
cadene-inceptionv4 16 702.85 703.34 -0.07%
cadene-resnext64x4 16 691.80 691.44 0.05%
slim-mobilenet 64 8,327.42 8,325.11 0.03%
slim-nasnetalarge 64 225.49 225.43 0.03%
slim-resnet50v2 64 2,664.56 2,662.69 0.07%
bert-mrpc-onnx 8 823.16 823.96 -0.10%
bert-mrpc-tf 1 388.87 389.58 -0.18%
pytorch-examples-wlang-gru 1 296.57 300.70 -1.37%
pytorch-examples-wlang-lstm 1 320.81 311.20 3.09% 🔆
torchvision-resnet50_1 1 606.28 606.23 0.01%
torchvision-inceptionv3_1 1 340.40 343.25 -0.83%
cadene-dpn92_1 1 401.46 398.53 0.73%
cadene-resnext101_1 1 326.62 325.19 0.44%
slim-vgg16_1 1 460.42 459.24 0.26%
slim-mobilenet_1 1 2,113.09 2,137.74 -1.15%
slim-inceptionv4_1 1 220.41 217.92 1.14%
onnx-taau-downsample 1 304.04 303.77 0.09%
dlrm-criteoterabyte 1 21.62 21.59 0.15%
dlrm-criteoterabyte_fp16 1 40.56 40.66 -0.23%
agentmodel 1 nan nan nan%
unet_fp16 2 54.72 54.72 -0.01%
resnet50v1_fp16 1 954.12 960.84 -0.70%
bert_base_cased_fp16 64 902.93 903.18 -0.03%
bert_large_uncased_fp16 32 285.55 285.56 -0.00%
bert_large_fp16 1 166.74 166.91 -0.11%
distilgpt2_fp16 16 1,281.71 1,280.50 0.09%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

❌agentmodel: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 336, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 254, in main
pred_migx = np.array(model.run(params)[-1])
RuntimeError: /src/AMDMIGraphX/src/targets/gpu/device/include/migraphx/gpu/device/visit.hpp:140: hip_visit_views_impl: Ranks must be the same


    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

    :white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

Copy link
Contributor

@lakhinderwalia lakhinderwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me understand how you could use the exact test data-set from qlinearadd** tests to verify for qlinearmul_test & qlinear_bcast_test. The compute results for Add and Mul operations should be (very) different. Thanks.

@gyulaz-htec
Copy link
Collaborator Author

@lakhinderwalia I've lowered the scales for A and B from 0.05 to 0.005 and increased the scale for C compared to QLinearAdd.
With these changes the output of ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale is below 0.5, so adding it to to C_zero_point will always output the value of C_zero_point

@gyulaz-htec
Copy link
Collaborator Author

@pfultz2 @lakhinderwalia I've updated the PR with the proposed parse_qlinearadd struct. Also if you want me to change the test to use more diverse results/scale values/zero points, let me know and I will update it.

@gyulaz-htec gyulaz-htec force-pushed the qlinearmul branch 2 times, most recently from 2fe89ff to 846ccc3 Compare November 13, 2023 11:27
@lakhinderwalia
Copy link
Contributor

@lakhinderwalia I've lowered the scales for A and B from 0.05 to 0.005 and increased the scale for C compared to QLinearAdd. With these changes the output of ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale is below 0.5, so adding it to to C_zero_point will always output the value of C_zero_point

Please note, all the output values that you are testing/verifying are zeros. This test-vector originally designed for Add isn't right for Multiplication. It is likely that your scale is so small that it is all effectively converting to zeros -- and the test passes. Thanks.

@gyulaz-htec
Copy link
Collaborator Author

@lakhinderwalia I've changed the scale values so the in between values are not zeros, also updated the results according to that.

@gyulaz-htec gyulaz-htec requested a review from pfultz2 November 14, 2023 12:54
zero_pt_a = helper.make_tensor('A_zero_point', TensorProto.UINT8, [], [0])

b = helper.make_tensor_value_info('B', TensorProto.UINT8, [64])
sc_b = helper.make_tensor('B_scale', TensorProto.FLOAT, [], [-0.05])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantization scale is always a positive number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to positive number, and the test results according to that.

Copy link
Contributor

@lakhinderwalia lakhinderwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@causten causten merged commit 0102d44 into develop Nov 17, 2023
8 of 9 checks passed
@causten causten deleted the qlinearmul branch January 5, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QLinearMul is not supported
5 participants