New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add DynamicQuantizeLinear op #2489

Merged

causten merged 5 commits into develop from dynamic_quantize_linear

Dec 12, 2023

Collaborator

gyulaz-htec commented Nov 30, 2023

Add support for DynamicQuantizeLinear operator.
This implementation only works with static shapes due to the use of reshape. Reshape is needed to get the max and min values across the entire input tensor. Any idea on how to solve that is welcome.

Fixes: migraphx-benchmark#91

gyulaz-htec requested review from pfultz2, CharlieL7 and lakhinderwalia and removed request for pfultz2

November 30, 2023 11:02

gyulaz-htec mentioned this pull request

ONNX Model Zoo models migraphx-benchmark/AMDMIGraphX#141

Open

26 tasks

gyulaz-htec force-pushed the dynamic_quantize_linear branch from a824b76 to 40c5b6c Compare

November 30, 2023 11:58

Collaborator

migraphx-bot commented Nov 30, 2023 •

edited

Loading

Test	Batch	Rate new 7cb098	Rate old 9d2003	Diff	Compare
torchvision-resnet50	64	2,832.70	2,834.89	-0.08%	✅
torchvision-resnet50_fp16	64	6,496.94	6,504.77	-0.12%	✅
torchvision-densenet121	32	2,095.30	2,096.39	-0.05%	✅
torchvision-densenet121_fp16	32	3,663.36	3,663.79	-0.01%	✅
torchvision-inceptionv3	32	1,597.77	1,593.15	0.29%	✅
torchvision-inceptionv3_fp16	32	2,563.84	2,561.29	0.10%	✅
cadene-inceptionv4	16	722.21	722.57	-0.05%	✅
cadene-resnext64x4	16	691.66	692.10	-0.06%	✅
slim-mobilenet	64	8,333.46	8,334.20	-0.01%	✅
slim-nasnetalarge	64	230.55	230.62	-0.03%	✅
slim-resnet50v2	64	2,663.05	2,665.22	-0.08%	✅
bert-mrpc-onnx	8	822.96	823.69	-0.09%	✅
bert-mrpc-tf	1	385.53	389.31	-0.97%	✅
pytorch-examples-wlang-gru	1	303.41	303.55	-0.05%	✅
pytorch-examples-wlang-lstm	1	311.63	313.03	-0.45%	✅
torchvision-resnet50_1	1	609.61	607.81	0.30%	✅
torchvision-inceptionv3_1	1	343.60	345.51	-0.55%	✅
cadene-dpn92_1	1	404.00	404.79	-0.20%	✅
cadene-resnext101_1	1	328.15	328.36	-0.06%	✅
slim-vgg16_1	1	459.24	459.17	0.02%	✅
slim-mobilenet_1	1	2,074.25	2,110.55	-1.72%	✅
slim-inceptionv4_1	1	212.51	214.65	-1.00%	✅
onnx-taau-downsample	1	305.11	306.30	-0.39%	✅
dlrm-criteoterabyte	1	21.59	21.63	-0.18%	✅
dlrm-criteoterabyte_fp16	1	40.62	40.54	0.21%	✅
agentmodel	1	5,905.94	5,884.50	0.36%	✅
unet_fp16	2	54.78	54.75	0.05%	✅
resnet50v1_fp16	1	931.21	945.09	-1.47%	✅
bert_base_cased_fp16	64	903.21	903.34	-0.01%	✅
bert_large_uncased_fp16	32	285.67	285.72	-0.02%	✅
bert_large_fp16	1	166.59	166.68	-0.05%	✅
distilgpt2_fp16	16	1,279.20	1,281.90	-0.21%	✅

This build is OK for merge ✅

Collaborator

migraphx-bot commented Nov 30, 2023 •

edited

Loading

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ slim-vgg16_1: PASSED: MIGraphX meets tolerance

✅ slim-mobilenet_1: PASSED: MIGraphX meets tolerance

✅ slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

✅ bert_large_uncased_fp16: PASSED: MIGraphX meets tolerance

✅ bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

lakhinderwalia reviewed

View reviewed changes

test/onnx/verify_onnx.cpp Outdated Show resolved Hide resolved

lakhinderwalia reviewed

View reviewed changes

test/onnx/verify_onnx.cpp Show resolved Hide resolved

gyulaz-htec force-pushed the dynamic_quantize_linear branch from 40c5b6c to 76ab520 Compare

December 1, 2023 09:44

gyulaz-htec requested a review from lakhinderwalia

December 1, 2023 09:46

gyulaz-htec mentioned this pull request

MatMulInteger op is partially supported migraphx-benchmark/AMDMIGraphX#159

Closed

3 tasks

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp

+              namespace migraphx {
+              inline namespace MIGRAPHX_INLINE_NS {
+              namespace onnx {

Contributor

lakhinderwalia Dec 1, 2023

(iff you edit this file again, then.. please cut-n-paste the reference information as comments -- as in some recent operators, e.g. in qlinearconcat).

My basic design question is: the operator reference says: "A Function to fuse calculation for Scale, Zero Point and FP32->8Bit conversion of FP32 Input data.".

But we are not fusing any calculations here.. any thoughts on it? Thanks.

Collaborator Author

gyulaz-htec Dec 4, 2023

I've checked the compiled gpu version from of dynamicquantizelinear_2d_test.onnx.
From that it seems there are lot of fused instructions in kernels:

module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
@1 = hip::hip_allocate_memory[shape=int8_type, {96}, {1},id=main:scratch] -> int8_type, {96}, {1}, target_id=0
@2 = load[offset=0,end=48](@1) -> float_type, {3, 4}, {4, 1}, target_id=0
x = @param:x -> float_type, {3, 4}, {4, 1}, target_id=0
@4 = hip::copy_to_gpu(x,@2) -> float_type, {3, 4}, {4, 1}, target_id=0
@5 = reshape_lazy[dims={12}](@4) -> float_type, {12}, {1}, target_id=0
@6 = load[offset=48,end=60](@1) -> [float_type, {1}, {1}, int64_type, {1}, {1}], target_id=0
@7 = gpu::topk[k=1,axis=0,largest=0](@5,@6) -> [float_type, {1}, {1}, int64_type, {1}, {1}], target_id=0
@8 = load[offset=64,end=68](@1) -> float_type, {1}, {1}, target_id=0
@9 = get_tuple_elem[index=0](@7) -> float_type, {1}, {1}, target_id=0
@10 = gpu::code_object[code_object=4536,symbol_name=min_kernel,global=1,local=1024,](@9,@8) -> float_type, {1}, {1}, target_id=0
@11 = load[offset=80,end=92](@1) -> [float_type, {1}, {1}, int64_type, {1}, {1}], target_id=0
@12 = gpu::topk[k=1,axis=0,largest=1](@5,@11) -> [float_type, {1}, {1}, int64_type, {1}, {1}], target_id=0
@13 = load[offset=48,end=52](@1) -> float_type, {1}, {1}, target_id=0
@14 = get_tuple_elem[index=0](@12) -> float_type, {1}, {1}, target_id=0
@15 = gpu::code_object[code_object=4872,symbol_name=max_sub_mul_kernel,global=1,local=1024,](@14,@10,@13) -> float_type, {1}, {1}, target_id=0
@16 = load[offset=80,end=81](@1) -> uint8_type, {1}, {1}, target_id=0
@17 = gpu::code_object[code_object=4976,symbol_name=neg_div_clip_nearbyint_convert_kernel,global=1,local=1024,](@10,@15,@16) -> uint8_type, {1}, {1}, target_id=0
@18 = hip::copy_from_gpu(@15) -> float_type, {1}, {1}, target_id=0
@19 = hip::copy_from_gpu(@17) -> uint8_type, {1}, {1}, target_id=0
@20 = load[offset=64,end=76](@1) -> uint8_type, {3, 4}, {4, 1}, target_id=0
@21 = multibroadcast[out_lens={3, 4},out_dyn_dims={}](@17) -> uint8_type, {3, 4}, {0, 0}, target_id=0
@22 = multibroadcast[out_lens={3, 4},out_dyn_dims={}](@15) -> float_type, {3, 4}, {0, 0}, target_id=0
@23 = gpu::code_object[code_object=5072,symbol_name=quantizelinear_kernel,global=6,local=1024,](@4,@22,@21,@20) -> uint8_type, {3, 4}, {4, 1}, target_id=0
@24 = hip::copy_from_gpu(@23) -> uint8_type, {3, 4}, {4, 1}, target_id=0
@25 = hip::sync_stream(@24,@18,@19) -> uint8_type, {3, 4}, {4, 1}, target_id=0
@26 = @return(@25,@18,@19), target_id=0

From my point of view, that satisfies the requirement of fusing.

Collaborator Author

gyulaz-htec Dec 4, 2023 •

edited

Loading

(iff you edit this file again, then.. please cut-n-paste the reference information as comments -- as in some recent operators, e.g. in qlinearconcat).

Added the comments after the onnx namespace

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated

+                          migraphx::literal{migraphx::shape{x_type}, {std::numeric_limits<uint8_t>::max()}});
+                      auto q_min = info.add_literal(
+                          migraphx::literal{migraphx::shape{x_type}, {std::numeric_limits<uint8_t>::min()}});
+                      auto x_reshape =

Contributor

lakhinderwalia Dec 1, 2023

Would this step be necessary (for static shapes) if X is 1-D? Thanks.

Collaborator Author

gyulaz-htec Dec 4, 2023

No, it's not needed in that case, I will add a check to skip the conversion in the 1-D case.

Collaborator

pfultz2 Dec 6, 2023

The optimizer will remove the redundant reshapes automatically so its not necessary to do this here.

No need to revert the change either, if you already updated it, just a note for the future.

gyulaz-htec force-pushed the dynamic_quantize_linear branch from 76ab520 to 2a43a71 Compare

December 4, 2023 12:17

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated

+                      auto q_min = info.add_literal(
+                          migraphx::literal{migraphx::shape{x_type}, {std::numeric_limits<uint8_t>::min()}});
+                      auto x_reshape = x;
+                      if(not(x_shape.lens().size() == 1))

Contributor

lakhinderwalia Dec 5, 2023

Optional: Just one comparison would suffice: if(x_shape.lens().size() != 1)

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Show resolved Hide resolved

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated

+                      // y_scale = (maximum(0, max(x)) - minimum(0, min(x))) / (qmax - qmin)
+                      auto sub0    = info.add_instruction(migraphx::make_op("sub"), max_x, min_x);
+                      auto y_scale = info.add_instruction(migraphx::make_op("div"), sub0, q_max);

Contributor

lakhinderwalia Dec 5, 2023

Optional. (q_max - q_min) instead of just q_max on line 130.

Contributor

lakhinderwalia Dec 5, 2023

Excuse me, this is a compile time step, not a new (additional) compute instruction. Hence the above suggestion. Sorry about any confusion.

Collaborator Author

gyulaz-htec Dec 6, 2023

Fixed

lakhinderwalia approved these changes

View reviewed changes

gyulaz-htec mentioned this pull request

MatMulInteger op is partially supported #2512

Closed

3 tasks

gyulaz-htec force-pushed the dynamic_quantize_linear branch 2 times, most recently from e2ca869 to bf8eadd Compare

December 6, 2023 08:51

CharlieL7 reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated Show resolved Hide resolved

CharlieL7 reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated Show resolved Hide resolved

lakhinderwalia reviewed

View reviewed changes

src/onnx/parse_dynamicquantizelinear.cpp Outdated

-                      auto div     = info.add_instruction(migraphx::make_op("sub"), q_max, q_min);
+                      auto sub0 = info.add_instruction(migraphx::make_op("sub"), max_x, min_x);
+                      // qmax - qmin is always 255
+                      auto div     = q_max;

Contributor

lakhinderwalia Dec 6, 2023

// https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html isn't any longer just for uint8. Please remove the comment on line 129.
auto div = q_max - q_min; // line 130.

Collaborator Author

gyulaz-htec Dec 7, 2023

The link you provided is for QuantizeLinear, DynamicQuantizeLinear only has support for uint8: https://onnx.ai/onnx/operators/onnx__DynamicQuantizeLinear.html

Contributor

lakhinderwalia Dec 7, 2023

You are right. There is (still) a disconnect between these two operators, and it shouldn't be!

But please do change line 130 to as suggested: the calculation then applies logically for any type -- including uint8. And that way qmax - qmin is still 255 here. This is not a compute step, but just a compile time- expression. Thanks.

Collaborator Author

gyulaz-htec Dec 8, 2023

auto div = q_max - q_min; // line 130 <- doesn't compile, so I've changed the implementation and added a third literal called scale with the value of q_max-q_min.

gyulaz-htec force-pushed the dynamic_quantize_linear branch from bf8eadd to 6008732 Compare

December 8, 2023 09:08

gyulaz-htec requested review from CharlieL7 and lakhinderwalia

December 8, 2023 09:16

gyulaz-htec force-pushed the dynamic_quantize_linear branch from 6008732 to 16e51d1 Compare

December 8, 2023 09:27

CharlieL7 approved these changes

View reviewed changes

gyulaz-htec added 4 commits

December 11, 2023 08:52


          Add DynamicQuantizeLinear op

f642fc2


          Remove compile step for div

7694a31


          DynQauntLinear replace topk with reduce_max/min

3358faf


          Remove helper function for limits

7cb098b

gyulaz-htec force-pushed the dynamic_quantize_linear branch from 16e51d1 to 7cb098b Compare

December 11, 2023 09:35

codecov-commenter commented Dec 11, 2023 •

edited

Loading

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (9d2003a) 91.50% compared to head (3de3b2f) 91.50%.
Report is 6 commits behind head on develop.

❗ Current head 3de3b2f differs from pull request most recent head 1ec583d. Consider uploading reports for the commit 1ec583d to get more accurate results

Files	Patch %	Lines
src/onnx/parse_dynamicquantizelinear.cpp	96.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #2489   +/-   ##
========================================
  Coverage    91.50%   91.50%           
========================================
  Files          453      454    +1     
  Lines        17183    17208   +25     
========================================
+ Hits         15723    15747   +24     
- Misses        1460     1461    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gyulaz-htec requested a review from CharlieL7

December 11, 2023 15:03

lakhinderwalia approved these changes

View reviewed changes

Contributor

lakhinderwalia left a comment

Thank you.

CharlieL7 approved these changes

View reviewed changes


          Merge branch 'develop' into dynamic_quantize_linear

1ec583d

causten merged commit 5fe1b07 into develop

8 of 9 checks passed

causten deleted the dynamic_quantize_linear branch

December 12, 2023 21:58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet