Add support for MaxPool unit8 type #2973

nives-vukovic · 2024-04-15T14:28:35Z

umangyadav

Is there a test that you want to enable or add ?

nives-vukovic · 2024-04-15T15:44:43Z

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

migraphx-bot · 2024-04-15T16:29:30Z

Test	Batch	Rate new 2caa6a	Rate old a19fdf	Diff	Compare
torchvision-resnet50	64	2,814.88	2,820.45	-0.20%	✅
torchvision-resnet50_fp16	64	6,407.33	6,408.42	-0.02%	✅
torchvision-densenet121	32	2,089.45	2,088.75	0.03%	✅
torchvision-densenet121_fp16	32	3,693.47	3,698.96	-0.15%	✅
torchvision-inceptionv3	32	1,605.72	1,601.98	0.23%	✅
torchvision-inceptionv3_fp16	32	2,551.33	2,553.71	-0.09%	✅
cadene-inceptionv4	16	718.59	718.66	-0.01%	✅
cadene-resnext64x4	16	681.11	680.56	0.08%	✅
slim-mobilenet	64	5,946.46	5,944.64	0.03%	✅
slim-nasnetalarge	64	154.07	154.05	0.01%	✅
slim-resnet50v2	64	2,588.13	2,591.45	-0.13%	✅
bert-mrpc-onnx	8	921.14	921.31	-0.02%	✅
bert-mrpc-tf	1	396.43	395.25	0.30%	✅
pytorch-examples-wlang-gru	1	395.64	392.84	0.71%	✅
pytorch-examples-wlang-lstm	1	377.79	373.94	1.03%	✅
torchvision-resnet50_1	1	605.24	608.05	-0.46%	✅
cadene-dpn92_1	1	392.10	389.03	0.79%	✅
cadene-resnext101_1	1	331.79	331.79	-0.00%	✅
onnx-taau-downsample	1	307.67	306.95	0.23%	✅
dlrm-criteoterabyte	1	28.85	28.88	-0.10%	✅
dlrm-criteoterabyte_fp16	1	48.27	48.21	0.13%	✅
agentmodel	1	7,365.72	7,240.40	1.73%	✅
unet_fp16	2	57.16	57.50	-0.59%	✅
resnet50v1_fp16	1	896.39	913.30	-1.85%	✅
resnet50v1_int8	1	792.46	810.79	-2.26%	✅
bert_base_cased_fp16	64	1,033.96	1,033.73	0.02%	✅
bert_large_uncased_fp16	32	300.55	300.40	0.05%	✅
bert_large_fp16	1	159.90	160.21	-0.19%	✅
distilgpt2_fp16	16	1,854.88	1,854.48	0.02%	✅
yolov5s	1	470.10	475.20	-1.07%	✅
tinyllama	1	32.99	32.99	-0.00%	✅
vicuna-fastchat	1	158.21	157.55	0.42%	✅
whisper-tiny-encoder	1	347.86	347.38	0.14%	✅
whisper-tiny-decoder	1	396.11	397.49	-0.35%	✅

This build is OK for merge ✅

migraphx-bot · 2024-04-15T16:29:31Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

TedThemistokleous · 2024-04-15T17:24:59Z

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

Well if there are any ONNX tests on backend that this breaks on or allows us to run, we should turn those on for coverage and use that to run in our CI if that fixes it.

Secondly what models from that list does this fix? Do you have a comprehensive list? Add that to this ticket and we can determine if we need to add coverage for them here or make them part of another set of runs.

codecov · 2024-04-15T17:25:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.77%. Comparing base (e4013bb) to head (a1bf49b).
Report is 7 commits behind head on develop.

❗ Current head a1bf49b differs from pull request most recent head 9f5d573. Consider uploading reports for the commit 9f5d573 to get more accurate results

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #2973   +/-   ##
========================================
  Coverage    91.77%   91.77%           
========================================
  Files          484      484           
  Lines        18711    18715    +4     
========================================
+ Hits         17172    17176    +4     
  Misses        1539     1539

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nives-vukovic · 2024-04-16T14:37:05Z

@TedThemistokleous I added a full list of models that don't break due to the unsupported error anymore, they compile and run successfully, but they do have accuracy issues as stated in the comment migraphx-benchmark#165.
I have also enabled one backend test that passes now.

TedThemistokleous · 2024-04-16T16:58:59Z

@TedThemistokleous I added a full list of models that don't break due to the unsupported error anymore, they compile and run successfully, but they do have accuracy issues as stated in the comment migraphx-benchmark#165. I have also enabled one backend test that passes now.

Accuracy could be related to uint8-int8 related conversion since you need to shift the data and zero points accordingly. See #2903

umangyadav · 2024-04-17T12:04:32Z

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

Having a onnx backend test is good. But in addition to that can you make a few of the verify tests with unit8 type.
For example :
You can templatize following tests to test them with uint8.
https://github.com/ROCm/AMDMIGraphX/blob/develop/test/verify/test_avg_pooling_3d.cpp
https://github.com/ROCm/AMDMIGraphX/blob/develop/test/verify/test_global_max_pooling.cpp

nives-vukovic · 2024-04-18T12:58:55Z

@umangyadav I added two verify tests for MaxPool as requested. Uint8 type or any integer type is not supported for AveragePool, as it can be seen https://onnx.ai/onnx/operators/onnx__AveragePool.html.

umangyadav · 2024-04-18T13:17:32Z

@umangyadav I added two verify tests for MaxPool as requested. Uint8 type or any integer type is not supported for AveragePool, as it can be seen https://onnx.ai/onnx/operators/onnx__AveragePool.html.

ONNX doesn't seem to support that but inside MIGraphX we should be able to create avgpool with uint8 dtype.

nives-vukovic · 2024-04-18T15:03:31Z

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with
RMS Error: 0.00927837
ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14
target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14
Max diff: 1
Mismatch at 9: !=

The test that fails has padding that is not default.

umangyadav · 2024-04-23T12:21:13Z

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with RMS Error: 0.00927837 ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 Max diff: 1 Mismatch at 9: !=

The test that fails has padding that is not default.

@nives-vukovic can you open an issue about the avgpool failure with padding ? I'll approve this PR for now.

TedThemistokleous · 2024-04-23T16:20:01Z

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with RMS Error: 0.00927837 ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 Max diff: 1 Mismatch at 9: !=
The test that fails has padding that is not default.

@nives-vukovic can you open an issue about the avgpool failure with padding ? I'll approve this PR for now.

Agree with @umangyadav on this. Lets get this in. Adding support shouldnt be difficult

causten · 2024-04-25T13:16:19Z

@nives-vukovic there seems to be failure in the clang_asan run. Can you investigate

nives-vukovic · 2024-04-25T13:28:50Z

@nives-vukovic there seems to be failure in the clang_asan run. Can you investigate

Unfortunately, I don't have access to the outputs of these errors. Can those be provided?

nives-vukovic · 2024-04-29T15:02:24Z

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error:
/home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

TedThemistokleous · 2024-04-30T02:12:53Z

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

nives-vukovic · 2024-05-08T11:32:47Z

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

Did you happen to look into this issue or have an advice where to dig for the problem?

umangyadav · 2024-05-08T13:00:55Z

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

Did you happen to look into this issue or have an advice where to dig for the problem?

Clang ASAN is complaining because, test is using Ceil mode. Last window of maxpool is operation on padded region completely which are initialized with numeric_limits<float>::Lowest() . Entire maxpool window is over padded region.
Later "convert" tries to convert <float>::Lowest() to uint8_type and then Clang ASAN complains. I think this error can be ignored or suppressed but i am not sure how it can be done.

umangyadav · 2024-05-08T13:10:53Z

Discussed potential solutions:

Check if MIGraphX is running in sanitizer mode or not by checking some macro. If it is then disable sanitizer on "convert"operation.
Do sanitizer safe convert operation for the downcast operations from float to integers when running in sanitizer mode.
Disable specific sanitizer check on convert operation. I am not sure if it is possible.

This reverts commit c7d5f09.

Add pooling to list of operations that do not support unit8 type

2c7a103

nives-vukovic requested review from TedThemistokleous and umangyadav April 15, 2024 14:28

umangyadav reviewed Apr 15, 2024

View reviewed changes

TedThemistokleous added bugfix Fixes a bug found in the code. Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase labels Apr 15, 2024

Enable maxpool uint8 backend test

e57136b

nives-vukovic and others added 2 commits April 18, 2024 12:56

Add uint8 type maxpool verify tests

073a39b

Merge branch 'develop' into pooling_uint8_error

7128823

nives-vukovic marked this pull request as ready for review April 18, 2024 12:59

nives-vukovic requested a review from causten as a code owner April 18, 2024 12:59

Verify test licencing fix

2a67e34

nives-vukovic requested a review from umangyadav April 22, 2024 15:15

umangyadav approved these changes Apr 23, 2024

View reviewed changes

nives-vukovic changed the title ~~Add pooling to list of operations that do not support unit8 type~~ Add support for MaxPool unit8 type Apr 23, 2024

nives-vukovic mentioned this pull request Apr 23, 2024

Issue with int8/uint8 type for AveragePool and int8 for MaxPool #2991

Open

TedThemistokleous approved these changes Apr 23, 2024

View reviewed changes

nives-vukovic and others added 3 commits April 24, 2024 13:25

Merge branch 'develop' into pooling_uint8_error

244428e

Merge branch 'develop' into pooling_uint8_error

a1bf49b

Merge branch 'develop' into pooling_uint8_error

9f5d573

Merge branch 'develop' into pooling_uint8_error

2caa6aa

causten merged commit c7d5f09 into develop May 8, 2024
38 of 42 checks passed

causten deleted the pooling_uint8_error branch May 8, 2024 19:00

causten added a commit that referenced this pull request May 8, 2024

Revert "Add support for MaxPool unit8 type (#2973)"

7b4d9fa

This reverts commit c7d5f09.

causten mentioned this pull request May 8, 2024

Revert "Add support for MaxPool unit8 type" #3057

Merged

nives-vukovic mentioned this pull request May 16, 2024

Fix Clang ASAN issue by handling float to integer overflow in convert operator #3071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for MaxPool unit8 type #2973

Add support for MaxPool unit8 type #2973

nives-vukovic commented Apr 15, 2024

umangyadav left a comment

nives-vukovic commented Apr 15, 2024

migraphx-bot commented Apr 15, 2024 •

edited

Loading

migraphx-bot commented Apr 15, 2024 •

edited

Loading

TedThemistokleous commented Apr 15, 2024

codecov bot commented Apr 15, 2024 •

edited

Loading

nives-vukovic commented Apr 16, 2024

TedThemistokleous commented Apr 16, 2024

umangyadav commented Apr 17, 2024

nives-vukovic commented Apr 18, 2024

umangyadav commented Apr 18, 2024

nives-vukovic commented Apr 18, 2024 •

edited

Loading

umangyadav commented Apr 23, 2024

TedThemistokleous commented Apr 23, 2024

causten commented Apr 25, 2024

nives-vukovic commented Apr 25, 2024

nives-vukovic commented Apr 29, 2024 •

edited

Loading

TedThemistokleous commented Apr 30, 2024

nives-vukovic commented May 8, 2024

umangyadav commented May 8, 2024

umangyadav commented May 8, 2024

Add support for MaxPool unit8 type #2973

Add support for MaxPool unit8 type #2973

Conversation

nives-vukovic commented Apr 15, 2024

umangyadav left a comment

Choose a reason for hiding this comment

nives-vukovic commented Apr 15, 2024

migraphx-bot commented Apr 15, 2024 • edited Loading

migraphx-bot commented Apr 15, 2024 • edited Loading

TedThemistokleous commented Apr 15, 2024

codecov bot commented Apr 15, 2024 • edited Loading

Codecov Report

nives-vukovic commented Apr 16, 2024

TedThemistokleous commented Apr 16, 2024

umangyadav commented Apr 17, 2024

nives-vukovic commented Apr 18, 2024

umangyadav commented Apr 18, 2024

nives-vukovic commented Apr 18, 2024 • edited Loading

umangyadav commented Apr 23, 2024

TedThemistokleous commented Apr 23, 2024

causten commented Apr 25, 2024

nives-vukovic commented Apr 25, 2024

nives-vukovic commented Apr 29, 2024 • edited Loading

TedThemistokleous commented Apr 30, 2024

nives-vukovic commented May 8, 2024

umangyadav commented May 8, 2024

umangyadav commented May 8, 2024

migraphx-bot commented Apr 15, 2024 •

edited

Loading

migraphx-bot commented Apr 15, 2024 •

edited

Loading

codecov bot commented Apr 15, 2024 •

edited

Loading

nives-vukovic commented Apr 18, 2024 •

edited

Loading

nives-vukovic commented Apr 29, 2024 •

edited

Loading