Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MaxPool unit8 type #2973

Merged
merged 9 commits into from
May 8, 2024
Merged

Add support for MaxPool unit8 type #2973

merged 9 commits into from
May 8, 2024

Conversation

nives-vukovic
Copy link
Collaborator

Copy link
Member

@umangyadav umangyadav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that you want to enable or add ?

@nives-vukovic
Copy link
Collaborator Author

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Apr 15, 2024

Test Batch Rate new
2caa6a
Rate old
a19fdf
Diff Compare
torchvision-resnet50 64 2,814.88 2,820.45 -0.20%
torchvision-resnet50_fp16 64 6,407.33 6,408.42 -0.02%
torchvision-densenet121 32 2,089.45 2,088.75 0.03%
torchvision-densenet121_fp16 32 3,693.47 3,698.96 -0.15%
torchvision-inceptionv3 32 1,605.72 1,601.98 0.23%
torchvision-inceptionv3_fp16 32 2,551.33 2,553.71 -0.09%
cadene-inceptionv4 16 718.59 718.66 -0.01%
cadene-resnext64x4 16 681.11 680.56 0.08%
slim-mobilenet 64 5,946.46 5,944.64 0.03%
slim-nasnetalarge 64 154.07 154.05 0.01%
slim-resnet50v2 64 2,588.13 2,591.45 -0.13%
bert-mrpc-onnx 8 921.14 921.31 -0.02%
bert-mrpc-tf 1 396.43 395.25 0.30%
pytorch-examples-wlang-gru 1 395.64 392.84 0.71%
pytorch-examples-wlang-lstm 1 377.79 373.94 1.03%
torchvision-resnet50_1 1 605.24 608.05 -0.46%
cadene-dpn92_1 1 392.10 389.03 0.79%
cadene-resnext101_1 1 331.79 331.79 -0.00%
onnx-taau-downsample 1 307.67 306.95 0.23%
dlrm-criteoterabyte 1 28.85 28.88 -0.10%
dlrm-criteoterabyte_fp16 1 48.27 48.21 0.13%
agentmodel 1 7,365.72 7,240.40 1.73%
unet_fp16 2 57.16 57.50 -0.59%
resnet50v1_fp16 1 896.39 913.30 -1.85%
resnet50v1_int8 1 792.46 810.79 -2.26%
bert_base_cased_fp16 64 1,033.96 1,033.73 0.02%
bert_large_uncased_fp16 32 300.55 300.40 0.05%
bert_large_fp16 1 159.90 160.21 -0.19%
distilgpt2_fp16 16 1,854.88 1,854.48 0.02%
yolov5s 1 470.10 475.20 -1.07%
tinyllama 1 32.99 32.99 -0.00%
vicuna-fastchat 1 158.21 157.55 0.42%
whisper-tiny-encoder 1 347.86 347.38 0.14%
whisper-tiny-decoder 1 396.11 397.49 -0.35%

This build is OK for merge ✅

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Apr 15, 2024


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

@TedThemistokleous
Copy link
Collaborator

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

Well if there are any ONNX tests on backend that this breaks on or allows us to run, we should turn those on for coverage and use that to run in our CI if that fixes it.

Secondly what models from that list does this fix? Do you have a comprehensive list? Add that to this ticket and we can determine if we need to add coverage for them here or make them part of another set of runs.

@TedThemistokleous TedThemistokleous added bugfix Fixes a bug found in the code. Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase labels Apr 15, 2024
Copy link

codecov bot commented Apr 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.77%. Comparing base (e4013bb) to head (a1bf49b).
Report is 7 commits behind head on develop.

❗ Current head a1bf49b differs from pull request most recent head 9f5d573. Consider uploading reports for the commit 9f5d573 to get more accurate results

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2973   +/-   ##
========================================
  Coverage    91.77%   91.77%           
========================================
  Files          484      484           
  Lines        18711    18715    +4     
========================================
+ Hits         17172    17176    +4     
  Misses        1539     1539           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nives-vukovic
Copy link
Collaborator Author

@TedThemistokleous I added a full list of models that don't break due to the unsupported error anymore, they compile and run successfully, but they do have accuracy issues as stated in the comment migraphx-benchmark#165.
I have also enabled one backend test that passes now.

@TedThemistokleous
Copy link
Collaborator

@TedThemistokleous I added a full list of models that don't break due to the unsupported error anymore, they compile and run successfully, but they do have accuracy issues as stated in the comment migraphx-benchmark#165. I have also enabled one backend test that passes now.

Accuracy could be related to uint8-int8 related conversion since you need to shift the data and zero points accordingly. See #2903

@umangyadav
Copy link
Member

@umangyadav That is why I left it in draft state to check with you about this. Currently it fixes crashes for a number of models listed in https://github.com/gyulaz-htec/models/blob/migraphx_testing/MIGRAPHX_fp32-int8.md and also one python backend test that I will remove from disabled list. So this causes issues only on the gpu, is there any other test that you would prefer to have for this kind of fix?

Having a onnx backend test is good. But in addition to that can you make a few of the verify tests with unit8 type.
For example :
You can templatize following tests to test them with uint8.
https://github.com/ROCm/AMDMIGraphX/blob/develop/test/verify/test_avg_pooling_3d.cpp
https://github.com/ROCm/AMDMIGraphX/blob/develop/test/verify/test_global_max_pooling.cpp

@nives-vukovic
Copy link
Collaborator Author

@umangyadav I added two verify tests for MaxPool as requested. Uint8 type or any integer type is not supported for AveragePool, as it can be seen https://onnx.ai/onnx/operators/onnx__AveragePool.html.

@nives-vukovic nives-vukovic marked this pull request as ready for review April 18, 2024 12:59
@nives-vukovic nives-vukovic requested a review from causten as a code owner April 18, 2024 12:59
@umangyadav
Copy link
Member

@umangyadav I added two verify tests for MaxPool as requested. Uint8 type or any integer type is not supported for AveragePool, as it can be seen https://onnx.ai/onnx/operators/onnx__AveragePool.html.

ONNX doesn't seem to support that but inside MIGraphX we should be able to create avgpool with uint8 dtype.

@nives-vukovic
Copy link
Collaborator Author

nives-vukovic commented Apr 18, 2024

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with
RMS Error: 0.00927837
ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14
target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14
Max diff: 1
Mismatch at 9: !=

The test that fails has padding that is not default.

@umangyadav
Copy link
Member

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with RMS Error: 0.00927837 ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 Max diff: 1 Mismatch at 9: !=

The test that fails has padding that is not default.

@nives-vukovic can you open an issue about the avgpool failure with padding ? I'll approve this PR for now.

@nives-vukovic nives-vukovic changed the title Add pooling to list of operations that do not support unit8 type Add support for MaxPool unit8 type Apr 23, 2024
@TedThemistokleous
Copy link
Collaborator

@umangyadav @TedThemistokleous Currently as I see test_avg_pooling_3d_opt passes for uint8_type, while test_avg_pooling_3d fails with RMS Error: 0.00927837 ref:14, 16, 12, 18, 22, 13, 12, 17, 14, 18, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 target:14, 16, 12, 18, 22, 13, 12, 17, 14, 17, 15, 14, 12, 18, 13, 15, 10, 16, 18, 19, 15, 18, 13, 14 Max diff: 1 Mismatch at 9: !=
The test that fails has padding that is not default.

@nives-vukovic can you open an issue about the avgpool failure with padding ? I'll approve this PR for now.

Agree with @umangyadav on this. Lets get this in. Adding support shouldnt be difficult

@causten
Copy link
Collaborator

causten commented Apr 25, 2024

@nives-vukovic there seems to be failure in the clang_asan run. Can you investigate

@nives-vukovic
Copy link
Collaborator Author

@nives-vukovic there seems to be failure in the clang_asan run. Can you investigate

Unfortunately, I don't have access to the outputs of these errors. Can those be provided?

@nives-vukovic
Copy link
Collaborator Author

nives-vukovic commented Apr 29, 2024

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error:
/home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

@TedThemistokleous
Copy link
Collaborator

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

@nives-vukovic
Copy link
Collaborator Author

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

Did you happen to look into this issue or have an advice where to dig for the problem?

@umangyadav
Copy link
Member

@TedThemistokleous @umangyadav It seems that clang_asan fails for test test_max_pooling_ceil_3d<migraphx::shape::uint8_type>, I managed to reproduce this issue on our system, as far as I can with this cmake configuration it runs all tests on cpu, instead of gpu and fails for this particular test case with error: /home/jenkins/workspace/AMDMIGraphX_PR-2973/src/include/migraphx/shape.hpp:338:25: runtime error: -3.40282e+38 is outside the range of representable values of type 'unsigned char'

I haven't seen that before. Likely this implies we're missing some sort of conversion? That looks like we're getting something smaller than numeric limits off lowest() for float. https://en.cppreference.com/w/cpp/types/numeric_limits

Did you encountered such issues before? I see that on cpu, there should be conversion of all types to float, so I am not sure why this is reported. I noticed that in previous PRs where you added support of uint8 type you didn't add verify tests, but only different onnx tests, so is this expected?

This isn't expected. We shouldn't be hitting this case. I'll have to dig more

Did you happen to look into this issue or have an advice where to dig for the problem?

Clang ASAN is complaining because, test is using Ceil mode. Last window of maxpool is operation on padded region completely which are initialized with numeric_limits<float>::Lowest() . Entire maxpool window is over padded region.
Later "convert" tries to convert <float>::Lowest() to uint8_type and then Clang ASAN complains. I think this error can be ignored or suppressed but i am not sure how it can be done.

@umangyadav
Copy link
Member

Discussed potential solutions:

  • Check if MIGraphX is running in sanitizer mode or not by checking some macro. If it is then disable sanitizer on "convert"operation.
  • Do sanitizer safe convert operation for the downcast operations from float to integers when running in sanitizer mode.
  • Disable specific sanitizer check on convert operation. I am not sure if it is possible.

@causten causten merged commit c7d5f09 into develop May 8, 2024
38 of 42 checks passed
@causten causten deleted the pooling_uint8_error branch May 8, 2024 19:00
causten added a commit that referenced this pull request May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a bug found in the code. Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsupported uint8_type error with int8 models
5 participants