Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Clang ASAN issue by handling float to integer overflow in convert operator #3071

Merged
merged 4 commits into from
Jun 10, 2024

Conversation

nives-vukovic
Copy link
Collaborator

No description provided.

@nives-vukovic
Copy link
Collaborator Author

Explanation can be found in the link: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
-fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types which would overflow the destination. Because the range of representable values for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are conversions from floating point to integer types.

Copy link

codecov bot commented May 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.92%. Comparing base (30cab64) to head (4572474).
Report is 158 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3071   +/-   ##
========================================
  Coverage    91.92%   91.92%           
========================================
  Files          489      489           
  Lines        19275    19278    +3     
========================================
+ Hits         17719    17722    +3     
  Misses        1556     1556           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented May 10, 2024

Test Batch Rate new
457247
Rate old
30cab6
Diff Compare
torchvision-resnet50 64 1,713.75 1,713.85 -0.01%
torchvision-resnet50_fp16 64 3,809.82 3,810.00 -0.00%
torchvision-densenet121 32 1,452.84 1,453.61 -0.05%
torchvision-densenet121_fp16 32 2,436.81 2,437.45 -0.03%
torchvision-inceptionv3 32 883.48 883.14 0.04%
torchvision-inceptionv3_fp16 32 1,412.60 1,415.16 -0.18%
cadene-inceptionv4 16 407.14 407.50 -0.09%
cadene-resnext64x4 16 413.67 413.59 0.02%
slim-mobilenet 64 3,823.06 3,820.83 0.06%
slim-nasnetalarge 64 97.00 97.03 -0.04%
slim-resnet50v2 64 1,651.45 1,651.24 0.01%
bert-mrpc-onnx 8 591.46 590.73 0.12%
bert-mrpc-tf 1 288.99 289.90 -0.31%
pytorch-examples-wlang-gru 1 333.00 351.58 -5.28% 🔴
pytorch-examples-wlang-lstm 1 295.34 299.62 -1.43%
torchvision-resnet50_1 1 451.64 455.92 -0.94%
cadene-dpn92_1 1 244.62 244.63 -0.00%
cadene-resnext101_1 1 189.05 187.95 0.59%
onnx-taau-downsample 1 204.07 204.11 -0.02%
dlrm-criteoterabyte 1 22.28 22.30 -0.07%
dlrm-criteoterabyte_fp16 1 41.63 41.61 0.05%
agentmodel 1 6,119.61 6,092.92 0.44%
unet_fp16 2 33.73 33.74 -0.05%
resnet50v1_fp16 1 560.36 570.97 -1.86%
resnet50v1_int8 1 463.11 463.87 -0.16%
bert_base_cased_fp16 64 620.83 620.81 0.00%
bert_large_uncased_fp16 32 193.79 193.83 -0.02%
bert_large_fp16 1 103.89 103.97 -0.08%
distilgpt2_fp16 16 1,186.27 1,189.15 -0.24%
yolov5s 1 298.12 298.14 -0.01%
tinyllama 1 23.33 23.33 0.01%
vicuna-fastchat 1 133.82 133.18 0.49%
whisper-tiny-encoder 1 241.43 240.81 0.26%
whisper-tiny-decoder 1 245.67 245.89 -0.09%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented May 10, 2024


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

@nives-vukovic nives-vukovic requested a review from umangyadav May 13, 2024 12:57
@nives-vukovic nives-vukovic marked this pull request as ready for review May 13, 2024 15:44
@nives-vukovic nives-vukovic requested a review from causten as a code owner May 13, 2024 15:44
@causten causten requested a review from pfultz2 May 14, 2024 13:36
Jenkinsfile Outdated
@@ -165,7 +165,8 @@ rocmtest clang_debug: rocmnode('mi100+') { cmake_build ->
}, clang_asan: rocmnode('nogpu') { cmake_build ->
stage('Clang ASAN') {
def sanitizers = "undefined,address"
def debug_flags = "-g -O2 -fno-omit-frame-pointer -fsanitize=${sanitizers} -fno-sanitize-recover=${sanitizers}"
def sanitizers_disabled = "float-cast-overflow"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it this way means we lose asan coverage of float-cast-overflow on the entire code base. Is there a way to target just the function/file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with chris.

@nives-vukovic can you try disabling sanitizer just on the convert op's compute function and see if that works ?
https://clang.llvm.org/docs/AddressSanitizer.html#disabling-instrumentation-with-attribute-no-sanitize-address

Copy link
Collaborator Author

@nives-vukovic nives-vukovic May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is reported in AMDMIGraphX/src/include/migraphx/shape.hpp, and when I add __attribute__((no_sanitize("float-cast-overflow"))) above:

 type operator()(U u) const
        {
            return type(u);
        }

in 'as' struct, the issue is not reported on our system. However, this is a common function used in many places so I don't know if this would be a satisfactory solution for you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont not think it is a good idea to disable sanitizer on type operator()(U u) either.

We can conditionally handle floating point to integer conversion inside the convert operator itself.

  shape::visit(type, [&](auto as) {
                // clamping value between target_type's max and min doesn't work for NaNs,
                if(std::isnan(static_cast<double>(x)))
                {
                    y = as.nan();
                }
---------------------> Here 
               // if "type" is integer and "x" has floating point then, first do the clamping and then do the conversion. 
------------------------
                else
                {
                    // clamp overflowing/underflowing values to min()/max() instead of +/-infinity
                    // during downcasting
                    y = std::min(std::max(as(x), as.min()), as.max());
                }
            });

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this works inside the convert.hpp's apply() function.

  else if(shape::is_integral(type) and std::is_floating_point_v<decltype(x)>)
                {
                    // for the floating point to integer conversion, clamp first and then convert to
                    // avoid undefined behaviour
                    y = as(std::min(std::max(static_cast<double>(x), static_cast<double>(as.min())),
                                    static_cast<double>(as.max())));
                }

Copy link
Contributor

@lakhinderwalia lakhinderwalia May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was compiled as : g++ -fsanitize=address

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was compiled as : g++ -fsanitize=address

Try adding -fsanitize=undefined as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Yes that option does highlight the issue. Thanks.)

One answer is: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#disabling-instrumentation-with-attribute-no-sanitize-undefined

It has been tried but that would disable sanitizer on entire type cast function in migraphx. it is not desired. Therefore need to handle it inside the convert.hpp itself

Copy link
Contributor

@lakhinderwalia lakhinderwalia May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks.
Maybe try, on a test_case basis, passing on an environment flag/variable:
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#runtime-suppressions

@nives-vukovic nives-vukovic changed the title Fix Clang ASAN issue by disabling float-cast-overflow check in sanitizer Fix Clang ASAN issue by handling float to integer overflow in covert operator May 15, 2024
@umangyadav umangyadav self-requested a review May 16, 2024 14:27
@lakhinderwalia lakhinderwalia changed the title Fix Clang ASAN issue by handling float to integer overflow in covert operator Fix Clang ASAN issue by handling float to integer overflow in convert operator May 21, 2024
@causten causten merged commit 5fcf86e into develop Jun 10, 2024
46 of 47 checks passed
@causten causten deleted the clang_asan_uint8_fix branch June 10, 2024 13:35
lajagapp pushed a commit to lajagapp/AMDMIGraphX that referenced this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants