Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mod op compile issue on GPU #3086

Merged
merged 3 commits into from
May 27, 2024
Merged

Fix mod op compile issue on GPU #3086

merged 3 commits into from
May 27, 2024

Conversation

gyulaz-htec
Copy link
Collaborator

Lifting the failing int8, uint8 and int32 data types to float to workaround the problem described in #2268 (comment)

@gyulaz-htec gyulaz-htec requested a review from causten as a code owner May 15, 2024 12:21
@gyulaz-htec gyulaz-htec changed the title Fix mod compile issue on GPU (#2268) Fix mod op compile issue on GPU (#2268) May 15, 2024
@gyulaz-htec gyulaz-htec changed the title Fix mod op compile issue on GPU (#2268) Fix mod op compile issue on GPU May 15, 2024
@migraphx-bot
Copy link
Collaborator

migraphx-bot commented May 15, 2024

Test Batch Rate new
a27482
Rate old
c2167f
Diff Compare
torchvision-resnet50 64 1,714.12 1,714.50 -0.02%
torchvision-resnet50_fp16 64 3,810.36 3,812.07 -0.04%
torchvision-densenet121 32 1,456.56 1,453.98 0.18%
torchvision-densenet121_fp16 32 2,430.13 2,436.13 -0.25%
torchvision-inceptionv3 32 883.76 883.63 0.02%
torchvision-inceptionv3_fp16 32 1,382.20 1,417.34 -2.48%
cadene-inceptionv4 16 408.08 407.89 0.05%
cadene-resnext64x4 16 413.72 414.01 -0.07%
slim-mobilenet 64 3,822.90 3,825.99 -0.08%
slim-nasnetalarge 64 97.05 97.09 -0.03%
slim-resnet50v2 64 1,651.52 1,651.41 0.01%
bert-mrpc-onnx 8 589.51 590.42 -0.15%
bert-mrpc-tf 1 288.82 280.89 2.82%
pytorch-examples-wlang-gru 1 330.05 333.91 -1.16%
pytorch-examples-wlang-lstm 1 264.58 305.41 -13.37% 🔴
torchvision-resnet50_1 1 434.55 452.26 -3.91% 🔴
cadene-dpn92_1 1 228.70 244.50 -6.46% 🔴
cadene-resnext101_1 1 189.05 188.04 0.54%
onnx-taau-downsample 1 204.11 204.42 -0.15%
dlrm-criteoterabyte 1 21.85 22.30 -2.01%
dlrm-criteoterabyte_fp16 1 40.67 41.62 -2.29%
agentmodel 1 7,250.47 5,802.65 24.95% 🔆
unet_fp16 2 33.74 34.27 -1.54%
resnet50v1_fp16 1 569.50 553.42 2.91%
resnet50v1_int8 1 464.01 463.00 0.22%
bert_base_cased_fp16 64 620.80 623.70 -0.46%
bert_large_uncased_fp16 32 193.81 194.38 -0.29%
bert_large_fp16 1 103.96 104.06 -0.09%
distilgpt2_fp16 16 1,188.43 1,192.82 -0.37%
yolov5s 1 297.35 298.59 -0.41%
tinyllama 1 23.34 23.34 -0.00%
vicuna-fastchat 1 132.18 133.58 -1.06%
whisper-tiny-encoder 1 240.92 240.98 -0.03%
whisper-tiny-decoder 1 245.83 245.92 -0.04%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented May 15, 2024


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.82%. Comparing base (bceef13) to head (a274820).
Report is 158 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3086      +/-   ##
===========================================
- Coverage    91.82%   91.82%   -0.01%     
===========================================
  Files          486      486              
  Lines        18991    18990       -1     
===========================================
- Hits         17438    17437       -1     
  Misses        1553     1553              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pfultz2
Copy link
Collaborator

pfultz2 commented May 15, 2024

We should add a function to do the mod on the GPU correctly instead of falling back to float. Maybe something like:

template <class T, class U>
constexpr auto mod(T a, U b)
{
    if constexpr(is_integral<T>{} and is_integral<U>{})
        return a % b;
    return vec_transform(a, b)([](auto x0, auto x1) { 
        return fmod(remainder(x0, x1) + x1, x1); 
    });
}

@causten causten requested review from CharlieL7 and pfultz2 and removed request for CharlieL7 May 17, 2024 13:19
@gyulaz-htec
Copy link
Collaborator Author

gyulaz-htec commented May 21, 2024

@umangyadav @pfultz2 I've updated the code according to what @pfultz2 mentioned.
I had to go with ((a % b) + b) % b instead of a % b to match the rounding mode of modulos to what onnx expects, because rounding for modulus in c++ is implementation dependant.

@gyulaz-htec gyulaz-htec requested a review from umangyadav May 21, 2024 14:54
@causten causten merged commit ca65129 into develop May 27, 2024
47 checks passed
@causten causten deleted the mod_op_fix branch May 27, 2024 19:01
lajagapp pushed a commit to lajagapp/AMDMIGraphX that referenced this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants