-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3684 +/- ##
===========================================
+ Coverage 92.21% 92.23% +0.02%
===========================================
Files 514 517 +3
Lines 21750 21819 +69
===========================================
+ Hits 20056 20124 +68
- Misses 1694 1695 +1 ☔ View full report in Codecov by Sentry. |
it needs a string input
Fixed the bug in the the pointwise compilation. __builtin_nan requires a string input that affects the most significant bits. |
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
* intrinsically. Conversion uses the same bit representation and adjusts scaling factors at the | ||
* dequantization. Using the same bit representation from fp8e4m3fn to fp8e4m3fnuz halves the | ||
* floating point representation. This pass should run before simplify_qdq so that the scales and | ||
* zero points calculated by simplify_qdq have the correct adjusted scaling factors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate this comment.
@@ -220,8 +220,8 @@ cpp_generator::function cpp_generator::generate_module(const module& m, | |||
if(x < 0) | |||
string_literal = "-__builtin_huge_val()"; | |||
} | |||
else if(std::isnan(static_cast<double>(x))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think static_cast
is needed for windows.
run_propagate_constant(m1); | ||
run_propagate_constant(m3); | ||
run_cse(m1); | ||
run_cse(m3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just combine all these passes into one function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Add some additional tests for some of the coverage warnings and that's about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard, you're already like 97% covered here. Looks good
simplify_qdq
so that the adjusted scales and zero points are propagated to after the quantized operator.test/fp8_ocp_to_nanoo.cpp
checks the pass works withsimplify_qdq
and does the expected operationstest/ref/fp8_ocp_to_nanoo.cpp
checks the pass produces the same result before and after__builtin_nan
incorrectly