Add `--fp8` option to quantize models in FP8 inside `migraphx-driver` #2535

umangyadav · 2023-12-07T23:54:16Z

Depends on #2506

Follows same scheme as Int8 quantization except it uses different scales compared to Int8.

shivadbhavsar · 2023-12-08T00:07:47Z

We should also expose quantize_fp8 to the APIs in the same way we have quantize_int8 and quantize_fp16

CharlieL7

LGTM

pfultz2 · 2023-12-08T15:35:38Z

src/include/migraphx/quantization.hpp

@@ -41,11 +41,19 @@ struct program;
 MIGRAPHX_EXPORT void quantize_fp16(program& prog,
                                   const std::vector<std::string>& ins_names = {"all"});

+MIGRAPHX_EXPORT void quantize_8bits(program& prog,


I dont think this needs to be declared in the header, its only used internally for quantize_int8 and quantize_fp8.

pfultz2 · 2023-12-08T15:41:36Z

src/quantization.cpp

        std::make_shared<std::vector<std::pair<float, float>>>();
    std::shared_ptr<std::vector<float>> max_abs_vals = std::make_shared<std::vector<float>>();

-    auto calc_quant_params = [int8_quant_params, max_abs_vals, &t](std::size_t ins_index,
-                                                                   std::vector<argument> args) {
+    float quantized_range  = (precision == shape::type_t::int8_type) ? 127.0 : 240.0;


Can you put this in another function? Ideally we should use a visit+numeric_limits to get the quantized range, but we can leave it as is for now.

NOt making change for now.

pfultz2 · 2023-12-08T15:43:39Z

src/quantization.cpp

-    auto calc_quant_params = [int8_quant_params, max_abs_vals, &t](std::size_t ins_index,
-                                                                   std::vector<argument> args) {
+    float quantized_range  = (precision == shape::type_t::int8_type) ? 127.0 : 240.0;
+    auto calc_quant_params = [quant_8bit_params, max_abs_vals, quantized_range, &t](


This can use [&] capture for the lambda.

pfultz2 · 2023-12-08T15:45:16Z

src/quantize_8bits.cpp

        {
-            auto zero_point  = m.add_literal(static_cast<int8_t>(param.second));
+            auto zero_point = m.add_literal(
+                migraphx::literal{migraphx::shape{precision}, {static_cast<int8_t>(param.second)}});


The cast is not needed anymore.

codecov-commenter · 2023-12-08T22:58:16Z

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (9d2003a) 91.50% compared to head (9692c57) 91.41%.
Report is 1 commits behind head on develop.

❗ Current head 9692c57 differs from pull request most recent head dc2263c. Consider uploading reports for the commit dc2263c to get more accurate results

Files	Patch %	Lines
src/quantization.cpp	51.72%	14 Missing ⚠️
src/quantize_8bits.cpp	80.00%	2 Missing ⚠️
src/include/migraphx/op/quant_dot.hpp	80.00%	1 Missing ⚠️
src/simplify_reshapes.cpp	50.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2535      +/-   ##
===========================================
- Coverage    91.50%   91.41%   -0.09%     
===========================================
  Files          453      452       -1     
  Lines        17183    17153      -30     
===========================================
- Hits         15723    15681      -42     
- Misses        1460     1472      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfultz2 · 2023-12-12T15:07:41Z

src/quantization.cpp

+        {
+            continue;
+        }
+        else if(not starts_with(ins->name(), "@"))


The else is redundant here.

umangyadav and others added 30 commits November 9, 2023 23:23

changes for the FP8 ref implementation

df7f8a3

cppcheck fixes

9bc1828

move FNUZ as template parameter

155a2b1

Fix numeric limits

d9f11e3

Working FNUZ and FN

4e9d51f

use float equal

7639c28

add test for fp8e5m2

a6372c5

add test for fp8e5m2fnuz

439ea40

refactor add some comments

183db78

Review updates

ab653af

Fix tidy

8319e01

Fix test failure

9ee0418

fix isfinite

355e4f6

Merge remote-tracking branch 'origin/develop' into ref_fp8

ba471f4

fix test for neg inf

6aec703

fix warning

12aac37

add tests

6009232

Fix tests

03f7139

add stringstream tests

1e220c0

Remove clang diagnostics

a83e9dc

Merge remote-tracking branch 'origin/develop' into ref_fp8

dfb35a6

Remove NOLINTS

26956f1

Bugfixes and additional tests

269ce6d

Fix undoing

6414ee3

Handle underflow case separately to avoid sanitization errors

cd26ada

use std::min to avoid sanitization errors

1cf87ef

Merge branch 'develop' into ref_fp8

e7e5ba2

formatting

98a838f

use 31 for min value

61e4e1d

add note

a5c38eb

umangyadav requested a review from shivadbhavsar December 7, 2023 23:56

CharlieL7 approved these changes Dec 8, 2023

View reviewed changes

umangyadav marked this pull request as ready for review December 8, 2023 14:58

pfultz2 reviewed Dec 8, 2023

View reviewed changes

umangyadav added 2 commits December 8, 2023 17:39

don't use raw fp comparisons

4b4388f

skip converts

56ba174

umangyadav added 11 commits December 8, 2023 23:03

Address Paul's comments

78b998c

remove typo

2f93ee2

Fix error

3494800

Merge remote-tracking branch 'origin/develop' into quant_gemm_fp8

1cab3b0

use gemm() instead of migemm()

8c164a5

Remove unnecessary files

07eb7f3

Remove ref/gemm.cpp

386e9bb

Remove header

b0ac0bb

cleanup blaze requirmenet

85190c6

Merge branch 'quant_gemm_fp8' into add_fp8_quantizer

f306a7f

bert running

02bbfb9

umangyadav requested a review from pfultz2 December 12, 2023 14:25

umangyadav changed the base branch from quant_gemm_fp8 to develop December 12, 2023 14:33

Merge branch 'develop' into add_fp8_quantizer

dc2263c

pfultz2 reviewed Dec 12, 2023

View reviewed changes

src/quantization.cpp

{

continue;

}

else if(not starts_with(ins->name(), "@"))

Copy link

Collaborator

pfultz2 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else is redundant here.

pfultz2 approved these changes Dec 12, 2023

View reviewed changes

ahsan-ca approved these changes Dec 12, 2023

View reviewed changes

causten merged commit db3c07f into develop Dec 12, 2023
28 of 36 checks passed

causten deleted the add_fp8_quantizer branch December 12, 2023 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--fp8` option to quantize models in FP8 inside `migraphx-driver` #2535

Add `--fp8` option to quantize models in FP8 inside `migraphx-driver` #2535

umangyadav commented Dec 7, 2023 •

edited

Loading

shivadbhavsar commented Dec 8, 2023

CharlieL7 left a comment

pfultz2 Dec 8, 2023

umangyadav Dec 8, 2023

pfultz2 Dec 8, 2023

umangyadav Dec 8, 2023

pfultz2 Dec 8, 2023

umangyadav Dec 8, 2023

pfultz2 Dec 8, 2023

umangyadav Dec 8, 2023

codecov-commenter commented Dec 8, 2023 •

edited

Loading

pfultz2 Dec 12, 2023

Add --fp8 option to quantize models in FP8 inside migraphx-driver #2535

Add --fp8 option to quantize models in FP8 inside migraphx-driver #2535

Conversation

umangyadav commented Dec 7, 2023 • edited Loading

shivadbhavsar commented Dec 8, 2023

CharlieL7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 8, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Add `--fp8` option to quantize models in FP8 inside `migraphx-driver` #2535

Add `--fp8` option to quantize models in FP8 inside `migraphx-driver` #2535

umangyadav commented Dec 7, 2023 •

edited

Loading

codecov-commenter commented Dec 8, 2023 •

edited

Loading