FP8 QuantDot operation #2506

umangyadav · 2023-12-02T21:34:40Z

this version of Dot/GEMM is different in the sense that, it uses FP8 dtype for the input args but computes fp32 output.

need to disable two of the verify test for the CPU backend because they are failing. Failure is because of lossy cast from (Float->fp8->float) inside the "ref" implementation while CPU backend optimizes out float -> fp8 -> float converts to no-op. Therefore results are not matching for "Ref" and "CPU".

Depends on #2473

umangyadav · 2023-12-07T00:08:48Z

test/verify/main.cpp

+                         "quant_dot_3args_5<migraphx::fp8::fp8e4m3fnuz, float>"});
+    rv.disable_test_for("gpu",
+                        {"test_conv_bn_add",
+                         // These passes on MI300 but fails on others, same issue as CPU.


Will have to raise tolerance for FP8. I can't see other way around.

pfultz2 · 2023-12-08T20:14:51Z

src/targets/ref/lowering.cpp

+                        return static_cast<int32_t>(x);
+                    });
+                });
+            });


We shouldn't be converting the input here. gemm is already using a double accumulator. We could also convert the amat and bmat to double to avoid loss with the multiply as well, but this can be done inline in the function.

pfultz2 · 2023-12-08T20:16:10Z

src/targets/ref/lowering.cpp

+                    });
+                });
+            });
+            migemm(result, arg_0, arg_1, int32_t{1}, int32_t{0});


This should just call our gemm function instead of migemm.

ahsan-ca

Looks good besides Paul comments.

umangyadav and others added 30 commits November 9, 2023 23:23

changes for the FP8 ref implementation

df7f8a3

cppcheck fixes

9bc1828

move FNUZ as template parameter

155a2b1

Fix numeric limits

d9f11e3

Working FNUZ and FN

4e9d51f

use float equal

7639c28

add test for fp8e5m2

a6372c5

add test for fp8e5m2fnuz

439ea40

refactor add some comments

183db78

Review updates

ab653af

Fix tidy

8319e01

Fix test failure

9ee0418

fix isfinite

355e4f6

Merge remote-tracking branch 'origin/develop' into ref_fp8

ba471f4

fix test for neg inf

6aec703

fix warning

12aac37

add tests

6009232

Fix tests

03f7139

add stringstream tests

1e220c0

Remove clang diagnostics

a83e9dc

Merge remote-tracking branch 'origin/develop' into ref_fp8

dfb35a6

Remove NOLINTS

26956f1

Bugfixes and additional tests

269ce6d

Fix undoing

6414ee3

Handle underflow case separately to avoid sanitization errors

cd26ada

use std::min to avoid sanitization errors

1cf87ef

Merge branch 'develop' into ref_fp8

e7e5ba2

formatting

98a838f

use 31 for min value

61e4e1d

add note

a5c38eb

umangyadav mentioned this pull request Dec 6, 2023

Add eliminate_nested_converts pass and add unit-tests #2520

Closed

Base automatically changed from rocblas_fp8 to develop December 6, 2023 01:20

Merge branch 'develop' into quant_gemm_fp8

1ce916c

umangyadav requested a review from pfultz2 December 6, 2023 01:32

umangyadav and others added 2 commits December 5, 2023 20:32

Merge branch 'develop' into quant_gemm_fp8

af2ffd6

revert changes for nested converts

c52d1f6

umangyadav mentioned this pull request Dec 6, 2023

FP8 lossy downcast issue with "ref" implementation #2517

Open

Disable for the GPU as well.

9d751a6

umangyadav commented Dec 7, 2023

View reviewed changes

GCC v/s clang issue

d4a6dbd

umangyadav mentioned this pull request Dec 7, 2023

Enable simplify qdq to work with FP8 types and fix bug in pass #2528

Merged

umangyadav force-pushed the quant_gemm_fp8 branch from 4da1510 to d4a6dbd Compare December 7, 2023 03:04

umangyadav and others added 3 commits December 6, 2023 22:07

Merge branch 'develop' into quant_gemm_fp8

4064ece

Formatting

504ee6d

fix typo

75b1089

TedThemistokleous approved these changes Dec 7, 2023

View reviewed changes

Merge branch 'develop' into quant_gemm_fp8

e296c86

umangyadav mentioned this pull request Dec 7, 2023

Add --fp8 option to quantize models in FP8 inside migraphx-driver #2535

Merged

pfultz2 reviewed Dec 8, 2023

View reviewed changes

Merge remote-tracking branch 'origin/develop' into quant_gemm_fp8

1cab3b0

ahsan-ca approved these changes Dec 11, 2023

View reviewed changes

umangyadav added 5 commits December 11, 2023 19:27

use gemm() instead of migemm()

8c164a5

Remove unnecessary files

07eb7f3

Remove ref/gemm.cpp

386e9bb

Remove header

b0ac0bb

cleanup blaze requirmenet

85190c6

pfultz2 approved these changes Dec 12, 2023

View reviewed changes

causten merged commit aac4e95 into develop Dec 12, 2023
38 of 40 checks passed

umangyadav deleted the quant_gemm_fp8 branch December 12, 2023 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 QuantDot operation #2506

FP8 QuantDot operation #2506

umangyadav commented Dec 2, 2023 •

edited

Loading

umangyadav Dec 7, 2023

pfultz2 Dec 8, 2023 •

edited

Loading

umangyadav Dec 11, 2023

pfultz2 Dec 8, 2023

umangyadav Dec 11, 2023

ahsan-ca left a comment

FP8 QuantDot operation #2506

FP8 QuantDot operation #2506

Conversation

umangyadav commented Dec 2, 2023 • edited Loading

umangyadav Dec 7, 2023

Choose a reason for hiding this comment

pfultz2 Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

umangyadav Dec 11, 2023

Choose a reason for hiding this comment

pfultz2 Dec 8, 2023

Choose a reason for hiding this comment

umangyadav Dec 11, 2023

Choose a reason for hiding this comment

ahsan-ca left a comment

Choose a reason for hiding this comment

umangyadav commented Dec 2, 2023 •

edited

Loading

pfultz2 Dec 8, 2023 •

edited

Loading