FP8 GPU implementation #2455

umangyadav · 2023-11-18T00:36:59Z

Tested on MI300.

Added verify tests for Math ops, reduce ops and a couple/few pointwise ops.

Some FP8 tests are commented out because they run into compile errors. Need to fix and enable them

src/include/migraphx/bit_cast.hpp

src/targets/cpu/lowering.cpp

pfultz2 · 2023-11-30T21:03:37Z

src/targets/gpu/kernels/include/migraphx/kernels/ops.hpp

@@ -118,7 +118,7 @@ struct highest
    template <class T>
    constexpr operator T() const
    {
-        return numeric_max<vec_type<T>>();
+        return numeric_max<vec_type<T>, void>();


Why do you need void here?

to make it call numeric_max() always. I had to specialize numeric_max for fp8 to avoid redefinition by setting second template parmaeter to void. by adding void this would work for all the types.

I think you can add class Enable=void parameter to avoid this. The void parameter is an implementation detail that consumers shouldn't touch as its just there for enable_if.

Made changes.

pfultz2 · 2023-11-30T21:09:08Z

src/targets/gpu/kernels/include/migraphx/kernels/float8.hpp

+    constexpr T numeric_max<T, void>()           \
+    {                                            \
+        return fp8::numeric_limits<T>::max();    \
+    }                                            \


This should just be a templated function instead of using a macro:

template <class T, MIGRAPHX_REQUIRES(is_same<T, fp8::fp8e4m3fnuz>{} or is_same<T, fp8::fp8e5m2fnuz>{} or is_same<T, fp8::fp8e4m3fn>{} or is_same<T, fp8::fp8e5m2>{})> constexpr T numeric_max<T>() { return fp8::numeric_limits<T>::max(); }

Doesn't work it, It would be considered as redefinition of numeric_max(). I need to speicalize numeric_max template that is defined in type_traits.hpp.

Ok then you should be able to specialize it like this:

template <migraphx::fp8::f8_type T = migraphx::fp8::f8_type::fp8, bool FNUZ, class Enable=void> constexpr float8<T, FNUZ> numeric_max<float8<T, FNUZ>, Enable>() { return fp8::numeric_limits<float8<T, FNUZ>>::max(); }

That should work for all fp8 types. Plus the class Enable=void wont require us to pass void into the function.

Made changes.

pfultz2 · 2023-11-30T21:09:31Z

src/targets/gpu/kernels/include/migraphx/kernels/float8.hpp

+    constexpr T numeric_lowest<T>()              \
+    {                                            \
+        return fp8::numeric_limits<T>::lowest(); \
+    }


Is this overload necessary?

Yes i think it's used inside ops::lowest{}.

~~removed it. It it taken care of by numeric_lowest inside type_traits.~~

Ok, you can specialize this using float8<T, FNUZ> template.

float8<T, FNUZ> Partial specialization that way is not allowed. Runs into compilation errors.

Made changes.

CharlieL7

What's the usage of implicit_conversion?
Looks like a lot of the kernels were changed to be more explicit about the types.

src/targets/cpu/dnnl.cpp

test/simplify_algebra_test.cpp

Add throw message for DNNL Co-authored-by: Charlie Lin <[email protected]>

umangyadav · 2023-11-30T23:48:40Z

What's the usage of implicit_conversion?
Looks like a lot of the kernels were changed to be more explicit about the types.

From what i understand, some of the internal functions/kernels return output in different type compared to what is desired from JIT kernel's output type. and implicit conversion takes care of such types mismatches.

Float8 constructors are marked as explicit to avoid implict conversions and therefore had to call constructor explicitly in many places explicitly.

Other reason is to avoid warnings about narrowing conversion from Float to __Float16 or fp8.

…MIGraphX into gpu_fp8

src/targets/gpu/kernels/include/migraphx/kernels/float8.hpp

src/targets/gpu/kernels/include/migraphx/kernels/float8_impl.hpp

TedThemistokleous

LGTM. No more questions/concerns and looks like you've handled all comments

pfultz2

Looks good! We should probably look into having an env variable to enable fp8 emulation since its probably much slower on unsupported hardware then just using fp32 or fp16 directly.

TedThemistokleous · 2023-12-01T23:07:33Z

Looks good! We should probably look into having an env variable to enable fp8 emulation since its probably much slower on unsupported hardware then just using fp32 or fp16 directly.

like this idea too but lets get this in so we can hammer the others out. Good for a smaller final PR

umangyadav and others added 30 commits November 9, 2023 23:23

changes for the FP8 ref implementation

df7f8a3

cppcheck fixes

9bc1828

move FNUZ as template parameter

155a2b1

Fix numeric limits

d9f11e3

Working FNUZ and FN

4e9d51f

use float equal

7639c28

add test for fp8e5m2

a6372c5

add test for fp8e5m2fnuz

439ea40

refactor add some comments

183db78

Review updates

ab653af

Fix tidy

8319e01

Fix test failure

9ee0418

fix isfinite

355e4f6

Merge remote-tracking branch 'origin/develop' into ref_fp8

ba471f4

fix test for neg inf

6aec703

fix warning

12aac37

add tests

6009232

Fix tests

03f7139

add stringstream tests

1e220c0

Remove clang diagnostics

a83e9dc

Merge remote-tracking branch 'origin/develop' into ref_fp8

dfb35a6

Remove NOLINTS

26956f1

Bugfixes and additional tests

269ce6d

Fix undoing

6414ee3

Handle underflow case separately to avoid sanitization errors

cd26ada

use std::min to avoid sanitization errors

1cf87ef

Merge branch 'develop' into ref_fp8

e7e5ba2

formatting

98a838f

use 31 for min value

61e4e1d

add note

a5c38eb

umangyadav requested review from pfultz2 and TedThemistokleous November 30, 2023 16:24

pfultz2 reviewed Nov 30, 2023

View reviewed changes

src/include/migraphx/bit_cast.hpp Outdated Show resolved Hide resolved

pfultz2 reviewed Nov 30, 2023

View reviewed changes

src/targets/cpu/lowering.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Nov 30, 2023

View reviewed changes

umangyadav added 5 commits November 30, 2023 21:27

address comments

c923e41

remove numeric lowest

9ac18df

renaminng stuff, using angled bracket for headers

ac73b33

remove unnecessary line

d26a86f

add back lowest

ba45008

CharlieL7 reviewed Nov 30, 2023

View reviewed changes

src/targets/cpu/dnnl.cpp Outdated Show resolved Hide resolved

test/simplify_algebra_test.cpp Show resolved Hide resolved

Update src/targets/cpu/dnnl.cpp

b11b2fe

Add throw message for DNNL Co-authored-by: Charlie Lin <[email protected]>

umangyadav added 4 commits December 1, 2023 01:38

add another overload for numeric_max/lowest for the float8

52cb87c

fix bug

42a1686

Merge branch 'gpu_fp8' of https://github.com/ROCmSoftwarePlatform/AMD…

b936b0e

…MIGraphX into gpu_fp8

change comments

86c4484

umangyadav added the FP8 issues related to FP8 implemenation label Dec 1, 2023

TedThemistokleous reviewed Dec 1, 2023

View reviewed changes

src/targets/gpu/kernels/include/migraphx/kernels/float8.hpp Outdated Show resolved Hide resolved

dont' use abbreviation

8561d6d

TedThemistokleous reviewed Dec 1, 2023

View reviewed changes

src/targets/gpu/kernels/include/migraphx/kernels/float8_impl.hpp Show resolved Hide resolved

TedThemistokleous approved these changes Dec 1, 2023

View reviewed changes

CharlieL7 approved these changes Dec 1, 2023

View reviewed changes

pfultz2 approved these changes Dec 1, 2023

View reviewed changes

Merge branch 'develop' into gpu_fp8

dbda1a1

causten merged commit eafd55d into develop Dec 1, 2023
14 of 15 checks passed

causten deleted the gpu_fp8 branch December 1, 2023 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 GPU implementation #2455

FP8 GPU implementation #2455

umangyadav commented Nov 18, 2023

pfultz2 Nov 30, 2023

umangyadav Nov 30, 2023

pfultz2 Nov 30, 2023

umangyadav Dec 1, 2023

pfultz2 Nov 30, 2023

umangyadav Nov 30, 2023

pfultz2 Nov 30, 2023

umangyadav Dec 1, 2023

pfultz2 Nov 30, 2023

umangyadav Nov 30, 2023

umangyadav Nov 30, 2023 •

edited

Loading

pfultz2 Nov 30, 2023

umangyadav Dec 1, 2023

umangyadav Dec 1, 2023

CharlieL7 left a comment

umangyadav commented Nov 30, 2023 •

edited

Loading

TedThemistokleous left a comment

pfultz2 left a comment

TedThemistokleous commented Dec 1, 2023

FP8 GPU implementation #2455

FP8 GPU implementation #2455

Conversation

umangyadav commented Nov 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

umangyadav Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlieL7 left a comment

Choose a reason for hiding this comment

umangyadav commented Nov 30, 2023 • edited Loading

TedThemistokleous left a comment

Choose a reason for hiding this comment

pfultz2 left a comment

Choose a reason for hiding this comment

TedThemistokleous commented Dec 1, 2023

umangyadav Nov 30, 2023 •

edited

Loading

umangyadav commented Nov 30, 2023 •

edited

Loading