FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684

CharlieL7 · 2024-12-05T01:44:10Z

NANOO is short for NAN On Overflow, the data type comes from this paper: https://arxiv.org/pdf/2206.02915
Implements the method written about in Convert OCP FP8 model to FNUZ model inside MIGraphX #2717
This pass must run before simplify_qdq so that the adjusted scales and zero points are propagated to after the quantized operator.
The test in test/fp8_ocp_to_nanoo.cpp checks the pass works with simplify_qdq and does the expected operations
The test in test/ref/fp8_ocp_to_nanoo.cpp checks the pass produces the same result before and after
I will make a separate PR that removes the gpu context changes to get the gfx number
Fixed the cpp_generator that was using __builtin_nan incorrectly

…st_op

…_fnuz

src/include/migraphx/fp8_ocp_to_nanoo.hpp

src/include/migraphx/qdq_helpers.hpp

codecov · 2024-12-11T00:36:45Z

Codecov Report

Attention: Patch coverage is 97.46835% with 2 lines in your changes missing coverage. Please review.

Project coverage is 92.23%. Comparing base (f56b1b4) to head (3c36b9b).
Report is 15 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/fp8_ocp_to_fnuz.cpp	98.50%	1 Missing ⚠️
src/simplify_qdq.cpp	75.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3684      +/-   ##
===========================================
+ Coverage    92.21%   92.23%   +0.02%     
===========================================
  Files          514      517       +3     
  Lines        21750    21819      +69     
===========================================
+ Hits         20056    20124      +68     
- Misses        1694     1695       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

it needs a string input

CharlieL7 · 2024-12-13T20:08:58Z

Fixed the bug in the the pointwise compilation. __builtin_nan requires a string input that affects the most significant bits.

…aphX into ocp_to_fnuz

migraphx-bot · 2024-12-14T02:17:44Z

Test	Batch	Rate new 3c36b9	Rate old 79a256	Diff	Compare
torchvision-resnet50	64	3,258.74	3,254.68	0.12%	✅
torchvision-resnet50_fp16	64	6,984.47	6,988.98	-0.06%	✅
torchvision-densenet121	32	2,434.85	2,435.20	-0.01%	✅
torchvision-densenet121_fp16	32	4,078.51	4,089.35	-0.27%	✅
torchvision-inceptionv3	32	1,628.47	1,629.15	-0.04%	✅
torchvision-inceptionv3_fp16	32	2,747.68	2,750.39	-0.10%	✅
cadene-inceptionv4	16	765.40	765.66	-0.03%	✅
cadene-resnext64x4	16	812.27	812.33	-0.01%	✅
slim-mobilenet	64	7,465.27	7,465.36	-0.00%	✅
slim-nasnetalarge	64	209.02	209.02	0.00%	✅
slim-resnet50v2	64	3,439.49	3,439.27	0.01%	✅
bert-mrpc-onnx	8	1,150.69	1,145.81	0.43%	✅
bert-mrpc-tf	1	503.08	466.39	7.87%	🔆
pytorch-examples-wlang-gru	1	410.04	421.94	-2.82%	✅
pytorch-examples-wlang-lstm	1	387.84	381.01	1.79%	✅
torchvision-resnet50_1	1	805.75	763.70	5.51%	🔆
cadene-dpn92_1	1	401.37	434.24	-7.57%	🔴
cadene-resnext101_1	1	382.46	383.59	-0.29%	✅
onnx-taau-downsample	1	346.38	346.01	0.11%	✅
dlrm-criteoterabyte	1	33.35	33.33	0.06%	✅
dlrm-criteoterabyte_fp16	1	52.76	52.73	0.05%	✅
agentmodel	1	8,215.78	8,229.23	-0.16%	✅
unet_fp16	2	58.87	58.93	-0.09%	✅
resnet50v1_fp16	1	975.05	1,025.60	-4.93%	🔴
resnet50v1_int8	1	1,027.54	1,052.08	-2.33%	✅
bert_base_cased_fp16	64	1,170.31	1,169.64	0.06%	✅
bert_large_uncased_fp16	32	363.18	363.34	-0.04%	✅
bert_large_fp16	1	198.76	198.80	-0.02%	✅
distilgpt2_fp16	16	2,199.98	2,201.30	-0.06%	✅
yolov5s	1	533.93	529.55	0.83%	✅
tinyllama	1	43.41	43.36	0.10%	✅
vicuna-fastchat	1	174.80	170.51	2.51%	✅
whisper-tiny-encoder	1	417.32	417.88	-0.13%	✅
whisper-tiny-decoder	1	433.93	425.43	2.00%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-12-14T02:17:45Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

TedThemistokleous · 2024-12-17T05:08:51Z

src/include/migraphx/fp8_ocp_to_fnuz.hpp

+ * intrinsically. Conversion uses the same bit representation and adjusts scaling factors at the
+ * dequantization. Using the same bit representation from fp8e4m3fn to fp8e4m3fnuz halves the
+ * floating point representation. This pass should run before simplify_qdq so that the scales and
+ * zero points calculated by simplify_qdq have the correct adjusted scaling factors


Appreciate this comment.

pfultz2 · 2024-12-17T21:49:07Z

src/cpp_generator.cpp

@@ -220,8 +220,8 @@ cpp_generator::function cpp_generator::generate_module(const module& m,
                        if(x < 0)
                            string_literal = "-__builtin_huge_val()";
                    }
-                    else if(std::isnan(static_cast<double>(x)))


I think static_cast is needed for windows.

pfultz2 · 2024-12-17T21:51:30Z

test/fp8_ocp_to_fnuz_test.cpp

+    run_propagate_constant(m1);
+    run_propagate_constant(m3);
+    run_cse(m1);
+    run_cse(m3);


Can you just combine all these passes into one function call?

TedThemistokleous

Makes sense. Add some additional tests for some of the coverage warnings and that's about it.

TedThemistokleous

Disregard, you're already like 97% covered here. Looks good

CharlieL7 added 20 commits October 3, 2024 11:02

Initial

70336db

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

b41c8b6

progress

bdebeb5

cleanup

a1fb21e

remove unneeded files

b8e2041

Fix bit_cast kernel

8366434

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

a15e5a4

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

be5d9a0

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into bit_cast_op

3e08ab2

Merge branch 'bit_cast_op' of github.com:ROCm/AMDMIGraphX into bit_ca…

7b40796

…st_op

progress

697d459

fix template for gpu bit_cast

4b6c8c1

Merge branch 'develop' into bit_cast_op

531150f

Merge branch 'bit_cast_op' of github.com:ROCm/AMDMIGraphX into ocp_to…

d53ac35

…_fnuz

first implementation

95a3cd7

progress

98d8760

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

7357367

Fixes and first test works

e3d84fc

formatting

dac07c2

Added ref tests

06b94b8

CharlieL7 added the FP8 issues related to FP8 implemenation label Dec 5, 2024

CharlieL7 self-assigned this Dec 5, 2024

CharlieL7 requested review from pfultz2 and shivadbhavsar December 9, 2024 16:30

Merge branch 'develop' of github.com:ROCm/AMDMIGraphX into ocp_to_fnuz

3e5d3a8

pfultz2 reviewed Dec 9, 2024

View reviewed changes

src/include/migraphx/fp8_ocp_to_nanoo.hpp Outdated Show resolved Hide resolved

pfultz2 reviewed Dec 9, 2024

View reviewed changes

src/include/migraphx/qdq_helpers.hpp Outdated Show resolved Hide resolved

pfultz2 reviewed Dec 9, 2024

View reviewed changes

src/include/migraphx/qdq_helpers.hpp Outdated Show resolved Hide resolved

CharlieL7 marked this pull request as ready for review December 10, 2024 20:01

CharlieL7 requested a review from causten as a code owner December 10, 2024 20:01

CharlieL7 changed the title ~~FP8 OCP to FP8 NANOO on hardware with only FP8 NANOO support~~ FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support Dec 10, 2024

Cleanup

df0202e

CharlieL7 added 2 commits December 11, 2024 09:15

add verify test

0a4d6bf

Fix bug with __builtin_nan(string)

c94c520

it needs a string input

CharlieL7 requested a review from pfultz2 December 13, 2024 20:09

Merge branch 'develop' into ocp_to_fnuz

d025e47

CharlieL7 requested a review from ahsan-ca December 13, 2024 20:09

CharlieL7 added 2 commits December 13, 2024 14:11

separate quantizable ops

0cddfbf

Merge branch 'ocp_to_fnuz' of github.com:ROCmSoftwarePlatform/AMDMIGr…

3c36b9b

…aphX into ocp_to_fnuz

CharlieL7 mentioned this pull request Dec 16, 2024

Driver quantize fp8 update #3715

Open

causten requested a review from TedThemistokleous December 16, 2024 20:36

TedThemistokleous reviewed Dec 17, 2024

View reviewed changes

pfultz2 reviewed Dec 17, 2024

View reviewed changes

TedThemistokleous requested changes Dec 20, 2024

View reviewed changes

TedThemistokleous approved these changes Dec 20, 2024

View reviewed changes

TedThemistokleous added the roadmap Tasks to finish for a release label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684

FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684

CharlieL7 commented Dec 5, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

CharlieL7 commented Dec 13, 2024

migraphx-bot commented Dec 14, 2024

migraphx-bot commented Dec 14, 2024

TedThemistokleous Dec 17, 2024

pfultz2 Dec 17, 2024

pfultz2 Dec 17, 2024

TedThemistokleous left a comment

TedThemistokleous left a comment •

edited

Loading

FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684

Are you sure you want to change the base?

FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684

Conversation

CharlieL7 commented Dec 5, 2024 • edited Loading

codecov bot commented Dec 11, 2024 • edited Loading

Codecov Report

CharlieL7 commented Dec 13, 2024

migraphx-bot commented Dec 14, 2024

migraphx-bot commented Dec 14, 2024

TedThemistokleous Dec 17, 2024

Choose a reason for hiding this comment

pfultz2 Dec 17, 2024

Choose a reason for hiding this comment

pfultz2 Dec 17, 2024

Choose a reason for hiding this comment

TedThemistokleous left a comment

Choose a reason for hiding this comment

TedThemistokleous left a comment • edited Loading

Choose a reason for hiding this comment

CharlieL7 commented Dec 5, 2024 •

edited

Loading

codecov bot commented Dec 11, 2024 •

edited

Loading

TedThemistokleous left a comment •

edited

Loading