-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP8 rocblas gemm support #2473
Merged
+490
−312
Merged
Changes from 95 commits
Commits
Show all changes
97 commits
Select commit
Hold shift + click to select a range
df7f8a3
changes for the FP8 ref implementation
umangyadav 9bc1828
cppcheck fixes
umangyadav 155a2b1
move FNUZ as template parameter
umangyadav d9f11e3
Fix numeric limits
umangyadav 4e9d51f
Working FNUZ and FN
umangyadav 7639c28
use float equal
umangyadav a6372c5
add test for fp8e5m2
umangyadav 439ea40
add test for fp8e5m2fnuz
umangyadav 183db78
refactor add some comments
umangyadav ab653af
Review updates
umangyadav 8319e01
Fix tidy
umangyadav 9ee0418
Fix test failure
umangyadav 355e4f6
fix isfinite
umangyadav ba471f4
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav 6aec703
fix test for neg inf
umangyadav 12aac37
fix warning
umangyadav 6009232
add tests
umangyadav 03f7139
Fix tests
umangyadav 1e220c0
add stringstream tests
umangyadav a83e9dc
Remove clang diagnostics
umangyadav dfb35a6
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav 26956f1
Remove NOLINTS
umangyadav 269ce6d
Bugfixes and additional tests
umangyadav 6414ee3
Fix undoing
umangyadav cd26ada
Handle underflow case separately to avoid sanitization errors
umangyadav 1cf87ef
use std::min to avoid sanitization errors
umangyadav e7e5ba2
Merge branch 'develop' into ref_fp8
umangyadav 98a838f
formatting
umangyadav 61e4e1d
use 31 for min value
umangyadav a5c38eb
add note
umangyadav 61775ea
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav 3806427
Merge branch 'develop' into ref_fp8
umangyadav 017d67e
add some more comments
umangyadav 9e6d866
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav a9dd42f
port gpu changes
umangyadav d7339e8
use bit cast
umangyadav 6094234
Make FNUZ template param and add numeric limits
umangyadav 78ec77e
only compile for device
umangyadav 3411649
remove non-JIT related code
umangyadav d2c25a0
Remove FP8_Lowest/Max
umangyadav 5da68df
remove using for dtypes
umangyadav b36f72d
Update float8_impl
umangyadav 85ba819
constructor from float works with constexpr
umangyadav aed1922
Remove unnecessary pragmas
umangyadav f975c63
Remove clang diagnostics
umangyadav 32033d8
Add back floatequal
umangyadav e88d46a
disable DPP For FP8
umangyadav 3ae93ca
Merge remote-tracking branch 'origin/develop' into gpu_fp8
umangyadav 60dd1f4
formatting
umangyadav ef425d0
revert unwanted changes
umangyadav 76f0318
Merge branch 'gpu_fp8' of https://github.com/ROCmSoftwarePlatform/AMD…
umangyadav bd0ae5f
add some more tests
umangyadav 91cc9c7
Add math and reduce tests
umangyadav e2b0c40
Fix tidy and other errors
umangyadav 9f50051
fixes
umangyadav 249464c
add nolint
umangyadav 1be9587
tidy fix
umangyadav 13403ab
roialign, softmax, pow, acosh, atanh,pad tests are enabled now
umangyadav f550f81
add layernorm, remove constexpr for 1/r
umangyadav 7e3444c
tidy fixes
umangyadav 6155c78
use __builtin_is_constant_evaluated
umangyadav 13ef414
add test for rsqrt and remove old-styple-cast
umangyadav 8660572
add comment about c++20 extensions
umangyadav 6fbd997
Remove old cast
umangyadav 2acd265
Remove DPP
umangyadav 836e201
Remove MIN max overloads
umangyadav f9542d5
Put numeric_max and numeeric lowest into float8
umangyadav 480288f
use void for highest to match template candidates
umangyadav a6c5772
add float8 for tensorview
umangyadav 3aa465f
compiles all right
umangyadav 037205c
Works now
umangyadav 87548b5
add ifdef to compile
umangyadav d473b80
add tests and fix cmake
umangyadav 4604f2e
add tests
umangyadav ad9c25e
add eliminate_fp8 pass
umangyadav 8734ffa
remove convert from lowering
umangyadav f014fb9
Fix eliminate_fp8 pass
umangyadav 83ce487
Move pass before optimize module
umangyadav 9a9e964
formatting
umangyadav c40a39c
fix cppcheck
umangyadav c4cee34
Merge branch 'develop' into rocblas_fp8
umangyadav f155b0e
merge changes
umangyadav 38218ed
few changes
umangyadav 379692f
few more cosmetic changes
umangyadav 381b2d9
add half tests
umangyadav 5423577
use updated eliminate_fp8 pass
umangyadav 402c66a
use eliminate_data_type pass instead of eliminate_fp8 pass
umangyadav 8738f3b
Merge branch 'develop' into rocblas_fp8
umangyadav 4ca90ec
remove older files
umangyadav b099a7d
remove header
umangyadav 7d6e6ad
fix typo
umangyadav cf91c2b
add changes for the eliminate_data_type pass
umangyadav 82f9847
add comments
umangyadav a9db2bf
fix typo
umangyadav aeaac20
remove else
umangyadav a196e90
disable tests that uses CK
umangyadav 7e80f62
formatting
umangyadav File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we set
compute_type
torocblas_compute_type_f32
in the constructor for fp8? Then we dont need to remove this from the common args.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rocblas uses different type for the
ex3
API v/s regular API.For the
ex3
API it is of typerocblas_computetype
and for the regular API it isrocblas_datatype
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its just an enum so we can store it as an integer in a class and then add conversion operators to convert to the correct type when invoking:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.