Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 QuantDot operation #2506

Merged
merged 121 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
df7f8a3
changes for the FP8 ref implementation
umangyadav Nov 9, 2023
9bc1828
cppcheck fixes
umangyadav Nov 9, 2023
155a2b1
move FNUZ as template parameter
umangyadav Nov 10, 2023
d9f11e3
Fix numeric limits
umangyadav Nov 10, 2023
4e9d51f
Working FNUZ and FN
umangyadav Nov 10, 2023
7639c28
use float equal
umangyadav Nov 10, 2023
a6372c5
add test for fp8e5m2
umangyadav Nov 10, 2023
439ea40
add test for fp8e5m2fnuz
umangyadav Nov 10, 2023
183db78
refactor add some comments
umangyadav Nov 10, 2023
ab653af
Review updates
umangyadav Nov 13, 2023
8319e01
Fix tidy
umangyadav Nov 14, 2023
9ee0418
Fix test failure
umangyadav Nov 14, 2023
355e4f6
fix isfinite
umangyadav Nov 14, 2023
ba471f4
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav Nov 14, 2023
6aec703
fix test for neg inf
umangyadav Nov 14, 2023
12aac37
fix warning
umangyadav Nov 14, 2023
6009232
add tests
umangyadav Nov 14, 2023
03f7139
Fix tests
umangyadav Nov 14, 2023
1e220c0
add stringstream tests
umangyadav Nov 14, 2023
a83e9dc
Remove clang diagnostics
umangyadav Nov 15, 2023
dfb35a6
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav Nov 15, 2023
26956f1
Remove NOLINTS
umangyadav Nov 15, 2023
269ce6d
Bugfixes and additional tests
umangyadav Nov 16, 2023
6414ee3
Fix undoing
umangyadav Nov 16, 2023
cd26ada
Handle underflow case separately to avoid sanitization errors
umangyadav Nov 16, 2023
1cf87ef
use std::min to avoid sanitization errors
umangyadav Nov 16, 2023
e7e5ba2
Merge branch 'develop' into ref_fp8
umangyadav Nov 16, 2023
98a838f
formatting
umangyadav Nov 16, 2023
61e4e1d
use 31 for min value
umangyadav Nov 16, 2023
a5c38eb
add note
umangyadav Nov 16, 2023
61775ea
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav Nov 16, 2023
3806427
Merge branch 'develop' into ref_fp8
umangyadav Nov 16, 2023
017d67e
add some more comments
umangyadav Nov 17, 2023
9e6d866
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav Nov 17, 2023
a9dd42f
port gpu changes
umangyadav Nov 17, 2023
d7339e8
use bit cast
umangyadav Nov 17, 2023
6094234
Make FNUZ template param and add numeric limits
umangyadav Nov 17, 2023
78ec77e
only compile for device
umangyadav Nov 17, 2023
3411649
remove non-JIT related code
umangyadav Nov 17, 2023
d2c25a0
Remove FP8_Lowest/Max
umangyadav Nov 17, 2023
5da68df
remove using for dtypes
umangyadav Nov 17, 2023
b36f72d
Update float8_impl
umangyadav Nov 17, 2023
85ba819
constructor from float works with constexpr
umangyadav Nov 17, 2023
aed1922
Remove unnecessary pragmas
umangyadav Nov 17, 2023
f975c63
Remove clang diagnostics
umangyadav Nov 17, 2023
32033d8
Add back floatequal
umangyadav Nov 17, 2023
e88d46a
disable DPP For FP8
umangyadav Nov 17, 2023
3ae93ca
Merge remote-tracking branch 'origin/develop' into gpu_fp8
umangyadav Nov 17, 2023
60dd1f4
formatting
umangyadav Nov 17, 2023
ef425d0
revert unwanted changes
umangyadav Nov 17, 2023
76f0318
Merge branch 'gpu_fp8' of https://github.com/ROCmSoftwarePlatform/AMD…
umangyadav Nov 17, 2023
bd0ae5f
add some more tests
umangyadav Nov 17, 2023
91cc9c7
Add math and reduce tests
umangyadav Nov 18, 2023
e2b0c40
Fix tidy and other errors
umangyadav Nov 18, 2023
9f50051
fixes
umangyadav Nov 18, 2023
249464c
add nolint
umangyadav Nov 18, 2023
1be9587
tidy fix
umangyadav Nov 18, 2023
13403ab
roialign, softmax, pow, acosh, atanh,pad tests are enabled now
umangyadav Nov 20, 2023
f550f81
add layernorm, remove constexpr for 1/r
umangyadav Nov 20, 2023
7e3444c
tidy fixes
umangyadav Nov 20, 2023
6155c78
use __builtin_is_constant_evaluated
umangyadav Nov 20, 2023
13ef414
add test for rsqrt and remove old-styple-cast
umangyadav Nov 20, 2023
8660572
add comment about c++20 extensions
umangyadav Nov 20, 2023
6fbd997
Remove old cast
umangyadav Nov 20, 2023
2acd265
Remove DPP
umangyadav Nov 20, 2023
836e201
Remove MIN max overloads
umangyadav Nov 20, 2023
f9542d5
Put numeric_max and numeeric lowest into float8
umangyadav Nov 20, 2023
480288f
use void for highest to match template candidates
umangyadav Nov 21, 2023
a6c5772
add float8 for tensorview
umangyadav Nov 21, 2023
3aa465f
compiles all right
umangyadav Nov 26, 2023
037205c
Works now
umangyadav Nov 26, 2023
87548b5
add ifdef to compile
umangyadav Nov 26, 2023
d473b80
add tests and fix cmake
umangyadav Nov 26, 2023
4604f2e
add tests
umangyadav Nov 26, 2023
ad9c25e
add eliminate_fp8 pass
umangyadav Nov 26, 2023
8734ffa
remove convert from lowering
umangyadav Nov 26, 2023
f014fb9
Fix eliminate_fp8 pass
umangyadav Nov 26, 2023
83ce487
Move pass before optimize module
umangyadav Nov 26, 2023
9a9e964
formatting
umangyadav Nov 26, 2023
c40a39c
fix cppcheck
umangyadav Nov 26, 2023
c4cee34
Merge branch 'develop' into rocblas_fp8
umangyadav Dec 1, 2023
f155b0e
merge changes
umangyadav Dec 1, 2023
38218ed
few changes
umangyadav Dec 1, 2023
379692f
few more cosmetic changes
umangyadav Dec 1, 2023
381b2d9
add half tests
umangyadav Dec 2, 2023
5423577
use updated eliminate_fp8 pass
umangyadav Dec 4, 2023
402c66a
use eliminate_data_type pass instead of eliminate_fp8 pass
umangyadav Dec 5, 2023
8738f3b
Merge branch 'develop' into rocblas_fp8
umangyadav Dec 5, 2023
4ca90ec
remove older files
umangyadav Dec 5, 2023
b099a7d
remove header
umangyadav Dec 5, 2023
7d6e6ad
fix typo
umangyadav Dec 5, 2023
cf91c2b
add changes for the eliminate_data_type pass
umangyadav Dec 5, 2023
82f9847
add comments
umangyadav Dec 5, 2023
a9db2bf
fix typo
umangyadav Dec 5, 2023
aeaac20
remove else
umangyadav Dec 5, 2023
a196e90
disable tests that uses CK
umangyadav Dec 5, 2023
7e80f62
formatting
umangyadav Dec 5, 2023
d5fa82d
add quant_dot support for fp8
umangyadav Dec 2, 2023
b50d8b2
disable quant_dot test for CPU backend, it requires removing downcast…
umangyadav Dec 2, 2023
7772a42
formatting
umangyadav Dec 3, 2023
5d240fa
quant dot
umangyadav Dec 5, 2023
39bf5dc
add quant_dot as unsupported fp8 op
umangyadav Dec 5, 2023
9e2946a
use eliminate_data_type to avoid lossy downcast
umangyadav Dec 5, 2023
351b43b
remove unnecessary changes
umangyadav Dec 5, 2023
c60ea0a
remove formatting change
umangyadav Dec 5, 2023
4315a99
add DCE
umangyadav Dec 5, 2023
1ce916c
Merge branch 'develop' into quant_gemm_fp8
umangyadav Dec 6, 2023
af2ffd6
Merge branch 'develop' into quant_gemm_fp8
umangyadav Dec 6, 2023
c52d1f6
revert changes for nested converts
umangyadav Dec 6, 2023
9d751a6
Disable for the GPU as well.
umangyadav Dec 7, 2023
d4a6dbd
GCC v/s clang issue
umangyadav Dec 7, 2023
4064ece
Merge branch 'develop' into quant_gemm_fp8
umangyadav Dec 7, 2023
504ee6d
Formatting
umangyadav Dec 7, 2023
75b1089
fix typo
umangyadav Dec 7, 2023
e296c86
Merge branch 'develop' into quant_gemm_fp8
umangyadav Dec 7, 2023
1cab3b0
Merge remote-tracking branch 'origin/develop' into quant_gemm_fp8
umangyadav Dec 11, 2023
8c164a5
use gemm() instead of migemm()
umangyadav Dec 11, 2023
07eb7f3
Remove unnecessary files
umangyadav Dec 11, 2023
386e9bb
Remove ref/gemm.cpp
umangyadav Dec 11, 2023
b0ac0bb
Remove header
umangyadav Dec 11, 2023
85190c6
cleanup blaze requirmenet
umangyadav Dec 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
#####################################################################################
google/[email protected] -DCMAKE_POSITION_INDEPENDENT_CODE=On -X subdir -Dprotobuf_BUILD_TESTS=Off
nlohmann/[email protected]
live-clones/[email protected] -X header -DHEADER_DIR=blaze -H sha256:d0ff011f47538285178908ea5f2cab46bb6a8f55b1edb6e03224a82dbc1a3212
ROCmSoftwarePlatform/[email protected]
pybind/pybind11@d159a563383d10c821ba7b2a71905d1207db6de4 --build
msgpack/[email protected] -DMSGPACK_BUILD_TESTS=Off
Expand Down
7 changes: 4 additions & 3 deletions src/include/migraphx/gemm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@
namespace migraphx {
inline namespace MIGRAPHX_INLINE_NS {

template <class T, class F>
void gemm(tensor_view<T> cmat, tensor_view<T> amat, tensor_view<T> bmat, F alpha, F beta)
template <class T, class U, class F>
void gemm(tensor_view<T> cmat, tensor_view<U> amat, tensor_view<U> bmat, F alpha, F beta)
{
std::size_t n_dims = cmat.get_shape().lens().size();
std::size_t dim_0 = n_dims - 2;
Expand All @@ -52,7 +52,8 @@ void gemm(tensor_view<T> cmat, tensor_view<T> amat, tensor_view<T> bmat, F alpha
double s = 0.0;
dfor(k)([&](auto kk) {
a_idx[dim_1] = b_idx[dim_0] = kk;
s += amat(a_idx.begin(), a_idx.end()) * bmat(b_idx.begin(), b_idx.end());
s += static_cast<double>(amat(a_idx.begin(), a_idx.end())) *
static_cast<double>(bmat(b_idx.begin(), b_idx.end()));
});
cmat(c_idx.begin(), c_idx.end()) = alpha * s + cmat(c_idx.begin(), c_idx.end()) * beta;
});
Expand Down
10 changes: 8 additions & 2 deletions src/include/migraphx/op/quant_dot.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,11 @@ struct quant_dot
const shape& a = inputs.at(0);
const shape& b = inputs.at(1);
auto t = a.type();
if(t != shape::int8_type)
std::set<migraphx::shape::type_t> suppported_types = {shape::int8_type,
shape::fp8e4m3fnuz_type};
if(not contains(suppported_types, t))
{
MIGRAPHX_THROW("QUANT_DOT: only support data type int8_t");
MIGRAPHX_THROW("QUANT_DOT: only support data type int8_t and fp8e4m3fnuz_type");
}

if(not std::all_of(
Expand All @@ -73,6 +75,10 @@ struct quant_dot

auto out_lens = a.lens();
out_lens[dim_1] = b.lens()[dim_1];
if(t == shape::fp8e4m3fnuz_type)
{
return {shape::float_type, out_lens};
TedThemistokleous marked this conversation as resolved.
Show resolved Hide resolved
} // else int8 gemm
return {shape::int32_type, out_lens};
}
};
Expand Down
5 changes: 5 additions & 0 deletions src/simplify_reshapes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,11 @@ struct find_nested_convert
auto x = ins->inputs().front();
auto input = x->inputs().front();

while(input->name() == "convert")
{
input = input->inputs().front();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test for this?

Also, if we add a eliminate_convert pass, we can probably remove it from here.

Copy link
Member Author

@umangyadav umangyadav Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can i keep this change for some other PR ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #2520


if(ins->get_shape() != input->get_shape())
return;

Expand Down
2 changes: 1 addition & 1 deletion src/targets/gpu/gemm_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ struct gemm_impl
ldd = is_3inputs ? input_shapes[3].strides()[dim_0] : ldc;

arg_type = get_type(input_shapes[0].type());
output_type = arg_type;
output_type = get_type(input_shapes[2].type());
if(output_type == rocblas_datatype_i8_r)
{
output_type = rocblas_datatype_i32_r;
Expand Down
2 changes: 1 addition & 1 deletion src/targets/gpu/include/migraphx/gpu/gemm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ struct rocblas_gemm
argument
compute(context& ctx, const shape& output_shape, const std::vector<argument>& args) const
{
if(this->name() == "gpu::gemm")
if(this->name() == "gpu::gemm" or output_shape.type() == migraphx::shape::float_type)
{
gemm_compute(ctx, output_shape, args, alpha, beta, compute_fp32, solution_idx);
}
Expand Down
1 change: 1 addition & 0 deletions src/targets/gpu/target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti
if(not gpu::rocblas_fp8_available())
{
unsupported_fp8_ops.insert("dot");
unsupported_fp8_ops.insert("quant_dot");
}
// MIOpen doesn't have support for fp8 pooling yet.
unsupported_fp8_ops.insert("pooling");
Expand Down
5 changes: 0 additions & 5 deletions src/targets/ref/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,13 @@
add_library(migraphx_ref
target.cpp
lowering.cpp
gemm.cpp
)
set_target_properties(migraphx_ref PROPERTIES EXPORT_NAME ref)
rocm_set_soversion(migraphx_ref ${MIGRAPHX_SO_VERSION})

find_path(BLAZE_INCLUDE blaze/Blaze.h)

rocm_clang_tidy_check(migraphx_ref)
target_link_libraries(migraphx_ref PRIVATE Threads::Threads)
target_link_libraries(migraphx_ref PUBLIC migraphx)
target_include_directories(migraphx_ref SYSTEM PRIVATE ${BLAZE_INCLUDE})
target_compile_definitions(migraphx_ref PRIVATE -DBLAZE_USE_CPP_THREADS)

migraphx_generate_export_header(migraphx_ref)

Expand Down
157 changes: 0 additions & 157 deletions src/targets/ref/gemm.cpp

This file was deleted.

46 changes: 0 additions & 46 deletions src/targets/ref/include/migraphx/ref/gemm.hpp

This file was deleted.

23 changes: 6 additions & 17 deletions src/targets/ref/lowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@
#include <migraphx/iterator_for.hpp>
#include <migraphx/par_dfor.hpp>
#include <migraphx/clamp.hpp>
#include <migraphx/ref/gemm.hpp>
#include <migraphx/register_op.hpp>
#include <migraphx/make_op.hpp>
#include <migraphx/tune_axis.hpp>
Expand Down Expand Up @@ -283,8 +282,8 @@ struct ref_gemm
argument compute(context&, const dyn_output& dyn_out, std::vector<argument> args) const
{
argument result{dyn_out.computed_shape};
migemm(result, args[0], args[1], 1.0f, 0.0f);

visit_all(result, args[0], args[1])(
[&](auto cmat, auto amat, auto bmat) { gemm(cmat, amat, bmat, 1.0f, 0.0f); });
return result;
}
};
Expand All @@ -306,24 +305,14 @@ struct ref_quant_gemm
argument compute(context&, const shape& output_shape, std::vector<argument> args) const
{
argument result{output_shape};
// first, convert the args[0] and args[1] from int8_t to int32_t
argument arg_0{{shape::int32_type, {args.at(0).get_shape().lens()}}};
argument arg_1{{shape::int32_type, {args.at(1).get_shape().lens()}}};
arg_0.visit([&](auto output) {
args.at(0).visit(
[&](auto input) { std::copy(input.begin(), input.end(), output.begin()); });
});

arg_1.visit([&](auto output) {
args.at(1).visit(
[&](auto input) { std::copy(input.begin(), input.end(), output.begin()); });
result.visit([&](auto cmat) {
visit_all(args.at(0), args.at(1))(
[&](auto amat, auto bmat) { return gemm(cmat, amat, bmat, 1.0f, 0.0f); });
});

migemm(result, arg_0, arg_1, int32_t{1}, int32_t{0});

return result;
}
};

MIGRAPHX_REGISTER_OP(ref_gemm)

template <class Op>
Expand Down
Loading
Loading