Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FP8 rocblas gemm support #2473

Merged
merged 97 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
df7f8a3
changes for the FP8 ref implementation
umangyadav Nov 9, 2023
9bc1828
cppcheck fixes
umangyadav Nov 9, 2023
155a2b1
move FNUZ as template parameter
umangyadav Nov 10, 2023
d9f11e3
Fix numeric limits
umangyadav Nov 10, 2023
4e9d51f
Working FNUZ and FN
umangyadav Nov 10, 2023
7639c28
use float equal
umangyadav Nov 10, 2023
a6372c5
add test for fp8e5m2
umangyadav Nov 10, 2023
439ea40
add test for fp8e5m2fnuz
umangyadav Nov 10, 2023
183db78
refactor add some comments
umangyadav Nov 10, 2023
ab653af
Review updates
umangyadav Nov 13, 2023
8319e01
Fix tidy
umangyadav Nov 14, 2023
9ee0418
Fix test failure
umangyadav Nov 14, 2023
355e4f6
fix isfinite
umangyadav Nov 14, 2023
ba471f4
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav Nov 14, 2023
6aec703
fix test for neg inf
umangyadav Nov 14, 2023
12aac37
fix warning
umangyadav Nov 14, 2023
6009232
add tests
umangyadav Nov 14, 2023
03f7139
Fix tests
umangyadav Nov 14, 2023
1e220c0
add stringstream tests
umangyadav Nov 14, 2023
a83e9dc
Remove clang diagnostics
umangyadav Nov 15, 2023
dfb35a6
Merge remote-tracking branch 'origin/develop' into ref_fp8
umangyadav Nov 15, 2023
26956f1
Remove NOLINTS
umangyadav Nov 15, 2023
269ce6d
Bugfixes and additional tests
umangyadav Nov 16, 2023
6414ee3
Fix undoing
umangyadav Nov 16, 2023
cd26ada
Handle underflow case separately to avoid sanitization errors
umangyadav Nov 16, 2023
1cf87ef
use std::min to avoid sanitization errors
umangyadav Nov 16, 2023
e7e5ba2
Merge branch 'develop' into ref_fp8
umangyadav Nov 16, 2023
98a838f
formatting
umangyadav Nov 16, 2023
61e4e1d
use 31 for min value
umangyadav Nov 16, 2023
a5c38eb
add note
umangyadav Nov 16, 2023
61775ea
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav Nov 16, 2023
3806427
Merge branch 'develop' into ref_fp8
umangyadav Nov 16, 2023
017d67e
add some more comments
umangyadav Nov 17, 2023
9e6d866
Merge branch 'ref_fp8' of github.com:ROCmSoftwarePlatform/AMDMIGraphX…
umangyadav Nov 17, 2023
a9dd42f
port gpu changes
umangyadav Nov 17, 2023
d7339e8
use bit cast
umangyadav Nov 17, 2023
6094234
Make FNUZ template param and add numeric limits
umangyadav Nov 17, 2023
78ec77e
only compile for device
umangyadav Nov 17, 2023
3411649
remove non-JIT related code
umangyadav Nov 17, 2023
d2c25a0
Remove FP8_Lowest/Max
umangyadav Nov 17, 2023
5da68df
remove using for dtypes
umangyadav Nov 17, 2023
b36f72d
Update float8_impl
umangyadav Nov 17, 2023
85ba819
constructor from float works with constexpr
umangyadav Nov 17, 2023
aed1922
Remove unnecessary pragmas
umangyadav Nov 17, 2023
f975c63
Remove clang diagnostics
umangyadav Nov 17, 2023
32033d8
Add back floatequal
umangyadav Nov 17, 2023
e88d46a
disable DPP For FP8
umangyadav Nov 17, 2023
3ae93ca
Merge remote-tracking branch 'origin/develop' into gpu_fp8
umangyadav Nov 17, 2023
60dd1f4
formatting
umangyadav Nov 17, 2023
ef425d0
revert unwanted changes
umangyadav Nov 17, 2023
76f0318
Merge branch 'gpu_fp8' of https://github.com/ROCmSoftwarePlatform/AMD…
umangyadav Nov 17, 2023
bd0ae5f
add some more tests
umangyadav Nov 17, 2023
91cc9c7
Add math and reduce tests
umangyadav Nov 18, 2023
e2b0c40
Fix tidy and other errors
umangyadav Nov 18, 2023
9f50051
fixes
umangyadav Nov 18, 2023
249464c
add nolint
umangyadav Nov 18, 2023
1be9587
tidy fix
umangyadav Nov 18, 2023
13403ab
roialign, softmax, pow, acosh, atanh,pad tests are enabled now
umangyadav Nov 20, 2023
f550f81
add layernorm, remove constexpr for 1/r
umangyadav Nov 20, 2023
7e3444c
tidy fixes
umangyadav Nov 20, 2023
6155c78
use __builtin_is_constant_evaluated
umangyadav Nov 20, 2023
13ef414
add test for rsqrt and remove old-styple-cast
umangyadav Nov 20, 2023
8660572
add comment about c++20 extensions
umangyadav Nov 20, 2023
6fbd997
Remove old cast
umangyadav Nov 20, 2023
2acd265
Remove DPP
umangyadav Nov 20, 2023
836e201
Remove MIN max overloads
umangyadav Nov 20, 2023
f9542d5
Put numeric_max and numeeric lowest into float8
umangyadav Nov 20, 2023
480288f
use void for highest to match template candidates
umangyadav Nov 21, 2023
a6c5772
add float8 for tensorview
umangyadav Nov 21, 2023
3aa465f
compiles all right
umangyadav Nov 26, 2023
037205c
Works now
umangyadav Nov 26, 2023
87548b5
add ifdef to compile
umangyadav Nov 26, 2023
d473b80
add tests and fix cmake
umangyadav Nov 26, 2023
4604f2e
add tests
umangyadav Nov 26, 2023
ad9c25e
add eliminate_fp8 pass
umangyadav Nov 26, 2023
8734ffa
remove convert from lowering
umangyadav Nov 26, 2023
f014fb9
Fix eliminate_fp8 pass
umangyadav Nov 26, 2023
83ce487
Move pass before optimize module
umangyadav Nov 26, 2023
9a9e964
formatting
umangyadav Nov 26, 2023
c40a39c
fix cppcheck
umangyadav Nov 26, 2023
c4cee34
Merge branch 'develop' into rocblas_fp8
umangyadav Dec 1, 2023
f155b0e
merge changes
umangyadav Dec 1, 2023
38218ed
few changes
umangyadav Dec 1, 2023
379692f
few more cosmetic changes
umangyadav Dec 1, 2023
381b2d9
add half tests
umangyadav Dec 2, 2023
5423577
use updated eliminate_fp8 pass
umangyadav Dec 4, 2023
402c66a
use eliminate_data_type pass instead of eliminate_fp8 pass
umangyadav Dec 5, 2023
8738f3b
Merge branch 'develop' into rocblas_fp8
umangyadav Dec 5, 2023
4ca90ec
remove older files
umangyadav Dec 5, 2023
b099a7d
remove header
umangyadav Dec 5, 2023
7d6e6ad
fix typo
umangyadav Dec 5, 2023
cf91c2b
add changes for the eliminate_data_type pass
umangyadav Dec 5, 2023
82f9847
add comments
umangyadav Dec 5, 2023
a9db2bf
fix typo
umangyadav Dec 5, 2023
aeaac20
remove else
umangyadav Dec 5, 2023
a196e90
disable tests that uses CK
umangyadav Dec 5, 2023
7e80f62
formatting
umangyadav Dec 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 72 additions & 20 deletions src/eliminate_data_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,72 @@
namespace migraphx {
inline namespace MIGRAPHX_INLINE_NS {

void insert_convert_to_supported_type(module& m,
instruction_ref ins,
migraphx::shape::type_t target_type,
std::set<migraphx::shape::type_t> unsupported_types)
{
migraphx::shape::type_t orig_type = ins->get_shape().type();
std::vector<instruction_ref> inputs = ins->inputs();
std::transform(inputs.begin(), inputs.end(), inputs.begin(), [&](const auto& i) {
if(contains(unsupported_types, i->get_shape().type()))
{
return m.insert_instruction(
ins,
migraphx::make_op("convert", {{"target_type", migraphx::to_value(target_type)}}),
i);
}
else
{
return i;
}
});
// if no change
if(inputs == ins->inputs())
return;
auto op = ins->get_operator();
auto attributes = op.attributes();
if(attributes.contains("general_data_type"))
{
op = make_op(attributes["general_data_type"].to<std::string>(), op.to_value());
}
auto new_ins = m.insert_instruction(ins, op, inputs);
if(orig_type == shape::tuple_type)
{
auto orig_outs = ins->outputs();
if(not std::all_of(orig_outs.begin(), orig_outs.end(), [&](const auto out_ins) {
return out_ins->name() == "get_tuple_elem";
}))
MIGRAPHX_THROW(
"eliminate_data_type: Instruction with tuple output doesn't have all its "
"usages as get_tuple_elem instruction");

std::transform(
orig_outs.begin(), orig_outs.end(), orig_outs.begin(), [&](const auto out_ins) {
auto gte_ins = m.insert_instruction(ins, out_ins->get_operator(), new_ins);
auto orig_out_type = out_ins->get_shape().type();
if(contains(unsupported_types, orig_out_type))
{
auto gte_convert = m.insert_instruction(
ins, make_op("convert", {{"target_type", orig_out_type}}), gte_ins);
return m.replace_instruction(out_ins, gte_convert);
}
else
{
return m.replace_instruction(out_ins, gte_ins);
}
});
}
else
{
auto convert_back_ins = m.insert_instruction(
ins,
migraphx::make_op("convert", {{"target_type", migraphx::to_value(orig_type)}}),
new_ins);
m.replace_instruction(ins, convert_back_ins);
}
}

void eliminate_data_type::apply(module& m) const
{
static const std::vector<std::string> skip_op_names = {"convert",
Expand All @@ -42,31 +108,17 @@ void eliminate_data_type::apply(module& m) const
"scatternd_add",
"scatternd_mul",
"scatternd_none"};
if(unsupported_types.empty())
return;

for(auto ins : iterator_for(m))
{
if(ins->name()[0] == '@')
continue;
if(contains(skip_op_names, ins->name()))
continue;
auto inputs = ins->inputs();
std::transform(inputs.begin(), inputs.end(), inputs.begin(), [&](auto i) {
if(types.count(i->get_shape().type()) == 0)
return i;
return m.insert_instruction(ins, make_op("convert", {{"target_type", target_type}}), i);
});
if(inputs == ins->inputs())
if(contains(skip_op_names, ins->name()) and not contains(unsupported_ops, ins->name()))
continue;
auto op = ins->get_operator();
auto attributes = op.attributes();
if(attributes.contains("general_data_type"))
{
op = make_op(attributes["general_data_type"].to<std::string>(), op.to_value());
}
auto old_type = ins->get_shape().type();
auto out = m.insert_instruction(ins, op, inputs);
auto convert =
m.insert_instruction(ins, make_op("convert", {{"target_type", old_type}}), out);
m.replace_instruction(ins, convert);
if(contains(unsupported_ops, "all") or contains(unsupported_ops, ins->name()))
insert_convert_to_supported_type(m, ins, target_type, unsupported_types);
}
}

Expand Down
3 changes: 2 additions & 1 deletion src/include/migraphx/eliminate_data_type.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ struct module;
*/
struct MIGRAPHX_EXPORT eliminate_data_type
{
std::set<shape::type_t> types;
std::set<shape::type_t> unsupported_types;
shape::type_t target_type;
std::set<std::string> unsupported_ops = {"all"};
std::string name() const { return "eliminate_data_type"; }
void apply(module& m) const;
};
Expand Down
9 changes: 9 additions & 0 deletions src/targets/gpu/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,8 @@ check_library_exists(MIOpen "miopenHiddenSetConvolutionFindMode" "${MIOPEN_LOCAT
check_library_exists(MIOpen "miopenFindSolutions" "${MIOPEN_LOCATION}" HAS_FIND_2_API)
# Beta API for automated GEMM tuning
check_library_exists(roc::rocblas "rocblas_gemm_ex_get_solutions" "${ROCBLAS_LOCATION}" HAS_ROCBLAS_TUNING_BETA_FEATURE_API)
# rocblas FP8 API
check_library_exists(roc::rocblas "rocblas_gemm_strided_batched_ex3" "${ROCBLAS_LOCATION}" HAS_ROCBLAS_FP8_BETA_API)

set(MIGRAPHX_USE_FIND_2_API "${HAS_FIND_2_API}" CACHE BOOL "")

Expand Down Expand Up @@ -288,6 +290,13 @@ else()
message(STATUS "rocBLAS does not have User Tuning Beta API")
endif()

if(HAS_ROCBLAS_FP8_BETA_API)
target_compile_definitions(migraphx_gpu PUBLIC -DMIGRAPHX_USE_ROCBLAS_FP8_API -DROCBLAS_BETA_FEATURES_API -DROCBLAS_NO_DEPRECATED_WARNINGS)
message(STATUS "MIGraphX is using Beta API of rocBLAS for FP8 computations")
else()
message(STATUS "rocBLAS does not have Fp8 Beta API")
endif()

target_link_libraries(migraphx_gpu PUBLIC migraphx MIOpen roc::rocblas)
target_link_libraries(migraphx_gpu PRIVATE migraphx_device migraphx_kernels)
if(MIGRAPHX_USE_COMPOSABLEKERNEL)
Expand Down
85 changes: 68 additions & 17 deletions src/targets/gpu/gemm_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,35 @@
* THE SOFTWARE.
*/

#include <rocblas/internal/rocblas-types.h>
#include <rocblas/rocblas.h>
#include <migraphx/gpu/rocblas.hpp>
#include <migraphx/gpu/gemm_impl.hpp>
#include <migraphx/reduce_dims.hpp>
#include <migraphx/generate.hpp>
#include <migraphx/time.hpp>
#include <type_traits>

using microseconds = std::chrono::duration<double, std::micro>;

namespace migraphx {
inline namespace MIGRAPHX_INLINE_NS {
namespace gpu {

/*
Regular rocBLAS API takes compute_type as `rocblas_datatype` enum value v/s "ex3" BETA API takes it
as `rocblas_computetype` enum value. `rb_compute_type` is faciliator to implictly cast integer enum
value to required type that can be used inside `common_args` generator.
*/
struct rb_compute_type
{
int type = 0;
rb_compute_type(rocblas_datatype t) : type(static_cast<int>(t)) {}
rb_compute_type(rocblas_computetype t) : type(static_cast<int>(t)) {}
operator rocblas_datatype() const { return static_cast<rocblas_datatype>(type); }
operator rocblas_computetype() const { return static_cast<rocblas_computetype>(type); }
};

// Convert rocBLAS datatypes to equivalent Migraphx data types
rocblas_datatype get_type(shape::type_t type)
{
Expand All @@ -46,7 +63,7 @@ rocblas_datatype get_type(shape::type_t type)
case shape::uint8_type: return rocblas_datatype_u8_r;
case shape::int32_type: return rocblas_datatype_i32_r;
case shape::uint32_type: return rocblas_datatype_u32_r;
case shape::fp8e4m3fnuz_type:
case shape::fp8e4m3fnuz_type: return rocblas_datatype_f8_r;
case shape::tuple_type:
case shape::bool_type:
case shape::uint16_type:
Expand Down Expand Up @@ -183,12 +200,17 @@ struct gemm_impl
{
output_type = rocblas_datatype_i32_r;
}
compute_type = output_type;
compute_type = rb_compute_type{output_type};
if(compute_fp32)
{
if(arg_type == rocblas_datatype_f16_r)
compute_type = rocblas_datatype_f32_r;
}
if(arg_type == rocblas_datatype_f8_r)
{
assert(get_type(input_shapes[1].type()) == rocblas_datatype_f8_r);
compute_type = rocblas_compute_type_f32;
}

auto a_lens = input_shapes[0].lens();
auto b_lens = input_shapes[1].lens();
Expand Down Expand Up @@ -217,23 +239,52 @@ struct gemm_impl

void run(context& ctx, const std::vector<argument>& input_args, int32_t solution_idx = 0) const
{
if(strided_batched)
#ifdef MIGRAPHX_USE_ROCBLAS_FP8_API
if(rocblas_fp8_available() and
std::any_of(input_args.begin(), input_args.end(), [](const auto i) {
return i.get_shape().type() == migraphx::shape::fp8e4m3fnuz_type;
}))
{
auto common_args = create_strided_batched_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_strided_batched_ex,
common_args,
rocblas_gemm_algo_solution_index,
solution_idx,
gemm_flags);
if(strided_batched)
{
auto common_args = create_strided_batched_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_strided_batched_ex3,
common_args,
rocblas_gemm_algo_standard,
solution_idx,
gemm_flags);
}
else
{
auto common_args = create_gemm_ex_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_ex3,
common_args,
rocblas_gemm_algo_standard,
solution_idx,
gemm_flags);
}
}
else
#endif
{
auto common_args = create_gemm_ex_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_ex,
common_args,
rocblas_gemm_algo_solution_index,
solution_idx,
gemm_flags);
if(strided_batched)
{
auto common_args = create_strided_batched_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_strided_batched_ex,
common_args,
rocblas_gemm_algo_solution_index,
solution_idx,
gemm_flags);
}
else
{
auto common_args = create_gemm_ex_args_common(ctx, input_args);
rocblas_invoke(&rocblas_gemm_ex,
common_args,
rocblas_gemm_algo_solution_index,
solution_idx,
gemm_flags);
}
}
}

Expand Down Expand Up @@ -331,7 +382,6 @@ struct gemm_impl
num_matrices,
compute_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we set compute_type to rocblas_compute_type_f32 in the constructor for fp8? Then we dont need to remove this from the common args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocblas uses different type for the ex3 API v/s regular API.

For the ex3 API it is of type rocblas_computetype and for the regular API it is rocblas_datatype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just an enum so we can store it as an integer in a class and then add conversion operators to convert to the correct type when invoking:

struct compute_type
{
    int type = 0;
    compute_type(rocblas_datatype t)
    : type(static_cast<int>(t)) {}
    compute_type(rocblas_computetype t)
    : type(static_cast<int>(t)) {}
    operator rocblas_datatype() const
    {
        return static_cast<rocblas_datatype>(type);
    }
    operator rocblas_computetype() const
    {
        return static_cast<rocblas_datatype>(type);
    }
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/**
* Helper method to create that subset of a long rocBLAS argument list that is common
* to multiple "gemm_ex..." calls.
Expand Down Expand Up @@ -366,6 +416,7 @@ struct gemm_impl
ldd,
compute_type);
}

#ifdef MIGRAPHX_USE_ROCBLAS_TUNING_API
/**
* Find best rocBLAS solution: Get list of solutions and try them all, returning the index
Expand Down Expand Up @@ -481,8 +532,8 @@ struct gemm_impl
rocblas_int b_stride = 0;
rocblas_int c_stride = 0;
rocblas_int d_stride = 0;
rocblas_datatype compute_type = rocblas_datatype_f32_r;
rocblas_datatype arg_type = rocblas_datatype_f32_r;
rb_compute_type compute_type = rocblas_datatype_f32_r;
rocblas_datatype output_type = rocblas_datatype_f32_r;
bool strided_batched = true;
bool is_3inputs = true;
Expand Down
2 changes: 2 additions & 0 deletions src/targets/gpu/include/migraphx/gpu/rocblas.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ struct context;

MIGRAPHX_GPU_EXPORT bool get_compute_fp32_flag();

MIGRAPHX_GPU_EXPORT bool rocblas_fp8_available();

} // namespace gpu
} // namespace MIGRAPHX_INLINE_NS
} // namespace migraphx
Expand Down
8 changes: 2 additions & 6 deletions src/targets/gpu/kernels/include/migraphx/kernels/float8.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -501,9 +501,7 @@ class numeric_limits<fp8e5m2fnuz>
{
return fp8e5m2fnuz(0x7F, fp8e5m2fnuz::from_bits());
}
// this is min value that is not DeNormalized(DeNorm). DeNorm min is 0x01. I am not sure if we
// want to make this distinction. For the floating points we would end up using lowest most of
// the times.
// this is min value that is not DeNormalized(DeNorm). DeNorm min is 0x01.
static constexpr __device__ fp8e5m2fnuz min()
{
return fp8e5m2fnuz(0x4, fp8e5m2fnuz::from_bits());
Expand All @@ -528,9 +526,7 @@ class numeric_limits<fp8e5m2>
}

static constexpr __device__ fp8e5m2 max() { return fp8e5m2(0x7B, fp8e5m2::from_bits()); }
// this is min value that is not DeNormalized(DeNorm). DeNorm min is 0x01. I am not sure if we
// want to make this distinction. For the floating points we would end up using lowest most of
// the times.
// this is min value that is not DeNormalized(DeNorm). DeNorm min is 0x01.
static constexpr __device__ fp8e5m2 min() { return fp8e5m2(0x4, fp8e5m2::from_bits()); }

static constexpr __device__ fp8e5m2 lowest() { return fp8e5m2(0xFB, fp8e5m2::from_bits()); }
Expand Down
10 changes: 10 additions & 0 deletions src/targets/gpu/rocblas.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,16 @@ bool get_compute_fp32_flag()
return (starts_with(device_name, "gfx9") and device_name >= "gfx908");
}

bool rocblas_fp8_available()
{
#ifndef MIGRAPHX_USE_ROCBLAS_FP8_API
return false;
#else
const auto device_name = trim(split_string(get_device_name(), ':').front());
return (starts_with(device_name, "gfx9") and device_name >= "gfx940");
#endif
}

} // namespace gpu
} // namespace MIGRAPHX_INLINE_NS
} // namespace migraphx
7 changes: 7 additions & 0 deletions src/targets/gpu/target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,11 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti
unsupported_types.erase(shape::type_t::uint8_type);
unsupported_types.erase(shape::type_t::int32_type);
unsupported_types.erase(shape::type_t::tuple_type);
std::set<std::string> unsupported_fp8_ops = {};
if(not gpu::rocblas_fp8_available())
{
unsupported_fp8_ops.insert("dot");
}
// clang-format off
return
{
Expand Down Expand Up @@ -136,6 +141,8 @@ std::vector<pass> target::get_passes(migraphx::context& gctx, const compile_opti
prefuse_ops{},
dead_code_elimination{},
auto_contiguous{},
eliminate_data_type{{migraphx::shape::fp8e4m3fnuz_type}, shape::float_type, unsupported_fp8_ops},
dead_code_elimination{},
optimize_module{},
fuse_pointwise{},
dead_code_elimination{},
Expand Down
11 changes: 8 additions & 3 deletions test/verify/gemm_2args_bmv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,15 @@
#include <migraphx/generate.hpp>
#include <migraphx/make_op.hpp>

struct gemm_2args_bmv : verify_program<gemm_2args_bmv>
template <migraphx::shape::type_t DType>
struct gemm_2args_bmv : verify_program<gemm_2args_bmv<DType>>
{
migraphx::program create_program() const
{
migraphx::program p;
auto* mm = p.get_main_module();
migraphx::shape m1_shape{migraphx::shape::float_type, {2, 3, 3, 5}};
migraphx::shape m2_shape{migraphx::shape::float_type, {5}};
migraphx::shape m1_shape{DType, {2, 3, 3, 5}};
migraphx::shape m2_shape{DType, {5}};
auto l1 = mm->add_parameter("1", m1_shape);
auto l2 = mm->add_parameter("2", m2_shape);
auto ul2 = mm->add_instruction(migraphx::make_op("unsqueeze", {{"axes", {1}}}), l2);
Expand All @@ -46,3 +47,7 @@ struct gemm_2args_bmv : verify_program<gemm_2args_bmv>
return p;
}
};

template struct gemm_2args_bmv<migraphx::shape::float_type>;
template struct gemm_2args_bmv<migraphx::shape::half_type>;
template struct gemm_2args_bmv<migraphx::shape::fp8e4m3fnuz_type>;
Loading
Loading