-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added bias
term to attenOp
for rocMLIR
#2777
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2777 +/- ##
========================================
Coverage 91.84% 91.84%
========================================
Files 478 478
Lines 18179 18179
========================================
Hits 16696 16696
Misses 1483 1483 ☔ View full report in Codecov by Sentry. |
54a6c4a
to
a3d73f7
Compare
This build is OK for merge ✅ |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
806a69b
to
9348d76
Compare
Hi @pfultz2, could you, please, review the PR. |
aae95ab
to
ddfdbf4
Compare
Hi @pfultz2 I have a problem with one test that checks the Here is the way how I execute the test (i.e., MIGRAPHX_MLIR_USE_SPECIFIC_OPS="attention" MIGRAPHX_TRACE_MLIR=1 ./bin/test_verify "gemm_softmax_gemm_relu<true>" which leads to the following: mlir_0:z = @param:z -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:bias = @param:bias -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@2 = @literal{0.125} -> half_type, {1}, {0}, target_id=0
mlir_0:y1 = @param:y1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:y0 = @param:y0 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@5 = transpose[permutation={0, 1, 3, 2}](mlir_0:y1) -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0
mlir_0:@6 = contiguous(mlir_0:@5) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@7 = dot(mlir_0:y0,mlir_0:@6) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@8 = multibroadcast[out_lens={1, 12, 256, 256},out_dyn_dims={}](mlir_0:@2) -> half_type, {1, 12, 256, 256}, {0, 0, 0, 0}, target_id=0
mlir_0:@9 = mul(mlir_0:@7,mlir_0:@8) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@10 = add(mlir_0:@9,mlir_0:bias) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@11 = softmax[axis=3](mlir_0:@10) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@12 = dot(mlir_0:@11,mlir_0:z) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@13 = relu(mlir_0:@12) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
mlir_0:@14 = @return(mlir_0:@13), target_id=0
module {
func.func @mlir_transpose_dot_mul_add_softmax_dot_relu(%arg0: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg1: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg2: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg3: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>) -> !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1> attributes {arch = "gfx90a:sramecc+:xnack-", kernel = "mixr", num_cu = 110 : i64} {
%0 = migraphx.literal(dense<1.250000e-01> : tensor<1xf16>) : <1xf16, 0>
%1 = migraphx.transpose %arg2 {permutation = [0, 1, 3, 2]} : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x1x256>
%2 = migraphx.dot %arg1, %1 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x1x256> -> <1x12x256x256xf16, 786432x65536x256x1>
%3 = migraphx.multibroadcast %0 {out_dyn_dims = [], out_lens = [1, 12, 256, 256]} : <1xf16, 0> -> <1x12x256x256xf16, 0x0x0x0>
%4 = migraphx.mul %2, %3 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 0x0x0x0> -> <1x12x256x256xf16, 786432x65536x256x1>
%5 = migraphx.add %4, %arg0 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%6 = migraphx.softmax %5 {axis = 3 : i64} : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%7 = migraphx.dot %6, %arg3 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%8 = migraphx.relu %7 : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
return %8 : !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>
}
} The code totally makes sense for me. But, it results in the numerical error: FAILED: gpu
RMS Error: 0.0832389
Max diff: 0.135544
Mismatch at 3: 0.0284576 != 0.0394592
module: "main"
@0 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@1 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@5 = transpose[permutation={0, 1, 3, 2}](2) -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0
@6 = dot(1,@5) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@7 = mul(@6,@1) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@8 = add(@7,@0) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@9 = softmax[axis=3](@8) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@10 = dot(@9,3) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@11 = relu(@10) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
ref:
module: "main"
@0 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@1 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@5 = ref::transpose[permutation={0, 1, 3, 2}](2) -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0
@6 = ref::contiguous(@5) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@7 = ref::dot(1,@6) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@8 = ref::mul(@7,@1) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@9 = ref::add(@8,@0) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@10 = ref::softmax[axis=3](@9) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@11 = ref::dot(@10,3) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@12 = ref::relu(@11) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
gpu:
module: "main"
@0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0
@1 = hip::hip_copy_literal[id=main:@literal:0] -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
output = @param:output -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0
@6 = gpu::code_object[code_object=10480,symbol_name=mlir_transpose_dot_mul_add_softmax_dot_relu,global=49152,local=256,](1,2,@1,3,output) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 I extracted the mlir code and tested with the rocMLIR infrastructure. $ cat ./atten-bias.mlir
module {
func.func @mlir_transpose_dot_mul_add_softmax_dot_relu(%arg0: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg1: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg2: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>, %arg3: !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>) -> !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1> attributes {arch = "gfx90a:sramecc+:xnack-", kernel = "mixr", num_cu = 110 : i64} {
%0 = migraphx.literal(dense<1.250000e-01> : tensor<1xf16>) : <1xf16, 0>
%1 = migraphx.transpose %arg2 {permutation = [0, 1, 3, 2]} : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x1x256>
%2 = migraphx.dot %arg1, %1 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x1x256> -> <1x12x256x256xf16, 786432x65536x256x1>
%3 = migraphx.multibroadcast %0 {out_dyn_dims = [], out_lens = [1, 12, 256, 256]} : <1xf16, 0> -> <1x12x256x256xf16, 0x0x0x0>
%4 = migraphx.mul %2, %3 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 0x0x0x0> -> <1x12x256x256xf16, 786432x65536x256x1>
%5 = migraphx.add %4, %arg0 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%6 = migraphx.softmax %5 {axis = 3 : i64} : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%7 = migraphx.dot %6, %arg3 : <1x12x256x256xf16, 786432x65536x256x1>, <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
%8 = migraphx.relu %7 : <1x12x256x256xf16, 786432x65536x256x1> -> <1x12x256x256xf16, 786432x65536x256x1>
return %8 : !migraphx.shaped<1x12x256x256xf16, 786432x65536x256x1>
}
} Here is the way how I execute the code snippet $ func=mlir_transpose_dot_mul_add_softmax_dot_relu
$ rocmlir-gen -fut ${func} --arch gfx90a --clone-harness ./atten-bias.mlir | rocmlir-driver -kernel-pipeline=migraphx | rocmlir-driver -host-pipeline=migraphx,highlevel | rocmlir-gen -ph -rand 1 -rand_type float -fut ${func}_wrapper -absDiff_threshold 7e-03 -relDiff_threshold 7e-03 -RMS_threshold 5e-03 --verifier clone - | rocmlir-driver -host-pipeline mhal -kernel-pipeline full | xmir-runner --shared-libs=external/llvm-project/llvm/lib/libmlir_rocm_runtime.so,lib/libconv-validation-wrappers.so,external/llvm-project/llvm/lib/libmlir_runner_utils.so,external/llvm-project/llvm/lib/libmlir_float16_utils.so,external/llvm-project/llvm/lib/libmlir_c_runner_utils.so,external/llvm-project/llvm/lib/libmlir_async_runtime.so --entry-point-result=void This results in [1 1 1] which means that the e2e test didn't fail. I also tested the same code snippet with |
@ravil-mobile it could be a mismatch in argument ordering into the fused_mlir module in the migraphx main. |
The parameters should be passed in alphabetic order, so it should go |
Thanks a lot! I will investigate it further. |
Hi @pfultz2 , thanks a lot! I didn't know about it. Let me try it |
ddfdbf4
to
09c6de5
Compare
@pfultz2, you were correct about the alphabetical order. Many thanks for the info. I assume that |
31a0341
to
7b7ca37
Compare
I'm seeing errors in the Performance check. Do this PR depend on a different one? torchvision-resnet50 failed with following error: |
2e90304
to
432aa05
Compare
Hi @causten, I've found a way to fix the lens mismatch. Now, everything should work as before. The timestamp and clang-tidy issues were fixed as well. |
6720325
to
a924f45
Compare
a924f45
to
bfa1fbc
Compare
bfa1fbc
to
2eae332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
auto gemm2 = mm->add_instruction(migraphx::make_op("dot"), softmax, b1); | ||
mm->add_instruction(migraphx::make_op("relu"), gemm2); | ||
return p; | ||
} | ||
std::string section() const { return "gemm"; } | ||
}; | ||
|
||
template struct gemm_softmax_gemm_relu<false, true>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the second parameter set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it looks weird. I addressed this issue with the enum
that you suggested below.
@@ -27,31 +27,48 @@ | |||
#include <migraphx/generate.hpp> | |||
#include <migraphx/make_op.hpp> | |||
|
|||
struct gemm_softmax_gemm_relu : verify_program<gemm_softmax_gemm_relu> | |||
template <bool WithBias, bool WithStandardBiasShape> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer an enum be used here to make it clearer. Something like:
enum class bias
{
without,
with,
with_standard_shape
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pfultz2, Yes, it makes sense. I agree.
Done
6e7fed1
to
19c91a7
Compare
19c91a7
to
06c0be0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix merge issues.
Hi @umangyadav. Thanks for noticing. Done! |
Ravil has already addressed your review, please re-review
This PR adds the bias term to the
AttentionOp
. The term is optional.