-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable MLIR by default for more cases #2274
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2274 +/- ##
========================================
Coverage 91.29% 91.29%
========================================
Files 436 436
Lines 16335 16335
========================================
Hits 14913 14913
Misses 1422 1422 |
This build is not recommended to merge 🔴 |
🔴torchvision-inceptionv3_1: FAILED: MIGraphX is not within tolerance - check verbose output🔴slim-inceptionv4_1: FAILED: MIGraphX is not within tolerance - check verbose output🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
Hello, would it be possible to have a |
@pfultz2 Wanted to poke the status of improving this PR to match the heuristic ticket, since we're getting close to the branch date |
Which ticket is that? |
@krzysz00 I updated the PR. |
Using |
src/targets/gpu/fuse_mlir.cpp
Outdated
{ | ||
match::find_matches(mpm, find_mlir_standalone_dot_op{}); | ||
} | ||
mlir_mode mode = enabled(MIGRAPHX_ENABLE_MLIR{}) ? mlir_mode::fast : mlir_mode::none; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on, since we now have MIGRAPHX_DISABLE_MLIR
, shouldn't this be keying off of _DISABLE_MLIR
and not _ENABLE_MLIR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, DISABLE_MLIR will disable MLIR completely(and its already handled in target.cpp), whereas ENABLE_MLIR will enable it for gemm fusions(because it is not enabled by default because its not always faster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. So maybe we should rename it to something like MIGRAPHX_ENABLE_MLIR_GEMM_FUSION
because it's a very confusing variable name
} | ||
|
||
struct find_mlir_fused_ops | ||
{ | ||
mlir_mode conv_mode = mlir_mode::none; | ||
mlir_mode dot_mode = mlir_mode::none; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shouldn't we have an explicit constructor? Or /*conv_mode=*/
comments when we create this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if designated initializers work here(an explicit constructor wont allow that though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving on the assumption that the renaming in this review gets made before merge
src/targets/gpu/fuse_mlir.cpp
Outdated
@@ -37,23 +37,13 @@ struct module; | |||
namespace gpu { | |||
|
|||
MIGRAPHX_DECLARE_ENV_VAR(MIGRAPHX_ENABLE_MLIR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to note the agreement from yesterday's meeting that this becomes MIGRAPHX_ENABLE_EXTRA_MLIR
@causten Fixed the mlir tests. |
So the CI failure looks like a bug in MLIR, here is the backtrace using a debug build of mlir:
From the mlir trace it comes from programs like this:
Also running with helgrind, I do see some data races in MLIR:
So I am not sure if this is a multihreaded issue, since it still produces the error when |
…MIGraphX into mlir-fast-check
@pfultz2 Because rocMLIR uses LLVM internally, ubsan builds of rocMLIR needs to use LLVM's ubsan flags, which include That is, an invalid vptr error is expected and should be disabled. It is not possible, to my knowledge, to fix this. I set up the MLIR debug CI to pass those options in MLIR builds. |
Also, I'm not sure the race is related |
This will enable MLIR by default for these cases:
MIGRAPHX_ENABLE_MLIR
) to enable MLIR for floating-point gemm fusionsExcept:
Also there is
MIGRAPHX_DISABLE_MLIR
to disable MLIR completely.