Enable MLIR by default for more cases #2274

pfultz2 · 2023-10-03T00:18:52Z

This will enable MLIR by default for these cases:

Any convolution fusion
Any int8 gemm fusion
All Navi3 standalone convolutions
With a flag(ie MIGRAPHX_ENABLE_MLIR) to enable MLIR for floating-point gemm fusions

Except:

3x3 winnograd convolutions fusions (except on Navi)
K > 2048 on gemm (as CK)

Also there is MIGRAPHX_DISABLE_MLIR to disable MLIR completely.

codecov · 2023-10-03T01:50:33Z

Codecov Report

Merging #2274 (fa39526) into develop (f8bf7bd) will not change coverage.
The diff coverage is n/a.

❗ Current head fa39526 differs from pull request most recent head 249b21d. Consider uploading reports for the commit 249b21d to get more accurate results

@@           Coverage Diff            @@
##           develop    #2274   +/-   ##
========================================
  Coverage    91.29%   91.29%           
========================================
  Files          436      436           
  Lines        16335    16335           
========================================
  Hits         14913    14913           
  Misses        1422     1422

migraphx-bot · 2023-10-03T04:58:48Z

Test	Batch	Rate new 249b21	Rate old e27efe	Diff	Compare
torchvision-resnet50	64	2,855.97	2,324.21	22.88%	🔆
torchvision-resnet50_fp16	64	6,486.60	5,360.67	21.00%	🔆
torchvision-densenet121	32	2,112.56	1,839.62	14.84%	🔆
torchvision-densenet121_fp16	32	3,696.11	3,399.12	8.74%	🔆
torchvision-inceptionv3	32	1,598.91	1,292.92	23.67%	🔆
torchvision-inceptionv3_fp16	32	2,592.14	2,541.50	1.99%	✅
cadene-inceptionv4	16	707.87	620.79	14.03%	🔆
cadene-resnext64x4	16	698.30	590.07	18.34%	🔆
slim-mobilenet	64	8,352.98	7,208.36	15.88%	🔆
slim-nasnetalarge	64	227.50	236.62	-3.85%	🔴
slim-resnet50v2	64	2,678.01	2,558.40	4.68%	🔆
bert-mrpc-onnx	8	825.08	825.21	-0.02%	✅
bert-mrpc-tf	1	388.94	388.05	0.23%	✅
pytorch-examples-wlang-gru	1	293.53	295.10	-0.53%	✅
pytorch-examples-wlang-lstm	1	303.19	315.61	-3.94%	🔴
torchvision-resnet50_1	1	605.51	550.88	9.92%	🔆
torchvision-inceptionv3_1	1	338.66	301.09	12.48%	🔆
cadene-dpn92_1	1	396.89	351.13	13.03%	🔆
cadene-resnext101_1	1	329.39	220.86	49.14%	🔆
slim-vgg16_1	1	463.58	224.10	106.86%	🔆
slim-mobilenet_1	1	2,075.47	1,482.35	40.01%	🔆
slim-inceptionv4_1	1	216.79	217.99	-0.55%	✅
onnx-taau-downsample	1	306.42	306.69	-0.09%	✅
dlrm-criteoterabyte	1	21.71	21.72	-0.04%	✅
dlrm-criteoterabyte_fp16	1	40.75	40.76	-0.05%	✅
agentmodel	1	5,739.31	5,786.20	-0.81%	✅
unet_fp16	2	56.01	55.81	0.37%	✅
resnet50v1_fp16	1	920.45	756.60	21.66%	🔆
bert_base_cased_fp16	64	971.51	971.33	0.02%	✅
bert_large_uncased_fp16	32	305.31	305.14	0.05%	✅
bert_large_fp16	1	166.95	166.81	0.09%	✅
distilgpt2_fp16	16	1,279.45	1,280.57	-0.09%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2023-10-03T04:58:50Z

:white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

🔴torchvision-inceptionv3_1: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

🔴slim-inceptionv4_1: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark:unet: PASSED: MIGraphX meets tolerance

:white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

src/targets/gpu/fuse_mlir.cpp

giuseros · 2023-10-06T15:13:27Z

Hello, would it be possible to have a use-rocmlir-always flag/environment variable? My point is that we would like to spot performance issues to improve rocMLIR and that flag would help us a lot in such investigations :)

krzysz00 · 2023-10-10T14:48:26Z

@pfultz2 Wanted to poke the status of improving this PR to match the heuristic ticket, since we're getting close to the branch date

pfultz2 · 2023-10-10T16:51:50Z

Wanted to poke the status of improving this PR to match the heuristic ticket, since we're getting close to the branch date

Which ticket is that?

krzysz00 · 2023-10-10T18:37:01Z

@pfultz2 It's ticket #2296 , linked above, summarizing our meeting Thursday

pfultz2 · 2023-10-10T20:24:19Z

@krzysz00 I updated the PR.

pfultz2 · 2023-10-10T20:25:07Z

Hello, would it be possible to have a use-rocmlir-always flag/environment variable? My point is that we would like to spot performance issues to improve rocMLIR and that flag would help us a lot in such investigations :)

Using MIGRAPHX_MLIR_USE_SPECIFIC_OPS will enable it always on.

krzysz00 · 2023-10-10T20:29:44Z

src/targets/gpu/fuse_mlir.cpp

-    {
-        match::find_matches(mpm, find_mlir_standalone_dot_op{});
-    }
+    mlir_mode mode = enabled(MIGRAPHX_ENABLE_MLIR{}) ? mlir_mode::fast : mlir_mode::none;


Hold on, since we now have MIGRAPHX_DISABLE_MLIR, shouldn't this be keying off of _DISABLE_MLIR and not _ENABLE_MLIR?

No, DISABLE_MLIR will disable MLIR completely(and its already handled in target.cpp), whereas ENABLE_MLIR will enable it for gemm fusions(because it is not enabled by default because its not always faster).

Ah. So maybe we should rename it to something like MIGRAPHX_ENABLE_MLIR_GEMM_FUSION because it's a very confusing variable name

krzysz00 · 2023-10-10T20:31:47Z

src/targets/gpu/fuse_mlir.cpp

 }

 struct find_mlir_fused_ops
 {
+    mlir_mode conv_mode = mlir_mode::none;
+    mlir_mode dot_mode  = mlir_mode::none;


Nit: shouldn't we have an explicit constructor? Or /*conv_mode=*/ comments when we create this?

Let me see if designated initializers work here(an explicit constructor wont allow that though).

krzysz00

I'm approving on the assumption that the renaming in this review gets made before merge

krzysz00 · 2023-10-12T15:23:07Z

src/targets/gpu/fuse_mlir.cpp

@@ -37,23 +37,13 @@ struct module;
 namespace gpu {

 MIGRAPHX_DECLARE_ENV_VAR(MIGRAPHX_ENABLE_MLIR);


Wanted to note the agreement from yesterday's meeting that this becomes MIGRAPHX_ENABLE_EXTRA_MLIR

pfultz2 · 2023-10-13T20:23:03Z

@causten Fixed the mlir tests.

pfultz2 · 2023-10-14T18:32:30Z

So the CI failure looks like a bug in MLIR, here is the backtrace using a debug build of mlir:

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:185:46 in 
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:194:6: runtime error: member call on address 0x7f2860000cc0 which does not point to an object of type 'std::_Sp_counted_base<>'
0x7f2860000cc0: note: object has invalid vptr
 00 00 00 00  e8 e8 5b 11 2b 7f 00 00  00 00 00 00 00 00 00 00  e0 0c 00 60 28 7f 00 00  01 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              invalid vptr
    #0 0x7f2b0a64a4f3 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:194:6
    #1 0x7f2b0c5a5919 in std::__weak_count<(__gnu_cxx::_Lock_policy)2>::~__weak_count() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:823:11
    #2 0x7f2b106f1ca8 in std::__weak_ptr<mlir::StorageUniquer::StorageAllocator*, (__gnu_cxx::_Lock_policy)2>::~__weak_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1596:29
    #3 0x7f2b106f1c84 in std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>::~weak_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr.h:531:11
    #4 0x7f2b106f1b8b in llvm::DenseMapBase<llvm::SmallDenseMap<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, 4u, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>, mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>::destroyAll() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/llvm/include/llvm/ADT/DenseMap.h:395:25
    #5 0x7f2b106f0b98 in llvm::SmallDenseMap<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, 4u, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>::~SmallDenseMap() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/llvm/include/llvm/ADT/DenseMap.h:960:11
    #6 0x7f2b106f0132 in mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::CacheType::~CacheType() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/mlir/include/mlir/Support/ThreadLocalCache.h:67:5
    #7 0x7f2ab93c02be in __call_tls_dtors /build/glibc-SzIz7B/glibc-2.31/stdlib/cxa_thread_atexit_impl.c:155:7
    #8 0x7f2ab98e3616 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:485:5
    #9 0x7f2ab9498132 in __clone /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

From the mlir trace it comes from programs like this:

module {
  func.func @mlir_convolution(%arg0: tensor<1x512x28x28xf32>, %arg1: tensor<256x512x1x1xf32>) -> tensor<1x256x28x28xf32> attributes {arch = "gfx908:sramecc+:xnack-", kernel = "mixr", num_cu = 120 : i64} {
    %0 = migraphx.convolution(%arg0, %arg1) {dilation = [1, 1], group = 1 : i64, padding = [0, 0, 0, 0], padding_mode = 0 : i64, stride = [1, 1]} : (tensor<1x512x28x28xf32>, tensor<256x512x1x1xf32>) -> tensor<1x256x28x28xf32>
    return %0 : tensor<1x256x28x28xf32>
  }
}

#map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3)>
#transform_map = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [1, 512, 28, 28, 1] -> [1, 512, 28, 28]>
#transform_map1 = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [256, 512, 1, 1, 1] -> [256, 512, 1, 1]>
#transform_map2 = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [1, 256, 28, 28, 1] -> [1, 256, 28, 28]>
module {
  func.func @mlir_convolution(%arg0: memref<1x512x28x28xf32>, %arg1: memref<256x512x1x1xf32>, %arg2: memref<1x256x28x28xf32>) attributes {arch = "gfx908:sramecc+:xnack-", kernel = "mixr", num_cu = 120 : i64} {
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1x256x28x28xf32>
    %0 = rock.transform %arg0 by #transform_map : memref<1x512x28x28xf32> to memref<1x512x28x28x1xf32>
    %1 = rock.transform %arg1 by #transform_map1 : memref<256x512x1x1xf32> to memref<256x512x1x1x1xf32>
    %2 = rock.transform %alloc by #transform_map2 : memref<1x256x28x28xf32> to memref<1x256x28x28x1xf32>
    rock.conv2d(%1, %0, %2) features =  mfma|dot|atomic_add {arch = "gfx908:sramecc+:xnack-", dilations = [1 : i32, 1 : i32], filter_layout = ["k", "c", "y", "x", "g"], input_layout = ["ni", "ci", "hi", "wi", "gi"], numCU = 120 : i32, output_layout = ["no", "ko", "ho", "wo", "go"], padding = [0 : i32, 0 : i32, 0 : i32, 0 : i32], perf_config = "16,16,8,16,16,8,1,1", strides = [1 : i32, 1 : i32]} : memref<256x512x1x1x1xf32>, memref<1x512x28x28x1xf32>, memref<1x256x28x28x1xf32>
    memref.copy %alloc, %arg2 : memref<1x256x28x28xf32> to memref<1x256x28x28xf32>
    return
  }
}

Also running with helgrind, I do see some data races in MLIR:

==694012== Possible data race during read of size 8 at 0x1222B868 by thread #159
==694012== Locks held: none
==694012==    at 0xC6AFFA0: llvm::Target::createMCAsmBackend(llvm::MCSubtargetInfo const&, llvm::MCRegisterInfo const&, llvm::MCTargetOptions const&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8EDAC: (anonymous namespace)::SerializeToHsacoPass::assembleIsa(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8D363: (anonymous namespace)::SerializeToHsacoPass::serializeISA(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xEDEA287: mlir::gpu::SerializeToBlobPass::runOnOperation() (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF62A: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF5C4: void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xC406C18: llvm::function_ref<void ()>::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D023F4: void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFAE42: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFB3C3: mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D00874: mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D004F8: mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==  Address 0x1222b868 is 104 bytes inside data symbol "_ZZN4llvm15getTheGCNTargetEvE12TheGCNTarget"

==694012== Possible data race during read of size 8 at 0x1222B890 by thread #159
==694012== Locks held: none
==694012==    at 0xC6AFF1C: llvm::Target::createMCCodeEmitter(llvm::MCInstrInfo const&, llvm::MCContext&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8ED5B: (anonymous namespace)::SerializeToHsacoPass::assembleIsa(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8D363: (anonymous namespace)::SerializeToHsacoPass::serializeISA(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xEDEA287: mlir::gpu::SerializeToBlobPass::runOnOperation() (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF62A: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF5C4: void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xC406C18: llvm::function_ref<void ()>::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D023F4: void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFAE42: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFB3C3: mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D00874: mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D004F8: mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==  Address 0x1222b890 is 144 bytes inside data symbol "_ZZN4llvm15getTheGCNTargetEvE12TheGCNTarget"

So I am not sure if this is a multihreaded issue, since it still produces the error when MIGRAPHX_GPU_COMPILE_PARALLEL=1 is ued.

…MIGraphX into mlir-fast-check

krzysz00 · 2023-10-16T14:46:03Z

@pfultz2 Because rocMLIR uses LLVM internally, ubsan builds of rocMLIR needs to use LLVM's ubsan flags, which include -fno-sanitize=vptr,function (or more permissive)

That is, an invalid vptr error is expected and should be disabled. It is not possible, to my knowledge, to fix this.

I set up the MLIR debug CI to pass those options in MLIR builds.

krzysz00 · 2023-10-16T14:46:17Z

Also, I'm not sure the race is related

pfultz2 added 2 commits October 2, 2023 16:26

Skip MLIR for 3x3 convolutions and large K gemms

514e053

Format

a86bf5d

umangyadav reviewed Oct 3, 2023

View reviewed changes

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved

krzysz00 mentioned this pull request Oct 5, 2023

Implement MLIR enablement heuristic #2296

Open

pfultz2 added 2 commits October 10, 2023 20:14

Refactor to handle more cases by default

30e509d

Format

3e313ec

pfultz2 changed the title ~~Skip MLIR for 3x3 convolutions and large K gemms~~ Enable MLIR by default for more cases Oct 10, 2023

krzysz00 reviewed Oct 10, 2023

View reviewed changes

pfultz2 and others added 3 commits October 10, 2023 21:55

Use named parameters

14e1833

Format

8000221

Merge branch 'develop' into mlir-fast-check

4174d21

causten requested review from umangyadav and krzysz00 October 12, 2023 14:14

causten assigned pfultz2 Oct 12, 2023

krzysz00 approved these changes Oct 12, 2023

View reviewed changes

umangyadav approved these changes Oct 12, 2023

View reviewed changes

pfultz2 added 2 commits October 13, 2023 20:20

Fix fuse_mlir tests

9500c0d

Format

c133898

causten added 2 commits October 13, 2023 17:04

Merge branch 'develop' into mlir-fast-check

08da3dd

Merge branch 'develop' into mlir-fast-check

c69f5a1

pfultz2 and others added 7 commits October 14, 2023 19:48

Fix tidy

b9a7baa

Merge branch 'mlir-fast-check' of github.com:ROCmSoftwarePlatform/AMD…

e6cb212

…MIGraphX into mlir-fast-check

Fix typo

91c1fd1

Dont use standalone convolution on fp by default

dd6ff56

Merge branch 'develop' into mlir-fast-check

07cbee9

Fix cppcheck issue

e9d978a

Disable MLIR in main stage

249b21d

causten merged commit 650ba45 into develop Oct 16, 2023
15 checks passed

causten deleted the mlir-fast-check branch October 16, 2023 03:31

CharlieL7 mentioned this pull request Oct 17, 2023

Change driver verify to check for fp16 and --fp16 #2334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable MLIR by default for more cases #2274

Enable MLIR by default for more cases #2274

pfultz2 commented Oct 3, 2023 •

edited

Loading

codecov bot commented Oct 3, 2023 •

edited

Loading

migraphx-bot commented Oct 3, 2023 •

edited

Loading

migraphx-bot commented Oct 3, 2023 •

edited

Loading

giuseros commented Oct 6, 2023

krzysz00 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

krzysz00 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

krzysz00 Oct 10, 2023

pfultz2 Oct 10, 2023

krzysz00 Oct 10, 2023

krzysz00 Oct 10, 2023

pfultz2 Oct 10, 2023

krzysz00 left a comment

krzysz00 Oct 12, 2023

pfultz2 commented Oct 13, 2023

pfultz2 commented Oct 14, 2023

krzysz00 commented Oct 16, 2023

krzysz00 commented Oct 16, 2023

		@@ -37,23 +37,13 @@ struct module;
		namespace gpu {

		MIGRAPHX_DECLARE_ENV_VAR(MIGRAPHX_ENABLE_MLIR);

Enable MLIR by default for more cases #2274

Enable MLIR by default for more cases #2274

Conversation

pfultz2 commented Oct 3, 2023 • edited Loading

codecov bot commented Oct 3, 2023 • edited Loading

Codecov Report

migraphx-bot commented Oct 3, 2023 • edited Loading

migraphx-bot commented Oct 3, 2023 • edited Loading

giuseros commented Oct 6, 2023

krzysz00 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

krzysz00 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

pfultz2 commented Oct 10, 2023

krzysz00 Oct 10, 2023

Choose a reason for hiding this comment

pfultz2 Oct 10, 2023

Choose a reason for hiding this comment

krzysz00 Oct 10, 2023

Choose a reason for hiding this comment

krzysz00 Oct 10, 2023

Choose a reason for hiding this comment

pfultz2 Oct 10, 2023

Choose a reason for hiding this comment

krzysz00 left a comment

Choose a reason for hiding this comment

krzysz00 Oct 12, 2023

Choose a reason for hiding this comment

pfultz2 commented Oct 13, 2023

pfultz2 commented Oct 14, 2023

krzysz00 commented Oct 16, 2023

krzysz00 commented Oct 16, 2023

pfultz2 commented Oct 3, 2023 •

edited

Loading

codecov bot commented Oct 3, 2023 •

edited

Loading

migraphx-bot commented Oct 3, 2023 •

edited

Loading

migraphx-bot commented Oct 3, 2023 •

edited

Loading