Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable MLIR by default for more cases #2274

Merged
merged 18 commits into from
Oct 16, 2023
Merged

Enable MLIR by default for more cases #2274

merged 18 commits into from
Oct 16, 2023

Conversation

pfultz2
Copy link
Collaborator

@pfultz2 pfultz2 commented Oct 3, 2023

This will enable MLIR by default for these cases:

  • Any convolution fusion
  • Any int8 gemm fusion
  • All Navi3 standalone convolutions
  • With a flag(ie MIGRAPHX_ENABLE_MLIR) to enable MLIR for floating-point gemm fusions

Except:

  • 3x3 winnograd convolutions fusions (except on Navi)
  • K > 2048 on gemm (as CK)

Also there is MIGRAPHX_DISABLE_MLIR to disable MLIR completely.

@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

Merging #2274 (fa39526) into develop (f8bf7bd) will not change coverage.
The diff coverage is n/a.

❗ Current head fa39526 differs from pull request most recent head 249b21d. Consider uploading reports for the commit 249b21d to get more accurate results

@@           Coverage Diff            @@
##           develop    #2274   +/-   ##
========================================
  Coverage    91.29%   91.29%           
========================================
  Files          436      436           
  Lines        16335    16335           
========================================
  Hits         14913    14913           
  Misses        1422     1422           

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Oct 3, 2023

Test Batch Rate new
249b21
Rate old
e27efe
Diff Compare
torchvision-resnet50 64 2,855.97 2,324.21 22.88% 🔆
torchvision-resnet50_fp16 64 6,486.60 5,360.67 21.00% 🔆
torchvision-densenet121 32 2,112.56 1,839.62 14.84% 🔆
torchvision-densenet121_fp16 32 3,696.11 3,399.12 8.74% 🔆
torchvision-inceptionv3 32 1,598.91 1,292.92 23.67% 🔆
torchvision-inceptionv3_fp16 32 2,592.14 2,541.50 1.99%
cadene-inceptionv4 16 707.87 620.79 14.03% 🔆
cadene-resnext64x4 16 698.30 590.07 18.34% 🔆
slim-mobilenet 64 8,352.98 7,208.36 15.88% 🔆
slim-nasnetalarge 64 227.50 236.62 -3.85% 🔴
slim-resnet50v2 64 2,678.01 2,558.40 4.68% 🔆
bert-mrpc-onnx 8 825.08 825.21 -0.02%
bert-mrpc-tf 1 388.94 388.05 0.23%
pytorch-examples-wlang-gru 1 293.53 295.10 -0.53%
pytorch-examples-wlang-lstm 1 303.19 315.61 -3.94% 🔴
torchvision-resnet50_1 1 605.51 550.88 9.92% 🔆
torchvision-inceptionv3_1 1 338.66 301.09 12.48% 🔆
cadene-dpn92_1 1 396.89 351.13 13.03% 🔆
cadene-resnext101_1 1 329.39 220.86 49.14% 🔆
slim-vgg16_1 1 463.58 224.10 106.86% 🔆
slim-mobilenet_1 1 2,075.47 1,482.35 40.01% 🔆
slim-inceptionv4_1 1 216.79 217.99 -0.55%
onnx-taau-downsample 1 306.42 306.69 -0.09%
dlrm-criteoterabyte 1 21.71 21.72 -0.04%
dlrm-criteoterabyte_fp16 1 40.75 40.76 -0.05%
agentmodel 1 5,739.31 5,786.20 -0.81%
unet_fp16 2 56.01 55.81 0.37%
resnet50v1_fp16 1 920.45 756.60 21.66% 🔆
bert_base_cased_fp16 64 971.51 971.33 0.02%
bert_large_uncased_fp16 32 305.31 305.14 0.05%
bert_large_fp16 1 166.95 166.81 0.09%
distilgpt2_fp16 16 1,279.45 1,280.57 -0.09%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Oct 3, 2023


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

🔴torchvision-inceptionv3_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:cadene-dpn92_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

🔴slim-inceptionv4_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

    :white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

    :white_check_mark:resnet50v1: PASSED: MIGraphX meets tolerance

🔴bert_base_cased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:bert_large: PASSED: MIGraphX meets tolerance

🔴distilgpt2_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved
src/targets/gpu/fuse_mlir.cpp Outdated Show resolved Hide resolved
@giuseros
Copy link

giuseros commented Oct 6, 2023

Hello, would it be possible to have a use-rocmlir-always flag/environment variable? My point is that we would like to spot performance issues to improve rocMLIR and that flag would help us a lot in such investigations :)

@krzysz00
Copy link
Contributor

@pfultz2 Wanted to poke the status of improving this PR to match the heuristic ticket, since we're getting close to the branch date

@pfultz2
Copy link
Collaborator Author

pfultz2 commented Oct 10, 2023

Wanted to poke the status of improving this PR to match the heuristic ticket, since we're getting close to the branch date

Which ticket is that?

@krzysz00
Copy link
Contributor

@pfultz2 It's ticket #2296 , linked above, summarizing our meeting Thursday

@pfultz2 pfultz2 changed the title Skip MLIR for 3x3 convolutions and large K gemms Enable MLIR by default for more cases Oct 10, 2023
@pfultz2
Copy link
Collaborator Author

pfultz2 commented Oct 10, 2023

@krzysz00 I updated the PR.

@pfultz2
Copy link
Collaborator Author

pfultz2 commented Oct 10, 2023

Hello, would it be possible to have a use-rocmlir-always flag/environment variable? My point is that we would like to spot performance issues to improve rocMLIR and that flag would help us a lot in such investigations :)

Using MIGRAPHX_MLIR_USE_SPECIFIC_OPS will enable it always on.

{
match::find_matches(mpm, find_mlir_standalone_dot_op{});
}
mlir_mode mode = enabled(MIGRAPHX_ENABLE_MLIR{}) ? mlir_mode::fast : mlir_mode::none;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, since we now have MIGRAPHX_DISABLE_MLIR, shouldn't this be keying off of _DISABLE_MLIR and not _ENABLE_MLIR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, DISABLE_MLIR will disable MLIR completely(and its already handled in target.cpp), whereas ENABLE_MLIR will enable it for gemm fusions(because it is not enabled by default because its not always faster).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So maybe we should rename it to something like MIGRAPHX_ENABLE_MLIR_GEMM_FUSION because it's a very confusing variable name

}

struct find_mlir_fused_ops
{
mlir_mode conv_mode = mlir_mode::none;
mlir_mode dot_mode = mlir_mode::none;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't we have an explicit constructor? Or /*conv_mode=*/ comments when we create this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if designated initializers work here(an explicit constructor wont allow that though).

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving on the assumption that the renaming in this review gets made before merge

@@ -37,23 +37,13 @@ struct module;
namespace gpu {

MIGRAPHX_DECLARE_ENV_VAR(MIGRAPHX_ENABLE_MLIR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to note the agreement from yesterday's meeting that this becomes MIGRAPHX_ENABLE_EXTRA_MLIR

@pfultz2
Copy link
Collaborator Author

pfultz2 commented Oct 13, 2023

@causten Fixed the mlir tests.

@pfultz2
Copy link
Collaborator Author

pfultz2 commented Oct 14, 2023

So the CI failure looks like a bug in MLIR, here is the backtrace using a debug build of mlir:

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:185:46 in 
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:194:6: runtime error: member call on address 0x7f2860000cc0 which does not point to an object of type 'std::_Sp_counted_base<>'
0x7f2860000cc0: note: object has invalid vptr
 00 00 00 00  e8 e8 5b 11 2b 7f 00 00  00 00 00 00 00 00 00 00  e0 0c 00 60 28 7f 00 00  01 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              invalid vptr
    #0 0x7f2b0a64a4f3 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:194:6
    #1 0x7f2b0c5a5919 in std::__weak_count<(__gnu_cxx::_Lock_policy)2>::~__weak_count() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:823:11
    #2 0x7f2b106f1ca8 in std::__weak_ptr<mlir::StorageUniquer::StorageAllocator*, (__gnu_cxx::_Lock_policy)2>::~__weak_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1596:29
    #3 0x7f2b106f1c84 in std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>::~weak_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr.h:531:11
    #4 0x7f2b106f1b8b in llvm::DenseMapBase<llvm::SmallDenseMap<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, 4u, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>, mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>::destroyAll() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/llvm/include/llvm/ADT/DenseMap.h:395:25
    #5 0x7f2b106f0b98 in llvm::SmallDenseMap<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>, 4u, llvm::DenseMapInfo<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, void>, llvm::detail::DenseMapPair<mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::PerInstanceState*, std::weak_ptr<mlir::StorageUniquer::StorageAllocator*>>>::~SmallDenseMap() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/llvm/include/llvm/ADT/DenseMap.h:960:11
    #6 0x7f2b106f0132 in mlir::ThreadLocalCache<mlir::StorageUniquer::StorageAllocator*>::CacheType::~CacheType() /usr/local/cget/build/tmp-d05526b9888442d280f9561487d0cd9b/rocMLIR-12748a3402c069f733ea7f2ba1f8d8a070b3622a/external/llvm-project/mlir/include/mlir/Support/ThreadLocalCache.h:67:5
    #7 0x7f2ab93c02be in __call_tls_dtors /build/glibc-SzIz7B/glibc-2.31/stdlib/cxa_thread_atexit_impl.c:155:7
    #8 0x7f2ab98e3616 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:485:5
    #9 0x7f2ab9498132 in __clone /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

From the mlir trace it comes from programs like this:

module {
  func.func @mlir_convolution(%arg0: tensor<1x512x28x28xf32>, %arg1: tensor<256x512x1x1xf32>) -> tensor<1x256x28x28xf32> attributes {arch = "gfx908:sramecc+:xnack-", kernel = "mixr", num_cu = 120 : i64} {
    %0 = migraphx.convolution(%arg0, %arg1) {dilation = [1, 1], group = 1 : i64, padding = [0, 0, 0, 0], padding_mode = 0 : i64, stride = [1, 1]} : (tensor<1x512x28x28xf32>, tensor<256x512x1x1xf32>) -> tensor<1x256x28x28xf32>
    return %0 : tensor<1x256x28x28xf32>
  }
}

#map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3)>
#transform_map = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [1, 512, 28, 28, 1] -> [1, 512, 28, 28]>
#transform_map1 = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [256, 512, 1, 1, 1] -> [256, 512, 1, 1]>
#transform_map2 = #rock.transform_map<#map by [<PassThrough ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3] -> ["dim0", "dim1", "dim2", "dim3"] at [0, 1, 2, 3]>, <AddDim{1} ["g"] at [4] -> [] at []>] bounds = [1, 256, 28, 28, 1] -> [1, 256, 28, 28]>
module {
  func.func @mlir_convolution(%arg0: memref<1x512x28x28xf32>, %arg1: memref<256x512x1x1xf32>, %arg2: memref<1x256x28x28xf32>) attributes {arch = "gfx908:sramecc+:xnack-", kernel = "mixr", num_cu = 120 : i64} {
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1x256x28x28xf32>
    %0 = rock.transform %arg0 by #transform_map : memref<1x512x28x28xf32> to memref<1x512x28x28x1xf32>
    %1 = rock.transform %arg1 by #transform_map1 : memref<256x512x1x1xf32> to memref<256x512x1x1x1xf32>
    %2 = rock.transform %alloc by #transform_map2 : memref<1x256x28x28xf32> to memref<1x256x28x28x1xf32>
    rock.conv2d(%1, %0, %2) features =  mfma|dot|atomic_add {arch = "gfx908:sramecc+:xnack-", dilations = [1 : i32, 1 : i32], filter_layout = ["k", "c", "y", "x", "g"], input_layout = ["ni", "ci", "hi", "wi", "gi"], numCU = 120 : i32, output_layout = ["no", "ko", "ho", "wo", "go"], padding = [0 : i32, 0 : i32, 0 : i32, 0 : i32], perf_config = "16,16,8,16,16,8,1,1", strides = [1 : i32, 1 : i32]} : memref<256x512x1x1x1xf32>, memref<1x512x28x28x1xf32>, memref<1x256x28x28x1xf32>
    memref.copy %alloc, %arg2 : memref<1x256x28x28xf32> to memref<1x256x28x28xf32>
    return
  }
}

Also running with helgrind, I do see some data races in MLIR:

==694012== Possible data race during read of size 8 at 0x1222B868 by thread #159
==694012== Locks held: none
==694012==    at 0xC6AFFA0: llvm::Target::createMCAsmBackend(llvm::MCSubtargetInfo const&, llvm::MCRegisterInfo const&, llvm::MCTargetOptions const&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8EDAC: (anonymous namespace)::SerializeToHsacoPass::assembleIsa(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8D363: (anonymous namespace)::SerializeToHsacoPass::serializeISA(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xEDEA287: mlir::gpu::SerializeToBlobPass::runOnOperation() (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF62A: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF5C4: void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xC406C18: llvm::function_ref<void ()>::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D023F4: void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFAE42: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFB3C3: mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D00874: mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D004F8: mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==  Address 0x1222b868 is 104 bytes inside data symbol "_ZZN4llvm15getTheGCNTargetEvE12TheGCNTarget"

==694012== Possible data race during read of size 8 at 0x1222B890 by thread #159
==694012== Locks held: none
==694012==    at 0xC6AFF1C: llvm::Target::createMCCodeEmitter(llvm::MCInstrInfo const&, llvm::MCContext&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8ED5B: (anonymous namespace)::SerializeToHsacoPass::assembleIsa(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xFA8D363: (anonymous namespace)::SerializeToHsacoPass::serializeISA(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xEDEA287: mlir::gpu::SerializeToBlobPass::runOnOperation() (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF62A: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFF5C4: void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0xC406C18: llvm::function_ref<void ()>::operator()() const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D023F4: void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFAE42: mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10CFB3C3: mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D00874: mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0::operator()(mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo&) const (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==    by 0x10D004F8: mlir::LogicalResult mlir::failableParallelForEach<__gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&>(mlir::MLIRContext*, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, __gnu_cxx::__normal_iterator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo*, std::vector<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo, std::allocator<mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::OpPMInfo> > >, mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool)::$_0&) (in /data/build/lib/libmigraphx_gpu.so.2.8.0)
==694012==  Address 0x1222b890 is 144 bytes inside data symbol "_ZZN4llvm15getTheGCNTargetEvE12TheGCNTarget"

So I am not sure if this is a multihreaded issue, since it still produces the error when MIGRAPHX_GPU_COMPILE_PARALLEL=1 is ued.

@causten causten merged commit 650ba45 into develop Oct 16, 2023
15 checks passed
@causten causten deleted the mlir-fast-check branch October 16, 2023 03:31
@krzysz00
Copy link
Contributor

@pfultz2 Because rocMLIR uses LLVM internally, ubsan builds of rocMLIR needs to use LLVM's ubsan flags, which include -fno-sanitize=vptr,function (or more permissive)

That is, an invalid vptr error is expected and should be disabled. It is not possible, to my knowledge, to fix this.

I set up the MLIR debug CI to pass those options in MLIR builds.

@krzysz00
Copy link
Contributor

Also, I'm not sure the race is related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants