[compilation][cpu]: failed to legalize operation onnx.Multinomial #18268

pdhirajkumarprasad · 2024-08-19T03:09:11Z

What happened?

for the given IR

module {
  func.func @"torch-jit-export"( %arg6: !torch.vtensor<[?,4],f32>) -> (!torch.vtensor<[?,1],si64>) attributes {torch.onnx_meta.ir_version = 6 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.7"} {
    %82 = torch.operator "onnx.Multinomial"(%arg6) {torch.onnx.dtype = 7 : si64, torch.onnx.sample_size = 1 : si64} : (!torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64> 
    return %82: !torch.vtensor<[?,1],si64>
  }
}

getting following error

t1.mlir:3:11: error: 'arith.addf' op requires the same type for all operands and results
    %82 = torch.operator "onnx.Multinomial"(%arg6) {torch.onnx.dtype = 7 : si64, torch.onnx.sample_size = 1 : si64} : (!torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64> 
          ^
// -----// IR Dump After ConvertTorchToLinalg Failed (convert-torch-to-linalg) //----- //
"func.func"() <{function_type = (!torch.vtensor<[?,4],f32>) -> !torch.vtensor<[?,1],si64>, sym_name = "torch-jit-export"}> ({
^bb0(%arg0: !torch.vtensor<[?,4],f32> loc("t1.mlir":2:34)):
  %0 = "torch_c.to_builtin_tensor"(%arg0) : (!torch.vtensor<[?,4],f32>) -> tensor<?x4xf32> loc("t1.mlir":3:11)
  %1 = "torch.constant.int"() <{value = 1 : i64}> : () -> !torch.int loc("t1.mlir":3:11)
  %2 = "arith.constant"() <{value = 1 : i64}> : () -> i64 loc("t1.mlir":3:11)
  %3 = "torch.constant.none"() : () -> !torch.none loc("t1.mlir":3:11)
  %4 = "torch.constant.bool"() <{value = true}> : () -> !torch.bool loc("t1.mlir":3:11)
  %5 = "arith.constant"() <{value = 0 : i64}> : () -> i64 loc("t1.mlir":3:11)
  %6 = "arith.constant"() <{value = 1 : i64}> : () -> i64 loc("t1.mlir":3:11)
  %7 = "arith.constant"() <{value = 0 : index}> : () -> index loc("t1.mlir":3:11)
  %8 = "arith.constant"() <{value = 1 : index}> : () -> index loc("t1.mlir":3:11)
  %9 = "arith.index_cast"(%2) : (i64) -> index loc("t1.mlir":3:11)
  %10 = "tensor.dim"(%0, %7) : (tensor<?x4xf32>, index) -> index loc("t1.mlir":3:11)
  %11 = "tensor.dim"(%0, %8) : (tensor<?x4xf32>, index) -> index loc("t1.mlir":3:11)

IREE Version:
IREE compiler version 20240819.990 @ aeda149
LLVM version 20.0.0git

Steps to reproduce your issue

Command to reproduce the issue:

iree-compile --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu model.torch_onnx.mlir

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

nirvedhmeshram · 2024-08-19T16:47:50Z

@pdhirajkumarprasad For this and all the other bugs you filed recently, could you please add this flag to --mlir-print-ir-after-all to the iree-compile command and copy the dump to a file and share a link on the issue. it will help us understand which layer the issue is in and who to assign on it. cc @MaheshRavishankar

pdhirajkumarprasad · 2024-08-20T04:17:46Z

@pdhirajkumarprasad For this and all the other bugs you filed recently, could you please add this flag to --mlir-print-ir-after-all to the iree-compile command and copy the dump to a file and share a link on the issue. it will help us understand which layer the issue is in and who to assign on it. cc @MaheshRavishankar

@nirvedhmeshram I have updated all my issues with stage where it's failing along with version used.

pdhirajkumarprasad · 2024-08-20T07:18:29Z

Also this may be related with llvm/torch-mlir#3630

lialan · 2024-08-20T15:55:16Z

@zjgarvey @rsuderman Can you provide some context input on this issue? As @pdhirajkumarprasad mentioned, it is possible that it is caused by it.

zjgarvey · 2024-08-20T15:59:06Z

Yeah, with the PR linked above, this op will still fail to compile. I'll post a linalg reproducer for the next compilation issue here today. This is another op that will successfully compile through our ref-backend pipeline in torch-mlir e2e testing, but not through iree compile.

pashu123 · 2024-08-20T16:17:47Z

@zjgarvey The error that's showing over here is orthogonal to the IREE issue. It's not converting from torch to linalg. Maybe it was recently added and the torch-mlir submodule isn't bumped in IREE.

zjgarvey · 2024-08-20T16:20:54Z

@pashu123 The PR here llvm/torch-mlir#3630 resolves the torch-mlir error.

zjgarvey · 2024-08-20T16:22:04Z

If someone wants to review that PR we can get it merged and work on the next issue for this op not compiling.

lialan · 2024-08-20T20:02:05Z

@zjgarvey So @pashu123 has approved the PR, and it looks clear. please proceed.

zjgarvey · 2024-08-20T20:23:14Z

Ah, right. Here is the IR generated from an e2e test in torch mlir: "MultinomialModule2D_basic".

This successfully compiles through the ref backend pipeline in torch-mlir, but fails to compile with iree for llvm-cpu.

#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
#map2 = affine_map<(d0, d1) -> ()>
#map3 = affine_map<() -> ()>
module attributes {torch.debug_module_name = "MultinomialModule2D"} {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @forward(%arg0: tensor<?x?xf64>) -> tensor<f64> {
    %c1048576_i64 = arith.constant 1048576 : i64
    %c0_i64 = arith.constant 0 : i64
    %c1_i64 = arith.constant 1 : i64
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %cst = arith.constant 0.000000e+00 : f64
    %c32_i64 = arith.constant 32 : i64
    %c2_i64 = arith.constant 2 : i64
    %c1048576 = arith.constant 1048576 : index
    %cst_0 = arith.constant 5.4210107999999998E-20 : f64
    %c6364136223846793005_i64 = arith.constant 6364136223846793005 : i64
    %c1442695040888963407_i64 = arith.constant 1442695040888963407 : i64
    %dim = tensor.dim %arg0, %c0 : tensor<?x?xf64>
    %dim_1 = tensor.dim %arg0, %c1 : tensor<?x?xf64>
    %0 = arith.index_cast %dim : index to i64
    %1 = arith.index_cast %dim_1 : index to i64
    %2 = tensor.empty(%dim) : tensor<?x1048576xi64>
    %3 = tensor.empty(%dim) : tensor<?xf64>
    %4 = linalg.fill ins(%cst : f64) outs(%3 : tensor<?xf64>) -> tensor<?xf64>
    %5 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : tensor<?x?xf64>) outs(%4 : tensor<?xf64>) {
    ^bb0(%in: f64, %out: f64):
      %13 = arith.addf %in, %out : f64
      linalg.yield %13 : f64
    } -> tensor<?xf64>
    %6 = scf.for %arg1 = %c0_i64 to %0 step %c1_i64 iter_args(%arg2 = %2) -> (tensor<?x1048576xi64>)  : i64 {
      %13 = arith.index_cast %arg1 : i64 to index
      %extracted = tensor.extract %5[%13] : tensor<?xf64>
      %14 = tensor.empty(%dim_1) : tensor<?xf64>
      %15 = scf.for %arg3 = %c0_i64 to %1 step %c1_i64 iter_args(%arg4 = %14) -> (tensor<?xf64>)  : i64 {
        %19 = arith.cmpi sgt, %arg3, %c0_i64 : i64
        %20 = arith.index_cast %arg3 : i64 to index
        %extracted_4 = tensor.extract %arg0[%13, %20] : tensor<?x?xf64>
        %21 = arith.divf %extracted_4, %extracted : f64
        %22 = scf.if %19 -> (f64) {
          %23 = arith.subi %arg3, %c1_i64 : i64
          %24 = arith.index_cast %23 : i64 to index
          %extracted_6 = tensor.extract %arg4[%24] : tensor<?xf64>
          %25 = arith.addf %21, %extracted_6 : f64
          scf.yield %25 : f64
        } else {
          scf.yield %21 : f64
        }
        %inserted_5 = tensor.insert %22 into %arg4[%20] : tensor<?xf64>
        scf.yield %inserted_5 : tensor<?xf64>
      }
      %global_seed = ml_program.global_load @global_seed : tensor<i64>
      %extracted_3 = tensor.extract %global_seed[] : tensor<i64>
      %16 = arith.muli %extracted_3, %c6364136223846793005_i64 : i64
      %17 = arith.addi %16, %c1442695040888963407_i64 : i64
      %inserted = tensor.insert %17 into %global_seed[] : tensor<i64>
      ml_program.global_store @global_seed = %inserted : tensor<i64>
      %18 = scf.for %arg3 = %c0_i64 to %c1048576_i64 step %c1_i64 iter_args(%arg4 = %arg2) -> (tensor<?x1048576xi64>)  : i64 {
        %19 = arith.muli %arg3, %17 : i64
        %20 = arith.addi %19, %17 : i64
        %21 = arith.muli %19, %19 : i64
        %22 = arith.addi %21, %19 : i64
        %23 = arith.shli %22, %c32_i64 : i64
        %24 = arith.shrui %22, %c32_i64 : i64
        %25 = arith.ori %23, %24 : i64
        %26 = arith.muli %25, %25 : i64
        %27 = arith.addi %26, %20 : i64
        %28 = arith.shli %27, %c32_i64 : i64
        %29 = arith.shrui %27, %c32_i64 : i64
        %30 = arith.ori %28, %29 : i64
        %31 = arith.muli %30, %30 : i64
        %32 = arith.addi %31, %19 : i64
        %33 = arith.shli %32, %c32_i64 : i64
        %34 = arith.shrui %32, %c32_i64 : i64
        %35 = arith.ori %33, %34 : i64
        %36 = arith.muli %35, %35 : i64
        %37 = arith.addi %36, %20 : i64
        %38 = arith.shli %37, %c32_i64 : i64
        %39 = arith.shrui %37, %c32_i64 : i64
        %40 = arith.ori %38, %39 : i64
        %41 = arith.muli %40, %40 : i64
        %42 = arith.addi %41, %19 : i64
        %43 = arith.shrui %42, %c32_i64 : i64
        %44 = arith.xori %37, %43 : i64
        %45 = arith.uitofp %44 : i64 to f64
        %46 = arith.mulf %45, %cst_0 : f64
        %47 = arith.addf %46, %cst : f64
        %48:2 = scf.while (%arg5 = %c0_i64, %arg6 = %1) : (i64, i64) -> (i64, i64) {
          %50 = arith.cmpi sgt, %arg6, %arg5 : i64
          scf.condition(%50) %arg5, %arg6 : i64, i64
        } do {
        ^bb0(%arg5: i64, %arg6: i64):
          %50 = arith.subi %arg6, %arg5 : i64
          %51 = arith.divsi %50, %c2_i64 : i64
          %52 = arith.addi %arg5, %51 : i64
          %53 = arith.index_cast %52 : i64 to index
          %extracted_5 = tensor.extract %15[%53] : tensor<?xf64>
          %54 = arith.cmpf olt, %extracted_5, %47 : f64
          %55 = arith.select %54, %arg6, %52 : i64
          %56 = scf.if %54 -> (i64) {
            %57 = arith.addi %52, %c1_i64 : i64
            scf.yield %57 : i64
          } else {
            scf.yield %arg5 : i64
          }
          scf.yield %56, %55 : i64, i64
        }
        %49 = arith.index_cast %arg3 : i64 to index
        %inserted_4 = tensor.insert %48#0 into %arg4[%13, %49] : tensor<?x1048576xi64>
        scf.yield %inserted_4 : tensor<?x1048576xi64>
      }
      scf.yield %18 : tensor<?x1048576xi64>
    }
    %7 = tensor.empty() : tensor<f64>
    %8 = linalg.fill ins(%cst : f64) outs(%7 : tensor<f64>) -> tensor<f64>
    %9 = linalg.generic {indexing_maps = [#map, #map2], iterator_types = ["reduction", "reduction"]} ins(%6 : tensor<?x1048576xi64>) outs(%8 : tensor<f64>) {
    ^bb0(%in: i64, %out: f64):
      %13 = arith.sitofp %in : i64 to f64
      %14 = arith.addf %13, %out : f64
      linalg.yield %14 : f64
    } -> tensor<f64>
    %dim_2 = tensor.dim %6, %c0 : tensor<?x1048576xi64>
    %10 = arith.muli %dim_2, %c1048576 : index
    %11 = arith.index_cast %10 : index to i64
    %12 = linalg.generic {indexing_maps = [#map3, #map3], iterator_types = []} ins(%9 : tensor<f64>) outs(%7 : tensor<f64>) {
    ^bb0(%in: f64, %out: f64):
      %13 = arith.sitofp %11 : i64 to f64
      %14 = arith.divf %in, %13 : f64
      linalg.yield %14 : f64
    } -> tensor<f64>
    return %12 : tensor<f64>
  }
}

repro:

iree-compile --iree-hal-target-backends=llvm-cpu repro.mlir -o repro.vmfb

hits an assertion:

iree-compile: iree/compiler/src/iree/compiler/Dialect/Util/Transforms/HoistIntoGlobals.cpp:105: auto mlir::iree_compiler::IREE::Util::(anonymous namespace)::HoistIntoGlobalsPass::runOnOperation()::(anonymous class)::operator()(mlir::Operation *) const: Assertion `resultInfo && "must have const-expr info"' failed.

MaheshRavishankar · 2024-08-20T21:29:38Z

ohk... we havent looked at programs like this. I have no idea what happens when we compile this. Might take some effort. Would be good to get a signal on the priority of this.

MaheshRavishankar · 2024-08-20T21:30:48Z

Changed it to me for now. It will take me a bit to look at this, but I can reprioritize it if needed.

zjgarvey · 2024-08-22T15:40:34Z

ohk... we havent looked at programs like this. I have no idea what happens when we compile this. Might take some effort. Would be good to get a signal on the priority of this.

It's a failure in one of the migraphx models, so I imagine it's moderate to high priority.

MaheshRavishankar · 2024-08-22T16:55:34Z

Ok, but this seems like it is pretty much lowered to scalar code form the outset... I am not sure what the computation is and if we can lower it to something like linalg. All the scf.if and scf.for at this level is not what you expect into IREE. THere is anything really good we can do with such code (and yes we should be able to compile and get shitty performance, but that will take cycles).

benvanik · 2024-08-22T16:57:52Z

yeah, that loop is going to be a problem (it's going to run entirely on the host in the VM interpreter) - tensorizing may help a bit but it really just shouldn't be scalarized like that.

zjgarvey · 2024-08-22T17:30:16Z

For context:

The outermost scf loop is essentially over batches. Inside that, the first scf loop is doing a cumulative sum of the input distribution for the given batch. Maybe this could be converted to a TM tensor scan op instead. This step is done to generate a cumulative distribution function from the input distribution.

Afterwards, for each desired sample, it computes a "random" probability value from [0,1], then performs a binary search through the CDF outputs to find the last event whose cumulative probabilty is lesser or equal to the randomly selected probability value.

benvanik · 2024-08-22T17:42:11Z

I'd start by looking at converting the ops if there are equivalents - it's absolutely required that those loops end up inside a dispatch region, and doing that with linalg ops (or things that lower to them or something adjacent) is pretty much the only way. Basic loops around tensor operations can be ok (though we have some things to do to make those better), but doing global stores inside the loop and any tensor.extract/tensor.insert_slice is on the "it'd be faster to do this work on pen and paper territory" and a big red flag when looking at models. You can get in the habit of grepping for tensor.extract and if you find any of them know that you're performance is going to be somewhere between bad to indistinguishable-from-hangs :P

pdhirajkumarprasad · 2024-09-27T08:41:16Z

closing this one as open nod-ai/SHARK-ModelDev#854 to track the front-end issue

pdhirajkumarprasad added the bug 🐞 Something isn't working label Aug 19, 2024

MaheshRavishankar assigned lialan Aug 19, 2024

lialan added the hal/cpu Runtime Host/CPU-based HAL backend label Aug 19, 2024

ScottTodd added the integrations/onnx ONNX integration work label Aug 19, 2024

pdhirajkumarprasad mentioned this issue Aug 20, 2024

MiGraphx CPU/GPU Status Tracking nod-ai/SHARK-TestSuite#325

Open

MaheshRavishankar assigned MaheshRavishankar and unassigned lialan Aug 20, 2024

pdhirajkumarprasad mentioned this issue Aug 28, 2024

[Tracker] All the issue related with e2e shark test suite nod-ai/SHARK-ModelDev#812

Open

MaheshRavishankar mentioned this issue Sep 5, 2024

[compile] 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect #18412

Closed

pdhirajkumarprasad closed this as completed Sep 27, 2024

zjgarvey mentioned this issue Nov 2, 2024

[CPU][ONNX] Onnx test failures after pulling in torch-mlir changes #18961

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compilation][cpu]: failed to legalize operation onnx.Multinomial #18268

[compilation][cpu]: failed to legalize operation onnx.Multinomial #18268

pdhirajkumarprasad commented Aug 19, 2024 •

edited

Loading

nirvedhmeshram commented Aug 19, 2024

pdhirajkumarprasad commented Aug 20, 2024

pdhirajkumarprasad commented Aug 20, 2024

lialan commented Aug 20, 2024

zjgarvey commented Aug 20, 2024 •

edited

Loading

pashu123 commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

lialan commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

MaheshRavishankar commented Aug 20, 2024

MaheshRavishankar commented Aug 20, 2024

zjgarvey commented Aug 22, 2024

MaheshRavishankar commented Aug 22, 2024

benvanik commented Aug 22, 2024

zjgarvey commented Aug 22, 2024

benvanik commented Aug 22, 2024

pdhirajkumarprasad commented Sep 27, 2024

[compilation][cpu]: failed to legalize operation onnx.Multinomial #18268

[compilation][cpu]: failed to legalize operation onnx.Multinomial #18268

Comments

pdhirajkumarprasad commented Aug 19, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

nirvedhmeshram commented Aug 19, 2024

pdhirajkumarprasad commented Aug 20, 2024

pdhirajkumarprasad commented Aug 20, 2024

lialan commented Aug 20, 2024

zjgarvey commented Aug 20, 2024 • edited Loading

pashu123 commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

lialan commented Aug 20, 2024

zjgarvey commented Aug 20, 2024

MaheshRavishankar commented Aug 20, 2024

MaheshRavishankar commented Aug 20, 2024

zjgarvey commented Aug 22, 2024

MaheshRavishankar commented Aug 22, 2024

benvanik commented Aug 22, 2024

zjgarvey commented Aug 22, 2024

benvanik commented Aug 22, 2024

pdhirajkumarprasad commented Sep 27, 2024

pdhirajkumarprasad commented Aug 19, 2024 •

edited

Loading

zjgarvey commented Aug 20, 2024 •

edited

Loading