Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tileConsumerAndFuseProducersUsingScf stuck in the infinite loop. #18875

Closed
pashu123 opened this issue Oct 23, 2024 · 5 comments
Closed

tileConsumerAndFuseProducersUsingScf stuck in the infinite loop. #18875

pashu123 opened this issue Oct 23, 2024 · 5 comments
Assignees
Labels
bug 🐞 Something isn't working

Comments

@pashu123
Copy link
Contributor

What happened?

The pass calls the function and it's stuck in the while loop here: https://github.com/llvm/llvm-project/blob/ac5a2010ad35a72de3e75a1883e2495345b92a73/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp#L1482

Steps to reproduce your issue

Input IR:

  func.func @time_out_dispatch_0_unpack_elementwise_1x1x1152_f32() attributes {translation_info = #iree_codegen.translation_info<CPUDoubleTilingExpert>} {
    %c0 = arith.constant 0 : index
    %0 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(0) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<1x1x288x8x4xf32>>
    %1 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(1) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<1x1x1152xf32>>
    %2 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(2) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<1x1x1152xf32>>
    %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0], sizes = [1, 1, 288, 8, 4], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x288x8x4xf32>> -> tensor<1x1x288x8x4xf32>
    %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1, 1, 1152], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x1152xf32>> -> tensor<1x1x1152xf32>
    %5 = tensor.empty() : tensor<1x1x1152xf32>
    %unpack = tensor.unpack %3 outer_dims_perm = [0, 1, 2] inner_dims_pos = [1, 2] inner_tiles = [8, 4] into %5 {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 1152], [1, 8, 16], [0, 0, 0], [0, 0, 0]]>} : tensor<1x1x288x8x4xf32> -> tensor<1x1x1152xf32>
    %6 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%4, %unpack : tensor<1x1x1152xf32>, tensor<1x1x1152xf32>) outs(%5 : tensor<1x1x1152xf32>) attrs =  {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 1152], [1, 8, 16], [0, 0, 0], [0, 0, 0]]>} {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %7 = arith.addf %in, %in_0 : f32
      linalg.yield %7 : f32
    } -> tensor<1x1x1152xf32>
    flow.dispatch.tensor.store %6, %2, offsets = [0, 0, 0], sizes = [1, 1, 1152], strides = [1, 1, 1] : tensor<1x1x1152xf32> -> !flow.dispatch.tensor<writeonly:tensor<1x1x1152xf32>>
    return
  }

Pass:
iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-tile-and-distribute-to-workgroups-using-forall-op, cse))" --mlir-print-local-scope --split-input-file input.mlir

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

@pashu123
Copy link
Contributor Author

With tensor.unpack with explicit slicing semantics this passes.

  func.func @time_out(%arg0: tensor<1x1x288x8x4xf32>, %arg1: tensor<1152xf32>) -> tensor<1x1x1152xf32> {
    %0 = tensor.empty() : tensor<1x1x1152xf32>
    %1 = tensor.empty() : tensor<1x8x1152xf32>
    %unpack = tensor.unpack %arg0 outer_dims_perm = [0, 1, 2] inner_dims_pos = [1, 2] inner_tiles = [8, 4] into %1 {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 1152], [1, 8, 16], [0, 0, 0], [0, 0, 0]]>} : tensor<1x1x288x8x4xf32> -> tensor<1x8x1152xf32>
    %extracted_slice = tensor.extract_slice %unpack[0, 0, 0] [1, 1, 1152] [1, 1, 1] : tensor<1x8x1152xf32> to tensor<1x1x1152xf32>
    %2 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%arg1, %extracted_slice : tensor<1152xf32>, tensor<1x1x1152xf32>) outs(%0 : tensor<1x1x1152xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %3 = arith.addf %in, %in_0 : f32
      linalg.yield %3 : f32
    } -> tensor<1x1x1152xf32>
    return %2 : tensor<1x1x1152xf32>
  }

@Max191
Copy link
Contributor

Max191 commented Oct 24, 2024

I opened a PR upstream which fixes this: llvm/llvm-project#113571

@pashu123
Copy link
Contributor Author

I opened a PR upstream which fixes this: llvm/llvm-project#113571

Thanks, @Max191 , for the fix!

@pashu123
Copy link
Contributor Author

Closing this the fix is merged.

@Max191
Copy link
Contributor

Max191 commented Oct 25, 2024

Closing this the fix is merged.

Sounds good. Note that it will not be fixed in IREE until #18897 lands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants