Support i1 datatype #18713

lialan · 2024-10-07T23:55:44Z

This patch enables i1 datatype support.

Previously i1 was treated as i8 in memory. This patch avoids padding i1
Fixes some corner-case issues with i1 and i2 where the size of vector is not a multiple of 8-bits
Have to work together with upstream change which handles sub-byte sized vector and memref types.

benvanik · 2024-10-07T23:56:52Z

compiler/src/iree/compiler/Utils/ElementPackingUtils.cpp

@@ -99,6 +99,11 @@ Value calculateStorageElementCountInBytes(Location loc,
    }
  }

+  // make sure the last dimension is byte aligned.


style: proper punctuation (here and elsewhere) in comments: https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar

Signed-off-by: Alan Li <[email protected]>

hanhanW

Alan and I had an offline sync, and he is revisiting the codegen side changes. I'm not an expert of host side changes, so we need some inputs from Ben.

hanhanW · 2024-10-08T21:35:10Z

compiler/src/iree/compiler/Dialect/Stream/Conversion/FlowToStream/Patterns.cpp

+  // align tensor type to multiple of 8 bits:
+  auto rankedTensorType = tensorType.asRankedTensorType();
+  auto elementSize = rankedTensorType.getElementType().getIntOrFloatBitWidth();
+  auto typeSize = tensorType.getNumElements() * elementSize;
+
+  if (typeSize * elementSize % 8 != 0) {
+    SmallVector<int64_t> newShape(rankedTensorType.getShape());
+    newShape.back() = llvm::alignTo(newShape.back(), 8 / elementSize);
+
+    auto newTensorType = IREE::Flow::DispatchTensorType::get(
+        tensorType.getAccess(), newShape,
+        rankedTensorType.getElementType(), rankedTensorType.getEncoding());
+    tensorType = newTensorType;
+  }


We need some input from @benvanik about how we land this properly. My understanding is that we want to align i1 shape with bytes. E.g., 6xi1 becomes 8xi1 on both stream allocation and dispatch sides. The current approach replaces flow.dispatch.tensor type with 8xi1, while it is leaving 6xi1 type in the stream.tensor.sizeof op. See below snippet for more details. This is off to me because:

I think it does not work with dynamic shapes. Because the arguments of DispatchTieShapeOp are not taken into accounts.

It leaks the stream.tensor.sizeof lowering logic to FlowToStream conversion. Is it okay?

Ben knows more details, please correct me if I'm wrong. I think we still can cook all the logics in FlowToStream conversion. We either need a type converter or introduce a legalizePackedType method in ElementPackingUtils.[h|cpp] which shares the logic between buildResultSizeOf method and ConvertExecutableOp patterns. The legalizePackedType takes tensor type and dynamicDims and does something similar to calculateStorageElementCountInBytes.

The other approach is updating the logics in EncodeTensors.cpp. The pass has logics to encode host tensors and device tensors, and we probably just need to update the alignTensorType logic. (I don't know, please do some study.)

@benvanik do you have any suggestions about where the change should happen?

// -----// IR Dump Before ConvertToStreamPass (iree-stream-conversion) //----- // #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}> #map = affine_map<(d0) -> (d0)> #device_target_local = #hal.device.target<"local", [#executable_target_embedded_elf_x86_64_]> : !hal.device module attributes {stream.affinity.default = #hal.device.affinity<@__device_0>} { util.global private @__device_0 = #device_target_local flow.executable private @add_tensors_dispatch_0 { flow.executable.export public @add_tensors_dispatch_0_elementwise_6_i1 workgroups() -> (index, index, index) { %x, %y, %z = flow.dispatch.workgroup_count_from_slice flow.return %x, %y, %z : index, index, index } builtin.module { func.func @add_tensors_dispatch_0_elementwise_6_i1(%arg0: !flow.dispatch.tensor<readonly:tensor<6xi1>>, %arg1: !flow.dispatch.tensor<readonly:tensor<6xi1>>, %arg2: !flow.dispatch.tensor<writeonly:tensor<6xi1>>) { %0 = flow.dispatch.tensor.load %arg0, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<6xi1>> -> tensor<6xi1> %1 = flow.dispatch.tensor.load %arg1, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<6xi1>> -> tensor<6xi1> %2 = tensor.empty() : tensor<6xi1> %3 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%0, %1 : tensor<6xi1>, tensor<6xi1>) outs(%2 : tensor<6xi1>) { ^bb0(%in: i1, %in_0: i1, %out: i1): %4 = arith.addi %in, %in_0 : i1 linalg.yield %4 : i1 } -> tensor<6xi1> flow.dispatch.tensor.store %3, %arg2, offsets = [0], sizes = [6], strides = [1] : tensor<6xi1> -> !flow.dispatch.tensor<writeonly:tensor<6xi1>> return } } } util.func public @add_tensors(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub, iree.reflection = {iree.abi.declaration = "sync func @add_tensors(%input0: tensor<2x3xi1>, %input1: tensor<2x3xi1>) -> (%output0: tensor<2x3xi1>)"}} { %0 = hal.tensor.import %arg0 "input0" : !hal.buffer_view -> tensor<2x3xi1> %1 = hal.tensor.import %arg1 "input1" : !hal.buffer_view -> tensor<2x3xi1> %2 = flow.tensor.reshape %0 : tensor<2x3xi1> -> tensor<6xi1> %3 = flow.tensor.reshape %1 : tensor<2x3xi1> -> tensor<6xi1> %4 = flow.dispatch @add_tensors_dispatch_0::@add_tensors_dispatch_0_elementwise_6_i1(%2, %3) : (tensor<6xi1>, tensor<6xi1>) -> tensor<6xi1> %5 = flow.tensor.reshape %4 : tensor<6xi1> -> tensor<2x3xi1> %6 = hal.tensor.export %5 "output0" : tensor<2x3xi1> -> !hal.buffer_view util.return %6 : !hal.buffer_view } } // -----// IR Dump Before VerifyLoweringToTensorsPass (iree-stream-verify-lowering-to-tensors) //----- // #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}> #map = affine_map<(d0) -> (d0)> #device_target_local = #hal.device.target<"local", [#executable_target_embedded_elf_x86_64_]> : !hal.device module attributes {stream.affinity.default = #hal.device.affinity<@__device_0>} { util.global private @__device_0 = #device_target_local stream.executable private @add_tensors_dispatch_0 { stream.executable.export public @add_tensors_dispatch_0_elementwise_6_i1 workgroups() -> (index, index, index) { %x, %y, %z = flow.dispatch.workgroup_count_from_slice stream.return %x, %y, %z : index, index, index } builtin.module { func.func @add_tensors_dispatch_0_elementwise_6_i1(%arg0: !stream.binding, %arg1: !stream.binding, %arg2: !stream.binding) { %c0 = arith.constant 0 : index %0 = stream.binding.subspan %arg0[%c0] : !stream.binding -> !flow.dispatch.tensor<readonly:tensor<8xi1>> %1 = stream.binding.subspan %arg1[%c0] : !stream.binding -> !flow.dispatch.tensor<readonly:tensor<8xi1>> %2 = stream.binding.subspan %arg2[%c0] : !stream.binding -> !flow.dispatch.tensor<writeonly:tensor<8xi1>> %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<8xi1>> -> tensor<6xi1> %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<8xi1>> -> tensor<6xi1> %5 = tensor.empty() : tensor<6xi1> %6 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%3, %4 : tensor<6xi1>, tensor<6xi1>) outs(%5 : tensor<6xi1>) { ^bb0(%in: i1, %in_0: i1, %out: i1): %7 = arith.addi %in, %in_0 : i1 linalg.yield %7 : i1 } -> tensor<6xi1> flow.dispatch.tensor.store %6, %2, offsets = [0], sizes = [6], strides = [1] : tensor<6xi1> -> !flow.dispatch.tensor<writeonly:tensor<8xi1>> return } } } util.func public @add_tensors(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub, iree.reflection = {iree.abi.declaration = "sync func @add_tensors(%input0: tensor<2x3xi1>, %input1: tensor<2x3xi1>) -> (%output0: tensor<2x3xi1>)"}} { %element_type_i1 = hal.element_type<i1> : i32 %dense_row_major = hal.encoding_type<dense_row_major> : i32 %c2 = arith.constant 2 : index %c3 = arith.constant 3 : index hal.buffer_view.assert<%arg0 : !hal.buffer_view> message("input0") shape([%c2, %c3]) type(%element_type_i1) encoding(%dense_row_major) %0 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<2x3xi1> : index %1 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg0 : !hal.buffer_view -> tensor<2x3xi1> in !stream.resource<external>{%0} %2 = stream.async.transfer %1 : !stream.resource<external>{%0} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%0} %element_type_i1_0 = hal.element_type<i1> : i32 %dense_row_major_1 = hal.encoding_type<dense_row_major> : i32 %c2_2 = arith.constant 2 : index %c3_3 = arith.constant 3 : index hal.buffer_view.assert<%arg1 : !hal.buffer_view> message("input1") shape([%c2_2, %c3_3]) type(%element_type_i1_0) encoding(%dense_row_major_1) %3 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<2x3xi1> : index %4 = stream.tensor.import on(#hal.device.affinity<@__device_0>) %arg1 : !hal.buffer_view -> tensor<2x3xi1> in !stream.resource<external>{%3} %5 = stream.async.transfer %4 : !stream.resource<external>{%3} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<*>{%3} %6 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<6xi1> : index %7 = stream.tensor.clone on(#hal.device.affinity<@__device_0>) %2 : tensor<2x3xi1> in !stream.resource<*>{%0} -> tensor<6xi1> in !stream.resource<*>{%6} %8 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<6xi1> : index %9 = stream.tensor.clone on(#hal.device.affinity<@__device_0>) %5 : tensor<2x3xi1> in !stream.resource<*>{%3} -> tensor<6xi1> in !stream.resource<*>{%8} %c0 = arith.constant 0 : index %10 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<6xi1> : index %11 = stream.async.dispatch on(#hal.device.affinity<@__device_0>) @add_tensors_dispatch_0::@add_tensors_dispatch_0_elementwise_6_i1(%7[%c0 to %6 for %6], %9[%c0 to %8 for %8]) : (!stream.resource<*>{%6}, !stream.resource<*>{%8}) -> !stream.resource<*>{%10} %12 = stream.tensor.sizeof on(#hal.device.affinity<@__device_0>) tensor<2x3xi1> : index %13 = stream.tensor.clone on(#hal.device.affinity<@__device_0>) %11 : tensor<6xi1> in !stream.resource<*>{%10} -> tensor<2x3xi1> in !stream.resource<*>{%12} %14 = stream.async.transfer %13 : !stream.resource<*>{%12} from(#hal.device.affinity<@__device_0>) -> to(#hal.device.affinity<@__device_0>) !stream.resource<external>{%12} %15 = stream.tensor.export on(#hal.device.affinity<@__device_0>) %14 : tensor<2x3xi1> in !stream.resource<external>{%12} -> !hal.buffer_view util.return %15 : !hal.buffer_view } }

benvanik reviewed Oct 7, 2024

View reviewed changes

lialan linked an issue Oct 7, 2024 that may be closed by this pull request

Plumb i1 datatype through the compilation pipeline #18483

Open

lialan added 3 commits October 8, 2024 13:33

Integrate LLVM at 634c57d7

449dc0c

Signed-off-by: Alan Li <[email protected]>

[MLIR] Support i1 datatypes

29c3ea5

Adding a unit tests.

c21a234

lialan force-pushed the lialan/i1 branch from bc5aa85 to c21a234 Compare October 8, 2024 15:18

move some patterns over

66c8db6

hanhanW reviewed Oct 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support i1 datatype #18713

Support i1 datatype #18713

lialan commented Oct 7, 2024

benvanik Oct 7, 2024

hanhanW left a comment

hanhanW Oct 8, 2024

Support i1 datatype #18713

Are you sure you want to change the base?

Support i1 datatype #18713

Conversation

lialan commented Oct 7, 2024

benvanik Oct 7, 2024

Choose a reason for hiding this comment

hanhanW left a comment

Choose a reason for hiding this comment

hanhanW Oct 8, 2024

Choose a reason for hiding this comment