You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We'd like to be able to represent more complex size/stride combinations in the perf config without adding too many special cases like the transpose flags or the layout string.
This'll allow us to easily support things like computing on tiles of a larger tensor and NCHWC.
It'll also clean up the code and give us some more generality.
I propose:
Standard layouts for the rock operations
I propose that rock.conv2d will always take a GNHWC view of the underlying memory, though that choice in somewhat arbitrary and we could go with NGHWC or GNCHW or what have you.
rock.gemm will take matrix A as M x K and matrix B as K x N .
The {filter,input,output}_layout and transpose{A,B,C} will be removed.
Argument passing
Kernels will be given a 1D memref of size [actual underlying memory size]xT . You could even make it an i8 buffer for extra spicy.
This'll then be passed to an operation I'm going to call rock.interpret_memory {sizes = [l0, l1, ... lN], strides = [s0, s1, ..., sN]) : tensor<LxT> -> tensor<l0xl1x...xlNxT> (which bufferizes, and, early in the kernel pipeline, expands out to rock.transform`).
Then, you get to do transposes, reshapes, what have you, to break that apart and recombine it however you like.
Heck, you might not always need interpret_memory - the common cases are just reshape.
What we change in our code later
We add the function rock::sizesAndStridesFor([pile of transforms], SmallVectorImpl<SmallVector<int64_t>> &sizes, SmallVectorImpl<SmallVector<int64_t>>& strides).
The goal of this is to traverse the transform stack and give you the size(s) and stride(s) of the component dimension(s) of each dimension in the input.
For example, if I have
(%rawA : memref<20xf32>, ...) {
%matA = reshape %rawA : memref<20xf32> -> memref<4x5xf32> // K x M
%transA = transpose %matA ([1, 0]) : memref<4x5xf32> -> memref<5x4xf32> // M x K
rock.gemm ... = %transA * ...
}
then sizesAndStridesFor(%transA, sizes, strides) would set sizes to [[5], [4]] and strides to [[1], [5]]
we'd have getSizesAndStridesFor(%collapsedI, sizes, strides) producing sizes = [[1], [3], [3], [2, 4]] and strides = [[72], [12], [4], [36, 1]], thus expressing the NCHWC layout.
We would use this information when generating problem config strings (see below) and when making the arbitrary decisions of how tho construct gemm{M,N,K} during conv-to-gemm.
Problem configs
We would still support the old form (stuff like -in_layout nchw) but translate it to the new form quickly on contact, and we might decide we want to deprecate it.
Instead, in problem keys, we'll have (using gemm as an example) keys such as -a_m_size, -a_m_stride, -b_n_size, -b_n_stride and so on.
In the simple case where there aren't discontinuities, we'd have (using our first gemm example) keys like ... -a_m_size 5 -a_m_stride 1 -a_k_size 4 -a_k_stride 5 ...
For more complex cases, like my second example, we'd have problem key entriies like -in_c_size [2, 4] -in_c_stride [36,1] (or we could drop the brackets).
This'd give us much more generality while simplifying our input format.
This is all a very high-level rough sketch of what I'm thinking, please feel free to lob clarifying questions at it.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We'd like to be able to represent more complex size/stride combinations in the perf config without adding too many special cases like the transpose flags or the layout string.
This'll allow us to easily support things like computing on tiles of a larger tensor and NCHWC.
It'll also clean up the code and give us some more generality.
I propose:
Standard layouts for the
rock
operationsI propose that
rock.conv2d
will always take a GNHWC view of the underlying memory, though that choice in somewhat arbitrary and we could go with NGHWC or GNCHW or what have you.rock.gemm will take matrix A as M x K and matrix B as K x N .
The
{filter,input,output}_layout
andtranspose{A,B,C}
will be removed.Argument passing
Kernels will be given a 1D memref of size
[actual underlying memory size]xT
. You could even make it ani8
buffer for extra spicy.This'll then be passed to an operation I'm going to call
rock.interpret_memory {sizes = [l0, l1, ... lN], strides = [s0, s1, ..., sN]) : tensor<LxT> -> tensor<l0xl1x...xlNxT> (which bufferizes, and, early in the kernel pipeline, expands out to
rock.transform`).Then, you get to do transposes, reshapes, what have you, to break that apart and recombine it however you like.
Heck, you might not always need
interpret_memory
- the common cases are justreshape
.What we change in our code later
We add the function
rock::sizesAndStridesFor([pile of transforms], SmallVectorImpl<SmallVector<int64_t>> &sizes, SmallVectorImpl<SmallVector<int64_t>>& strides)
.The goal of this is to traverse the transform stack and give you the size(s) and stride(s) of the component dimension(s) of each dimension in the input.
For example, if I have
then
sizesAndStridesFor(%transA, sizes, strides)
would setsizes
to[[5], [4]]
andstrides
to[[1], [5]]
As a more complex example, if I had
we'd have
getSizesAndStridesFor(%collapsedI, sizes, strides)
producingsizes = [[1], [3], [3], [2, 4]]
andstrides = [[72], [12], [4], [36, 1]]
, thus expressing the NCHWC layout.We would use this information when generating problem config strings (see below) and when making the arbitrary decisions of how tho construct gemm{M,N,K} during conv-to-gemm.
Problem configs
We would still support the old form (stuff like
-in_layout nchw
) but translate it to the new form quickly on contact, and we might decide we want to deprecate it.Instead, in problem keys, we'll have (using gemm as an example) keys such as
-a_m_size, -a_m_stride, -b_n_size, -b_n_stride
and so on.In the simple case where there aren't discontinuities, we'd have (using our first gemm example) keys like
... -a_m_size 5 -a_m_stride 1 -a_k_size 4 -a_k_stride 5 ...
For more complex cases, like my second example, we'd have problem key entriies like
-in_c_size [2, 4] -in_c_stride [36,1]
(or we could drop the brackets).This'd give us much more generality while simplifying our input format.
This is all a very high-level rough sketch of what I'm thinking, please feel free to lob clarifying questions at it.
Beta Was this translation helpful? Give feedback.
All reactions