Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a variety of (data tiled) convolution strategies #63

Merged
merged 12 commits into from
Jul 20, 2023

Conversation

qedawkins
Copy link

No description provided.

Adds the ability to use the transform dialect strategy builders behind
`iree-spirv-enable-transform-dialect-jit`, mirroring the existing flags
for LLVMCPU/GPU.
DetachElementwiseFromNamedOps is used to replace pre-filled outputs with
a zero-fill + add for contracting ops (gemm, conv). This extends the
pattern to the convolution interface to allow non-named cases. Renaming
of the pass can happen as a follow up if/when this is upstreamed.
Towards pad fused convolution strategies.
Removes the restriction for named ops only on the convolution matcher,
instead using the interface.
Adds a builder for mapping data tiled convolutions to a direct
tensorcore approach (mainly targeting wmma for now). This generates
a loop over the input channels, promotion of the padded input tile to
shared memory, and then two more inner loops over the convolution
filter.
Adds a direct SIMT(/fma/dot4) conv approach without shared memory.
Allows matching non-named contraction ops, using the same
MatmulOpCaptures struct that exists for matmul and batch matmul
…r strategies

Maps data tiled matmuls to tensor core, assuming no distribution is
expected to happen over the inner tile.
Additionally improve distribution of pad copies for convolution strategy
by greedily distributing over the outer most dimensions of the copy.
Currently pad fusion only applies to named convolutions. This allows it
to apply based on the interface.
<32 bit width types are handled on the SPIR-V side by introducing
bitcasts to and from i32 and bubbling them to the center of the kernel
hoping to cancel. This adds a pattern for a bitcast on the result of an
scf.if, which comes from the way that padding is handled (transfer_read
in the `then` branch, else yield a splat constant).
@qedawkins qedawkins merged commit f8976fd into shark_frozen Jul 20, 2023
8 of 11 checks passed
@qedawkins qedawkins deleted the shark_staging branch July 20, 2023 16:51
powderluv pushed a commit that referenced this pull request Sep 25, 2023
I had built these out while working on root causing a regression.
Cleaned them up and mainlining them. They are controlled by compile time
variables for the moment and we can do something smarter later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant