Skip to content

Commit

Permalink
ConvTranpose using CUDNN Frontend with NHWC support (#21752)
Browse files Browse the repository at this point in the history
### Description
Added CUDNN Frontend and used it for NHWC ConvTranspose op including
option for bias fusion. Similar to this [Conv
PR](#19470)

### Backward compatible
If ORT is built with cuDNN 8, cuDNN frontend will not be built into
binary. Old kernels (using cudnn backend APIs) are used.

### Major Changes
For cuDNN 9, we will enable cudnn frontend to fuse data gradient
convolution and bias when a provider option fuse_conv_bias=1.

### Potential Issues
cuDNN frontend uses TF32 by default. It can be disabled using use_tf32
cuda provider option, but in the case cuDNN frontend encounters issues
building an operation graph it will fallback to using TF32.

### Follow ups
This is one of the PRs that target to enable NHWC, here the
ConvTranspose operation in CUDA EP by default if device supports it.
There are other changes will follow up to make it possible.
(1) Enable prefer_nhwc by default for device with sm >= 70.
(2) Change fuse_conv_bias=1 by default after more testing.
(3) Add other NHWC operators (like Resize or UpSample).

### Motivation and Context
The new CUDNN Frontend library provides the functionality to fuse
operations and provides new heuristics for kernel selection. Here it
fuses the convolution data gradient operation (ConvTranspose) with the
pointwise bias operation.

### Minor Change
In the CUDA convolution operation was a small bug when
`GetCudnnConv1dPadToNc1d ` was enabled.
  • Loading branch information
JTischbein authored Sep 10, 2024
1 parent f633caa commit 20d9464
Show file tree
Hide file tree
Showing 5 changed files with 702 additions and 224 deletions.
3 changes: 2 additions & 1 deletion onnxruntime/core/providers/cuda/cuda_execution_provider.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2473,7 +2473,8 @@ static bool RNNNeedFallbackToCPU(const onnxruntime::Node& node,
return false;
}

static bool ConvTransposeNeedFallbackToCPU(const onnxruntime::Node& node, const logging::Logger& logger,
static bool ConvTransposeNeedFallbackToCPU([[maybe_unused]] const onnxruntime::Node& node,
[[maybe_unused]] const logging::Logger& logger,
[[maybe_unused]] const GraphViewer& graph_viewer,
[[maybe_unused]] const bool prefer_nhwc) {
const auto& node_attributes = node.GetAttributes();
Expand Down
2 changes: 1 addition & 1 deletion onnxruntime/core/providers/cuda/nn/conv.cc
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ Status Conv<T, Layout>::UpdateState(OpKernelContext* context, bool bias_expected
if (cuda_ep->GetCudnnConv1dPadToNc1d()) {
x_dims_cudnn.insert(x_dims_cudnn.begin() + 2, 1);
y_dims_cudnn.insert(y_dims_cudnn.begin() + 2, 1);
w_dims_cudnn.insert(w_dims.begin() + 2, 1);
w_dims_cudnn.insert(w_dims_cudnn.begin() + 2, 1);
pads.insert(pads.begin() + kernel_rank, 0);
pads.insert(pads.begin(), 0);
kernel_shape.insert(kernel_shape.begin(), 1);
Expand Down
Loading

0 comments on commit 20d9464

Please sign in to comment.