Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ConvTranpose using CUDNN Frontend with NHWC support (#21752)
### Description Added CUDNN Frontend and used it for NHWC ConvTranspose op including option for bias fusion. Similar to this [Conv PR](#19470) ### Backward compatible If ORT is built with cuDNN 8, cuDNN frontend will not be built into binary. Old kernels (using cudnn backend APIs) are used. ### Major Changes For cuDNN 9, we will enable cudnn frontend to fuse data gradient convolution and bias when a provider option fuse_conv_bias=1. ### Potential Issues cuDNN frontend uses TF32 by default. It can be disabled using use_tf32 cuda provider option, but in the case cuDNN frontend encounters issues building an operation graph it will fallback to using TF32. ### Follow ups This is one of the PRs that target to enable NHWC, here the ConvTranspose operation in CUDA EP by default if device supports it. There are other changes will follow up to make it possible. (1) Enable prefer_nhwc by default for device with sm >= 70. (2) Change fuse_conv_bias=1 by default after more testing. (3) Add other NHWC operators (like Resize or UpSample). ### Motivation and Context The new CUDNN Frontend library provides the functionality to fuse operations and provides new heuristics for kernel selection. Here it fuses the convolution data gradient operation (ConvTranspose) with the pointwise bias operation. ### Minor Change In the CUDA convolution operation was a small bug when `GetCudnnConv1dPadToNc1d ` was enabled.
- Loading branch information