You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current single-chip end-to-end performance of UNet is 243 fps. In order to improve the end-to-end performance of UNet, we need to do the following things:
Reduce input volume: Currently input channels are padded from 4 to 32 because convolution does not support in_channels=4.
Reduce output volume: UNet's output has only a single channel, but because the output tensor TILE layout, we copy an extra 31 channels of data back to host.
Summary
The current single-chip end-to-end performance of UNet is 243 fps. In order to improve the end-to-end performance of UNet, we need to do the following things:
Reduce input volume: Currently input channels are padded from 4 to 32 because convolution does not support
in_channels=4
.pad
does not support last dimension #12896 - possible workaroundReduce output volume: UNet's output has only a single channel, but because the output tensor TILE layout, we copy an extra 31 channels of data back to host.
to_layout
does not support UNet Shallow output shape #12705Improve host read-back performance: based on what we are seeing in the tracy profile, UNet seems to be spending a long time in the host read back
The text was updated successfully, but these errors were encountered: