Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UNet Shallow end-to-end performance #12857

Open
4 tasks
esmalTT opened this issue Sep 18, 2024 · 0 comments
Open
4 tasks

Improve UNet Shallow end-to-end performance #12857

esmalTT opened this issue Sep 18, 2024 · 0 comments

Comments

@esmalTT
Copy link
Contributor

esmalTT commented Sep 18, 2024

Summary

The current single-chip end-to-end performance of UNet is 243 fps. In order to improve the end-to-end performance of UNet, we need to do the following things:

  1. Reduce input volume: Currently input channels are padded from 4 to 32 because convolution does not support in_channels=4.

  2. Reduce output volume: UNet's output has only a single channel, but because the output tensor TILE layout, we copy an extra 31 channels of data back to host.

  3. Improve host read-back performance: based on what we are seeing in the tracy profile, UNet seems to be spending a long time in the host read back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants