Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conv Layer Incorrect output #14236

Open
2 tasks
shwetankTT opened this issue Oct 24, 2024 · 6 comments
Open
2 tasks

Conv Layer Incorrect output #14236

shwetankTT opened this issue Oct 24, 2024 · 6 comments
Assignees
Labels
bug Something isn't working CNN_bug CNNs conv generality op_cat: conv2D 2D convolution for CNNs P1

Comments

@shwetankTT
Copy link
Contributor

shwetankTT commented Oct 24, 2024

  • out_channels > 256 and output is row major layout incorrect output from pack_untilize_dst
  • has_bias and packer_l1_acc and fp32_accum flags are enabled for use_max_rows(output_layout is ROW_MAJOR)

Will add more details.

@shwetankTT shwetankTT added the bug Something isn't working label Oct 24, 2024
@shwetankTT shwetankTT self-assigned this Oct 24, 2024
@shwetankTT shwetankTT changed the title Conv output mismatch Incorrect output Oct 24, 2024
@shwetankTT shwetankTT changed the title Incorrect output Conv Layer Incorrect output Oct 24, 2024
@shwetankTT
Copy link
Contributor Author

test_new_conv2d.txt

@jvasilje
Copy link
Collaborator

@shwetankTT you opened a P0 bug and assigned it to yourself?
P0 bugs are company wide show stoppers - is that what this is?

@jvasilje jvasilje added P1 and removed P0 labels Oct 30, 2024
@shwetankTT
Copy link
Contributor Author

shwetankTT commented Nov 2, 2024

@shwetankTT you opened a P0 bug and assigned it to yourself? P0 bugs are company wide show stoppers - is that what this is?

Yeah It was def not a P0. Thanks for changing it to P1.

@shwetankTT
Copy link
Contributor Author

Seems like a hardware limitation? Let's take an example where input and output is 8x320x16x16 distributed across 64 cores, resulting in each core having a shard shape of (32, 320). This equates to 1 tile row and 10 tile columns. Since the output is in row-major order, completing the each row requires data from all 10 tiles, but we only have 8 dst tiles. @mywoodstock

shwetankTT added a commit that referenced this issue Nov 11, 2024
shwetankTT added a commit that referenced this issue Nov 11, 2024
@mywoodstock
Copy link
Contributor

Thanks @shwetankTT , yes if the width is > 8 tiles, we will need to iterate over the 8-tile blocks. Can you add a TT_FATAL for this for now?

@rtawfik01
Copy link
Contributor

Hi guys, so I see that pack_untilize has been tested in the unit level with Destination register Float32, and output format as Float32 https://github.com/tenstorrent/tt-metal/blob/main/tests/tt_metal/tt_metal/unit_tests/compute/test_untilize_tilize.cpp

But it seems the above use case is that Input is to the copy + pack_untilize is Float16_b, then input to the packer is Float32, and output of the packer is possibly back to Float16_b? We can add a unit test case that replicates the behavior you see to verify if it is a kernel issue, in the meantime @mywoodstock can you make sure the data format reconfigs are being correctly used with the copy + pack_untilize above?

@ncvetkovicTT @nvelickovicTT The issue above is that once Float32 destination register + Float32 packer input format is set, the results start being incorrect, and the pattern looks like each row of the Float32 destination register (failing result) is half of the datums of the Float16_b register (passing result). if possible, it would be good to add their specific test to our unit test infra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CNN_bug CNNs conv generality op_cat: conv2D 2D convolution for CNNs P1
Projects
None yet
Development

No branches or pull requests

4 participants