Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] ttnn.concat - Tile padding along concatenated dim is not supported #13667

Open
chandrasekaranpradeep opened this issue Oct 10, 2024 · 8 comments
Assignees
Labels
bug Something isn't working forge P0

Comments

@chandrasekaranpradeep
Copy link

Describe the bug
The ttnn.concat op throws Tile padding along concatenated dim (3) not supported for concat yet (tensor: 0) error when the tuple of two input tensor of same shape (1, 32, 12, 50) are passed with concat dim = -1. The above error is thrown while validating the input tensor of the concat op in the ttnn::operations::data_movement::ConcatDeviceOperation::validate function.

For more context, here is the exact error message:

E       RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/concat/device/concat_device_operation.cpp:43: !in_ref.get_shape().has_tile_padding(this->dim)
E       info:
E       Tile padding along concatenated dim (3) not supported for concat yet (tensor: 0).

To Reproduce
Run the following test:

import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc

def test_concat_rotary_embedding(device):
    concat_dim = 3
    torch_input_tensor_a = torch.rand((1, 32, 12, 50), dtype=torch.float32)
    torch_input_tensor_b = torch.rand((1, 32, 12, 50), dtype=torch.float32)
    torch_output_tensor = torch.concat([torch_input_tensor_a, torch_input_tensor_b], dim=concat_dim)

    input_tensor_a = ttnn.from_torch(torch_input_tensor_a, layout=ttnn.TILE_LAYOUT, device=device)
    input_tensor_b = ttnn.from_torch(torch_input_tensor_b, layout=ttnn.TILE_LAYOUT, device=device)

    output = ttnn.concat([input_tensor_a, input_tensor_b], dim=concat_dim)
    output = ttnn.to_torch(output)

    assert_with_pcc(torch_output_tensor, output, 0.9999)
@ntarafdar
Copy link
Contributor

@jaykru-tt can you see if this has been covered by the concat merge you did ?

@ntarafdar
Copy link
Contributor

@chandrasekaranpradeep spoke to @jaykru-tt , it will be fixed with his PR that he has already implemented, it is in the process of getting merged in. will close this issue when its merged in.

@jvasilje
Copy link
Collaborator

jvasilje commented Nov 5, 2024

Did not hear back reason for P0 label. Setting to P1.

@jvasilje jvasilje added P1 and removed P0 labels Nov 5, 2024
@jvasilje
Copy link
Collaborator

jvasilje commented Nov 6, 2024

@ntarafdar let's update the Forge team on the status on this issue. :)

During the bugs meeting this morning, the Forge team reported that this is top priority for them to unblock a single model they want to push through. It's not a generality and sweep milestone.

Setting back to P0, as per Forge request.

@jvasilje jvasilje added P0 and removed P1 labels Nov 6, 2024
@ntarafdar
Copy link
Contributor

This can be cherry-picked right now in Jay's branch. He is currently trying to push this in through CI and running into perf issues (allocater). #14306

Need to investigate if this is infra issue or issue in concat.

Let us know if cherry-picking will work in meantime, Jay will be back next Wednesday to push it in.
If not we can assign someone this in the mean time

@tt-mpantic
Copy link

Hi Naif, thanks for the update. At the moment we are consuming only metal main so we can't consume custom branch that you shared and we will wait for fix to land on main. I understand that it takes some time to get this in and it is fine, important part if that this is being worked on with priority.
Assigning someone else at this point seems inefficient. Wednesday sounds fine.

@dgolubovicTT
Copy link

Any updates on this one? We are still blocked on Llama bringup due to this.

@ntarafdar
Copy link
Contributor

@jaykru-tt is back to work on this. He has seen unexpected bugs related to perf tests of models due to concat allocating large tensors. His eta to get this fixed and in is tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working forge P0
Projects
None yet
Development

No branches or pull requests

7 participants