Failed to lower some tensor manipulation ops from `aten` to `ttnn` #12853

jdh8 · 2024-08-22T19:25:49Z

The following ops refuse to lower to ttnn and stay in aten:

concat Try converting aten.cat to ttnn.concat pytorch2.0_ttnn#188
expand Fix conversion for aten.expand pytorch2.0_ttnn#146
repeat Convert aten.repeat to ttnn.repeat pytorch2.0_ttnn#189
reshape Convert aten.reshape to ttnn.reshape pytorch2.0_ttnn#190

============================================================================== short test summary info ===============================================================================
FAILED tests/lowering/tensor_manipulation/test_concat.py::test_concat[input_shapes0-1] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_expand.py::test_expand[input_shape0-new_shape0] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_expand.py::test_expand_after_op[input_shape0-new_shape0] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_expand.py::test_expand_before_op[input_shape0-new_shape0] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_expand.py::test_expand_between_ops[input_shape0-new_shape0] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_repeat.py::test_repeat[input_shape0-sizes0] - AssertionError: assert 0 == 1
FAILED tests/lowering/tensor_manipulation/test_reshape.py::test_reshape[input_shape0-new_shape0] - AssertionError: assert 0 == 1
============================================================================ 7 failed, 9 passed in 8.42s =============================================================================

The text was updated successfully, but these errors were encountered:

ayerofieiev-tt · 2024-08-22T21:04:36Z

@jdh8 need more details so we can pass the info to the Op team.

Can you please specify individual problems with each op?
Ideally we should have single ticket for single issue, with issue categorized

Op Category:

TM (tensor manipulation/data movement),
MatMul,
Conv,
Eltwise,
Pool,
Reduction,
Misc

Issue Category:

Arguments order mismatch (lower prioritiy)
Arguments mismatch (no way to pass configuration to the op)
Unsupported case (op rejects the input)
Crash
Wrong answer
Low PCC (close to expected value, but smth off)

Specific input: [params].
Link to an xfail test.

@kevinwuTT

* change tracer type as a decorator, its API change, too * torch_ttnn.backend apply tracer config * mv json dump out of parse_fx_graph * Wrap leaf func with function form instead of decorator * Rename TorchTtnnOption to TenstorrentBackendOption * Revert "Use subgraph_rewriter" This reverts commit fc09080. * Extrace mock_ttnn to a standalone package * Register ttnn backend * Add setup.py and pyproject.toml * Update README.md * mv torch_ttnn/tracer.py tracer/tracer.py * Add model_sow2_list1 in tools/stat_models * Fix counter bug * fix try except * detr_resnet50 retain_graph=True * Update README.md for package building * Update test case for ttnn inferface change - ttnn.open(0) -> ttnn.open_device(device_id=0) - ttnn.close(d) -> ttnn.close_device(d) * Convert 14 pointwise binary operations Add conversion and unit test cases - add - eq - gt - logical_and - logical_or - logical_xor - lt - maximum - minimum - mul - ne - pow - sub - xlogy * Convert 35 pointwise unary operations - aten.abs - aten.acos - aten.acosh - aten.asin - aten.asinh - aten.atan - aten.atan2 # binary - aten.atanh - aten.clone - aten.cos - aten.cosh - aten.erf - aten.exp - aten.expm1 - aten.gelu - aten.hardtanh - aten.isinf - aten.isnan - aten.leaky_relu - aten.log - aten.log10 - aten.log1p - aten.log2 - aten.logical_not - aten.neg - aten.reciprocal - aten.relu - aten.rsqrt - aten.sigmoid - aten.sign - aten.sin - aten.sinh - aten.sqrt - aten.tan - aten.tanh * Convert 3 pointwise trinary operations - addcdiv - addcmul - where * Convert 2 matmul operations Also use fx.subgraph_rewriter - matmul - linear * Simplify op conversion * Fix wrap for ttnn & update test - ttnn.add(and other ops) don't have __name__, so torch.compile will fail. We hard patch the op with the __name__ - Now ttnn need a to_layout before computation * Fix more ops unit test * Simpify pass insertion for gen_graphviz * Update test cases for to_layour * Fix add_data_move support kwargs * Support linear without bias * Don't gen graphviz for pointwise unary test * Fix three ops, and verify some ops 3 op are fixed - clone - hardtanh - leaky_relu Following ops are verifyed and wont be fixed - atan2: ttnn bug while x or y is 0 - pow: exponent don't suporrt tensor type - xlogy: y==1 should be 0 * Simplify binary test, Add scalar support * Supprt addcmul & addcdiv * Fix support of addcdiv and where - addcdiv with option is a special case in torch, need special pattern matching - ttnn.where(c, x, y) c can not be bool, need cast * Convert and test repeat op * Add silu conversion * Update new test cases according to torch.compile interface change * Simplify unary test cases, reuse impl code * Update test_stat import statment * Add group_norm conversion * Try convert layer_norm but the result is different * Support repeat * Update trinary tests * Support concat * Support split * Update format * [wip] group_norm * group_norm use customized replacement transformer method not work because it replace native_group_norm->getitem as ttnn.group_norm pattern replacement not work because it use symbolic trace, which using proxy, not value, however, we need to do the value dependent conversion * Add more test_group_norm * Add more test_layer_norm * Remove unused patterns/norm.py * refactor, rename, add comment * Support x.t() * move layer_norm impl to customized_replace * updatfor ttnn version 3023ec0f7 * Refactor test case Don't know why but if I reuse some code in test case, wrap it as a function, it will fail if vscode run more than 8 cases at the same time. It is really strange. I currently have no idea how it happen, I can just refactor the case to dup code. * Merge .gitignore * Resolve conflicts in README.md * Use incoming test_fall_back.py * Use our tests/tools/test_stats.py * Resolve generate_report.py * Preserve intermediate node meta for composite ops and add checks in respective tests (#50) Co-authored-by: Artem Yerofieiev <[email protected]> * Resolve torch_ttnn/__init__.py * Resolve confict * Remove duplicate entries from .gitignore * Update metadata in setup.py * Correct the name of the test module in test_datamove.py * Remove duplicate test for softmax * Fix merge errors, now binary test passed. * Remove test_if.py as `if` is already tested by lowering/misc/test_if.py * Remove test_only_add_matmul.py, superseded by lowering/matmul/test_only_add_matmul.py * Test group_norm * Test torch.matmul -> ttnn.matmul * Test compiling torch.nn.functional.linear * Refactor test for CSE * Remove test_norm.py as we've done its migration * This test no longer tests falling back to torch op but division, which should be handled in lowering/eltwise/binary/test_div.py instead * Convert tests for unary eltwise ops * Fix loading pytest arguments * Convert tests for binary eltwise ops * Fix precision test for bivariate ops * Fix precision test for univariate ops * Remove test_pointwise_trinary.py for flexibility Trivariate functions differ too much to share the same testing protocol * Test torch.reshape -> ttnn.reshape * Test compiling aten.repeat * Test compiling torch.cat * Remove test_datamove.py because all its tests have been moved to lowering/tensor_manipulation/ * Remove test already covered by test_resnet.py * Use more descriptive names in torch_ttnn/patterns/add.py Co-authored-by: Artem Yerofieiev <[email protected]> * Use more descriptive names in torch_ttnn/patterns/addcdiv.py Co-authored-by: Artem Yerofieiev <[email protected]> * Simpler path to resolve ttnn.addcdiv Co-authored-by: Artem Yerofieiev <[email protected]> * Make test names unique for easier batch testing with pytest * Fix import target_wrappers * Move code in add_coreops_pass.py to add_data_move_pass.py first to help refactoring * Refactor lowering univariate functions * Simplify control flow in lowering * Refactor lowering bivariate functions * Sort ops to lower * Lower to ttnn.atan2 * Lower to ttnn.leaky_relu * Lower default hardtanh to ttnn.clip for now * Lower addcdiv and addcmul * Remove migrated lowering code * Fix names of ttnn ops * Remove duplicate entries in the op list * Remove unused pattern file * Remove the file that is already merged into add_data_move_pass.py * Test broadcasting for bivariate ops * Test broadcasting for all bivariate ops * Remove intermediate test for `div` so we can test broadcastinig * Regroup ops based on API docs https://docs.tenstorrent.com/tt-metal/latest/ttnn/ttnn/api.html * Remove ops not working: - Comparison ops - Logical ops - aten.pow.Tensor_Tensor * Apply @kevinwuTT's patch on tt.repeat at model teardown * Format the code with `black` for a consistent style * Reformat with the specified config * Mark tests xfail based on #64 * Remove test for unsupported pow(tensor, tensor) * Mark broadcasting issues (#64) with atan2 and xlogy * Reformat the code * Mark broadcasting issues (#64) with (min|max)imum * Mark broadcasting issues (#64) with subtraction * Mark numerical issues with atan2 * Try setting realistic tolerance for low precision math ops * Tolerate more with pointwise unary math ops * Reflect that we convert torch.hardtanh to ttnn.clip for now * Remove test for unsupported group norm * Mark conversion failure with `linear` * Fix test_clone.py for the patch * Mark argument mismtach in `arange` (#65) * Link #66 to test_linear.py * Mark lowering issues with tensor manipulation (#67) * Reciprocal needs an offset because it has a pole at 0 * More tolerance for matmul for its accumulated error * Symetrically mark xfail for (min|max)imum * Merge human-made docs from README.md to docs/README.md.in * Do not use braces for shell variables to avoid clashing with .format * Generate README.md with new metrics * Mark xfail for xlogy involving broadcasting xlogy asserts the same size for inputs for now --------- Co-authored-by: swimdi <[email protected]> Co-authored-by: yoco <[email protected]> Co-authored-by: Zahid Wakeel <[email protected]> Co-authored-by: Artem Yerofieiev <[email protected]> Co-authored-by: yoco <[email protected]>

ayerofieiev-tt · 2024-08-27T01:09:08Z

@jdh8 please provide more details so we can fire individual tickets to Op lead

boris-drazic · 2024-09-11T17:59:01Z

@jdh8 For reshape your test (tests/lowering/tensor_manipulation/test_reshape.py) is not actually dispatching aten.reshape to compiler, but rather aten.view.

Your test module:

def forward(self, x, new_shape):
  return torch.reshape(x, new_shape)

is passed to compiler as

def forward(self, arg0_1):
    view = torch.ops.aten.view.default(arg0_1, [21, 5]);  arg0_1 = None
    return (view,)

which will yield the same result, but without using repeat.

boris-drazic · 2024-09-12T14:34:01Z

I don't see any code for processing aten OPs reshape, concat, and repeat in torch_ttnn/passes/lowering/to_tt_pass.py. That would explain why they are not being lowered to ttnn during compilation.

jdh8 · 2024-09-12T18:01:17Z

@boris-drazic, I once removed conversion for these ops to keep the size of tenstorrent/pytorch2.0_ttnn#54 reviewable. Now I make separate PR for each op.

ntarafdar · 2024-09-19T16:31:32Z

@jdh8 for each sub issue can you give us a ttnn unit test thats failing. CC @ayerofieiev-tt

jdh8 · 2024-09-19T20:49:13Z

@tarafdarTT, I've made each sub-issue as a PR, so we can experiment on tests without affecting other ops. In each PR, I 'unlocked' some tests by removing @pytest-mark-xfail, and these are the failing tests.

concat Try converting aten.cat to ttnn.concat pytorch2.0_ttnn#188
expand Fix conversion for aten.expand pytorch2.0_ttnn#146
reshape Convert aten.reshape to ttnn.reshape pytorch2.0_ttnn#190

ntarafdar · 2024-09-19T20:59:42Z

@jdh8 oh okay, do you need anything from me (or the TM team ) right now or its that after you lower to ttnn ?

jdh8 referenced this issue in jdh8/pytorch2.0_ttnn Aug 22, 2024

Mark lowering issues with tensor manipulation (#67)

d3eeca5

jdh8 mentioned this issue Aug 22, 2024

Merge M0 ops tenstorrent/pytorch2.0_ttnn#54

Merged

23 tasks

jdh8 mentioned this issue Sep 4, 2024

Fix conversion for aten.expand tenstorrent/pytorch2.0_ttnn#146

Open

This was referenced Sep 12, 2024

Try converting aten.cat to ttnn.concat tenstorrent/pytorch2.0_ttnn#188

Open

Convert aten.repeat to ttnn.repeat tenstorrent/pytorch2.0_ttnn#189

Merged

Convert aten.reshape to ttnn.reshape tenstorrent/pytorch2.0_ttnn#190

Open

ayerofieiev-tt added the pytorch-compiler label Sep 18, 2024

ayerofieiev-tt transferred this issue from tenstorrent/pytorch2.0_ttnn Sep 18, 2024

github-actions bot added the community label Sep 18, 2024

ayerofieiev-tt removed the community label Sep 18, 2024

ayerofieiev-tt assigned ntarafdar Sep 18, 2024

ayerofieiev-tt added the P1 label Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to lower some tensor manipulation ops from `aten` to `ttnn` #12853

Failed to lower some tensor manipulation ops from `aten` to `ttnn` #12853

jdh8 commented Aug 22, 2024 •

edited

Loading

ayerofieiev-tt commented Aug 22, 2024 •

edited

Loading

ayerofieiev-tt commented Aug 27, 2024

boris-drazic commented Sep 11, 2024

boris-drazic commented Sep 12, 2024

jdh8 commented Sep 12, 2024

ntarafdar commented Sep 19, 2024

jdh8 commented Sep 19, 2024

ntarafdar commented Sep 19, 2024 •

edited

Loading

Failed to lower some tensor manipulation ops from aten to ttnn #12853

Failed to lower some tensor manipulation ops from aten to ttnn #12853

Comments

jdh8 commented Aug 22, 2024 • edited Loading

ayerofieiev-tt commented Aug 22, 2024 • edited Loading

ayerofieiev-tt commented Aug 27, 2024

boris-drazic commented Sep 11, 2024

boris-drazic commented Sep 12, 2024

jdh8 commented Sep 12, 2024

ntarafdar commented Sep 19, 2024

jdh8 commented Sep 19, 2024

ntarafdar commented Sep 19, 2024 • edited Loading

Failed to lower some tensor manipulation ops from `aten` to `ttnn` #12853

Failed to lower some tensor manipulation ops from `aten` to `ttnn` #12853

jdh8 commented Aug 22, 2024 •

edited

Loading

ayerofieiev-tt commented Aug 22, 2024 •

edited

Loading

ntarafdar commented Sep 19, 2024 •

edited

Loading