eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

wujingyue · 2024-11-21T19:56:14Z

This is a spin-off from #3444.

The current code assumes that logical-to-allocation has to be a permutation. This assumption won't hold any more with #2563. So this PR tries to extend eraseInputDistinctRootDomains to support more general transforms.

This can happen to single-GPU, although not as common. The tests added in this PR are for single-GPU because #3444 hasn't landed. #3444 will add some multi-GPU tests.

…transforms.

wujingyue · 2024-11-21T20:09:04Z

!test

csrc/fusion_segmenter.cpp

wujingyue · 2024-11-21T20:33:26Z

tests/cpp/test_allocation_domain.cpp

+// The test fails as is. The symbolic IterDomains in loop/allocation are not
+// concretized. I tried to change DynamicTransformConcretizer::mutate to grab
+// all expressions between root and allocation but still couldn't get it to
+// work.


@jacobhinkle FYI

I'm not sure why this could cause that problem. It may be just because of the concern I mentioned below.

This issue persisted after I moved the split after addInput/addOutput. FYI, below is my failed attempt to fix dynamic transform for this test.

diff --git a/csrc/dynamic_transform.cpp b/csrc/dynamic_transform.cpp index 24404db8..d287149d 100644 --- a/csrc/dynamic_transform.cpp +++ b/csrc/dynamic_transform.cpp @@ -1056,7 +1056,7 @@ void DynamicTransformConcretizer::mutate(TensorView* tv) { // beyond the logical domain as asserted above auto all_id_exprs = StmtSort::getExprsBetween( {tv->getRootDomain().begin(), tv->getRootDomain().end()}, - {tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()}); + {tv->getMaybeAllocationDomain().begin(), tv->getMaybeAllocationDomain().end()}); for (auto expr : all_id_exprs) { // Assume outputs of IterDomain exprs are always IterDomains. If // the assumption is invalidated, the logic here would need to

Previously we had assumed there were no loop or allocation transforms so that each of those would just be permutations of logical. ~~I think what we should do is actually use IRBFS here to propagate from root to all of the other domains (logical, loop, and allocation).~~ IRBFS is not needed actually since we can safely assume at concretization that we don't have an uncommon situation like a loop domain that's a producer for transforms that lead to the root, instead of the other direction. If we assume that root is a producer for all of the other domains we can just use StmtSort::getExprsBetween like above, but we need to add not just the logical or allocation, but all three other domains as the "to" argument. i.e.

std::unordered_set<Val*> to{tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()}; to.insert(tv->getMaybeAllocationDomain().begin(), tv->getMaybeAllocationDomain().end()); to.insert(tv->getLoopDomain().begin(), tv->getLoopDomain().end()); auto all_id_exprs = StmtSort::getExprsBetween({tv->getRootDomain().begin(), tv->getRootDomain().end()}, to);

Did you try it? While I agree with what you said, I doubt that helps this particular test case where root->maybeallocation includes all expressions. Btw, I believe

Fuser/csrc/ir/nodes.cpp

Line 3762 in b5e5182

std::vector<Expr*> TensorDomain::allExprs() const {

can be used to capture all Exprs in a TensorView.

Yes, the issue here is that allocation and logical are disconnected because there's been a replacement performed on logical, but not on allocation. I think the issue is that StmtSort::getStmts is only giving us the logical domains of the input TVs whereas we should expect it to provide all IDs for processing before the input TensorDomain and the TV itself.

Ha, looks like it's actually just loop domains.

https://github.com/NVIDIA/Fuser/blob/main/csrc/iter_visitor.cpp#L73

We should probably change this to visit all of root, logical and loop domains.

In this particular case though, loop is equal to allocation. I originally thought this was the issue here. I agree that that line is an issue if we have unconventional tensor domains like loops that are producers of logical, and we should ensure all the domains are available there, but I think in this case the bad behavior is specific to input TVs.

@wujingyue Is this a blocker?

Not for this PR. I can stick with concrete sizes for the tests for now

wujingyue · 2024-11-21T20:33:43Z

!test

wujingyue · 2024-11-21T21:49:25Z

!test

csrc/fusion_segmenter.cpp

tests/cpp/test_allocation_domain.cpp

csrc/ir/nodes.cpp

tests/cpp/test_allocation_domain.cpp

wujingyue · 2024-11-25T21:27:52Z

!test

naoyam

LGTM

wujingyue · 2024-11-25T23:28:09Z

!test

wujingyue added 4 commits November 20, 2024 22:36

Add a repro

559fbb7

eraseInputDistinctRootDomains supports general logical-to-alloaction …

6bc8de5

…transforms.

Can't do dynamic shapes unfortunately

bac9a30

Comment

fbaa7e5

wujingyue requested review from jacobhinkle, jjsjann123 and naoyam November 21, 2024 20:09

wujingyue added 2 commits November 21, 2024 12:31

Update the symbolic shape test

44696e4

Make loop the same as allocation to avoid a change in transform_replay.

cf126e6

wujingyue commented Nov 21, 2024

View reviewed changes

More of keeping loop and alloc the same

e445dfc

naoyam reviewed Nov 21, 2024

View reviewed changes

csrc/fusion_segmenter.cpp Outdated Show resolved Hide resolved

naoyam reviewed Nov 21, 2024

View reviewed changes

csrc/fusion_segmenter.cpp Outdated Show resolved Hide resolved

naoyam reviewed Nov 21, 2024

View reviewed changes

csrc/fusion_segmenter.cpp Outdated Show resolved Hide resolved

naoyam reviewed Nov 21, 2024

View reviewed changes

tests/cpp/test_allocation_domain.cpp Outdated Show resolved Hide resolved

Review

8d99735

wujingyue requested a review from naoyam November 22, 2024 07:12

jjsjann123 reviewed Nov 22, 2024

View reviewed changes

csrc/ir/nodes.cpp Show resolved Hide resolved

tests/cpp/test_allocation_domain.cpp Show resolved Hide resolved

Review

d184660

Merge branch 'main' into wjy/root

8460d43

naoyam approved these changes Nov 25, 2024

View reviewed changes

wujingyue merged commit 58e1514 into main Nov 26, 2024
36 of 37 checks passed

wujingyue deleted the wjy/root branch November 26, 2024 03:49

This was referenced Nov 26, 2024

Allow allocation to be a split and different from loop. #3479

Open

Failed to split a symbolic allocation domain. #3480

Open

wujingyue added the enhancement New feature or request label Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

wujingyue commented Nov 21, 2024 •

edited

Loading

wujingyue commented Nov 21, 2024

wujingyue Nov 21, 2024

naoyam Nov 21, 2024

wujingyue Nov 21, 2024 •

edited

Loading

jacobhinkle Nov 22, 2024 •

edited

Loading

wujingyue Nov 22, 2024

jacobhinkle Nov 22, 2024 •

edited

Loading

naoyam Nov 22, 2024

jacobhinkle Nov 22, 2024

naoyam Nov 25, 2024

wujingyue Nov 25, 2024

wujingyue commented Nov 21, 2024

wujingyue commented Nov 21, 2024

wujingyue commented Nov 25, 2024

naoyam left a comment

wujingyue commented Nov 25, 2024

eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

Conversation

wujingyue commented Nov 21, 2024 • edited Loading

wujingyue commented Nov 21, 2024

wujingyue Nov 21, 2024

Choose a reason for hiding this comment

naoyam Nov 21, 2024

Choose a reason for hiding this comment

wujingyue Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

jacobhinkle Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

wujingyue Nov 22, 2024

Choose a reason for hiding this comment

jacobhinkle Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

naoyam Nov 22, 2024

Choose a reason for hiding this comment

jacobhinkle Nov 22, 2024

Choose a reason for hiding this comment

naoyam Nov 25, 2024

Choose a reason for hiding this comment

wujingyue Nov 25, 2024

Choose a reason for hiding this comment

wujingyue commented Nov 21, 2024

wujingyue commented Nov 21, 2024

wujingyue commented Nov 25, 2024

naoyam left a comment

Choose a reason for hiding this comment

wujingyue commented Nov 25, 2024

wujingyue commented Nov 21, 2024 •

edited

Loading

wujingyue Nov 21, 2024 •

edited

Loading

jacobhinkle Nov 22, 2024 •

edited

Loading

jacobhinkle Nov 22, 2024 •

edited

Loading