-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458
Conversation
!test |
// The test fails as is. The symbolic IterDomains in loop/allocation are not | ||
// concretized. I tried to change DynamicTransformConcretizer::mutate to grab | ||
// all expressions between root and allocation but still couldn't get it to | ||
// work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobhinkle FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this could cause that problem. It may be just because of the concern I mentioned below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue persisted after I moved the split after addInput/addOutput. FYI, below is my failed attempt to fix dynamic transform for this test.
diff --git a/csrc/dynamic_transform.cpp b/csrc/dynamic_transform.cpp
index 24404db8..d287149d 100644
--- a/csrc/dynamic_transform.cpp
+++ b/csrc/dynamic_transform.cpp
@@ -1056,7 +1056,7 @@ void DynamicTransformConcretizer::mutate(TensorView* tv) {
// beyond the logical domain as asserted above
auto all_id_exprs = StmtSort::getExprsBetween(
{tv->getRootDomain().begin(), tv->getRootDomain().end()},
- {tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()});
+ {tv->getMaybeAllocationDomain().begin(), tv->getMaybeAllocationDomain().end()});
for (auto expr : all_id_exprs) {
// Assume outputs of IterDomain exprs are always IterDomains. If
// the assumption is invalidated, the logic here would need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we had assumed there were no loop or allocation transforms so that each of those would just be permutations of logical. I think what we should do is actually use IRBFS here to propagate from root to all of the other domains (logical, loop, and allocation). IRBFS is not needed actually since we can safely assume at concretization that we don't have an uncommon situation like a loop domain that's a producer for transforms that lead to the root, instead of the other direction. If we assume that root is a producer for all of the other domains we can just use StmtSort::getExprsBetween
like above, but we need to add not just the logical or allocation, but all three other domains as the "to" argument. i.e.
std::unordered_set<Val*> to{tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()};
to.insert(tv->getMaybeAllocationDomain().begin(), tv->getMaybeAllocationDomain().end());
to.insert(tv->getLoopDomain().begin(), tv->getLoopDomain().end());
auto all_id_exprs =
StmtSort::getExprsBetween({tv->getRootDomain().begin(), tv->getRootDomain().end()}, to);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try it? While I agree with what you said, I doubt that helps this particular test case where root->maybeallocation includes all expressions. Btw, I believe
Line 3762 in b5e5182
std::vector<Expr*> TensorDomain::allExprs() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the issue here is that allocation and logical are disconnected because there's been a replacement performed on logical, but not on allocation. I think the issue is that StmtSort::getStmts
is only giving us the logical domains of the input TVs whereas we should expect it to provide all IDs for processing before the input TensorDomain and the TV itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, looks like it's actually just loop domains.
https://github.com/NVIDIA/Fuser/blob/main/csrc/iter_visitor.cpp#L73
We should probably change this to visit all of root, logical and loop domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular case though, loop is equal to allocation. I originally thought this was the issue here. I agree that that line is an issue if we have unconventional tensor domains like loops that are producers of logical, and we should ensure all the domains are available there, but I think in this case the bad behavior is specific to input TVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wujingyue Is this a blocker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR. I can stick with concrete sizes for the tests for now
!test |
!test |
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
!test |
This is a spin-off from #3444.
The current code assumes that logical-to-allocation has to be a permutation. This assumption won't hold any more with #2563. So this PR tries to extend eraseInputDistinctRootDomains to support more general transforms.
This can happen to single-GPU, although not as common. The tests added in this PR are for single-GPU because #3444 hasn't landed. #3444 will add some multi-GPU tests.