Unshard tensor sizes before binding. #3444

wujingyue · 2024-11-19T01:39:23Z

Fixes #3282

With this PR, we'll still try to bind tensors to logical domains. However, tensor sizes are "unsharded" before binding.

csrc/expr_evaluator.cpp

csrc/scheduler/transpose.cpp

wujingyue · 2024-11-25T23:26:45Z

!test

wujingyue · 2024-11-26T00:54:52Z

tests/cpp/test_sharding.cpp

-  TensorView* b = makeSymbolicTensor(3);
-  b->split(1, 4);
-  b->axis(1)->parallelize(ParallelType::DIDx);
-  EXPECT_TRUE(isSharded(b)) << "DIDx on loop domain";


Due to the change in isSharded, this is changed to look at the allocation domain. Also, I split this test into three and expanded the error messages.

csrc/fusion_segmenter.cpp

csrc/transform_replay.cpp

naoyam · 2024-11-26T02:35:36Z

Can you also run !test --diff just in case?

…transforms (#3458) This is a spin-off from #3444. The current code assumes that logical-to-allocation has to be a permutation. This assumption won't hold any more with #2563. So this PR tries to extend eraseInputDistinctRootDomains to support more general transforms. This can happen to single-GPU, although not as common. The tests added in this PR are for single-GPU because #3444 hasn't landed. #3444 will add some multi-GPU tests.

wujingyue · 2024-11-26T04:24:27Z

!test --diff

wujingyue · 2024-11-26T07:24:13Z

Can you also run !test --diff just in case?

All passing

wujingyue · 2024-11-27T19:18:31Z

!test

wujingyue · 2024-11-27T19:25:34Z

!test

wujingyue · 2024-11-27T19:32:45Z

!test

samnordmann

LGTM! only left some minor comments

I agree with the approach taken and how it's implemented, however, let me nit-pick on the wording used: I would say that the implemented approach binds the at::Tensor with the allocation domain, and not the logical domain as written in the comments. Indeed, the shape of the tensor will match the shape of the allocation domain, even though we still need to revert some division to "solve the equation" and obtain the size to which the symbolic extent must be bound. I guess it's ok if we don't use the same wording, but I just wanted to express how I interpret what is implemented here.

tests/cpp/test_sharding.cpp

tests/cpp/test_multidevice_sharding.cpp

samnordmann · 2024-11-27T17:08:47Z

tests/cpp/test_multidevice_sharding.cpp

+  for (auto* tv : {in, out}) {
+    tv->split(0, num_devices, /*inner_split=*/false);
+    tv->axis(0)->parallelize(ParallelType::DIDx);
+    tv->setAllocationDomain(tv->getLoopDomain(), true);


Are there some cases where we want Allocation and Loop domains to be different?

Yes and it's a limitation that they have to be the same in certain cases at this moment. #3479

In the link I read

The loop domain, ideally, shouldn't be set or used because a fusion/segment input comes from outside and is not generated by a loop.

It makes sense. But I am still curious if we know some cases where we want loop and allocation to be different.

I can't think of a case for DID, but there are certainly cases for the host-loop parallel type as we discussed before -- each iteration reads/writes a slice of fully-allocated input/output tensor.

tests/cpp/test_multidevice_sharding.cpp

csrc/multidevice/utils.h

samnordmann · 2024-11-28T09:14:08Z

csrc/multidevice/utils.h

+// For example, when `tv` is
+//   logical: iM, iN
+//   allocation: iDIDx{D}, iN/D, iM
+// and `sizes` is [2, 3], the returned shape will be [2, 3D]. This is because,


I think there is a mistake here..
if we bind {2,3} to {N/D, M}
then M=3 and N=2D, and so according to the comment, it should return the shape corresponding to the logical domain, i.e., [3, 2D]. Am I missing something?

moreover, do we support transposition?

The comment is correct and is consistent with the code.

ExpressionEvaluator::bindTensorDomain basically does the following

unsharded_sizes = unshardedSizes(t.sizes()); for (i : range(t.dim())) { bind(logical_domain[i], unsharded_sizes[i]); }

That's also why I prefer to say we bind the unsharded sizes to the logical domain instead of allocation.

The comment is correct and is consistent with the code.

I see. Imo it is error prone to silently discard transposition. We should assert that only splits have been applied, OR, we should support transposition, which shouldn't be too hard...

ExpressionEvaluator::bindTensorDomain basically does the following

unsharded_sizes = unshardedSizes(t.sizes()); for (i : range(t.dim())) { bind(logical_domain[i], unsharded_sizes[i]); }

That's also why I prefer to say we bind the unsharded sizes to the logical domain instead of allocation.

I would say in this case that we bind to neither the logical nor the allocation, but to some hybrid domain where starting from the logical we only applied the splits. This is a bit counter-intuitive to me.

In your snippet above, everything is contained in the unsharded_sizes which basically embeds a mapping from allocation (or more precisely the hybrid domain I described earlier) to logical.

it is error prone to silently discard transposition.

I believe code as is supports transposition. (I assume by transposition you mean TensorView::reorder). To assure you that, I added a test in the latest commit.

csrc/multidevice/utils.h

Co-authored-by: samnordmann <[email protected]>

wujingyue · 2024-11-30T01:40:03Z

!test

wujingyue · 2024-11-30T01:47:11Z

csrc/tensor_metadata.cpp

+
+    int64_t inner_size;
+    int64_t outer_size;
+    if (split->innerSplit()) {


@jjsjann123 this check was missing. I believe there's a similar problem in BackwardTraverseFromLogicalToAlloc which I didn't fix in this PR.

I think we need to have a test exposing that first before fixing anything.

Yes, that's why I didn't fix backward in this PR. The problem in forward was indeed exposed by LoopSplitWithReorder.

Yeah no worries. We'll patch it when it shows up.

wujingyue · 2024-11-30T02:49:13Z

!test

This reverts commit 2daf5b2.

wujingyue · 2024-11-30T03:07:38Z

!test

wujingyue force-pushed the wjy/forward branch 2 times, most recently from 2111355 to 176ed8e Compare November 19, 2024 06:47

naoyam reviewed Nov 20, 2024

View reviewed changes

csrc/expr_evaluator.cpp Outdated Show resolved Hide resolved

wujingyue mentioned this pull request Nov 21, 2024

eraseInputDistinctRootDomains supports general logical-to-allocation transforms #3458

Merged

wujingyue force-pushed the wjy/forward branch 2 times, most recently from 564f2f5 to 1c7c7e4 Compare November 25, 2024 23:07

wujingyue changed the base branch from main to wjy/root November 25, 2024 23:08

wujingyue force-pushed the wjy/forward branch from 1c7c7e4 to 98ce7eb Compare November 25, 2024 23:24

wujingyue commented Nov 25, 2024

View reviewed changes

csrc/scheduler/transpose.cpp Show resolved Hide resolved

wujingyue marked this pull request as ready for review November 25, 2024 23:26

wujingyue requested review from naoyam, cowanmeg and samnordmann November 25, 2024 23:26

wujingyue commented Nov 26, 2024

View reviewed changes

naoyam reviewed Nov 26, 2024

View reviewed changes

csrc/fusion_segmenter.cpp Outdated Show resolved Hide resolved

naoyam reviewed Nov 26, 2024

View reviewed changes

csrc/transform_replay.cpp Outdated Show resolved Hide resolved

Base automatically changed from wjy/root to main November 26, 2024 03:49

wujingyue added 8 commits November 25, 2024 20:01

Add a repro for #3282

04e06a8

Remove an assumption in the transpose scheduler.

416f1d0

Reimplement unshardSizesAndStrides.

2c984c8

Inherit parallel type for new allocation IDs

44b5091

Fix broadcast tests

521c783

Unify unshardedSizes.

9ff10cf

Fix a test

8bd9486

Refine the logic in the transpose scheduler

5a16349

wujingyue force-pushed the wjy/forward branch from 18164da to 5a16349 Compare November 26, 2024 04:23

wujingyue requested a review from naoyam November 26, 2024 07:24

Comment

2768970

Merge branch 'main' into wjy/forward

e38962e

wujingyue force-pushed the wjy/forward branch from e4638ae to e38962e Compare November 27, 2024 19:25

Resolve two fixmes

400684e

samnordmann reviewed Nov 28, 2024

View reviewed changes

wujingyue and others added 2 commits November 28, 2024 21:49

Harden a test

261b831

Apply suggestions from code review

c9d748e

Co-authored-by: samnordmann <[email protected]>

wujingyue requested a review from samnordmann November 29, 2024 06:16

Lint

b6e3f40

samnordmann approved these changes Nov 29, 2024

View reviewed changes

Add a test with reordering

61260cb

wujingyue commented Nov 30, 2024

View reviewed changes

Lint

2daf5b2

wujingyue added 2 commits November 29, 2024 19:06

Revert "Lint"

d4532e6

This reverts commit 2daf5b2.

Fix lint

a5f6eb3

wujingyue merged commit c154e90 into main Nov 30, 2024
39 of 47 checks passed

wujingyue deleted the wjy/forward branch November 30, 2024 05:28

wujingyue mentioned this pull request Dec 4, 2024

Allgather with DID loop split #3284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unshard tensor sizes before binding. #3444

Unshard tensor sizes before binding. #3444

wujingyue commented Nov 19, 2024 •

edited

Loading

wujingyue commented Nov 25, 2024

wujingyue Nov 26, 2024

naoyam commented Nov 26, 2024

wujingyue commented Nov 26, 2024

wujingyue commented Nov 26, 2024

wujingyue commented Nov 27, 2024

wujingyue commented Nov 27, 2024

wujingyue commented Nov 27, 2024

samnordmann left a comment

samnordmann Nov 27, 2024

wujingyue Nov 29, 2024

samnordmann Nov 29, 2024

wujingyue Nov 30, 2024

samnordmann Nov 28, 2024

samnordmann Nov 28, 2024

wujingyue Nov 29, 2024

samnordmann Nov 29, 2024

wujingyue Nov 30, 2024

wujingyue commented Nov 30, 2024

wujingyue Nov 30, 2024

jjsjann123 Dec 2, 2024

wujingyue Dec 2, 2024

jjsjann123 Dec 2, 2024

wujingyue commented Nov 30, 2024

wujingyue commented Nov 30, 2024

Unshard tensor sizes before binding. #3444

Unshard tensor sizes before binding. #3444

Conversation

wujingyue commented Nov 19, 2024 • edited Loading

wujingyue commented Nov 25, 2024

Choose a reason for hiding this comment

naoyam commented Nov 26, 2024

wujingyue commented Nov 26, 2024

wujingyue commented Nov 26, 2024

wujingyue commented Nov 27, 2024

wujingyue commented Nov 27, 2024

wujingyue commented Nov 27, 2024

samnordmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wujingyue commented Nov 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wujingyue commented Nov 30, 2024

wujingyue commented Nov 30, 2024

wujingyue commented Nov 19, 2024 •

edited

Loading