Disallow async nestings that violate read after write dependencies #7868

abadams · 2023-09-27T18:38:19Z

Discovered while trying to improve the async schedule in the async_gpu performance test so that it's less flaky.

zvookin · 2023-09-28T16:03:05Z

Does this have any connection to #5202 ? There is also #5195 . It would be great to either get those moved forward or closed. @vksnk .

abadams · 2023-09-28T16:11:06Z

Looks strongly related to #5202, which I had forgotten about. Not sure about #5195

vksnk · 2023-09-28T16:15:44Z

#5202 looks related for sure, but #5195 is a sepatate issue.

I was planning to ressurect the PRs above (finally have some spare cycles), but I don't think this PR should be blocked on these. I can update accordingly once this is merged in.

abadams · 2023-09-28T16:16:32Z

I think this is the same as #5202

I like this solution slightly better, because it happens after tightening producer consumer nodes so it seems less possible for it to capture false dependencies. Open to other opinions though. I should at least incorporate the test from #5202 , and this PR has the error test but not the correctness test.

abadams · 2023-09-28T16:23:59Z

This also fixes #5201 (makes it error out), but that revealed my error message is wrong. I think the constraint is that if you have a producer consumer pair, where the consumer is async, then the store_at of the producer can never be between the compute_at and store_at levels of the consumer, and the compute_at of the producer can only be between the compute_at and store_at levels of the consumer if the producer is also async.

abadams · 2023-09-28T16:36:21Z

Correction:

If you have a producer consumer pair, where the consumer is async, and the compute_at of the producer is in between the store_at and compute_at of the consumer, then the producer must be both async and store_at outside the store_at level of the consumer.

abadams · 2023-09-28T16:40:45Z

I cherry-picked the tests from #5201, and one of them fails, so clearly something is still wrong :(

vksnk · 2023-09-28T16:54:48Z

The only thing I remember is that this bug was pretty tricky and required some graph level reasoning, but the rest is pretty foggy for now:( I'll need to reread the PR to remember it again.

I also remember there was another test I found which was failing, it's not in PR (all tests in the PR should be passing), but I sort of vaguely remember what was happening there. I'm going to try to reproduce it again.

abadams · 2023-09-28T16:55:35Z

I think the failing test is actually caused by the new sliding window behavior. There are too many release calls to the storage folding semaphores because the consumer has acquired extra loop iterations.

* Implement sliding window warmups by backing up the loop min. * Fix indirect sliding windows. * Improve is_monotonic. * Small cleanups. * Avoid generating vector valued bounds. * Fix build error on some compilers. * Fix loop bounds. * Don't try to slide things that should just be compute_at the store_at location. * Print condition when printing boxes. * Less things broken. * Add/fix comments. * Comments * Fix async by moving if inside consume (and so inside acquires). * Fix division. * This doesn't work on master either. * Add TODO * Acquire is not a no-op. * Add comment about unfortunate simplification. * Remove debug(0) * Add simplification of for { acquire { noop } } * Fix folding factors finally! * Update storage_folding test. * Fix bug when cloning a semaphore used more than once. * Disable failing test. * Work around bad complexity in is_monotonic. * Fix sub bug * Significantly faster schedule for blur. * Update tracing test. * New simplifications that help with upsampled and downsampled sliding windows. * This doesn't need explicit folding any more. * Fix new simplifier rules. * Fix simplifier div rule * Remove ancient brittle test. * Fix simplify rule again * More LT -> EQ rules for mod * Fix nested sliding windows with upsamples. * Replace hack with better solution. * Add missing override * Don't rewrite loop variable if the min doesn't change. * Refactor sliding window lowering. * Fixed bounds growing redundantly for independent producers. * Don't take the union unless possibly needed. * Respect conditional provide/required. * Add missing overrides * Much better schedule. * Use a smaller image for blur benchmarking so that different schedules have different perf * Replace Interval with ConstantInterval for is_monotonic. * Don't try to handle unsigned deltas. * Add failing test. * Remove unused new code. * Remove weird debugging code. * Avoid expanding bounds of split producers * Remove stray likely_if_innermost. * Remove old autotune tests. * Update test for guarded producers. * Reenable test. * Update trace for guarding producers. * Don't overwrite required.used * Handle LE/LT in bounds of lanes in vectorize * Fix acquire and release of warmups * Earlier fix for multiply cloned acquires was wrong. * Handle nested vectorization. * clang-format * Remove autotune_bug_* tests * Fix shadowing error on some compilers. * Appease overzealous clang-tidy warning. * clang-format * Don't use silly hack. * clang-tidy... * It's no longer safe to assume monotonic means bounds_of_expr_in_scope is exact * Address review comments * Add comment * Add missing override. * Fix constant interval issues. * Revert and remove empty interval * Fix multiply!? * Reduce need for simplifications. * Simplifications from dsharletg/sliding-window branch * Don't learn likely(x) and x. * Add comment * Add some min/max rules. * Also substitute facts from asserts * Remove is_empty from header too. * More rules * Add double stairstep rule. * Disable rule that uncovers bugs. * Consider anded expressions as if they were independent nested ifs. * Add promise_clamped to producer guards. * Revert "Consider anded expressions as if they were independent nested ifs." This reverts commit 03efb3f. * Don't combine ifs, split them instead. * Update trace * clang-tidy/clang-format * Remove splitting of ifs, it breaks brittle tests. * Safer check on old conditions. * Fix producer guard condition. * Interval fixes. * Handle sliding backwards * Handle transitive dependencies. * Backport abadams' fix from abadams/slide_over_split_loop * Fix select visitor. * More simplifier rules. * Bring back old logic as a fallback. * Avoid specializations corrupting sliding * Fix boneheaded rule errors. * Fix slightly conservative bounds at the max for split case. * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. * Add missing clamp rule * Don't count unlikely loops as inner loops for likely_if_innermost * Use <= instead of == to solve for the new loop min Useful when the warmup is a partial vector or something * Verify simplifier changes and add variants as suggested by synthesizer * Make implicit assumption explicit, for clarity * Use find_constant_bounds * Guard against expanded bounds more effectively. * Update tracing test * Small cleanup. * Don't simplify/prove using lets that might change value. * Stronger solving without expanding lets. * New simplifier rule for alignment * Fix case where no warmup needed * Add some useful rules. * Add safety check on when we can use the new loop min. * Better proof to avoid hacky condition that is hard to prove. * Small cleanup and use the nice new folding factors. * Bring back unrolled producer test. * clang-format * Expand comment. * Fix sliding backwards condition. * min(new_loop_min, loop_min) isn't needed any more. * We need that min, but we can be more conservative about it. * Stronger handling of previous loop mins. * Remove unused is_monotonic_strong. * Remove ConstantInterval::make_intersection. * Avoid need to handle uint specially. * Add cache for depends_on. * Reduce unnecessarily large cache scope * The first part of the key is always the same Co-authored-by: Andrew Adams <[email protected]>

vksnk · 2023-09-28T21:00:38Z

I think I found a failing test, it's hanging in my branch for #5202:

    {
        Func producer1, producer2, consumer;
        Var x, y;

        producer1(x, y) = x + y;
        producer2(x, y) = producer1(x, y);
        consumer(x, y) = producer1(x, y - 1) + producer2(x, y + 1);

        consumer.compute_root();

        producer1.store_at(consumer, Var::outermost()).compute_at(consumer, y).async();
        producer2.store_root().compute_at(consumer, y).async();
        consumer.bound(x, 0, 16).bound(y, 0, 16);

        Buffer<int> out = consumer.realize({16, 16});

        out.for_each_element([&](int x, int y) {
            int correct = 2 * (x + y);
            if (out(x, y) != correct) {
                printf("out(%d, %d) = %d instead of %d\n",
                       x, y, out(x, y), correct);
                exit(-1);
            }
        });
    }

vksnk · 2023-09-28T21:05:39Z

Ok, I just tried it in your branch and it does produce an error message instead of hanging.

abadams · 2023-10-04T19:06:03Z

Dillon doesn't remember the reasoning behind that line, so I just removed it. This will need testing inside Google to see if it breaks anything.

vksnk · 2023-11-14T18:25:33Z

Dillon doesn't remember the reasoning behind that line, so I just removed it. This will need testing inside Google to see if it breaks anything.

I can do testing inside Google, if it's ready.

abadams · 2023-11-14T23:54:18Z

Should be ready. I just did a merge with main.

steven-johnson · 2023-11-29T20:43:34Z

I'll pull this into Google to test

steven-johnson · 2023-11-30T16:16:24Z

AFAICT, the async() directive is not used anywhere inside Google

abadams · 2023-11-30T17:14:19Z

I was under the impression @vksnk was using it

vksnk · 2023-11-30T17:21:07Z

I thought we wanted to test it in Google, because no one could remember what the removed lines in storage_folding were for.

I am only planning to use async for something, but not quite yet.

steven-johnson · 2023-11-30T17:25:47Z

There were some usages of it for a long-defunct project, but that code was deleted a while back.

abadams · 2023-11-30T17:28:48Z

Ah, right. This is a change to storage folding, and it needs to be tested for that reason.

steven-johnson · 2023-12-01T01:08:01Z

I don't see any google3 failures.

abadams · 2023-12-01T01:19:49Z

Just needs a review then.

vksnk · 2023-12-01T19:33:28Z

src/AsyncProducers.cpp

            // Add post-synchronization
            internal_assert(!sema.empty()) << "Duplicate produce node: " << op->name << "\n";
            Stmt body = op->body;
+
+            // We don't currently support waiting on producers to the producer


"to the producer" -> "in the producer"?

…alide#7868) * Disallow async nestings that violate read after write dependencies Fixes halide#7867 * Add test * Add another failure case, and improve error message * Add some more tests * Update test * Add new test to cmakelists * Fix for llvm trunk * Always acquire the folding semaphore, even if unused * Skip async_order test under wasm * trigger buildbots --------- Co-authored-by: Volodymyr Kysenko <[email protected]> Co-authored-by: Steven Johnson <[email protected]>

abadams added 2 commits September 27, 2023 11:37

Disallow async nestings that violate read after write dependencies

9c6c062

Fixes #7867

Add test

4ccbe8d

abadams and others added 2 commits September 28, 2023 09:39

Add another failure case, and improve error message

4f60796

Add some more tests

50d9470

abadams added 2 commits September 28, 2023 10:44

Update test

3a2f087

Add new test to cmakelists

fc45139

abadams added 4 commits October 4, 2023 12:02

Fix for llvm trunk

33fa8a6

Merge remote-tracking branch 'origin/main' into abadams/fix_7867

1342c4d

Always acquire the folding semaphore, even if unused

14c7d73

Merge branch 'abadams/fix_riscv_vx_vi' into abadams/fix_7867

22c0565

Merge remote-tracking branch 'origin/main' into abadams/fix_7867

eb73a44

abadams mentioned this pull request Oct 6, 2023

HTML Stmt IR with conceptual code and device code. #7843

Merged

8 tasks

Merge remote-tracking branch 'origin/main' into abadams/fix_7867

5ade4c6

Merge remote-tracking branch 'origin/main' into abadams/fix_7867

edcc260

Skip async_order test under wasm

34d1832

steven-johnson added 2 commits November 28, 2023 07:19

Merge branch 'main' into abadams/fix_7867

6c278e1

Merge branch 'main' into abadams/fix_7867

fb72bad

abadams mentioned this pull request Nov 29, 2023

Pipeline with two async producers produce incorrect results #7965

Closed

steven-johnson added 2 commits November 29, 2023 12:43

Merge branch 'main' into abadams/fix_7867

4181f3e

trigger buildbots

153709b

vksnk approved these changes Dec 1, 2023

View reviewed changes

abadams merged commit 674e6cc into main Dec 1, 2023
19 checks passed

BrewTestBot mentioned this pull request Feb 2, 2024

halide 17.0.0 Homebrew/homebrew-core#161602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow async nestings that violate read after write dependencies #7868

Disallow async nestings that violate read after write dependencies #7868

abadams commented Sep 27, 2023 •

edited

Loading

zvookin commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Oct 4, 2023

vksnk commented Nov 14, 2023

abadams commented Nov 14, 2023

steven-johnson commented Nov 29, 2023

steven-johnson commented Nov 30, 2023

abadams commented Nov 30, 2023

vksnk commented Nov 30, 2023

steven-johnson commented Nov 30, 2023

abadams commented Nov 30, 2023

steven-johnson commented Dec 1, 2023

abadams commented Dec 1, 2023

vksnk Dec 1, 2023

Disallow async nestings that violate read after write dependencies #7868

Disallow async nestings that violate read after write dependencies #7868

Conversation

abadams commented Sep 27, 2023 • edited Loading

zvookin commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Sep 28, 2023

vksnk commented Sep 28, 2023

vksnk commented Sep 28, 2023

abadams commented Oct 4, 2023

vksnk commented Nov 14, 2023

abadams commented Nov 14, 2023

steven-johnson commented Nov 29, 2023

steven-johnson commented Nov 30, 2023

abadams commented Nov 30, 2023

vksnk commented Nov 30, 2023

steven-johnson commented Nov 30, 2023

abadams commented Nov 30, 2023

steven-johnson commented Dec 1, 2023

abadams commented Dec 1, 2023

vksnk Dec 1, 2023

Choose a reason for hiding this comment

abadams commented Sep 27, 2023 •

edited

Loading