Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow async nestings that violate read after write dependencies #7868

Merged
merged 18 commits into from
Dec 1, 2023

Conversation

abadams
Copy link
Member

@abadams abadams commented Sep 27, 2023

Fixes #7867
Fixes #5175

Discovered while trying to improve the async schedule in the async_gpu performance test so that it's less flaky.

@zvookin
Copy link
Member

zvookin commented Sep 28, 2023

Does this have any connection to #5202 ? There is also #5195 . It would be great to either get those moved forward or closed. @vksnk .

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

Looks strongly related to #5202, which I had forgotten about. Not sure about #5195

@vksnk
Copy link
Member

vksnk commented Sep 28, 2023

#5202 looks related for sure, but #5195 is a sepatate issue.

I was planning to ressurect the PRs above (finally have some spare cycles), but I don't think this PR should be blocked on these. I can update accordingly once this is merged in.

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

I think this is the same as #5202

I like this solution slightly better, because it happens after tightening producer consumer nodes so it seems less possible for it to capture false dependencies. Open to other opinions though. I should at least incorporate the test from #5202 , and this PR has the error test but not the correctness test.

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

This also fixes #5201 (makes it error out), but that revealed my error message is wrong. I think the constraint is that if you have a producer consumer pair, where the consumer is async, then the store_at of the producer can never be between the compute_at and store_at levels of the consumer, and the compute_at of the producer can only be between the compute_at and store_at levels of the consumer if the producer is also async.

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

Correction:

If you have a producer consumer pair, where the consumer is async, and the compute_at of the producer is in between the store_at and compute_at of the consumer, then the producer must be both async and store_at outside the store_at level of the consumer.

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

I cherry-picked the tests from #5201, and one of them fails, so clearly something is still wrong :(

@vksnk
Copy link
Member

vksnk commented Sep 28, 2023

The only thing I remember is that this bug was pretty tricky and required some graph level reasoning, but the rest is pretty foggy for now:( I'll need to reread the PR to remember it again.

I also remember there was another test I found which was failing, it's not in PR (all tests in the PR should be passing), but I sort of vaguely remember what was happening there. I'm going to try to reproduce it again.

@abadams
Copy link
Member Author

abadams commented Sep 28, 2023

I think the failing test is actually caused by the new sliding window behavior. There are too many release calls to the storage folding semaphores because the consumer has acquired extra loop iterations.

abadams referenced this pull request Sep 28, 2023
* Implement sliding window warmups by backing up the loop min.

* Fix indirect sliding windows.

* Improve is_monotonic.

* Small cleanups.

* Avoid generating vector valued bounds.

* Fix build error on some compilers.

* Fix loop bounds.

* Don't try to slide things that should just be compute_at the store_at location.

* Print condition when printing boxes.

* Less things broken.

* Add/fix comments.

* Comments

* Fix async by moving if inside consume (and so inside acquires).

* Fix division.

* This doesn't work on master either.

* Add TODO

* Acquire is not a no-op.

* Add comment about unfortunate simplification.

* Remove debug(0)

* Add simplification of for { acquire { noop } }

* Fix folding factors finally!

* Update storage_folding test.

* Fix bug when cloning a semaphore used more than once.

* Disable failing test.

* Work around bad complexity in is_monotonic.

* Fix sub bug

* Significantly faster schedule for blur.

* Update tracing test.

* New simplifications that help with upsampled and downsampled sliding windows.

* This doesn't need explicit folding any more.

* Fix new simplifier rules.

* Fix simplifier div rule

* Remove ancient brittle test.

* Fix simplify rule again

* More LT -> EQ rules for mod

* Fix nested sliding windows with upsamples.

* Replace hack with better solution.

* Add missing override

* Don't rewrite loop variable if the min doesn't change.

* Refactor sliding window lowering.

* Fixed bounds growing redundantly for independent producers.

* Don't take the union unless possibly needed.

* Respect conditional provide/required.

* Add missing overrides

* Much better schedule.

* Use a smaller image for blur benchmarking so that different schedules have different perf

* Replace Interval with ConstantInterval for is_monotonic.

* Don't try to handle unsigned deltas.

* Add failing test.

* Remove unused new code.

* Remove weird debugging code.

* Avoid expanding bounds of split producers

* Remove stray likely_if_innermost.

* Remove old autotune tests.

* Update test for guarded producers.

* Reenable test.

* Update trace for guarding producers.

* Don't overwrite required.used

* Handle LE/LT in bounds of lanes in vectorize

* Fix acquire and release of warmups

* Earlier fix for multiply cloned acquires was wrong.

* Handle nested vectorization.

* clang-format

* Remove autotune_bug_* tests

* Fix shadowing error on some compilers.

* Appease overzealous clang-tidy warning.

* clang-format

* Don't use silly hack.

* clang-tidy...

* It's no longer safe to assume monotonic means bounds_of_expr_in_scope is exact

* Address review comments

* Add comment

* Add missing override.

* Fix constant interval issues.

* Revert and remove empty interval

* Fix multiply!?

* Reduce need for simplifications.

* Simplifications from dsharletg/sliding-window branch

* Don't learn likely(x) and x.

* Add comment

* Add some min/max rules.

* Also substitute facts from asserts

* Remove is_empty from header too.

* More rules

* Add double stairstep rule.

* Disable rule that uncovers bugs.

* Consider anded expressions as if they were independent nested ifs.

* Add promise_clamped to producer guards.

* Revert "Consider anded expressions as if they were independent nested ifs."

This reverts commit 03efb3f.

* Don't combine ifs, split them instead.

* Update trace

* clang-tidy/clang-format

* Remove splitting of ifs, it breaks brittle tests.

* Safer check on old conditions.

* Fix producer guard condition.

* Interval fixes.

* Handle sliding backwards

* Handle transitive dependencies.

* Backport abadams' fix from abadams/slide_over_split_loop

* Fix select visitor.

* More simplifier rules.

* Bring back old logic as a fallback.

* Avoid specializations corrupting sliding

* Fix boneheaded rule errors.

* Fix slightly conservative bounds at the max for split case.

* This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction.

* Add missing clamp rule

* Don't count unlikely loops as inner loops for likely_if_innermost

* Use <= instead of == to solve for the new loop min

Useful when the warmup is a partial vector or something

* Verify simplifier changes and add variants as suggested by synthesizer

* Make implicit assumption explicit, for clarity

* Use find_constant_bounds

* Guard against expanded bounds more effectively.

* Update tracing test

* Small cleanup.

* Don't simplify/prove using lets that might change value.

* Stronger solving without expanding lets.

* New simplifier rule for alignment

* Fix case where no warmup needed

* Add some useful rules.

* Add safety check on when we can use the new loop min.

* Better proof to avoid hacky condition that is hard to prove.

* Small cleanup and use the nice new folding factors.

* Bring back unrolled producer test.

* clang-format

* Expand comment.

* Fix sliding backwards condition.

* min(new_loop_min, loop_min) isn't needed any more.

* We need that min, but we can be more conservative about it.

* Stronger handling of previous loop mins.

* Remove unused is_monotonic_strong.

* Remove ConstantInterval::make_intersection.

* Avoid need to handle uint specially.

* Add cache for depends_on.

* Reduce unnecessarily large cache scope

* The first part of the key is always the same

Co-authored-by: Andrew Adams <[email protected]>
@vksnk
Copy link
Member

vksnk commented Sep 28, 2023

I think I found a failing test, it's hanging in my branch for #5202:

    {
        Func producer1, producer2, consumer;
        Var x, y;

        producer1(x, y) = x + y;
        producer2(x, y) = producer1(x, y);
        consumer(x, y) = producer1(x, y - 1) + producer2(x, y + 1);

        consumer.compute_root();

        producer1.store_at(consumer, Var::outermost()).compute_at(consumer, y).async();
        producer2.store_root().compute_at(consumer, y).async();
        consumer.bound(x, 0, 16).bound(y, 0, 16);

        Buffer<int> out = consumer.realize({16, 16});

        out.for_each_element([&](int x, int y) {
            int correct = 2 * (x + y);
            if (out(x, y) != correct) {
                printf("out(%d, %d) = %d instead of %d\n",
                       x, y, out(x, y), correct);
                exit(-1);
            }
        });
    }

@vksnk
Copy link
Member

vksnk commented Sep 28, 2023

Ok, I just tried it in your branch and it does produce an error message instead of hanging.

@abadams
Copy link
Member Author

abadams commented Oct 4, 2023

Dillon doesn't remember the reasoning behind that line, so I just removed it. This will need testing inside Google to see if it breaks anything.

@vksnk
Copy link
Member

vksnk commented Nov 14, 2023

Dillon doesn't remember the reasoning behind that line, so I just removed it. This will need testing inside Google to see if it breaks anything.

I can do testing inside Google, if it's ready.

@abadams
Copy link
Member Author

abadams commented Nov 14, 2023

Should be ready. I just did a merge with main.

@steven-johnson
Copy link
Contributor

I'll pull this into Google to test

@steven-johnson
Copy link
Contributor

AFAICT, the async() directive is not used anywhere inside Google

@abadams
Copy link
Member Author

abadams commented Nov 30, 2023

I was under the impression @vksnk was using it

@vksnk
Copy link
Member

vksnk commented Nov 30, 2023

I thought we wanted to test it in Google, because no one could remember what the removed lines in storage_folding were for.

I am only planning to use async for something, but not quite yet.

@steven-johnson
Copy link
Contributor

There were some usages of it for a long-defunct project, but that code was deleted a while back.

@abadams
Copy link
Member Author

abadams commented Nov 30, 2023

Ah, right. This is a change to storage folding, and it needs to be tested for that reason.

@steven-johnson
Copy link
Contributor

I don't see any google3 failures.

@abadams
Copy link
Member Author

abadams commented Dec 1, 2023

Just needs a review then.

// Add post-synchronization
internal_assert(!sema.empty()) << "Duplicate produce node: " << op->name << "\n";
Stmt body = op->body;

// We don't currently support waiting on producers to the producer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to the producer" -> "in the producer"?

@abadams abadams merged commit 674e6cc into main Dec 1, 2023
19 checks passed
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
…alide#7868)

* Disallow async nestings that violate read after write dependencies

Fixes halide#7867

* Add test

* Add another failure case, and improve error message

* Add some more tests

* Update test

* Add new test to cmakelists

* Fix for llvm trunk

* Always acquire the folding semaphore, even if unused

* Skip async_order test under wasm

* trigger buildbots

---------

Co-authored-by: Volodymyr Kysenko <[email protected]>
Co-authored-by: Steven Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants