-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow async nestings that violate read after write dependencies #7868
Conversation
I think this is the same as #5202 I like this solution slightly better, because it happens after tightening producer consumer nodes so it seems less possible for it to capture false dependencies. Open to other opinions though. I should at least incorporate the test from #5202 , and this PR has the error test but not the correctness test. |
This also fixes #5201 (makes it error out), but that revealed my error message is wrong. I think the constraint is that if you have a producer consumer pair, where the consumer is async, then the store_at of the producer can never be between the compute_at and store_at levels of the consumer, and the compute_at of the producer can only be between the compute_at and store_at levels of the consumer if the producer is also async. |
Correction: If you have a producer consumer pair, where the consumer is async, and the compute_at of the producer is in between the store_at and compute_at of the consumer, then the producer must be both async and store_at outside the store_at level of the consumer. |
I cherry-picked the tests from #5201, and one of them fails, so clearly something is still wrong :( |
The only thing I remember is that this bug was pretty tricky and required some graph level reasoning, but the rest is pretty foggy for now:( I'll need to reread the PR to remember it again. I also remember there was another test I found which was failing, it's not in PR (all tests in the PR should be passing), but I sort of vaguely remember what was happening there. I'm going to try to reproduce it again. |
I think the failing test is actually caused by the new sliding window behavior. There are too many release calls to the storage folding semaphores because the consumer has acquired extra loop iterations. |
* Implement sliding window warmups by backing up the loop min. * Fix indirect sliding windows. * Improve is_monotonic. * Small cleanups. * Avoid generating vector valued bounds. * Fix build error on some compilers. * Fix loop bounds. * Don't try to slide things that should just be compute_at the store_at location. * Print condition when printing boxes. * Less things broken. * Add/fix comments. * Comments * Fix async by moving if inside consume (and so inside acquires). * Fix division. * This doesn't work on master either. * Add TODO * Acquire is not a no-op. * Add comment about unfortunate simplification. * Remove debug(0) * Add simplification of for { acquire { noop } } * Fix folding factors finally! * Update storage_folding test. * Fix bug when cloning a semaphore used more than once. * Disable failing test. * Work around bad complexity in is_monotonic. * Fix sub bug * Significantly faster schedule for blur. * Update tracing test. * New simplifications that help with upsampled and downsampled sliding windows. * This doesn't need explicit folding any more. * Fix new simplifier rules. * Fix simplifier div rule * Remove ancient brittle test. * Fix simplify rule again * More LT -> EQ rules for mod * Fix nested sliding windows with upsamples. * Replace hack with better solution. * Add missing override * Don't rewrite loop variable if the min doesn't change. * Refactor sliding window lowering. * Fixed bounds growing redundantly for independent producers. * Don't take the union unless possibly needed. * Respect conditional provide/required. * Add missing overrides * Much better schedule. * Use a smaller image for blur benchmarking so that different schedules have different perf * Replace Interval with ConstantInterval for is_monotonic. * Don't try to handle unsigned deltas. * Add failing test. * Remove unused new code. * Remove weird debugging code. * Avoid expanding bounds of split producers * Remove stray likely_if_innermost. * Remove old autotune tests. * Update test for guarded producers. * Reenable test. * Update trace for guarding producers. * Don't overwrite required.used * Handle LE/LT in bounds of lanes in vectorize * Fix acquire and release of warmups * Earlier fix for multiply cloned acquires was wrong. * Handle nested vectorization. * clang-format * Remove autotune_bug_* tests * Fix shadowing error on some compilers. * Appease overzealous clang-tidy warning. * clang-format * Don't use silly hack. * clang-tidy... * It's no longer safe to assume monotonic means bounds_of_expr_in_scope is exact * Address review comments * Add comment * Add missing override. * Fix constant interval issues. * Revert and remove empty interval * Fix multiply!? * Reduce need for simplifications. * Simplifications from dsharletg/sliding-window branch * Don't learn likely(x) and x. * Add comment * Add some min/max rules. * Also substitute facts from asserts * Remove is_empty from header too. * More rules * Add double stairstep rule. * Disable rule that uncovers bugs. * Consider anded expressions as if they were independent nested ifs. * Add promise_clamped to producer guards. * Revert "Consider anded expressions as if they were independent nested ifs." This reverts commit 03efb3f. * Don't combine ifs, split them instead. * Update trace * clang-tidy/clang-format * Remove splitting of ifs, it breaks brittle tests. * Safer check on old conditions. * Fix producer guard condition. * Interval fixes. * Handle sliding backwards * Handle transitive dependencies. * Backport abadams' fix from abadams/slide_over_split_loop * Fix select visitor. * More simplifier rules. * Bring back old logic as a fallback. * Avoid specializations corrupting sliding * Fix boneheaded rule errors. * Fix slightly conservative bounds at the max for split case. * This pattern is too sensitive to the simplifier. In a real use case, it's just a sum, and the result can be subtracted after doing a reduction. * Add missing clamp rule * Don't count unlikely loops as inner loops for likely_if_innermost * Use <= instead of == to solve for the new loop min Useful when the warmup is a partial vector or something * Verify simplifier changes and add variants as suggested by synthesizer * Make implicit assumption explicit, for clarity * Use find_constant_bounds * Guard against expanded bounds more effectively. * Update tracing test * Small cleanup. * Don't simplify/prove using lets that might change value. * Stronger solving without expanding lets. * New simplifier rule for alignment * Fix case where no warmup needed * Add some useful rules. * Add safety check on when we can use the new loop min. * Better proof to avoid hacky condition that is hard to prove. * Small cleanup and use the nice new folding factors. * Bring back unrolled producer test. * clang-format * Expand comment. * Fix sliding backwards condition. * min(new_loop_min, loop_min) isn't needed any more. * We need that min, but we can be more conservative about it. * Stronger handling of previous loop mins. * Remove unused is_monotonic_strong. * Remove ConstantInterval::make_intersection. * Avoid need to handle uint specially. * Add cache for depends_on. * Reduce unnecessarily large cache scope * The first part of the key is always the same Co-authored-by: Andrew Adams <[email protected]>
I think I found a failing test, it's hanging in my branch for #5202:
|
Ok, I just tried it in your branch and it does produce an error message instead of hanging. |
Dillon doesn't remember the reasoning behind that line, so I just removed it. This will need testing inside Google to see if it breaks anything. |
I can do testing inside Google, if it's ready. |
Should be ready. I just did a merge with main. |
I'll pull this into Google to test |
AFAICT, the async() directive is not used anywhere inside Google |
I was under the impression @vksnk was using it |
I thought we wanted to test it in Google, because no one could remember what the removed lines in storage_folding were for. I am only planning to use async for something, but not quite yet. |
There were some usages of it for a long-defunct project, but that code was deleted a while back. |
Ah, right. This is a change to storage folding, and it needs to be tested for that reason. |
I don't see any google3 failures. |
Just needs a review then. |
// Add post-synchronization | ||
internal_assert(!sema.empty()) << "Duplicate produce node: " << op->name << "\n"; | ||
Stmt body = op->body; | ||
|
||
// We don't currently support waiting on producers to the producer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to the producer" -> "in the producer"?
…alide#7868) * Disallow async nestings that violate read after write dependencies Fixes halide#7867 * Add test * Add another failure case, and improve error message * Add some more tests * Update test * Add new test to cmakelists * Fix for llvm trunk * Always acquire the folding semaphore, even if unused * Skip async_order test under wasm * trigger buildbots --------- Co-authored-by: Volodymyr Kysenko <[email protected]> Co-authored-by: Steven Johnson <[email protected]>
Fixes #7867
Fixes #5175
Discovered while trying to improve the async schedule in the async_gpu performance test so that it's less flaky.