-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache in DeepRejectCtxt
#133524
Cache in DeepRejectCtxt
#133524
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. r? lcnr
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (9c86b88): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (secondary 3.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 792.856s -> 794.08s (0.15%) |
if self.cache.contains(&(lhs, rhs)) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this cache access (and inserts) to after the rhs check.
if types_may_unify
returns right away we won't need to cache, only need to cache for branches which may deeply recur
[DO NOT MERGE] bootstrap with `-Znext-solver=globally` A revival of rust-lang#124812. Current status: `./x.py b --stage 2` passes 🎉 ### commits - rust-lang#133501 - rust-lang#133493 - 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR - c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now - rust-lang#133517 * rust-lang#133518 * rust-lang#133519 * rust-lang#133520 * rust-lang#133521 * rust-lang#133524 r? `@ghost`
[DO NOT MERGE] bootstrap with `-Znext-solver=globally` A revival of rust-lang#124812. Current status: `./x.py b --stage 2` passes 🎉 ### commits - rust-lang#133501 - rust-lang#133493 - 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR - c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now - rust-lang#133517 * rust-lang#133518 * rust-lang#133519 * rust-lang#133520 * rust-lang#133521 * rust-lang#133524 r? `@ghost`
856c616
to
3eccdbb
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. - Old solver: 5.31 seconds - New solver with this approach: 17.29 seconds - New solver with an (alternative) depth based approach: 16.87 seconds r? lcnr
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (3a95639): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.8%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 793.974s -> 792.032s (-0.24%) |
3eccdbb
to
77dceb9
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. - Old solver: 5.31 seconds - New solver with this approach: 17.29 seconds - New solver with an (alternative) depth based approach: 16.87 seconds r? lcnr
[DO NOT MERGE] bootstrap with `-Znext-solver=globally` A revival of rust-lang#124812. Current status: `./x.py b --stage 2` passes 🎉 ### commits - rust-lang#133501 - rust-lang#133493 - 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR - c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now - rust-lang#133517 * rust-lang#133518 * rust-lang#133519 * rust-lang#133520 * rust-lang#133521 * rust-lang#133524 r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (4bcdf3c): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 793.974s -> 794.313s (0.04%) |
fast-reject: add counter to avoid hangs alternative approach to rust-lang#133524 r? `@compiler-errors`
fast-reject: add cache slightly modified version of rust-lang#133524 I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply. [^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop r? `@compiler-errors`
fast-reject: add cache slightly modified version of rust-lang#133524 I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply. [^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop r? `@compiler-errors`
fast-reject: add cache slightly modified version of rust-lang#133524 I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply. [^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop r? `@compiler-errors`
fast-reject: add cache slightly modified version of rust-lang#133524 I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply. [^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop r? `@compiler-errors`
…rors fast-reject: add cache slightly modified version of rust-lang#133524 I tried a few alternatives: - simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: - add a cache: results in a bigger performance impact typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` [^1]: i've incremented a counter in the place I now decrement the depth at and then printed it on drop r? `@compiler-errors`
Used to avoid a hang with the new solver when compiling itertools
we're tried some alternative approaches in #133566
r? lcnr