Cache in `DeepRejectCtxt` #133524

compiler-errors · 2024-11-27T03:32:35Z

Used to avoid a hang with the new solver when compiling itertools

Old solver: 5.31 seconds
New solver with this approach: 16.87 seconds

we're tried some alternative approaches in #133566

r? lcnr

compiler-errors · 2024-11-27T03:32:43Z

@bors try @rust-timer queue

bors · 2024-11-27T03:33:53Z

⌛ Trying commit 856c616 with merge 9c86b88...

Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. r? lcnr

bors · 2024-11-27T05:16:46Z

☀️ Try build successful - checks-actions
Build commit: 9c86b88 (9c86b880cb739fbb8233cc0482755510699129bf)

rust-timer · 2024-11-27T07:27:46Z

Finished benchmarking commit (9c86b88): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.1%, 1.8%]	17
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	0.8%	[0.1%, 1.8%]	17

Max RSS (memory usage)

Results (secondary 3.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.7%	[3.7%, 3.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 792.856s -> 794.08s (0.15%)
Artifact size: 336.23 MiB -> 336.23 MiB (0.00%)

lcnr · 2024-11-27T07:40:55Z

compiler/rustc_type_ir/src/fast_reject.rs

+        if self.cache.contains(&(lhs, rhs)) {
+            return true;
+        }


you can move this cache access (and inserts) to after the rhs check.

if types_may_unify returns right away we won't need to cache, only need to cache for branches which may deeply recur

[DO NOT MERGE] bootstrap with `-Znext-solver=globally` A revival of rust-lang#124812. Current status: `./x.py b --stage 2` passes 🎉 ### commits - rust-lang#133501 - rust-lang#133493 - 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR - c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now - rust-lang#133517 * rust-lang#133518 * rust-lang#133519 * rust-lang#133520 * rust-lang#133521 * rust-lang#133524 r? `@ghost`

compiler-errors · 2024-11-27T20:37:39Z

@bors try @rust-timer queue

bors · 2024-11-27T20:38:49Z

⌛ Trying commit 3eccdbb with merge 3a95639...

Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. - Old solver: 5.31 seconds - New solver with this approach: 17.29 seconds - New solver with an (alternative) depth based approach: 16.87 seconds r? lcnr

bors · 2024-11-27T22:21:38Z

☀️ Try build successful - checks-actions
Build commit: 3a95639 (3a956397f266188b18cddabc6e664245334a5bb4)

rust-timer · 2024-11-27T23:39:19Z

Finished benchmarking commit (3a95639): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	1.2%	[0.1%, 2.1%]	14
Regressions ❌ (secondary)	1.7%	[1.7%, 1.7%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.2%	[0.1%, 2.1%]	14

Max RSS (memory usage)

Results (primary -2.8%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.1%	[3.1%, 3.1%]	1
Improvements ✅ (primary)	-2.8%	[-4.4%, -1.3%]	2
Improvements ✅ (secondary)	-3.2%	[-3.2%, -3.2%]	1
All ❌✅ (primary)	-2.8%	[-4.4%, -1.3%]	2

Cycles

Results (primary -1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-1.7%, -1.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.7%	[-1.7%, -1.7%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 793.974s -> 792.032s (-0.24%)
Artifact size: 335.91 MiB -> 335.94 MiB (0.01%)

compiler-errors · 2024-11-28T00:33:36Z

@bors try @rust-timer queue

Cache in `DeepRejectCtxt` We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho. - Old solver: 5.31 seconds - New solver with this approach: 17.29 seconds - New solver with an (alternative) depth based approach: 16.87 seconds r? lcnr

bors · 2024-11-28T00:34:47Z

⌛ Trying commit 77dceb9 with merge 4bcdf3c...

[DO NOT MERGE] bootstrap with `-Znext-solver=globally` A revival of rust-lang#124812. Current status: `./x.py b --stage 2` passes 🎉 ### commits - rust-lang#133501 - rust-lang#133493 - 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR - c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now - rust-lang#133517 * rust-lang#133518 * rust-lang#133519 * rust-lang#133520 * rust-lang#133521 * rust-lang#133524 r? `@ghost`

bors · 2024-11-28T02:21:33Z

☀️ Try build successful - checks-actions
Build commit: 4bcdf3c (4bcdf3c335b7d9d01b1bd7ccae4c4ac51dbc91ae)

rust-timer · 2024-11-28T03:38:43Z

Finished benchmarking commit (4bcdf3c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.4%, 1.3%]	13
Regressions ❌ (secondary)	1.4%	[1.2%, 1.7%]	2
Improvements ✅ (primary)	-0.1%	[-0.2%, -0.1%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[-0.2%, 1.3%]	16

Max RSS (memory usage)

Results (primary -1.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[2.4%, 2.4%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.9%	[-4.6%, -3.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.8%	[-4.6%, 2.4%]	3

Cycles

Results (primary -1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.3%	[-1.9%, -0.7%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.3%	[-1.9%, -0.7%]	2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 793.974s -> 794.313s (0.04%)
Artifact size: 335.91 MiB -> 335.91 MiB (-0.00%)

fast-reject: add counter to avoid hangs alternative approach to rust-lang#133524 r? `@compiler-errors`

fast-reject: add cache slightly modified version of rust-lang#133524 I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply. [^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop r? `@compiler-errors`

…rors fast-reject: add cache slightly modified version of rust-lang#133524 I tried a few alternatives: - simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]: - add a cache: results in a bigger performance impact typenum ```rust 1098842 counts ( 1) 670511 (61.0%, 61.0%): dropping after 1 ( 2) 358785 (32.7%, 93.7%): dropping after 0 ( 3) 25191 ( 2.3%, 96.0%): dropping after 2 ( 4) 10912 ( 1.0%, 97.0%): dropping after 4 ( 5) 6461 ( 0.6%, 97.5%): dropping after 3 ( 6) 5239 ( 0.5%, 98.0%): dropping after 5 ( 7) 2528 ( 0.2%, 98.3%): dropping after 8 ( 8) 2188 ( 0.2%, 98.5%): dropping after 1094 ( 9) 2097 ( 0.2%, 98.6%): dropping after 6 ( 10) 1179 ( 0.1%, 98.7%): dropping after 34 ( 11) 1148 ( 0.1%, 98.9%): dropping after 7 ( 12) 822 ( 0.1%, 98.9%): dropping after 10 ``` bitmaps ```rust 533346 counts ( 1) 526166 (98.7%, 98.7%): dropping after 1 ( 2) 4562 ( 0.9%, 99.5%): dropping after 0 ( 3) 2072 ( 0.4%, 99.9%): dropping after 1024 ( 4) 305 ( 0.1%,100.0%): dropping after 2 ( 5) 106 ( 0.0%,100.0%): dropping after 4 ( 6) 30 ( 0.0%,100.0%): dropping after 8 ( 7) 18 ( 0.0%,100.0%): dropping after 3 ( 8) 17 ( 0.0%,100.0%): dropping after 44 ( 9) 15 ( 0.0%,100.0%): dropping after 168 ( 10) 8 ( 0.0%,100.0%): dropping after 14 ( 11) 7 ( 0.0%,100.0%): dropping after 13 ( 12) 7 ( 0.0%,100.0%): dropping after 24 ``` stage 2 compiler is mostly trivial, but has a few cases where we get >5000 ```rust 12987156 counts ( 1) 9280476 (71.5%, 71.5%): dropping after 0 ( 2) 2277841 (17.5%, 89.0%): dropping after 1 ( 3) 724888 ( 5.6%, 94.6%): dropping after 2 ( 4) 204005 ( 1.6%, 96.2%): dropping after 4 ( 5) 146537 ( 1.1%, 97.3%): dropping after 3 ( 6) 64287 ( 0.5%, 97.8%): dropping after 5 ( 7) 43938 ( 0.3%, 98.1%): dropping after 6 ( 8) 43758 ( 0.3%, 98.4%): dropping after 8 ( 9) 27220 ( 0.2%, 98.7%): dropping after 7 ( 10) 17374 ( 0.1%, 98.8%): dropping after 9 ( 11) 16015 ( 0.1%, 98.9%): dropping after 10 ( 12) 12855 ( 0.1%, 99.0%): dropping after 12 ( 13) 10494 ( 0.1%, 99.1%): dropping after 11 ( 14) 7553 ( 0.1%, 99.2%): dropping after 14 ``` [^1]: i've incremented a counter in the place I now decrement the depth at and then printed it on drop r? `@compiler-errors`

rustbot assigned lcnr Nov 27, 2024