Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache in DeepRejectCtxt #133524

Closed
wants to merge 1 commit into from

Conversation

compiler-errors
Copy link
Member

@compiler-errors compiler-errors commented Nov 27, 2024

Used to avoid a hang with the new solver when compiling itertools

  • Old solver: 5.31 seconds
  • New solver with this approach: 16.87 seconds

we're tried some alternative approaches in #133566

r? lcnr

@compiler-errors
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 27, 2024
@bors
Copy link
Contributor

bors commented Nov 27, 2024

⌛ Trying commit 856c616 with merge 9c86b88...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 27, 2024
Cache in `DeepRejectCtxt`

We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho.

r? lcnr
@bors
Copy link
Contributor

bors commented Nov 27, 2024

☀️ Try build successful - checks-actions
Build commit: 9c86b88 (9c86b880cb739fbb8233cc0482755510699129bf)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9c86b88): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.8% [0.1%, 1.8%] 17
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) 0.8% [0.1%, 1.8%] 17

Max RSS (memory usage)

Results (secondary 3.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.7% [3.7%, 3.7%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 792.856s -> 794.08s (0.15%)
Artifact size: 336.23 MiB -> 336.23 MiB (0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 27, 2024
Comment on lines 240 to 257
if self.cache.contains(&(lhs, rhs)) {
return true;
}
Copy link
Contributor

@lcnr lcnr Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this cache access (and inserts) to after the rhs check.

if types_may_unify returns right away we won't need to cache, only need to cache for branches which may deeply recur

@lcnr lcnr added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 27, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 27, 2024
[DO NOT MERGE] bootstrap with `-Znext-solver=globally`

A revival of rust-lang#124812.

Current status:

`./x.py b --stage 2` passes 🎉

### commits

- rust-lang#133501
- rust-lang#133493
- 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR
- c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now
- rust-lang#133517
* rust-lang#133518
* rust-lang#133519
* rust-lang#133520
* rust-lang#133521
* rust-lang#133524

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 27, 2024
[DO NOT MERGE] bootstrap with `-Znext-solver=globally`

A revival of rust-lang#124812.

Current status:

`./x.py b --stage 2` passes 🎉

### commits

- rust-lang#133501
- rust-lang#133493
- 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR
- c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now
- rust-lang#133517
* rust-lang#133518
* rust-lang#133519
* rust-lang#133520
* rust-lang#133521
* rust-lang#133524

r? `@ghost`
@compiler-errors
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 27, 2024
@bors
Copy link
Contributor

bors commented Nov 27, 2024

⌛ Trying commit 3eccdbb with merge 3a95639...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 27, 2024
Cache in `DeepRejectCtxt`

We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho.

- Old solver: 5.31 seconds
- New solver with this approach: 17.29 seconds
- New solver with an (alternative) depth based approach: 16.87 seconds

r? lcnr
@bors
Copy link
Contributor

bors commented Nov 27, 2024

☀️ Try build successful - checks-actions
Build commit: 3a95639 (3a956397f266188b18cddabc6e664245334a5bb4)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3a95639): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
1.2% [0.1%, 2.1%] 14
Regressions ❌
(secondary)
1.7% [1.7%, 1.7%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.2% [0.1%, 2.1%] 14

Max RSS (memory usage)

Results (primary -2.8%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
-2.8% [-4.4%, -1.3%] 2
Improvements ✅
(secondary)
-3.2% [-3.2%, -3.2%] 1
All ❌✅ (primary) -2.8% [-4.4%, -1.3%] 2

Cycles

Results (primary -1.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-1.7%, -1.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.7% [-1.7%, -1.7%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 793.974s -> 792.032s (-0.24%)
Artifact size: 335.91 MiB -> 335.94 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 27, 2024
@compiler-errors
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 28, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 28, 2024
Cache in `DeepRejectCtxt`

We could alternatively just keep track of a depth (and return `true` if we hit the depth). I checked and this works too, and may impact perf less on the old solver side. Let's investigate how this stacks up, tho.

- Old solver: 5.31 seconds
- New solver with this approach: 17.29 seconds
- New solver with an (alternative) depth based approach: 16.87 seconds

r? lcnr
@bors
Copy link
Contributor

bors commented Nov 28, 2024

⌛ Trying commit 77dceb9 with merge 4bcdf3c...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 28, 2024
[DO NOT MERGE] bootstrap with `-Znext-solver=globally`

A revival of rust-lang#124812.

Current status:

`./x.py b --stage 2` passes 🎉

### commits

- rust-lang#133501
- rust-lang#133493
- 9456bfe and b21b116 reimplement candidate preference based on rust-lang#132325, not yet a separate PR
- c3ef9cd is a rebased version of rust-lang#125334, unsure whether I actually want to land this PR for now
- rust-lang#133517
* rust-lang#133518
* rust-lang#133519
* rust-lang#133520
* rust-lang#133521
* rust-lang#133524

r? `@ghost`
@bors
Copy link
Contributor

bors commented Nov 28, 2024

☀️ Try build successful - checks-actions
Build commit: 4bcdf3c (4bcdf3c335b7d9d01b1bd7ccae4c4ac51dbc91ae)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4bcdf3c): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.7% [0.4%, 1.3%] 13
Regressions ❌
(secondary)
1.4% [1.2%, 1.7%] 2
Improvements ✅
(primary)
-0.1% [-0.2%, -0.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [-0.2%, 1.3%] 16

Max RSS (memory usage)

Results (primary -1.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [2.4%, 2.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.9% [-4.6%, -3.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.8% [-4.6%, 2.4%] 3

Cycles

Results (primary -1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.9%, -0.7%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-1.9%, -0.7%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 793.974s -> 794.313s (0.04%)
Artifact size: 335.91 MiB -> 335.91 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 28, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 28, 2024
fast-reject: add counter to avoid hangs

alternative approach to rust-lang#133524

r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2024
fast-reject: add cache

slightly modified version of rust-lang#133524

I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]:

typenum
```rust
1098842 counts
(  1)   670511 (61.0%, 61.0%): dropping after 1
(  2)   358785 (32.7%, 93.7%): dropping after 0
(  3)    25191 ( 2.3%, 96.0%): dropping after 2
(  4)    10912 ( 1.0%, 97.0%): dropping after 4
(  5)     6461 ( 0.6%, 97.5%): dropping after 3
(  6)     5239 ( 0.5%, 98.0%): dropping after 5
(  7)     2528 ( 0.2%, 98.3%): dropping after 8
(  8)     2188 ( 0.2%, 98.5%): dropping after 1094
(  9)     2097 ( 0.2%, 98.6%): dropping after 6
( 10)     1179 ( 0.1%, 98.7%): dropping after 34
( 11)     1148 ( 0.1%, 98.9%): dropping after 7
( 12)      822 ( 0.1%, 98.9%): dropping after 10
```
bitmaps
```rust
533346 counts
(  1)   526166 (98.7%, 98.7%): dropping after 1
(  2)     4562 ( 0.9%, 99.5%): dropping after 0
(  3)     2072 ( 0.4%, 99.9%): dropping after 1024
(  4)      305 ( 0.1%,100.0%): dropping after 2
(  5)      106 ( 0.0%,100.0%): dropping after 4
(  6)       30 ( 0.0%,100.0%): dropping after 8
(  7)       18 ( 0.0%,100.0%): dropping after 3
(  8)       17 ( 0.0%,100.0%): dropping after 44
(  9)       15 ( 0.0%,100.0%): dropping after 168
( 10)        8 ( 0.0%,100.0%): dropping after 14
( 11)        7 ( 0.0%,100.0%): dropping after 13
( 12)        7 ( 0.0%,100.0%): dropping after 24
```
stage 2 compiler is mostly trivial, but has a few cases where we get >5000
```rust
12987156 counts
(  1)  9280476 (71.5%, 71.5%): dropping after 0
(  2)  2277841 (17.5%, 89.0%): dropping after 1
(  3)   724888 ( 5.6%, 94.6%): dropping after 2
(  4)   204005 ( 1.6%, 96.2%): dropping after 4
(  5)   146537 ( 1.1%, 97.3%): dropping after 3
(  6)    64287 ( 0.5%, 97.8%): dropping after 5
(  7)    43938 ( 0.3%, 98.1%): dropping after 6
(  8)    43758 ( 0.3%, 98.4%): dropping after 8
(  9)    27220 ( 0.2%, 98.7%): dropping after 7
( 10)    17374 ( 0.1%, 98.8%): dropping after 9
( 11)    16015 ( 0.1%, 98.9%): dropping after 10
( 12)    12855 ( 0.1%, 99.0%): dropping after 12
( 13)    10494 ( 0.1%, 99.1%): dropping after 11
( 14)     7553 ( 0.1%, 99.2%): dropping after 14
```

Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply.

[^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop
r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2024
fast-reject: add cache

slightly modified version of rust-lang#133524

I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]:

typenum
```rust
1098842 counts
(  1)   670511 (61.0%, 61.0%): dropping after 1
(  2)   358785 (32.7%, 93.7%): dropping after 0
(  3)    25191 ( 2.3%, 96.0%): dropping after 2
(  4)    10912 ( 1.0%, 97.0%): dropping after 4
(  5)     6461 ( 0.6%, 97.5%): dropping after 3
(  6)     5239 ( 0.5%, 98.0%): dropping after 5
(  7)     2528 ( 0.2%, 98.3%): dropping after 8
(  8)     2188 ( 0.2%, 98.5%): dropping after 1094
(  9)     2097 ( 0.2%, 98.6%): dropping after 6
( 10)     1179 ( 0.1%, 98.7%): dropping after 34
( 11)     1148 ( 0.1%, 98.9%): dropping after 7
( 12)      822 ( 0.1%, 98.9%): dropping after 10
```
bitmaps
```rust
533346 counts
(  1)   526166 (98.7%, 98.7%): dropping after 1
(  2)     4562 ( 0.9%, 99.5%): dropping after 0
(  3)     2072 ( 0.4%, 99.9%): dropping after 1024
(  4)      305 ( 0.1%,100.0%): dropping after 2
(  5)      106 ( 0.0%,100.0%): dropping after 4
(  6)       30 ( 0.0%,100.0%): dropping after 8
(  7)       18 ( 0.0%,100.0%): dropping after 3
(  8)       17 ( 0.0%,100.0%): dropping after 44
(  9)       15 ( 0.0%,100.0%): dropping after 168
( 10)        8 ( 0.0%,100.0%): dropping after 14
( 11)        7 ( 0.0%,100.0%): dropping after 13
( 12)        7 ( 0.0%,100.0%): dropping after 24
```
stage 2 compiler is mostly trivial, but has a few cases where we get >5000
```rust
12987156 counts
(  1)  9280476 (71.5%, 71.5%): dropping after 0
(  2)  2277841 (17.5%, 89.0%): dropping after 1
(  3)   724888 ( 5.6%, 94.6%): dropping after 2
(  4)   204005 ( 1.6%, 96.2%): dropping after 4
(  5)   146537 ( 1.1%, 97.3%): dropping after 3
(  6)    64287 ( 0.5%, 97.8%): dropping after 5
(  7)    43938 ( 0.3%, 98.1%): dropping after 6
(  8)    43758 ( 0.3%, 98.4%): dropping after 8
(  9)    27220 ( 0.2%, 98.7%): dropping after 7
( 10)    17374 ( 0.1%, 98.8%): dropping after 9
( 11)    16015 ( 0.1%, 98.9%): dropping after 10
( 12)    12855 ( 0.1%, 99.0%): dropping after 12
( 13)    10494 ( 0.1%, 99.1%): dropping after 11
( 14)     7553 ( 0.1%, 99.2%): dropping after 14
```

Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply.

[^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop

r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2024
fast-reject: add cache

slightly modified version of rust-lang#133524

I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]:

typenum
```rust
1098842 counts
(  1)   670511 (61.0%, 61.0%): dropping after 1
(  2)   358785 (32.7%, 93.7%): dropping after 0
(  3)    25191 ( 2.3%, 96.0%): dropping after 2
(  4)    10912 ( 1.0%, 97.0%): dropping after 4
(  5)     6461 ( 0.6%, 97.5%): dropping after 3
(  6)     5239 ( 0.5%, 98.0%): dropping after 5
(  7)     2528 ( 0.2%, 98.3%): dropping after 8
(  8)     2188 ( 0.2%, 98.5%): dropping after 1094
(  9)     2097 ( 0.2%, 98.6%): dropping after 6
( 10)     1179 ( 0.1%, 98.7%): dropping after 34
( 11)     1148 ( 0.1%, 98.9%): dropping after 7
( 12)      822 ( 0.1%, 98.9%): dropping after 10
```
bitmaps
```rust
533346 counts
(  1)   526166 (98.7%, 98.7%): dropping after 1
(  2)     4562 ( 0.9%, 99.5%): dropping after 0
(  3)     2072 ( 0.4%, 99.9%): dropping after 1024
(  4)      305 ( 0.1%,100.0%): dropping after 2
(  5)      106 ( 0.0%,100.0%): dropping after 4
(  6)       30 ( 0.0%,100.0%): dropping after 8
(  7)       18 ( 0.0%,100.0%): dropping after 3
(  8)       17 ( 0.0%,100.0%): dropping after 44
(  9)       15 ( 0.0%,100.0%): dropping after 168
( 10)        8 ( 0.0%,100.0%): dropping after 14
( 11)        7 ( 0.0%,100.0%): dropping after 13
( 12)        7 ( 0.0%,100.0%): dropping after 24
```
stage 2 compiler is mostly trivial, but has a few cases where we get >5000
```rust
12987156 counts
(  1)  9280476 (71.5%, 71.5%): dropping after 0
(  2)  2277841 (17.5%, 89.0%): dropping after 1
(  3)   724888 ( 5.6%, 94.6%): dropping after 2
(  4)   204005 ( 1.6%, 96.2%): dropping after 4
(  5)   146537 ( 1.1%, 97.3%): dropping after 3
(  6)    64287 ( 0.5%, 97.8%): dropping after 5
(  7)    43938 ( 0.3%, 98.1%): dropping after 6
(  8)    43758 ( 0.3%, 98.4%): dropping after 8
(  9)    27220 ( 0.2%, 98.7%): dropping after 7
( 10)    17374 ( 0.1%, 98.8%): dropping after 9
( 11)    16015 ( 0.1%, 98.9%): dropping after 10
( 12)    12855 ( 0.1%, 99.0%): dropping after 12
( 13)    10494 ( 0.1%, 99.1%): dropping after 11
( 14)     7553 ( 0.1%, 99.2%): dropping after 14
```

Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply.

[^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop

r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 29, 2024
fast-reject: add cache

slightly modified version of rust-lang#133524

I originally wanted to simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]:

typenum
```rust
1098842 counts
(  1)   670511 (61.0%, 61.0%): dropping after 1
(  2)   358785 (32.7%, 93.7%): dropping after 0
(  3)    25191 ( 2.3%, 96.0%): dropping after 2
(  4)    10912 ( 1.0%, 97.0%): dropping after 4
(  5)     6461 ( 0.6%, 97.5%): dropping after 3
(  6)     5239 ( 0.5%, 98.0%): dropping after 5
(  7)     2528 ( 0.2%, 98.3%): dropping after 8
(  8)     2188 ( 0.2%, 98.5%): dropping after 1094
(  9)     2097 ( 0.2%, 98.6%): dropping after 6
( 10)     1179 ( 0.1%, 98.7%): dropping after 34
( 11)     1148 ( 0.1%, 98.9%): dropping after 7
( 12)      822 ( 0.1%, 98.9%): dropping after 10
```
bitmaps
```rust
533346 counts
(  1)   526166 (98.7%, 98.7%): dropping after 1
(  2)     4562 ( 0.9%, 99.5%): dropping after 0
(  3)     2072 ( 0.4%, 99.9%): dropping after 1024
(  4)      305 ( 0.1%,100.0%): dropping after 2
(  5)      106 ( 0.0%,100.0%): dropping after 4
(  6)       30 ( 0.0%,100.0%): dropping after 8
(  7)       18 ( 0.0%,100.0%): dropping after 3
(  8)       17 ( 0.0%,100.0%): dropping after 44
(  9)       15 ( 0.0%,100.0%): dropping after 168
( 10)        8 ( 0.0%,100.0%): dropping after 14
( 11)        7 ( 0.0%,100.0%): dropping after 13
( 12)        7 ( 0.0%,100.0%): dropping after 24
```
stage 2 compiler is mostly trivial, but has a few cases where we get >5000
```rust
12987156 counts
(  1)  9280476 (71.5%, 71.5%): dropping after 0
(  2)  2277841 (17.5%, 89.0%): dropping after 1
(  3)   724888 ( 5.6%, 94.6%): dropping after 2
(  4)   204005 ( 1.6%, 96.2%): dropping after 4
(  5)   146537 ( 1.1%, 97.3%): dropping after 3
(  6)    64287 ( 0.5%, 97.8%): dropping after 5
(  7)    43938 ( 0.3%, 98.1%): dropping after 6
(  8)    43758 ( 0.3%, 98.4%): dropping after 8
(  9)    27220 ( 0.2%, 98.7%): dropping after 7
( 10)    17374 ( 0.1%, 98.8%): dropping after 9
( 11)    16015 ( 0.1%, 98.9%): dropping after 10
( 12)    12855 ( 0.1%, 99.0%): dropping after 12
( 13)    10494 ( 0.1%, 99.1%): dropping after 11
( 14)     7553 ( 0.1%, 99.2%): dropping after 14
```

Given that we have crates which frequently rely on fairly deep recursion, actually using a cache seems better than using an arbitrary cutoff here. Having an impl which is large enough to trigger a cutoff instead of getting rejected noticeably impacts perf, so just using a cache in these cases seems better to me. Does not matter too much in the end, we only have to make sure we don't regress crates which don't recurse deeply.

[^1]: i've incremented a counter in the place I now call `if cache.get(&(lhs, rhs))` and then printed it on drop

r? `@compiler-errors`
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 2, 2024
…rors

fast-reject: add cache

slightly modified version of rust-lang#133524

I tried a few alternatives:
- simply bail after recursion for a certain amount of times, however, looking at the number of steps taken while compiling different crates we get the following results[^1]:
- add a cache: results in a bigger performance impact

typenum
```rust
1098842 counts
(  1)   670511 (61.0%, 61.0%): dropping after 1
(  2)   358785 (32.7%, 93.7%): dropping after 0
(  3)    25191 ( 2.3%, 96.0%): dropping after 2
(  4)    10912 ( 1.0%, 97.0%): dropping after 4
(  5)     6461 ( 0.6%, 97.5%): dropping after 3
(  6)     5239 ( 0.5%, 98.0%): dropping after 5
(  7)     2528 ( 0.2%, 98.3%): dropping after 8
(  8)     2188 ( 0.2%, 98.5%): dropping after 1094
(  9)     2097 ( 0.2%, 98.6%): dropping after 6
( 10)     1179 ( 0.1%, 98.7%): dropping after 34
( 11)     1148 ( 0.1%, 98.9%): dropping after 7
( 12)      822 ( 0.1%, 98.9%): dropping after 10
```
bitmaps
```rust
533346 counts
(  1)   526166 (98.7%, 98.7%): dropping after 1
(  2)     4562 ( 0.9%, 99.5%): dropping after 0
(  3)     2072 ( 0.4%, 99.9%): dropping after 1024
(  4)      305 ( 0.1%,100.0%): dropping after 2
(  5)      106 ( 0.0%,100.0%): dropping after 4
(  6)       30 ( 0.0%,100.0%): dropping after 8
(  7)       18 ( 0.0%,100.0%): dropping after 3
(  8)       17 ( 0.0%,100.0%): dropping after 44
(  9)       15 ( 0.0%,100.0%): dropping after 168
( 10)        8 ( 0.0%,100.0%): dropping after 14
( 11)        7 ( 0.0%,100.0%): dropping after 13
( 12)        7 ( 0.0%,100.0%): dropping after 24
```
stage 2 compiler is mostly trivial, but has a few cases where we get >5000
```rust
12987156 counts
(  1)  9280476 (71.5%, 71.5%): dropping after 0
(  2)  2277841 (17.5%, 89.0%): dropping after 1
(  3)   724888 ( 5.6%, 94.6%): dropping after 2
(  4)   204005 ( 1.6%, 96.2%): dropping after 4
(  5)   146537 ( 1.1%, 97.3%): dropping after 3
(  6)    64287 ( 0.5%, 97.8%): dropping after 5
(  7)    43938 ( 0.3%, 98.1%): dropping after 6
(  8)    43758 ( 0.3%, 98.4%): dropping after 8
(  9)    27220 ( 0.2%, 98.7%): dropping after 7
( 10)    17374 ( 0.1%, 98.8%): dropping after 9
( 11)    16015 ( 0.1%, 98.9%): dropping after 10
( 12)    12855 ( 0.1%, 99.0%): dropping after 12
( 13)    10494 ( 0.1%, 99.1%): dropping after 11
( 14)     7553 ( 0.1%, 99.2%): dropping after 14
```

[^1]: i've incremented a counter in the place I now decrement the depth at and then printed it on drop

r? `@compiler-errors`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants