Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic #21278

Closed
wants to merge 3 commits into from

Conversation

noti0na1
Copy link
Member

@noti0na1 noti0na1 commented Jul 26, 2024

Fixes #20217

The main issue is that if a type is provisional, its denotation, signature, and other values will not be cached. As a result, using widen on provisional types will repeatedly recompute these values.

To enhance cache reuse, I introduced the concept of a provisional state, which maps all provisional type components (TypeRef, LazyRef, and TypeVar) to their information. For example, by comparing two provisional states, we can determine if a TypeVar is instantiated to a type from last time, even if the type is still provisional. For non-provisional types, the states are empty.

This PR includes two performance improvements:

  • Rewrite provisional testing: We can now collect all provisional type components during checking. Provisional states are added for cached values (denot, signature, etc.), and we reuse the cache value if the corresponding state has not changed.
  • Refactor TermRef widening logic: I inlined the logic here to avoid computing provisional state twice.
  • Add a local cache for widening results inside resolveOverloaded: Given that the denotation will not change in this part, we can safely cache the widening result. Since resolveOverloaded uses constrainResult and normalizedCompatible, which could constrain TypeVars, the denotation may change as well, so we should not cache the result here.

The compilation time in #20217 can be reduced to Scala 2 levels. We can test the results by publishing locally and compiling the project from #20217. Here are the (average) results on my laptop:

  • 2.13.14: 12s
  • 3.4.2: 67s
  • 3.6.0-RC1-bin-20240724-f0b3a2b-NIGHTLY: 62s
  • This PR: 16s

@noti0na1 noti0na1 changed the title Refactor TermRef widening logic; Cache widening results during overloading resolving. Performance Improvement: Refactor TermRef widening logic; Cache widening results during overloading resolving Jul 26, 2024
@noti0na1 noti0na1 marked this pull request as ready for review July 26, 2024 12:46
@noti0na1 noti0na1 requested a review from odersky July 26, 2024 12:46
@noti0na1
Copy link
Member Author

I am still not satisfied with the performance of the denotation computations. It appears that this is actually a performance regression introduced by #18092. The nightly builds 3.3.2-RC1-bin-20230707-7ede150-NIGHTLY and 3.3.2-RC1-bin-20230710-ed319e8-NIGHTLY confirm this issue.

Since the merge denotations require signature and #18092 disables the signature cache for provisional method types, computing an overloading denotation has become slower following that PR.

If I remove && !isProvisional from MethodOrPoly, the compiling time of the project from #20217 can be improved as well. However, while all tests pass, I believe this approach is incorrect given the semantics of isProvisional.

CC @smarter

@noti0na1 noti0na1 assigned noti0na1 and unassigned odersky Jul 30, 2024
@noti0na1
Copy link
Member Author

noti0na1 commented Jul 30, 2024

I'm trying another approach where a cache (denotation or signature, etc.) would be valid as long as the provisional state is not changed.

The provisional state of a type would store any component type that might be provisional in a map to track their changes, similar to the reductionContext in MatchType.

@noti0na1 noti0na1 changed the title Performance Improvement: Refactor TermRef widening logic; Cache widening results during overloading resolving Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic Aug 6, 2024
@noti0na1 noti0na1 assigned odersky and unassigned noti0na1 Aug 7, 2024
Copy link
Member

@dwijnand dwijnand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great to me! Few minor questions, 1 more important question about the cost of the new calls to currentProvisionalState in a bunch of hot paths.

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
Copy link
Contributor

@EugeneFlesselle EugeneFlesselle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more minor questions in addition to Dale's review

Independently of thoses, why do the methods accessing/updating lastSymbol not need to check the lastDenotationProvState is up to date as well, given that it depends on denot ?

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
def currentProvisionalState(using Context): ProvisionalState =
val state: ProvisionalState = util.HashMap()
// Compared to `testProvisional`, we don't use short-circuiting or,
// because we want to collect all provisional types.
class ProAcc extends TypeAccumulator[Boolean]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep the Boolean accumulator in the new scheme with a state? Why not use a TypeTraverser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the recursive result to set mightBeProvisional field

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but then maybe TypeAccumulator[ProvisionalState] and mightBeProvisional = x.nonEmpty?

private def testProvisional(using Context): Boolean =
type ProvisionalState = util.HashMap[Type, Type]

def currentProvisionalState(using Context): ProvisionalState =
Copy link
Contributor

@EugeneFlesselle EugeneFlesselle Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, it is important in the current implementation that we recompute the state from scratch each time, and not try to reuse the result of the previous currentProvisionalState computation itself. It might be worth documenting this too

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved
@noti0na1
Copy link
Member Author

noti0na1 commented Sep 6, 2024

test performance please

@WojciechMazur
Copy link
Contributor

WojciechMazur commented Sep 8, 2024

Hey, there seems to be no new regressions found in the Open CB when compared with the latest nightly (3.6.0-RC1-bin-20240905-f285199-NIGHTLY)

Total execution time for each of the builds (summed execution time from all GH Actions runners running in parallel):
Nightly: 3d 0h 13m 57s
PR 1# run: 3d 3h 7m 7s
PR 2# run: 3d 1h 46m 13s

I've taken a look at the nightly from the previous nightly run (3.6.0-RC1-bin-20240902-f774497-NIGHTLY) - it executed in 3d 0h 49m 56s
At first glance, it seems to be executing longer for 3h more of accumulated, but to confirm that I'd need to re-run it. It's also possible that it could have been more unlucky by picking the less performant runners.
We've run it over 1611 projects, so it makes the average difference of ~7s.

@odersky
Copy link
Contributor

odersky commented Sep 8, 2024

Thanks for the statistics @WojciechMazur! That's good to know. I had a nagging suspiciion that this is probably too expensive in this form, so it seems we need to target it better (or decide it's not worth trying).

But, it's fantastic that we can get these large codebase statistics. This is a gamechanger for future attempts at optimizations!

@WojciechMazur
Copy link
Contributor

Just a reminder that the execution times from the OpenCB need to be taken with a grain of salt. The actual compilation times might differ because we don't have a stable, reproducible environment. Repeating the same build again has shortened the execution by ~1.5h

@mbovel
Copy link
Member

mbovel commented Sep 16, 2024

test performance please

@dottybot
Copy link
Member

performance test scheduled: 1 job(s) in queue, 0 running.

@dottybot
Copy link
Member

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/21278/ to see the changes.

Benchmarks is based on merging with main (ad8c21a)

noti0na1 added a commit that referenced this pull request Sep 18, 2024
A simple performance improvement extracted from #21278. This PR reduces `denot` calls in `widen`s.

This is a simpler approach which can reduce the compilation time of
#20217 from 60s to about 40s.
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since performance improvements overall are unclear and might even be negative and this adds considerable complexity I think we should not merge this at this time.

@odersky odersky closed this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

~5x compilation slowdown compared to 2.13 on ~400 files
7 participants