-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic #21278
Conversation
I am still not satisfied with the performance of the denotation computations. It appears that this is actually a performance regression introduced by #18092. The nightly builds Since the If I remove CC @smarter |
I'm trying another approach where a cache (denotation or signature, etc.) would be valid as long as the provisional state is not changed. The provisional state of a type would store any component type that might be provisional in a map to track their changes, similar to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great to me! Few minor questions, 1 more important question about the cost of the new calls to currentProvisionalState
in a bunch of hot paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more minor questions in addition to Dale's review
Independently of thoses, why do the methods accessing/updating lastSymbol
not need to check the lastDenotationProvState
is up to date as well, given that it depends on denot
?
def currentProvisionalState(using Context): ProvisionalState = | ||
val state: ProvisionalState = util.HashMap() | ||
// Compared to `testProvisional`, we don't use short-circuiting or, | ||
// because we want to collect all provisional types. | ||
class ProAcc extends TypeAccumulator[Boolean]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep the Boolean
accumulator in the new scheme with a state
? Why not use a TypeTraverser
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the recursive result to set mightBeProvisional
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but then maybe TypeAccumulator[ProvisionalState]
and mightBeProvisional = x.nonEmpty
?
private def testProvisional(using Context): Boolean = | ||
type ProvisionalState = util.HashMap[Type, Type] | ||
|
||
def currentProvisionalState(using Context): ProvisionalState = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, it is important in the current implementation that we recompute the state from scratch each time, and not try to reuse the result of the previous currentProvisionalState computation itself. It might be worth documenting this too
test performance please |
Hey, there seems to be no new regressions found in the Open CB when compared with the latest nightly (3.6.0-RC1-bin-20240905-f285199-NIGHTLY) Total execution time for each of the builds (summed execution time from all GH Actions runners running in parallel): I've taken a look at the nightly from the previous nightly run (3.6.0-RC1-bin-20240902-f774497-NIGHTLY) - it executed in 3d 0h 49m 56s |
Thanks for the statistics @WojciechMazur! That's good to know. I had a nagging suspiciion that this is probably too expensive in this form, so it seems we need to target it better (or decide it's not worth trying). But, it's fantastic that we can get these large codebase statistics. This is a gamechanger for future attempts at optimizations! |
Just a reminder that the execution times from the OpenCB need to be taken with a grain of salt. The actual compilation times might differ because we don't have a stable, reproducible environment. Repeating the same build again has shortened the execution by ~1.5h |
test performance please |
performance test scheduled: 1 job(s) in queue, 0 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/21278/ to see the changes. Benchmarks is based on merging with main (ad8c21a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since performance improvements overall are unclear and might even be negative and this adds considerable complexity I think we should not merge this at this time.
Fixes #20217
The main issue is that if a type is provisional, its denotation, signature, and other values will not be cached. As a result, using
widen
on provisional types will repeatedly recompute these values.To enhance cache reuse, I introduced the concept of a provisional state, which maps all provisional type components (
TypeRef
,LazyRef
, andTypeVar
) to their information. For example, by comparing two provisional states, we can determine if aTypeVar
is instantiated to a type from last time, even if the type is still provisional. For non-provisional types, the states are empty.This PR includes two performance improvements:
denot
,signature
, etc.), and we reuse the cache value if the corresponding state has not changed.Add a local cache for widening results insideSinceresolveOverloaded
: Given that the denotation will not change in this part, we can safely cache the widening result.resolveOverloaded
usesconstrainResult
andnormalizedCompatible
, which could constrain TypeVars, the denotation may change as well, so we should not cache the result here.The compilation time in #20217 can be reduced to Scala 2 levels. We can test the results by publishing locally and compiling the project from #20217. Here are the (average) results on my laptop: