New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic #21278

Closed

noti0na1 wants to merge 3 commits into scala:main from dotty-staging:fix-20217

Member

noti0na1 commented Jul 26, 2024 •

edited by Gedochao

Loading

Fixes #20217

The main issue is that if a type is provisional, its denotation, signature, and other values will not be cached. As a result, using widen on provisional types will repeatedly recompute these values.

To enhance cache reuse, I introduced the concept of a provisional state, which maps all provisional type components (TypeRef, LazyRef, and TypeVar) to their information. For example, by comparing two provisional states, we can determine if a TypeVar is instantiated to a type from last time, even if the type is still provisional. For non-provisional types, the states are empty.

This PR includes two performance improvements:

Rewrite provisional testing: We can now collect all provisional type components during checking. Provisional states are added for cached values (denot, signature, etc.), and we reuse the cache value if the corresponding state has not changed.
Refactor TermRef widening logic: I inlined the logic here to avoid computing provisional state twice.
~~Add a local cache for widening results inside resolveOverloaded: Given that the denotation will not change in this part, we can safely cache the widening result.~~ Since resolveOverloaded uses constrainResult and normalizedCompatible, which could constrain TypeVars, the denotation may change as well, so we should not cache the result here.

The compilation time in #20217 can be reduced to Scala 2 levels. We can test the results by publishing locally and compiling the project from #20217. Here are the (average) results on my laptop:

2.13.14: 12s
3.4.2: 67s
3.6.0-RC1-bin-20240724-f0b3a2b-NIGHTLY: 62s
This PR: 16s

noti0na1 changed the title ~~Refactor TermRef widening logic; Cache widening results during overloading resolving.~~ Performance Improvement: Refactor TermRef widening logic; Cache widening results during overloading resolving

noti0na1 marked this pull request as ready for review

July 26, 2024 12:46

noti0na1 requested a review from odersky

July 26, 2024 12:46

noti0na1 assigned odersky

Member Author

noti0na1 commented Jul 29, 2024

I am still not satisfied with the performance of the denotation computations. It appears that this is actually a performance regression introduced by #18092. The nightly builds 3.3.2-RC1-bin-20230707-7ede150-NIGHTLY and 3.3.2-RC1-bin-20230710-ed319e8-NIGHTLY confirm this issue.

Since the merge denotations require signature and #18092 disables the signature cache for provisional method types, computing an overloading denotation has become slower following that PR.

If I remove && !isProvisional from MethodOrPoly, the compiling time of the project from #20217 can be improved as well. However, while all tests pass, I believe this approach is incorrect given the semantics of isProvisional.

noti0na1 assigned noti0na1 and unassigned odersky

Member Author

noti0na1 commented Jul 30, 2024 •

edited

Loading

I'm trying another approach where a cache (denotation or signature, etc.) would be valid as long as the provisional state is not changed.

The provisional state of a type would store any component type that might be provisional in a map to track their changes, similar to the reductionContext in MatchType.

dwijnand reviewed

View reviewed changes

compiler/src/dotty/tools/dotc/typer/Applications.scala Outdated Show resolved Hide resolved

noti0na1 force-pushed the fix-20217 branch from eb7a4e5 to e38f059 Compare

August 6, 2024 20:50

noti0na1 changed the title ~~Performance Improvement: Refactor TermRef widening logic; Cache widening results during overloading resolving~~ Performance improvement: use provisional state for better cache reuse; refactor TermRef widening logic

noti0na1 force-pushed the fix-20217 branch from 743c812 to 1d01d7c Compare

August 7, 2024 09:26

noti0na1 requested a review from EugeneFlesselle

August 7, 2024 10:30

noti0na1 assigned odersky and unassigned noti0na1

Gedochao requested a review from dwijnand

September 2, 2024 13:54

dwijnand approved these changes

View reviewed changes

Member

dwijnand left a comment

Generally looks great to me! Few minor questions, 1 more important question about the cost of the new calls to currentProvisionalState in a bunch of hot paths.

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved


          Introduce provisional state for better cache reuse

4c6e80f

EugeneFlesselle reviewed

View reviewed changes

Contributor

EugeneFlesselle left a comment •

edited

Loading

I added a few more minor questions in addition to Dale's review

Independently of thoses, why do the methods accessing/updating lastSymbol not need to check the lastDenotationProvState is up to date as well, given that it depends on denot ?

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved

compiler/src/dotty/tools/dotc/core/Types.scala

+                  def currentProvisionalState(using Context): ProvisionalState =
+                    val state: ProvisionalState = util.HashMap()
+                    // Compared to `testProvisional`, we don't use short-circuiting or,
+                    // because we want to collect all provisional types.
                     class ProAcc extends TypeAccumulator[Boolean]:

Contributor

EugeneFlesselle Sep 3, 2024

Is there a reason to keep the Boolean accumulator in the new scheme with a state? Why not use a TypeTraverser?

Member Author

noti0na1 Sep 4, 2024

We use the recursive result to set mightBeProvisional field

Member

dwijnand Sep 4, 2024

Good point, but then maybe TypeAccumulator[ProvisionalState] and mightBeProvisional = x.nonEmpty?

compiler/src/dotty/tools/dotc/core/Types.scala

-                  private def testProvisional(using Context): Boolean =
+                  type ProvisionalState = util.HashMap[Type, Type]
+                  def currentProvisionalState(using Context): ProvisionalState =

Contributor

EugeneFlesselle Sep 3, 2024 •

edited

Loading

As I understand it, it is important in the current implementation that we recompute the state from scratch each time, and not try to reuse the result of the previous currentProvisionalState computation itself. It might be worth documenting this too

compiler/src/dotty/tools/dotc/core/Types.scala Outdated Show resolved Hide resolved


          Resolve review commts

1df045c

noti0na1 force-pushed the fix-20217 branch from 1d01d7c to 1df045c Compare

September 4, 2024 13:13

dwijnand reviewed

View reviewed changes

compiler/src/dotty/tools/dotc/core/Types.scala Show resolved Hide resolved


          Add back the checkedPeriod update

61d0aad

noti0na1 force-pushed the fix-20217 branch from 2a5dbcb to 61d0aad Compare

September 4, 2024 16:19

Member Author

noti0na1 commented Sep 6, 2024

test performance please

Contributor

WojciechMazur commented Sep 8, 2024 •

edited

Loading

Hey, there seems to be no new regressions found in the Open CB when compared with the latest nightly (3.6.0-RC1-bin-20240905-f285199-NIGHTLY)

Total execution time for each of the builds (summed execution time from all GH Actions runners running in parallel):
Nightly: 3d 0h 13m 57s
PR 1# run: 3d 3h 7m 7s
PR 2# run: 3d 1h 46m 13s

I've taken a look at the nightly from the previous nightly run (3.6.0-RC1-bin-20240902-f774497-NIGHTLY) - it executed in 3d 0h 49m 56s
At first glance, it seems to be executing longer for 3h more of accumulated, but to confirm that I'd need to re-run it. It's also possible that it could have been more unlucky by picking the less performant runners.
We've run it over 1611 projects, so it makes the average difference of ~7s.

Contributor

odersky commented Sep 8, 2024 •

edited

Loading

Thanks for the statistics @WojciechMazur! That's good to know. I had a nagging suspiciion that this is probably too expensive in this form, so it seems we need to target it better (or decide it's not worth trying).

But, it's fantastic that we can get these large codebase statistics. This is a gamechanger for future attempts at optimizations!

Contributor

WojciechMazur commented Sep 9, 2024

Just a reminder that the execution times from the OpenCB need to be taken with a grain of salt. The actual compilation times might differ because we don't have a stable, reproducible environment. Repeating the same build again has shortened the execution by ~1.5h

noti0na1 mentioned this pull request

Simple performance improvement for Denotations #21584

Merged

Member

mbovel commented Sep 16, 2024

test performance please

Member

dottybot commented Sep 16, 2024

performance test scheduled: 1 job(s) in queue, 0 running.

Member

dottybot commented Sep 16, 2024

Performance test finished successfully:

Visit https://dotty-bench.epfl.ch/21278/ to see the changes.

Benchmarks is based on merging with main (ad8c21a)

noti0na1 added a commit that referenced this pull request


          Simple performance improvement for Denotations (#21584)

be10596

A simple performance improvement extracted from #21278. This PR reduces `denot` calls in `widen`s.

This is a simpler approach which can reduce the compilation time of
#20217 from 60s to about 40s.

noti0na1 mentioned this pull request

~5x compilation slowdown compared to 2.13 on ~400 files #20217

Closed

odersky reviewed

View reviewed changes

Contributor

odersky left a comment

Since performance improvements overall are unclear and might even be negative and this adds considerable complexity I think we should not merge this at this time.

odersky closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet