Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up "time to first result"? #746

Open
fingolfin opened this issue Jul 8, 2022 · 23 comments
Open

Speeding up "time to first result"? #746

fingolfin opened this issue Jul 8, 2022 · 23 comments

Comments

@fingolfin
Copy link
Contributor

On my M1 Mac with Julia 1.8-rc1:

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  5.577463 seconds (2.85 M allocations: 140.472 MiB, 0.34% gc time, 99.97% compilation time)

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  0.000288 seconds (121 allocations: 4.948 KiB)

This is not a simple matter of caching as it also is fast with different arguments, e.g.

julia> @time @eval maximal_order(quadratic_field(-3)[1]);
  0.003174 seconds (635 allocations: 30.745 KiB)

Most of this time seems to be spent in compiling. Indeed, starting Julia with -O1 things are much faster:

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  2.370229 seconds (554.39 k allocations: 25.480 MiB, 99.94% compilation time)

I've used @snoopl from SnoopCompile to get some insights into this. So I did

@snoopl "func_names.csv" "llvm_timings.yaml" begin
    using Hecke
    maximal_order(quadratic_field(-1)[1])
end

The resulting data files can be loaded separately:

julia> times, info = SnoopCompile.read_snoopl("func_names.csv", "llvm_timings.yaml");

julia> sort(times)[end][1]   # where is most time spent?
0.428102083

julia> sort(times)[end-1][1]  # second worst place?
0.068432708

julia> sum(k for (k,v) in times)
5.172362951000003

So yeah, we spend a lot of time in compilation. It compiles 999 method instances, it seems.

Unfortunately there don't seem to be many functions to help analyze this further...

Attached is the list of method instances, sorted alphabetically; perhaps this already gives some ideas...
methods.txt

@fingolfin fingolfin changed the title Speeding up "time to first result" Speeding up "time to first result"? Jul 8, 2022
@fingolfin
Copy link
Contributor Author

As @thofma pointed out: it may well be that in the end, we can't do anything about this other than building a custom baseimage. Quite true, but perhaps @noinline and @nospecialize or some other tricks can also help at least a bit?

@thofma
Copy link
Owner

thofma commented Oct 29, 2022

@fingolfin @fieker There is a PR over at julia, and someone (don't want to ping them) used Hecke as a benchmark:
JuliaLang/julia#44527 (comment). The improved timings are impressive.

P.S.: Caching the compiled stuff will yield

/Users/mose/.julia/compiled/v1.9/Hecke/lhBAK_LpNp2.dylib
-rwxr-xr-x  1 mose  staff   131M 20 Oct 00:39 

Amazing how much stuff we have.

@giordano
Copy link

P.S.: Caching the compiled stuff will yield

/Users/mose/.julia/compiled/v1.9/Hecke/lhBAK_LpNp2.dylib
-rwxr-xr-x  1 mose  staff   131M 20 Oct 00:39 

Amazing how much stuff we have.

People are looking into reducing the bloat. From what I heard, usually most of those libraries include serialised types in the data section, the machine code is a fraction of the whole library.

@fingolfin
Copy link
Contributor Author

Just tried JuliaLang/julia#47184 and with it, I get:

du -hs ~/.julia/compiled/v1.10/Hecke/*
 79M	/Users/mhorn/.julia/compiled/v1.10/Hecke/lhBAK_YE8QW.dylib
5.2M	/Users/mhorn/.julia/compiled/v1.10/Hecke/lhBAK_YE8QW.ji
 79M	/Users/mhorn/.julia/compiled/v1.10/Hecke/lhBAK_kKnCR.dylib
5.2M	/Users/mhorn/.julia/compiled/v1.10/Hecke/lhBAK_kKnCR.ji

which is already better (and in the end, who cares). It's really amazing how it makes things. Indeed:

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  0.060866 seconds (81.06 k allocations: 5.365 MiB, 111.94% compilation time)

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  0.000282 seconds (124 allocations: 5.183 KiB)

So instead of 5.5 seconds for the first call, it's now 0.06 seconds. Fantastic!

Unfortunately, we don't benefit as much in Oscar, due to CxxWrap: if I first load Hecke and then load CxxWrap, we get this:

julia> using CxxWrap

julia> @time @eval maximal_order(quadratic_field(-1)[1]);
  3.255026 seconds (3.88 M allocations: 259.868 MiB, 1.72% gc time, 115.45% compilation time: 85% of which was recompilation)

I.e. CxxWrap invalidates tons of stuff. I've reported this before at the CxxWrap repository, but zero reaction so far. I think we'll have to handle this ourselves if want to see it improved... AFAICT most (?) of the overhead comes from the TONS of automagic convert methods in CxxWrap several of which apply far too generically. If we removed or restricted many of these, I think we'd get better results. Unfortunately this might break some CxxWrap users (including ours).... And in the end, it might not be possible w/o stripping a major part of the CxxWrap functionality out (all those automagic cconvert calls...)

Another hypothetical "solution" would be to have AbstractAlgebra and GAP.jl depend on CxxWrap and also load it -- of course nobody wants that, I am just saying it would solve the issue because then all those invalidation introduced by CxxWrap would have happened at precompile time for all (?) our packages (I verified that it does).

@PallHaraldsson
Copy link

PallHaraldsson commented Dec 7, 2022

Indeed, starting Julia with -O1 things are much faster:

You can force your module to always do -O1, (or even -O0 and/or --compile=min) if helpful. It helps with TTFX but would you be worried about speed after that with lower optimization? Is the speed low on the default setting or as far as you know on that non-default? I'm not sure about this package, but is it mostly used interactively, and even 2x slower (not saying that will happen, just a guess or rather asking for your tolerance for slowdown) after first use would be ok? The PR for it is trivial (see how done in e.g. Plots.jl or ask me, maybe I do a PR; you may though need to add for some or all of your dependencies in addition or instead). It's great to see 92x faster for first use because of the new PR in Base! Still:

Unfortunately, we don't benefit as much in Oscar, due to CxxWrap [..] I.e. CxxWrap invalidates tons of stuff

As you describe the order of using affect/cause invalidations.

I don't know for sure about invalidations, I just have a feeling it has a lot to do with inlining of code, which is the default on -O2. You can check just that with:

julia -O2 --inline=no

I don't know about the different levels of optimization e.g. -O1 (I don't think it's even documented what they mean exactly, maybe in LLVM, it might only be passed over to it). At that level, or at least -O0 I'm though pretty sure inlining is off. Possibly always off for lowest, otherwise the threshold lowered. Compilation needs not at all be slow (it's not in fully non-optimizing compilers). With no inlining I have a hard time seeing a reason for invalidations.

However the best optimization implies "whole program optimization", i.e. inlining, and that means compilation is always going to be slow, unless resulting machine code stored. We are getting there now in Julia, for packages and sysimage, but I think individually, and that rules out inlining across packages (as e.g. for Python packages written in C, which seem plenty fast enough). I'm not sure if the new package forbids such inlining, or more likely is some cases still has to reoptimize/invalidate because of the inlining your default optimization allows.

@fingolfin
Copy link
Contributor Author

I am working on various fixes for CXxWrap which considerably improve the situation for me. See this comment for an overview.

As a datapoint, the timings printed by

using Hecke; using CxxWrap ; @time @eval maximal_order(quadratic_field(-3)[1]);

gives the following timings (in seconds) and allocations, when run with various Julia and CxxWrap pull requests, and with latest AbstractAlgebra DEV (which has some invalidation fixes).

Note that timings fluctuate quite a bit, and I did just two measurements each (the first one triggered precompilation and "warmed the caches" and I then noted down the second timings). So only take the relative order of magnitude into account. The number of allocations of course is stable, and thus perhaps a better proxy for the actual impact.

Julia 1.8 seconds Julia 1.8 allocations Julia master (a40e24e1) seconds Julia master allocations
CxxWrap main 7.565491 10.49 M 1.996047 2.50 M
with #335 7.554132 10.43 M 2.054217 2.50 M
with #337 7.382975 10.49 M 1.991450 2.50 M
with #338 7.069640 3.37 M 1.190185 1.32 M
with #335 + #337 7.550222 10.52 M 2.032760 2.50 M
with #335 + #338 6.759887 4.76 M 0.025311 74.92 k
with #337 + #338 6.720705 3.37 M 1.218763 1.32 M
with all three 6.944414 4.77 M 0.026655 74.92 k

So JuliaInterop/CxxWrap.jl#338 has a major impact across the board, only together with JuliaInterop/CxxWrap.jl#335 does it achieve its full potential. While JuliaInterop/CxxWrap.jl#337 seems not that important (but I think this depends on what exactly you measure -- it definitely does get rid of invalidations).

Interestingly (and annoyingly) with Julia 1.8.4, it seems better to leave out JuliaInterop/CxxWrap.jl#335 ... I have no idea so far why (but I also haven't made any efforts to find out so far).

@giordano
Copy link

I'm a bit confused: in JuliaLang/julia#44527 (comment) I got a time of the order of 0.02 seconds without having to do anything, what changed since?

@fingolfin
Copy link
Contributor Author

You got those numbers without the using CxxWrap.

@giordano
Copy link

Oh, cool 😄 So your work is basically to make sure loading also CxxWrap doesn't defeat all the gains from native code caching?

@fingolfin
Copy link
Contributor Author

@giordano indeed. I've long had my sights on CxxWrap as likely culprit for slowdowns / invalidations, see JuliaInterop/CxxWrap.jl#278 -- but past attempt had mixed to now effect on timings (as my table above illustrates ;-) ). Now with the native code caching it suddenly becomes possibly to benefit, yay!

@fingolfin
Copy link
Contributor Author

Just had a look at this again. Sadly the situation got worse again it seems.

With Julia 1.10.3, Hecke master and CxxWrap 0.15.1:

  • using Hecke ; @time @eval maximal_order(quadratic_field(-3)[1]); reports

      0.010076 seconds (16.18 k allocations: 1.077 MiB, 91.93% compilation time)
    
  • using Hecke; using CxxWrap ; @time @eval maximal_order(quadratic_field(-3)[1]); reports

      6.100303 seconds (19.22 M allocations: 1.274 GiB, 9.22% gc time, 99.88% compilation time: 48% of which was recompilation)
    

So I inspected the output of this:

using Hecke
using SnoopCompileCore
invalidations = @snoopr begin using CxxWrap end
using SnoopCompile
trees = invalidation_trees(invalidations)

which reports thousands of invalidations. I disabled the top method causing these and repeated this process one time. Afterwards the timings were fine again.

For reference here are the changes I made to CxxWrap to avoid the invalidations in this specific example (not claiming I covered all):

diff --git a/src/CxxWrap.jl b/src/CxxWrap.jl
index 421bffe..d902fd4 100644
--- a/src/CxxWrap.jl
+++ b/src/CxxWrap.jl
@@ -118,7 +118,6 @@ Base.AbstractFloat(x::CxxNumber) = Base.AbstractFloat(to_julia_int(x))
 # Convenience constructors
 (::Type{T})(x::Number) where {T<:CxxNumber} = reinterpret(T, convert(julia_int_type(T), x))
 (::Type{T})(x::Rational) where {T<:CxxNumber} = reinterpret(T, convert(julia_int_type(T), x))
-(::Type{T1})(x::T2) where {T1<:Number, T2<:CxxNumber} = T1(reinterpret(julia_int_type(T2), x))::T1
 (::Type{T1})(x::T2) where {T1<:CxxNumber, T2<:CxxNumber} = T1(reinterpret(julia_int_type(T2), x))::T1
 Base.Bool(x::T) where {T<:CxxNumber} = Bool(reinterpret(julia_int_type(T), x))::Bool
 (::Type{T})(x::T) where {T<:CxxNumber} = x
@@ -137,7 +136,6 @@ end
 
 # Enum type interface
 abstract type CppEnum <: Integer end
-(::Type{T})(x::CppEnum) where {T <: Integer} = T(reinterpret(Int32, x))::T
 (::Type{T})(x::Integer) where {T <: CppEnum} = reinterpret(T, Int32(x))
 (::Type{T})(x::T) where {T <: CppEnum} = x
 import Base: +, |

Of course I am also not saying making this change to CxxWrap is a solution (it probably would break functionality), not even that CxxWrap is at fault (I dunno who is -- CxxWrap, Hecke, Julia stdlib, ...)

@fingolfin
Copy link
Contributor Author

For the record, here are the top invalidations:

 inserting (::Type{T1})(x::T2) where {T1<:Number, T2<:Union{CxxSigned, CxxUnsigned}} @ CxxWrap.CxxWrapCore ~/Projekte/Julia/packages/JuliaInterop/CxxWrap.jl/src/CxxWrap.jl:121 invalidated:
   mt_backedges:  1: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::typeof(Base.JuliaSyntax._has_nested_error), ::AbstractString, ::Integer) (0 children)
                  2: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for resize!(::BitVector, ::Integer) (0 children)
                  3: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Hecke.var"#4569#4571", ::AbstractString, ::Integer) (0 children)
                  4: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Hecke.var"#4250#4251", ::AbstractString, ::Integer) (0 children)
                  5: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for Nemo._generic_power(::AbsSimpleNumFieldOrderIdeal, ::Integer) (0 children)
                  6: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Hecke.var"#4260#4261", ::AbstractString, ::Integer) (0 children)
                  7: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Hecke.var"#1995#1997", ::AbstractString, ::Integer) (0 children)
                  8: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for findnext(::Hecke.var"#4253#4254", ::AbstractString, ::Integer) (0 children)
                  9: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for Base._array_for(::Type{T}, ::Base.HasLength, ::Integer) where T<:AbsNumFieldOrderIdeal (0 children)
                 10: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for unsafe_load(::Ptr{UInt64}, ::Integer) (1 children)
                 11: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for SubString(::AbstractString, ::Int64, ::Integer) (1 children)
                 12: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for pointer(::String, ::Integer) (2 children)
                 13: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for unsafe_store!(::Ptr{UInt16}, ::UInt16, ::Integer) (2 children)
                 14: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for unsafe_store!(::Ptr{UInt8}, ::UInt8, ::Integer) (2 children)
                 15: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for unsafe_store!(::Ptr{UInt32}, ::UInt32, ::Integer) (2 children)
                 16: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for Base.Iterators.partition(::UnitRange{Int64}, ::Integer) (3 children)
                 17: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for Base.to_shape(::Base.OneTo) (27 children)
                 18: signature Tuple{Type{Int64}, Integer} triggered MethodInstance for convert(::Type{Int64}, ::Integer) (596 children)
                 19: signature Tuple{Type{UInt64}, Integer} triggered MethodInstance for convert(::Type{UInt64}, ::Integer) (3369 children)
   backedges: 1: superseding Integer(x::Integer) @ Core boot.jl:817 with MethodInstance for Integer(::Integer) (2 children)

Of particular note:

  • convert(::Type{Int64}, ::Integer) (596 children)
  • convert(::Type{UInt64}, ::Integer) (3369 children)

I then tried to use JET to find out more:

julia> using JET

julia> ascend(trees[end].mt_backedges[end][2])
Choose a call for analysis (q to quit):
     convert(::Type{UInt64}, ::Integer)
       cconvert(::Type{UInt64}, ::Integer)
         _growend!(::Vector, ::Integer)
           append!(::Vector, ::AbstractVector)
 >           _unit_group_generators_quaternion(::Union{Hecke.AlgAssAbsOrd, Hecke.AlgAssRelOrd})
               _unit_group_generators_maximal_simple(::Any)
                 __unit_reps_simple(::Any, ::Hecke.AlgAssAbsOrdIdl{StructureConstantAlgebra{QQFieldElem}, AssociativeAlgebraElem{QQFieldElem, StructureConstantAlgebra{QQFieldElem}}})
...

which suggests a type stability issue in Hecke (for this very first instance): in the call to append! it can't deduce the type of the second argument well enough, hence it can't prove that length applied to that second argument will produce an Int, hence it goes through general dispatch.

The calling function itself looks like this:

function _unit_group_generators_quaternion(O::Union{AlgAssRelOrd, AlgAssAbsOrd})
  gens1 = unit_group_modulo_scalars(O)
  u, mu = unit_group(base_ring(O))
  A = algebra(O)
  gens2 = [ O(A(elem_in_nf(mu(u[i])))) for i in 1:ngens(u) ]
  return append!(gens1, gens2)  # FIXME: generic dispatch
end

but unfortunately it doesn't have the proper type for O; then can't deduce the type of u and hence can't prove that ngens(u) is an Int...

Tracing further up, the type instability begins inside a call to _find_quaternion_algebra(::AbsSimpleNumFieldElem, ::Vector{AbsSimpleNumFieldOrderIdeal}, ::Vector{InfPlc}) (the arguments here are all concrete and fine) which then calls is_principal_with_data(::Hecke.AlgAssAbsOrdIdl) which is not fine:

        fl, x = is_principal_with_data(q * prod(__P[i]^Int(c.coeff[i]) for i in 1:length(__P)))

I'll dig some more, but it seems we are taxing JET (?) quite a bit here, it is slow as molasses.

@fingolfin
Copy link
Contributor Author

Unfortunately this has regressed again :-(

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.1 (2024-10-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Hecke
 _    _           _
| |  | |         | |         |  Software package for
| |__| | ___  ___| | _____   |  algorithmic algebraic number theory
|  __  |/ _ \/ __| |/ / _ \  |
| |  | |  __/ (__|   <  __/  |  Manual: https://thofma.github.io/Hecke.jl
|_|  |_|\___|\___|_|\_\___|  |  Version 0.34.8

julia> @time @eval maximal_order(quadratic_field(-3)[1]);
 19.297584 seconds (93.85 M allocations: 4.378 GiB, 5.26% gc time, 99.99% compilation time: 73% of which was recompilation)

@fingolfin
Copy link
Contributor Author

It seems this is a regression in Julia. If I use the exact same Hecke but with Julia 1.10.7 it is fast:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.7 (2024-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Hecke ; @time @eval maximal_order(quadratic_field(-3)[1]);
 _    _           _
| |  | |         | |         |  Software package for
| |__| | ___  ___| | _____   |  algorithmic algebraic number theory
|  __  |/ _ \/ __| |/ / _ \  |
| |  | |  __/ (__|   <  __/  |  Manual: https://thofma.github.io/Hecke.jl
|_|  |_|\___|\___|_|\_\___|  |  Version 0.34.8
  0.012206 seconds (16.21 k allocations: 1.078 MiB, 93.83% compilation time)

So I guess we should file a Julia issue and then bisect it there.

@giordano
Copy link

I guess "73% of which was recompilation" is the more worrying part, since that accounts for 14 out of 19 seconds

@giordano
Copy link

giordano commented Nov 29, 2024

Side note, on master I get

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.681794 seconds (5.30 M allocations: 283.301 MiB, 10.47% gc time, 2.68% compilation time: 27% of which was recompilation)
  0.036642 seconds (65.44 k allocations: 3.338 MiB, 98.06% compilation time: 83% of which was recompilation)

julia> versioninfo()
Julia Version 1.12.0-DEV.1693
Commit a17db2b138* (2024-11-26 18:38 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

(tmp) pkg> st
Status `/private/tmp/Project.toml`
  [3e1990a7] Hecke v0.34.8

while I can confirm the issue on v1.11.1:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  4.029264 seconds (4.26 M allocations: 251.417 MiB, 5.96% gc time, 9.69% compilation time: 96% of which was recompilation)
 17.165332 seconds (86.44 M allocations: 4.015 GiB, 5.59% gc time, 99.99% compilation time: 71% of which was recompilation)

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)

Sounds like the regression has been already addressed on master

@fingolfin
Copy link
Contributor Author

Thanks for checking so quickly, that's a relieve. Seems 1.11 has been a somewhat cursed release overall (at least I experienced a lot more severe regressions than with past releases -- entirely subjective of course)

Wonder what fixed it and if it would make sense to figure it out and get it backported to 1.11.2

@giordano
Copy link

giordano commented Nov 29, 2024

This is with 1.11.0-alpha1:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  3.721566 seconds (2.65 M allocations: 173.050 MiB, 5.17% gc time, 1.93% compilation time)
  0.011111 seconds (22.56 k allocations: 1.142 MiB, 88.68% compilation time)

bisecting the culprit may not be that bad.

Edit: more datapoints
v1.11.0-beta1:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.063193 seconds (2.71 M allocations: 175.740 MiB, 9.24% gc time, 2.35% compilation time)
  0.010677 seconds (22.57 k allocations: 1.142 MiB, 93.52% compilation time)

v1.11.0-beta2:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.440868 seconds (4.30 M allocations: 252.547 MiB, 10.10% gc time, 16.13% compilation time: 97% of which was recompilation)
  9.064353 seconds (41.89 M allocations: 2.073 GiB, 6.73% gc time, 99.99% compilation time: 47% of which was recompilation)

v1.11.0-rc1:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.427240 seconds (4.26 M allocations: 250.697 MiB, 9.45% gc time, 15.02% compilation time: 97% of which was recompilation)
  9.204946 seconds (41.74 M allocations: 2.063 GiB, 9.21% gc time, 99.99% compilation time: 47% of which was recompilation)

v1.11.0:

julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.320969 seconds (4.27 M allocations: 252.483 MiB, 11.11% gc time, 17.26% compilation time: 97% of which was recompilation)
 17.290070 seconds (86.42 M allocations: 4.014 GiB, 5.35% gc time, 100.00% compilation time: 71% of which was recompilation)

Diff between beta1 and beta2: JuliaLang/julia@v1.11.0-beta1...v1.11.0-beta2. I wouldn't be surprised if this was due to StyledStrings, that package wreaked havoc in terms of invalidations, a good chunk of which has been addressed on master, with some hope to be able to backport major fixes to the v1.11 series.

@thofma
Copy link
Owner

thofma commented Nov 29, 2024

I cannot reproduce the -beta2 timings. Am I doing something wrong?

bla/julia$ git status
HEAD detached at v1.11.0-beta2
nothing to commit, working tree clean

bla/julia$ make -j10 -l; ./julia -e 'using Hecke;exit()'; ./julia -e '@time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);'
  2.038123 seconds (2.41 M allocations: 161.547 MiB, 12.15% gc time, 3.62% compilation time: 40% of which was recompilation)
  0.012277 seconds (22.56 k allocations: 1.144 MiB, 92.80% compilation time)

@giordano
Copy link

I was using the official binaries through Juliaup on aarch64-darwin, nothing special

@giordano
Copy link

I also get the same qualitative results on linux:

% julia +1.11.0-beta2 --project=/tmp -q
julia> @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);
  2.543063 seconds (4.29 M allocations: 252.277 MiB, 12.02% gc time, 19.36% compilation time: 97% of which was recompilation)
 11.921326 seconds (41.72 M allocations: 2.066 GiB, 6.89% gc time, 99.99% compilation time: 39% of which was recompilation)

julia> versioninfo()
Julia Version 1.11.0-beta2
Commit edb3c92d6a6 (2024-05-29 09:37 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 22 × Intel(R) Core(TM) Ultra 7 155H
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 22 virtual cores)

@thofma
Copy link
Owner

thofma commented Nov 29, 2024

Ah. My mistake was doing the time using -e. Then probably the REPL is not loaded. I can reproduce it now with

./julia -e 'using REPL; @time @eval using Hecke; @time @eval maximal_order(quadratic_field(-3)[1]);'

@thofma
Copy link
Owner

thofma commented Nov 29, 2024

Regression between beta1 and beta2: 480dfabb676bf02a9d281c908419a1e5d67caf5a (opened JuliaLang/julia#56719)

Regression between beta2 and rc1: d1b1a5dd254e127edc994076cce61ae1cca16979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants