-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A*A' significantly slower than A*A for ArbMatrices #119
Comments
Arb only implements matrix multiplication A*B and no special versions where A or B are transposed. The easiest thing is probably to realize A' as an |
julia> @btime A*A;
11.988 ms (10089 allocations: 4.11 MiB)
julia> @btime A*A';
786.593 ms (8160005 allocations: 436.25 MiB)
julia> C = similar(A); D = similar(A); @btime Arblib.mul!($D, $A, Arblib.transpose!($C, $A));
12.785 ms (85 allocations: 3.20 MiB) we're hitting the generic julia> @which A*A
*(A::T, B::T) where T<:Union{AcbMatrix, AcbRefMatrix, ArbMatrix, ArbRefMatrix} in Arblib at /home/kalmar/.julia/dev/Arblib/src/matrix.jl:166
julia> @which A*A'
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /home/kalmar/packages/julias/julia-1.6/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151
|
|
Hi,
I was trying to see if I could use this instead of BigFloats for some of my computations, for which i (among other things like cholesky decompositions and computing the minimum eigenvalue) need to do A*A' for matrices of size up to about 300x300, which is currently the bottleneck.
I came across the following remarkable difference:
So multiplying with a transpose is significantly slower than normal multiplication, maybe because of all the allocations which do happen in the multiplication with the transpose.
Comparing it to BigFloats:
So for BigFloats, multiplying with the transpose is also a bit slower, but only 30% compared to about 2500%. Another way would be to use normal matrices with Arb entries:
So this is more comparable to the BigFloats, only a factor 2 difference in both cases. (Interestingly, it only uses about half the memory/allocations)
To me it seems like the ArbMatrices fall back to the generic matrix multiplication in the case with the transpose. Is there an easy way around this?
PS: I also tried Nemo (which has similar speed for
A*A
andA*A'
), but I also need the minimum eigenvalue and the cholesky decomposition, which are currently not in Nemo. So I would need to convert back and forth to BigFloats, or compute those myself. With Nemo, I'm not sure how to do this but with Arblib the conversion is easy. Hence why I would very much like a similar speed forA*A'
as forA*A
.The text was updated successfully, but these errors were encountered: