You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a big BanedBlockBandedMatrix and need to multiply it with a vector. However it seems to be rather slow and require a lot of allocation. Here's a MWE
using BlockArrays: BlockRange
using ApproxFunOrthogonalPolynomials
using LinearAlgebra
using SparseArrays
using BenchmarkTools
# Making one of my Matrices
CC = Chebyshev(-1..1) ⊗ Chebyshev(-1..1)
UC = Ultraspherical(1, -1..1) ⊗ Chebyshev(-1..1)
CU = Chebyshev(-1..1) ⊗ Ultraspherical(1, -1..1)
UU = Ultraspherical(1, -1..1) ⊗ Ultraspherical(1, -1..1)
degree = 500
Du = Derivative(CC, [1,0])[BlockRange(1:degree), BlockRange(1:degree+1)]
UCtoC2 = Conversion(UC, UU)[BlockRange(1:degree), BlockRange(1:degree)]
DutoC2 = UCtoC2 * Du
# Benchmarks
@benchmark mul!(out, $DutoC2, v) setup=begin
out = Vector{Float64}(undef, size(DutoC2, 1))
v = rand(size(DutoC2, 2))
end
# 3.2 ms with 257.62 KiB allocated
@benchmark mul!(out, $(sparse(DutoC2)), v) setup=begin
out = Vector{Float64}(undef, size(DutoC2, 1))
v = rand(size(DutoC2, 2))
end
# For comparison: 1.5 ms with 0 bytes allocated
I wonder why the BandedBlockBandedMatrix multiplication allocates so much and why it's slower than using a sparse matrix. Any ideas?
The text was updated successfully, but these errors were encountered:
OpenBLAS has very slow banded linear algebra. MKL is about 5x faster I think. Though the cost is not dominated by the actual banded solve.
The allocations and computational cost are from view(A, Block(K,J)) calls. These should in theory be non-allocating but in practice can sometimes be too complicated for the compiler to realise.
For very small bandwidths sparse is extremely fast for matrix multiplication. I don't think we'd be able to beat it.
Interesting, thanks! I'll stick with SparseMatrix then. So, should ApproxFun have some heuristic to pick sparse matrices instead of BandedBlockBandedMatrix?
I have a big
BanedBlockBandedMatrix
and need to multiply it with a vector. However it seems to be rather slow and require a lot of allocation. Here's a MWEI wonder why the
BandedBlockBandedMatrix
multiplication allocates so much and why it's slower than using a sparse matrix. Any ideas?The text was updated successfully, but these errors were encountered: