Performance of `BandedBlockBandedMatrix` `Vector` Multiplication #121

Luapulu · 2021-09-21T08:02:58Z

I have a big BanedBlockBandedMatrix and need to multiply it with a vector. However it seems to be rather slow and require a lot of allocation. Here's a MWE

using BlockArrays: BlockRange
using ApproxFunOrthogonalPolynomials
using LinearAlgebra
using SparseArrays

using BenchmarkTools

# Making one of my Matrices
CC = Chebyshev(-1..1)         ⊗ Chebyshev(-1..1)
UC = Ultraspherical(1, -1..1) ⊗ Chebyshev(-1..1)
CU = Chebyshev(-1..1)         ⊗ Ultraspherical(1, -1..1)
UU = Ultraspherical(1, -1..1) ⊗ Ultraspherical(1, -1..1)

degree = 500

Du = Derivative(CC, [1,0])[BlockRange(1:degree), BlockRange(1:degree+1)]
UCtoC2 = Conversion(UC, UU)[BlockRange(1:degree), BlockRange(1:degree)]
DutoC2 = UCtoC2 * Du

# Benchmarks

@benchmark mul!(out, $DutoC2, v) setup=begin
    out = Vector{Float64}(undef, size(DutoC2, 1))
    v = rand(size(DutoC2, 2))
end
# 3.2 ms with 257.62 KiB allocated

@benchmark mul!(out, $(sparse(DutoC2)), v) setup=begin
    out = Vector{Float64}(undef, size(DutoC2, 1))
    v = rand(size(DutoC2, 2))
end
# For comparison: 1.5 ms with 0 bytes allocated

I wonder why the BandedBlockBandedMatrix multiplication allocates so much and why it's slower than using a sparse matrix. Any ideas?

The text was updated successfully, but these errors were encountered:

dlfivefifty · 2021-09-21T10:01:20Z

Some things:

OpenBLAS has very slow banded linear algebra. MKL is about 5x faster I think. Though the cost is not dominated by the actual banded solve.
The allocations and computational cost are from view(A, Block(K,J)) calls. These should in theory be non-allocating but in practice can sometimes be too complicated for the compiler to realise.
For very small bandwidths sparse is extremely fast for matrix multiplication. I don't think we'd be able to beat it.

Luapulu · 2021-09-21T10:05:08Z

Interesting, thanks! I'll stick with SparseMatrix then. So, should ApproxFun have some heuristic to pick sparse matrices instead of BandedBlockBandedMatrix?

Edit: if the bands are thin enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of `BandedBlockBandedMatrix` `Vector` Multiplication #121

Performance of `BandedBlockBandedMatrix` `Vector` Multiplication #121

Luapulu commented Sep 21, 2021

dlfivefifty commented Sep 21, 2021

Luapulu commented Sep 21, 2021 •

edited

Loading

Performance of BandedBlockBandedMatrix Vector Multiplication #121

Performance of BandedBlockBandedMatrix Vector Multiplication #121

Comments

Luapulu commented Sep 21, 2021

dlfivefifty commented Sep 21, 2021

Luapulu commented Sep 21, 2021 • edited Loading

Performance of `BandedBlockBandedMatrix` `Vector` Multiplication #121

Performance of `BandedBlockBandedMatrix` `Vector` Multiplication #121

Luapulu commented Sep 21, 2021 •

edited

Loading