Work out good approach for automatic Hessian sparsity detection #13

ElOceanografo · 2023-06-01T21:17:41Z

SparsityDetection.jl is no longer maintained, and the sparsity detection functionality in Symbolics.jl still seems somewhat brittle. Currently this package just uses ForwardDiff, but that is a suboptimal solution (very slow in high dimensions).

ElOceanografo · 2024-05-21T03:20:27Z

Should be solved by some combination of SparseConnectivityTracer.jl and the solution to gdalle/DifferentiationInterface.jl#263

gdalle · 2024-05-30T19:23:52Z

Now that SparseConnectivityTracer supports linear algebra, can you see if this is enough for your use case?
Note that to support linear algebra, you will have to use local sparsity detection, which means the sparsity pattern cannot be reused between calls because it depends on the value of x. If you need a global sparsity pattern, do tell us which linear-algebraic functions you use and we can add specific overloads for them?

ElOceanografo · 2024-05-30T23:34:41Z

That's great! The main functions that are both important to have, and difficult to handle till now, are log-determinants (of both dense and sparse matrices). I need them to calculate loglikelihoods of Gaussian Markov random fields (aka MV Normals paramaterized by a precision matrix instead of a covariance). If I can trace through dense and sparse versions of the MWE function in gdalle/DifferentiationInterface.jl#263 I will be pretty happy:

const y = randn(10)

function f_dense(u)
    Q = diagm(exp.(u))
    return logdet(Q) - y' * Q * y
end

function f_sparse(u)
    Q = spdiagm(exp.(u))
    return logdet(Q) - y' * Q * y
end

In most problems, the structure of Q will be constant, so in theory the sparsity pattern should not change between calls.

Currently, SCT can handle the dense one with local tracing, but gets a stack overflow error with the sparse one.

gdalle · 2024-05-31T07:46:00Z

I opened an issue to keep track. It's weird that it doesn't happen with det, only logdet

StackOverflow when creating SparseMatrixCSC with tracers as values adrhill/SparseConnectivityTracer.jl#108

gdalle · 2024-05-31T16:49:34Z

Can you use this code in the meantime? I still haven't decided whether it belongs in DI itself or just in the docs as an example, given how easy and short it is

using ADTypes
using DifferentiationInterface
using SparseArrays

struct DenseSparsityDetector{B} <: ADTypes.AbstractSparsityDetector
    backend::B
    atol::Float64
end

function ADTypes.jacobian_sparsity(f, x, detector::DenseSparsityDetector)
    J = jacobian(f, detector.backend, x)
    return sparse(abs.(J) .> detector.atol)
end

function ADTypes.jacobian_sparsity(f!, y, x, detector::DenseSparsityDetector)
    J = jacobian(f!, y, detector.backend, x)
    return sparse(abs.(J) .> detector.atol)
end

function ADTypes.hessian_sparsity(f, x, detector::DenseSparsityDetector)
    H = hessian(f, detector.backend, x)
    return sparse(abs.(H) .> detector.atol)
end

ElOceanografo · 2024-06-01T03:13:29Z

Yes, that solution is probably good enough to close this issue, and I already have it as a patch on a development branch here. My $0.02 is it would be great to have it in DI unless there's a compelling reason not to.

gdalle · 2024-06-04T07:33:16Z

Added in gdalle/DifferentiationInterface.jl#297

gdalle · 2024-06-04T09:30:37Z

DenseSparsityDetector is part of the newly released DifferentiationInterface v0.5.3. Can you take it out for a spin?

ElOceanografo · 2024-06-05T01:45:29Z

Working here, will close this issue once I merge that branch. Thanks for the quick fix!

gdalle · 2024-06-05T05:11:07Z

Awesome! I have a few remarks on the PR, will put them here:

MarginalLogDensities.jl/src/MarginalLogDensities.jl

Lines 127 to 131 in e62bebc

    
           - `hess_adtype = nothing` : Specifies how to calculate the Hessian of the marginalized  
        
           variables. If not specified, defaults to a sparse second-order method using finite  
        
           differences over the AD type given in the `method` (`AutoForwardDiff()` is the default).  
        
           Other backends can be set by loading the appropriate AD package and using the ADTypes.jl  
        
           interface.

Why pick finite differences over forward as the default second order method? Forward over reverse would be much faster for large problems

MarginalLogDensities.jl/src/MarginalLogDensities.jl

Lines 132 to 138 in e62bebc

    
           - `sparsity_detector = DenseSparsityDetector(method.adtype, atol=cbrt(eps))` : How to 
        
           perform the sparsity detection. Detecting sparsity takes some time and may not be worth it 
        
           for small problems, but for larger problems it can be extremely worth it. The default  
        
           `DenseSparsityDetector` is most robust, but if it's too slow, or if you're running out of  
        
           memory on a larger problem, try the tracing-based dectectors from SparseConnectivityTracer.jl. 
        
           - `coloring_algorithm = GreedyColoringAlgorithm()` : How to determine the matrix "colors" 
        
           to compress the sparse Hessian.

Why separate the sparsity detector, coloring algorithm and backend? With the ADTypes AutoSparse struct, the user can provide all of them at once, with less bookkeeping for you.

Also keep in mind that the DenseSparsityDetector is local by nature. To avoid accidental cancellations and incorrect sparsity patterns, you should make sure that your function has no value-dependent control flow, and pick a random point x for evaluating it.

MarginalLogDensities.jl/src/MarginalLogDensities.jl

Lines 212 to 213 in e62bebc

    
           extras = prepare_hessian(w -> f(w, p2), hess_adtype, w) 
        
           H = hessian(w -> f(w, p2), hess_adtype, w, extras)

I'm not entirely sure that preparation is correct when you have different function objects such as two anonymous functions. I think it's better to define g = Base.Fix2(f, p2) for instance, and use it several times.

gdalle · 2024-06-05T05:49:10Z

Also I don't think your test failures on nightly are related to DI. I think it's Reexport.jl interacting badly with the new public keyword

ElOceanografo · 2024-06-05T09:02:10Z

Most of these decision were made to err on the side of reliability (at least for now). If I'm showing this package to someone who is currently using R/TMB, I'd rather they encounter something that runs slower than they'd like, rather than something that breaks in a situation that they are used to having work. In my anecdotal experience, it's not uncommon to try some model that should work, but breaks because of some AD corner case, hence my conservatism. What I really need to do at this point is work up a more comprehensive set of realistic problems and test them systematically for performance comparisons, and to find bugs.

Why pick finite differences over forward as the default second order method? Forward over reverse would be much faster for large problems

I made FiniteDiff the default because it is guaranteed to work with any inner backend, and is often not actually much slower than ForwardDiff if the Hessian is sparse enough.

Why separate the sparsity detector, coloring algorithm and backend? With the ADTypes AutoSparse struct, the user can provide all of them at once, with less bookkeeping for you.

True, not sure why I made it this way now. Will probably change it.

Also keep in mind that the DenseSparsityDetector is local by nature. To avoid accidental cancellations and incorrect sparsity patterns, you should make sure that your function has no value-dependent control flow, and pick a random point x for evaluating it.

Good to know.

I'm not entirely sure that preparation is correct when you have different function objects such as two anonymous functions. I think it's better to define g = Base.Fix2(f, p2) for instance, and use it several times.

Also good to know, and might explain some odd timings I've seen. Will try it out!

ElOceanografo · 2024-06-05T09:03:06Z

Closed via #25

gdalle · 2024-06-05T09:10:30Z

Also keep in mind that the DenseSparsityDetector is local by nature. To avoid accidental cancellations and incorrect sparsity patterns, you should make sure that your function has no value-dependent control flow, and pick a random point x for evaluating it.

Good to know.

The docstring for DenseSparsityDetector explains this in more detail:

https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/api/#DifferentiationInterface.DenseSparsityDetector

gdalle mentioned this issue May 28, 2024

Possible direct users gdalle/DifferentiationInterface.jl#134

Open

ElOceanografo mentioned this issue Jun 5, 2024

Calculate sparse Hessians using DifferentiationInterface #25

Merged

ElOceanografo closed this as completed Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work out good approach for automatic Hessian sparsity detection #13

Work out good approach for automatic Hessian sparsity detection #13

ElOceanografo commented Jun 1, 2023

ElOceanografo commented May 21, 2024

gdalle commented May 30, 2024

ElOceanografo commented May 30, 2024

gdalle commented May 31, 2024

gdalle commented May 31, 2024 •

edited

Loading

ElOceanografo commented Jun 1, 2024 •

edited

Loading

gdalle commented Jun 4, 2024

gdalle commented Jun 4, 2024

ElOceanografo commented Jun 5, 2024

gdalle commented Jun 5, 2024

gdalle commented Jun 5, 2024

ElOceanografo commented Jun 5, 2024

ElOceanografo commented Jun 5, 2024

gdalle commented Jun 5, 2024

Work out good approach for automatic Hessian sparsity detection #13

Work out good approach for automatic Hessian sparsity detection #13

Comments

ElOceanografo commented Jun 1, 2023

ElOceanografo commented May 21, 2024

gdalle commented May 30, 2024

ElOceanografo commented May 30, 2024

gdalle commented May 31, 2024

gdalle commented May 31, 2024 • edited Loading

ElOceanografo commented Jun 1, 2024 • edited Loading

gdalle commented Jun 4, 2024

gdalle commented Jun 4, 2024

ElOceanografo commented Jun 5, 2024

gdalle commented Jun 5, 2024

gdalle commented Jun 5, 2024

ElOceanografo commented Jun 5, 2024

ElOceanografo commented Jun 5, 2024

gdalle commented Jun 5, 2024

gdalle commented May 31, 2024 •

edited

Loading

ElOceanografo commented Jun 1, 2024 •

edited

Loading