Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symmetric positive definite matrix transforms (aka covariance matrices) #15

Open
5 tasks
mjhajharia opened this issue Jul 11, 2022 · 8 comments
Open
5 tasks

Comments

@mjhajharia
Copy link
Owner

mjhajharia commented Jul 11, 2022

  • implement SPD transforms as Stan programs
  • code test distribution programs
    • Wishart
    • inverse Wishart
    • something spectral (distribution on eigenvalues)?

The set of SPD matrices is not compact, so we need to add proper distributions. The Wisharts are conjugate for covariance in a multivariate normal model, so they're essentially like testing multivariate normal posteriors.

@mjhajharia
Copy link
Owner Author

@sethaxen maybe I can start with correlation matrices and you can see covariance matrices

@mjhajharia mjhajharia changed the title orthogonal transforms covariance/correlation matrix transforms Jul 11, 2022
@bob-carpenter
Copy link
Collaborator

Is this for Cholesky factors? Or for the whole thing? We usually go with Cholesky-factor parameterizations, so to me, those are more important. But for symmetric positive definite, that's just an unconstraining transform from (0, inf) to (-inf, inf) for the diagonals.

The alternative is to take a symmetric matrix and push it through matrix exp. I have no clue as to what the Jacobian determinant looks like there.

@bob-carpenter
Copy link
Collaborator

I think correlation matrices should be broken out into a separate issue. Adam Haber has looked at those.

@sethaxen
Copy link
Collaborator

sethaxen commented Jul 12, 2022

@sethaxen maybe I can start with correlation matrices and you can see covariance matrices

Works for me!

But for symmetric positive definite, that's just an unconstraining transform from (0, inf) to (-inf, inf) for the diagonals.

Agreed, we could just take the "best" Cholesky transform with the transform for the positive diagonals.

Another alternative is to use the "best" transform for orthonormal matrices for the eigenvectors and then a transform for the positive eigenvectors. I think I've seen other approaches in the literature before and will look into it.

The alternative is to take a symmetric matrix and push it through matrix exp. I have no clue as to what the Jacobian determinant looks like there.

From http://eprints.maths.manchester.ac.uk/2754/1/catalog.pdf, the eigendecomposition is still the preferred way to compute exp(A) for symmetric A. i.e. if A=U Λ U', then exp(A) = U exp(Λ) U'. This is the composition of 3 bijective maps: eigendecomposition, diagonal exponential, then inverse of eigendecomposition, so we only need to work out the Jacobian correction from symmetric eigendecomposition. From Alan Edelman's lecture notes (Section 2.5 of http://web.mit.edu/18.325/www/handouts/handout3.pdf), the Jacobian correction for symmetric eigendecomposition is $\prod_{i < j} |\lambda_j - \lambda_i|$. So the correction for symmetric matrix exponential would be

$$\prod_{i=1}^n e^{\lambda_i} \prod_{j=i+1}^n \left| \frac{e^{\lambda_j} - e^{\lambda_i}}{\lambda_j - \lambda_i} \right |$$

I verified this using finite differences in Julia:

julia> using FiniteDifferences, LinearAlgebra, LogExpFunctions

julia> function sparse_to_vec(U)
           inds = findall(!iszero, U)
           v = U[inds]
           function vec_to_sparse(v)
               Unew = similar(U)
               fill!(Unew, 0)
               Unew[inds] .= v
               return Unew
           end
           return v, vec_to_sparse
       end
sparse_to_vec (generic function with 1 method)

julia> function symexp_logdetjac(A)
           λ = eigvals(A)
           T = float(eltype(λ))
           n = size(A, 1)
           s = zero(T)
           for i in 1:n
               λᵢ = λ[i]
               s += λᵢ
               for j in (i + 1):n
                   λₗ, λᵤ = minmax(λᵢ, λ[j])
                   s += logsubexp(λᵤ, λₗ) - log(λᵤ - λₗ)
               end
           end
           return s
       end
symexp_logdetjac (generic function with 1 method)

julia> n = 5;

julia> U = triu(randn(n, n))
5×5 Matrix{Float64}:
 -0.500465  2.08902    0.782754   0.844179  -0.766196
  0.0       0.688097   0.929935  -1.54167    0.114989
  0.0       0.0       -0.24768    1.41416    0.830064
  0.0       0.0        0.0       -1.26386    0.915003
  0.0       0.0        0.0        0.0       -0.220671

julia> A = Symmetric(U)
5×5 Symmetric{Float64, Matrix{Float64}}:
 -0.500465   2.08902    0.782754   0.844179  -0.766196
  2.08902    0.688097   0.929935  -1.54167    0.114989
  0.782754   0.929935  -0.24768    1.41416    0.830064
  0.844179  -1.54167    1.41416   -1.26386    0.915003
 -0.766196   0.114989   0.830064   0.915003  -0.220671

julia> λ = eigvals(A);

julia> v, vec_to_sparse = sparse_to_vec(U);

julia> J = jacobian(central_fdm(5, 1), first  sparse_to_vec  triu  exp  Symmetric  vec_to_sparse, v)[1]
15×15 Matrix{Float64}:
  3.20214      5.12366     1.56993      -0.531845   -0.253399   -0.0277042   0.0378789
  2.56183      6.45721     3.58887       -0.658365   -0.197366    0.0777662   0.0283516
  1.56993      7.17774     7.2442        -0.748569   -0.136485    0.130263    0.0115425
  1.21801      2.00235     0.751506       0.378459   -0.0642349  -0.0627738  -0.0795216
  0.740135     2.4536      1.75339        0.987444   -0.0265266  -0.262457   -0.050647
  0.348811     0.707586    0.359101      0.489996    1.33695     0.403651    0.159073
  0.100272    -0.734982   -0.636234       0.601058    0.14057    -0.191995   -0.0795725
  0.139963    -0.442792   -1.29323        1.12791     0.176198   -0.420207   -0.0500219
  0.0677437   -0.232091   -0.308252       0.0485569   0.880119    0.719764    0.158394
 -0.00103785  -0.0805224   0.237911      -0.389536    0.402673    1.05731     0.157784
 -0.338023    -0.545782   -0.145159      2.16254     0.939959   -0.0392903  -0.249754
 -0.265922    -0.658365   -0.374284       3.7016      1.0117     -0.725274   -0.248751
 -0.1267      -0.197366   -0.0682424      1.0117      2.09928     0.841366    0.569505
 -0.0138521    0.0777662   0.0651313     -0.725274    0.841366    1.50725     0.580339
  0.0378789    0.0567031   0.0115425     -0.497502    1.13901     1.16068     1.44321

julia> logabsdet(J)[1]
0.3457424809406372

julia> symexp_logdetjac(A)
0.3457424809559124

I'd expect ADing through this transform would be quite a bit more expensive than for the Cholesky transform, which only requires adding the unconstrained parameters behind the positive diagonals.

@bob-carpenter
Copy link
Collaborator

I think @adamhaber already has the correlation matrix and Cholesky for correlation matrix transforms.

See: tensorflow/probability#694

@sethaxen
Copy link
Collaborator

@bob-carpenter I'm not certain if your comment is in reply to me, but I was only referring to symmetric positive definite matrices.

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented Jul 12, 2022

I verified this using finite differences in Julia:

Super cool! That correction is $\mathcal{O}(n^2)$. We have matrix_exp implemented in Stan using the Al-Mohy and Higham algorithm.

@bob-carpenter bob-carpenter changed the title covariance/correlation matrix transforms symmetric positive definite matrix transforms (aka covariance matrices) Jul 12, 2022
@sethaxen
Copy link
Collaborator

f780af0 adds derivations for this section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants