Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multinomial randomness #702

Merged
merged 19 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* introduce `rand(:HamiltonianMatrices)`
* extend `rand` to also `rand!` for `HamiltonianMatrices`, `SymplecticMatrices` and `SymplecticStiefel`
* implement `riemannian_gradient` conversion for `SymplecticMatrices` and `SymplecticGrassmann`
* the new manifold of `MultinomialSymmetricPositiveDefinite` matrices
* `rand!` for `MultinomialDoublyStochastic` and `MultinomialSymmetric`

### Deprecated

* Rename `Symplectic` to `SimplecticMatrices` in order to have a `Symplectic` wrapper for such matrices as well in the future for the next breaking change.
* Rename `SymplecticMatrix` to `SymplecticElement` to clarify that it is the special matrix ``J_{2n}`` and not an arbitrary symplectic matrix.

### Fixed

* a bug that cause `project` for tangent vectors to return wrong results on `MultinomialDoublyStochastic`

## [0.9.12] – 2024-01-21

### Fixed
Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ makedocs(;
"Multinomial doubly stochastic matrices" => "manifolds/multinomialdoublystochastic.md",
"Multinomial matrices" => "manifolds/multinomial.md",
"Multinomial symmetric matrices" => "manifolds/multinomialsymmetric.md",
"Multinomial symmetric positive definite matrices" => "manifolds/multinomialsymmetricpositivedefinite.md",
"Oblique manifold" => "manifolds/oblique.md",
"Probability simplex" => "manifolds/probabilitysimplex.md",
"Positive numbers" => "manifolds/positivenumbers.md",
Expand Down
5 changes: 5 additions & 0 deletions docs/src/manifolds/multinomialdoublystochastic.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,8 @@ Order = [:type, :function]
```

## Literature

```@bibliography
Pages = ["multinomialdoublystochastic.md"]
Canonical=false
```
5 changes: 5 additions & 0 deletions docs/src/manifolds/multinomialsymmetric.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,8 @@ Order = [:type, :function]
```

## Literature

```@bibliography
Pages = ["multinomialsymmetric.md"]
Canonical=false
```
14 changes: 14 additions & 0 deletions docs/src/manifolds/multinomialsymmetricpositivedefinite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Multinomial symmetric positive definite matrices

```@autodocs
Modules = [Manifolds]
Pages = ["manifolds/MultinomialSymmetricPositiveDefinite.jl"]
Order = [:type, :function]
```

## Literature

```@bibliography
Pages = ["multinomialsymmetricpositivedefinite.md"]
Canonical=false
```
1 change: 1 addition & 0 deletions docs/src/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,7 @@ @article{DouikHassibi:2019
TITLE = {Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry},
JOURNAL = {IEEE Transactions on Signal Processing}
}

#
# E
# ----------------------------------------------------------------------------------------
Expand Down
5 changes: 4 additions & 1 deletion src/Manifolds.jl
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ using ManifoldsBase:
IsEmbeddedSubmanifold,
IsExplicitDecorator,
LogarithmicInverseRetraction,
ManifoldDomainError,
ManifoldsBase,
NestedPowerRepresentation,
NestedReplacingPowerRepresentation,
Expand Down Expand Up @@ -426,6 +427,7 @@ include("manifolds/GeneralizedStiefel.jl")
include("manifolds/Hyperbolic.jl")
include("manifolds/MultinomialDoublyStochastic.jl")
include("manifolds/MultinomialSymmetric.jl")
include("manifolds/MultinomialSymmetricPositiveDefinite.jl")
include("manifolds/PositiveNumbers.jl")
include("manifolds/ProjectiveSpace.jl")
include("manifolds/SkewHermitian.jl")
Expand Down Expand Up @@ -646,6 +648,7 @@ export Euclidean,
MultinomialDoubleStochastic,
MultinomialMatrices,
MultinomialSymmetric,
MultinomialSymmetricPositiveDefinite,
Oblique,
OrthogonalMatrices,
PositiveArrays,
Expand Down Expand Up @@ -781,7 +784,7 @@ export CachedBasis,
InducedBasis,
ProjectedOrthonormalBasis
# Errors on Manifolds
export ComponentManifoldError, CompositeManifoldError
export ComponentManifoldError, CompositeManifoldError, ManifoldDomainError
# Functions on Manifolds
export ×,
^,
Expand Down
27 changes: 26 additions & 1 deletion src/manifolds/Multinomial.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ The multinomial manifold consists of `m` column vectors, where each column is of
`n` and unit norm, i.e.

````math
\mathcal{MN}(n,m) \coloneqq \bigl\{ p ∈ ℝ^{n×m}\ \big|\ p_{i,j} > 0 \text{ for all } i=1,…,n, j=1,…,m \text{ and } p^{\mathrm{T}}\mathbb{1}_m = \mathbb{1}_n\bigr\},
\mathcal{MN}(n,m) \coloneqq \bigl\{
p ∈ ℝ^{n×m}\ \big|\ p_{i,j} > 0 \text{ for all } i=1,…,n, j=1,…,m
\text{ and } p^{\mathrm{T}}\mathbb{1}_m = \mathbb{1}_n\bigr\},
````

where ``\mathbb{1}_k`` is the vector of length ``k`` containing ones.

This yields exactly the same metric as
Expand Down Expand Up @@ -85,6 +88,28 @@ function power_dimensions(M::MultinomialMatrices)
return (m,)
end

@doc raw"""
riemannian_gradient(M::MultinomialMatrices, p, Y; kwargs...)

Let ``Y`` denote the Euclidean gradient of a function ``\tilde f`` defined in the
embedding neighborhood of `M`, then the Riemannian gradient is given by
Equation 5 of [DouikHassibi:2019](@cite) as

```math
\operatorname{grad} f(p) = \proj_{T_p\mathcal M}(Y⊙p)
```

where ``⊙`` denotes the Hadamard or elementwise product.

"""
riemannian_gradient(M::MultinomialMatrices, p, Y; kwargs...)

function riemannian_gradient!(M::MultinomialMatrices, X, p, Y; kwargs...)
X .= p .* Y
project!(M, X, p, X)
return X
end

representation_size(M::MultinomialMatrices) = get_parameter(M.size)

function Base.show(io::IO, ::MultinomialMatrices{TypeParameter{Tuple{n,m}}}) where {n,m}
Expand Down
124 changes: 89 additions & 35 deletions src/manifolds/MultinomialDoublyStochastic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,57 @@

A common type for manifolds that are doubly stochastic, for example by direct constraint
[`MultinomialDoubleStochastic`](@ref) or by symmetry [`MultinomialSymmetric`](@ref),
or additionally by symmetric positive definiteness [`MultinomialSymmetricPositiveDefinite`](@ref)
as long as they are also modeled as [`IsIsometricEmbeddedManifold`](https://juliamanifolds.github.io/ManifoldsBase.jl/stable/decorator.html#ManifoldsBase.IsIsometricEmbeddedManifold).

That way they share the inner product (just by restriction), and even the Riemannian gradient
"""
abstract type AbstractMultinomialDoublyStochastic <: AbstractDecoratorManifold{ℝ} end

function active_traits(f, ::AbstractMultinomialDoublyStochastic, args...)
return merge_traits(IsIsometricEmbeddedManifold())
end

@doc raw"""
representation_size(M::AbstractMultinomialDoublyStochastic)

return the representation size of doubly stochastic matrices, whic are embedded
in the ``ℝ^{n×n}`` matrices and hence the answer here is ``
"""
function representation_size(M::AbstractMultinomialDoublyStochastic)
n = get_parameter(M.size)[1]
return (n, n)
end

@doc raw"""
riemannian_gradient(M::AbstractMultinomialDoublyStochastic, p, Y; kwargs...)

Let ``Y`` denote the Euclidean gradient of a function ``\tilde f`` defined in the
embedding neighborhood of `M`, then the Riemannian gradient is given by
Lemma 1 [DouikHassibi:2019](@cite) as

```math
\operatorname{grad} f(p) = \proj_{T_p\mathcal M}(Y⊙p)
```

where ``⊙`` denotes the Hadamard or elementwise product, and the projection
is the projection onto the tangent space of the corresponding manifold.

"""
riemannian_gradient(M::AbstractMultinomialDoublyStochastic, p, Y; kwargs...)

function riemannian_gradient!(M::AbstractMultinomialDoublyStochastic, X, p, Y; kwargs...)
X .= p .* Y
project!(M, X, p, X)
return X
end

@doc raw"""
MultinomialDoublyStochastic{T} <: AbstractMultinomialDoublyStochastic

The set of doubly stochastic multinomial matrices consists of all ``n×n`` matrices with
stochastic columns and rows, i.e.

````math
\begin{aligned}
\mathcal{DP}(n) \coloneqq \bigl\{p ∈ ℝ^{n×n}\ \big|\ &p_{i,j} > 0 \text{ for all } i=1,…,n, j=1,…,m,\\
Expand Down Expand Up @@ -60,44 +98,27 @@ end
Checks whether `p` is a valid point on the [`MultinomialDoubleStochastic`](@ref)`(n)` `M`,
i.e. is a matrix with positive entries whose rows and columns sum to one.
"""
function check_point(
M::MultinomialDoubleStochastic,
p::T;
atol::Real=sqrt(prod(representation_size(M))) * eps(real(float(number_eltype(T)))),
kwargs...,
) where {T}
function check_point(M::MultinomialDoubleStochastic, p; kwargs...)
n = get_parameter(M.size)[1]
r = sum(p, dims=2)
if !isapprox(norm(r - ones(n, 1)), 0.0; atol=atol, kwargs...)
return DomainError(
r,
"The point $(p) does not lie on $M, since its rows do not sum up to one.",
)
end
return nothing
s = check_point(MultinomialMatrices(n, n), p; kwargs...)
!isnothing(s) && return s
s2 = check_point(MultinomialMatrices(n, n), p'; kwargs...)
return s2
end

@doc raw"""
check_vector(M::MultinomialDoubleStochastic p, X; kwargs...)

Checks whether `X` is a valid tangent vector to `p` on the [`MultinomialDoubleStochastic`](@ref) `M`.
This means, that `p` is valid, that `X` is of correct dimension and sums to zero along any
column or row.
"""
function check_vector(
M::MultinomialDoubleStochastic,
p,
X::T;
atol::Real=sqrt(prod(representation_size(M))) * eps(real(float(number_eltype(T)))),
kwargs...,
) where {T}
r = sum(X, dims=2) # check for stochastic rows
if !isapprox(norm(r), 0.0; atol=atol, kwargs...)
return DomainError(
r,
"The matrix $(X) is not a tangent vector to $(p) on $(M), since its rows do not sum up to zero.",
)
end
return nothing
function check_vector(M::MultinomialDoubleStochastic, p, X; kwargs...)
n = get_parameter(M.size)[1]
s = check_vector(MultinomialMatrices(n, n), p, X; kwargs...)
!isnothing(s) && return s
s2 = check_vector(MultinomialMatrices(n, n), p, X'; kwargs...)
return s2
end

function get_embedding(::MultinomialDoubleStochastic{TypeParameter{Tuple{n}}}) where {n}
Expand Down Expand Up @@ -134,11 +155,14 @@ end

Project `Y` onto the tangent space at `p` on the [`MultinomialDoubleStochastic`](@ref) `M`, return the result in `X`.
The formula reads

````math
\operatorname{proj}_p(Y) = Y - (α\mathbf{1}_n^{\mathrm{T}} + \mathbf{1}_nβ^{\mathrm{T}}) ⊙ p,
````

where ``⊙`` denotes the Hadamard or elementwise product and ``\mathbb{1}_n`` is the vector of length ``n`` containing ones.
The two vectors ``α,β ∈ ℝ^{n×n}`` are computed as a solution (typically using the left pseudo inverse) of

````math
\begin{pmatrix} I_n & p\\p^{\mathrm{T}} & I_n \end{pmatrix}
\begin{pmatrix} α\\ β\end{pmatrix}
Expand All @@ -152,8 +176,8 @@ project(::MultinomialDoubleStochastic, ::Any, ::Any)

function project!(M::MultinomialDoubleStochastic, X, p, Y)
n = get_parameter(M.size)[1]
ζ = [I p; p I] \ [sum(Y, dims=2); sum(Y, dims=1)'] # Formula (25) from 1802.02628
return X .= Y .- (repeat(ζ[1:n], 1, 3) .+ repeat(ζ[(n + 1):end]', 3, 1)) .* p
ζ = [I p; p' I] \ [sum(Y, dims=2); sum(Y, dims=1)'] # Formula (25) from 1802.02628
return X .= Y .- (repeat(ζ[1:n], 1, n) .+ repeat(ζ[(n + 1):end]', n, 1)) .* p
end

@doc raw"""
Expand Down Expand Up @@ -201,15 +225,45 @@ function project!(
return q
end

function representation_size(M::MultinomialDoubleStochastic)
n = get_parameter(M.size)[1]
return (n, n)
@doc raw"""
rand(::MultinomialDoubleStochastic; vector_at=nothing, σ::Real=1.0, kwargs...)

Generate random points on the [`MultinomialDoubleStochastic`](@ref) manifold
or tangent vectors at the point `vector_at` if that is not `nothing`.

Let ``n×n`` denote the matrix dimension of the [`MultinomialDoubleStochastic`](@ref).

When `vector_at` is nothing, this is done by generating a random matrix`rand(n,n)`
with positive entries and projecting it onto the manifold. The `kwargs...` are
passed to this projection.

When `vector_at` is not `nothing`, a random matrix in the ambient space is generated
and projected onto the tangent space
"""
rand(::MultinomialDoubleStochastic; σ::Real=1.0)

function Random.rand!(
rng::AbstractRNG,
M::MultinomialDoubleStochastic,
pX;
vector_at=nothing,
σ::Real=one(real(eltype(pX))),
kwargs...,
)
rand!(rng, pX)
pX .*= σ
if vector_at === nothing
project!(M, pX, pX; kwargs...)
else
project!(M, pX, vector_at, pX)
end
return pX
end

@doc raw"""
retract(M::MultinomialDoubleStochastic, p, X, ::ProjectionRetraction)

compute a projection based retraction by projecting $p\odot\exp(X⨸p)$ back onto the manifold,
compute a projection based retraction by projecting ``p\odot\exp(X⨸p)`` back onto the manifold,
where ``⊙,⨸`` are elementwise multiplication and division, respectively. Similarly, ``\exp``
refers to the elementwise exponentiation.
"""
Expand Down
Loading
Loading