Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive proposals #39

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
e9fd602
adaptive proposal
Nov 22, 2020
5f0ddfa
Adaptive univariate proposals tests
Nov 22, 2020
f42b784
swp files from vim...
Nov 22, 2020
a188780
test ESS
Nov 22, 2020
d8989aa
export AdaptiveProposal
Nov 22, 2020
565f12a
Update src/adaptive.jl
arzwa Nov 22, 2020
cc16195
Update src/adaptive.jl
arzwa Nov 22, 2020
802ec67
Update src/adaptive.jl
arzwa Nov 22, 2020
c7623c4
Update src/adaptive.jl
arzwa Nov 22, 2020
a33937e
Update src/adaptive.jl
arzwa Nov 22, 2020
4007bd0
Update src/adaptive.jl
arzwa Nov 22, 2020
71e010b
Update src/adaptive.jl
arzwa Nov 22, 2020
2d59ede
Update src/adaptive.jl
arzwa Nov 22, 2020
279aea7
Update src/adaptive.jl
arzwa Nov 22, 2020
93f17c5
Update src/adaptive.jl
arzwa Nov 23, 2020
16715e1
Update src/adaptive.jl
arzwa Nov 23, 2020
046c21b
Update src/adaptive.jl
arzwa Nov 23, 2020
b91fcc0
test update
Nov 23, 2020
387eff4
Merge branch 'adaptive' of https://github.com/arzwa/AdvancedMH.jl int…
Nov 23, 2020
8fb1048
Adaptor docstring
Nov 23, 2020
4071675
Update test/runtests.jl
arzwa Nov 23, 2020
a63262d
updated tests
Nov 23, 2020
afd3ed1
updated tests
Nov 23, 2020
fe8562c
Merge branch 'adaptive' of https://github.com/arzwa/AdvancedMH.jl int…
Nov 23, 2020
f66f647
adaptive MvNormal
Dec 10, 2020
2988ff2
merge with master
Dec 11, 2020
e5ad041
AdaptiveMvNormal with tests
Dec 11, 2020
7aa8631
Update src/adaptivemvnormal.jl
arzwa Dec 11, 2020
4999349
Update src/adaptivemvnormal.jl
arzwa Dec 11, 2020
d4b3f6b
Update src/adaptivemvnormal.jl
arzwa Dec 11, 2020
cb52c7f
Update test/Project.toml
arzwa Dec 11, 2020
5a2a175
Update src/mh-core.jl
arzwa Dec 11, 2020
b2be967
Update src/adaptivemvnormal.jl
arzwa Dec 11, 2020
518aab1
removed test Project.toml
Dec 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ version = "0.5.6"
[deps]
AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
PDMats = "90014a1f-27ba-587c-ab20-58faa44d9150"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"

Expand Down
7 changes: 6 additions & 1 deletion src/AdvancedMH.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ module AdvancedMH
using AbstractMCMC
using Distributions
using Requires
using LinearAlgebra
using PDMats

import Random

# Exports
export MetropolisHastings, DensityModel, RWMH, StaticMH, StaticProposal,
RandomWalkProposal, Ensemble, StretchProposal, MALA
RandomWalkProposal, Ensemble, StretchProposal, MALA, AdaptiveProposal,
AdaptiveMvNormal

# Reexports
export sample, MCMCThreads, MCMCDistributed
Expand Down Expand Up @@ -110,5 +113,7 @@ end
include("proposal.jl")
include("mh-core.jl")
include("emcee.jl")
include("adaptive.jl")
include("adaptivemvnormal.jl")

end # module AdvancedMH
110 changes: 110 additions & 0 deletions src/adaptive.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""
Adaptor(; tune=25, target=0.44, bound=10., δmax=0.2)

A helper struct for univariate adaptive proposal kernels. This tracks the
number of accepted proposals and the total number of attempted proposals. The
proposal kernel is tuned every `tune` proposals, such that the scale (log(σ) in
the case of a Normal kernel, log(b) for a Uniform kernel) of the proposal is
increased (decreased) by `δ(n) = min(δmax, 1/√n)` at tuning step `n` if the
estimated acceptance probability is higher (lower) than `target`. The target
acceptance probability defaults to 0.44 which is supposedly optimal for 1D
proposals. To ensure ergodicity, the scale of the proposal has to be bounded
(by `bound`), although this is often not required in practice.
"""
mutable struct Adaptor
accepted::Int
total::Int
tune::Int # tuning interval
target::Float64 # target acceptance rate
bound::Float64 # bound on logσ of Gaussian kernel
δmax::Float64 # maximum adaptation step
end

function Adaptor(; tune=25, target=0.44, bound=10., δmax=0.2)
return Adaptor(0, 0, tune, target, bound, δmax)
end

"""
AdaptiveProposal{P}

An adaptive Metropolis-Hastings proposal. In order for this to work, the
proposal kernel should implement the `adapted(proposal, δ)` method, where `δ`
is the increment/decrement applied to the scale of the proposal distribution
during adaptation (e.g. for a Normal distribution the scale is `log(σ)`, so
that after adaptation the proposal is `Normal(0, exp(log(σ) + δ))`).

# Example
```julia
julia> p = AdaptiveProposal(Uniform(-0.2, 0.2));

julia> rand(p)
0.07975590594518434
```

# References

Roberts, Gareth O., and Jeffrey S. Rosenthal. "Examples of adaptive MCMC."
Journal of Computational and Graphical Statistics 18.2 (2009): 349-367.
"""
mutable struct AdaptiveProposal{P} <: Proposal{P}
proposal::P
adaptor::Adaptor
end

function AdaptiveProposal(p; kwargs...)
return AdaptiveProposal(p, Adaptor(; kwargs...))
end

# Adaptive proposals are only defined for symmetric proposal distributions
is_symmetric_proposal(::AdaptiveProposal) = true
Comment on lines +58 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, it would be good to check this property in an inner constructor.


accepted!(p::AdaptiveProposal) = p.adaptor.accepted += 1
accepted!(p::Vector{<:AdaptiveProposal}) = map(accepted!, p)
accepted!(p::NamedTuple{names}) where names = map(x->accepted!(getfield(p, x)), names)

# this is defined because the first draw has no transition yet (I think)
function propose(rng::Random.AbstractRNG, p::AdaptiveProposal, m::DensityModel)
return rand(rng, p.proposal)
end

# the actual proposal happens here
function propose(
rng::Random.AbstractRNG,
proposal::AdaptiveProposal{<:Union{Distribution,Proposal}},
model::DensityModel,
t
)
consider_adaptation!(proposal)
return t + rand(rng, proposal.proposal)
end

function q(proposal::AdaptiveProposal, t, t_cond)
return logpdf(proposal, t - t_cond)
end

Comment on lines +81 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? It seems like the default for Proposal should cover this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there seems to be no default q for Proposal. However it's also unnecessary because these adaptive proposals are always symmetric.

function consider_adaptation!(p)
(p.adaptor.total % p.adaptor.tune == 0) && adapt!(p)
p.adaptor.total += 1
end

function adapt!(p::AdaptiveProposal)
a = p.adaptor
a.total == 0 && return
δ = min(a.δmax, sqrt(a.tune / a.total)) # diminishing adaptation
α = a.accepted / a.tune # acceptance ratio
p_ = adapted(p.proposal, α > a.target ? δ : -δ, a.bound)
a.accepted = 0
p.proposal = p_
end

function adapted(d::Normal, δ, bound=Inf)
_lσ = log(d.σ) + δ
lσ = sign(_lσ) * min(bound, abs(_lσ))
return Normal(d.μ, exp(lσ))
end

function adapted(d::Uniform, δ, bound=Inf)
lσ = log(d.b) + δ
σ = exp(sign(lσ) * min(bound, abs(lσ)))
return Uniform(-σ, σ)
end
82 changes: 82 additions & 0 deletions src/adaptivemvnormal.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""
AdaptiveMvNormal(constant_component::MvNormal; σ=2.38, β=0.05)

Adaptive multivariate normal mixture proposal as described in Haario et al. and
Roberts & Rosenthal (2009). Uses a two-component mixture of MvNormal
distributions. One of the components (with mixture weight `β`) remains
constant, while the other component is adapted to the target covariance
structure. The proposal is initialized by providing the constant component to
the constructor.

`σ` is the scale factor for the covariance matrix, where 2.38 is supposedly
optimal in a high-dimensional context according to Roberts & Rosenthal.

# References

- Haario, Heikki, Eero Saksman, and Johanna Tamminen.
"An adaptive Metropolis algorithm." Bernoulli 7.2 (2001): 223-242.
- Roberts, Gareth O., and Jeffrey S. Rosenthal. "Examples of adaptive MCMC."
Journal of Computational and Graphical Statistics 18.2 (2009): 349-367.
"""
mutable struct AdaptiveMvNormal{T1,T2,V} <: Proposal{T1}
d::Int # dimensionality
n::Int # iteration
β::Float64 # constant component mixture weight
σ::Float64 # scale factor for adapted covariance matrix
constant::T1
adaptive::T2
Ex::Vector{V} # rolling mean vector
EX::Matrix{V} # scatter matrix of previous draws
end

function AdaptiveMvNormal(dist::MvNormal; σ=2.38, β=0.05)
n = length(dist)
adaptive = MvNormal(cov(dist))
AdaptiveMvNormal(n, -1, β, σ, dist, adaptive, zeros(n), zeros(n,n))
end

is_symmetric_proposal(::AdaptiveMvNormal) = true

"""
adapt!(p::AdaptiveMvNormal, x::AbstractVector)

Adaptation for the adaptive multivariate normal mixture proposal as described
in Haario et al. (2001) and Roberts & Rosenthal (2009). Will perform an online
estimation of the target covariance matrix and mean. The code for this routine
is largely based on `Mamba.jl`.
"""
function adapt!(p::AdaptiveMvNormal, x::AbstractVector)
p.n += 1
# adapt mean vector and scatter matrix
f = p.n / (p.n + 1)
p.Ex = f * p.Ex + (1 - f) * x
p.EX = f * p.EX + (1 - f) * x * x'
# compute adapted covariance matrix
Σ = (p.σ^2 / (p.d * f)) * (p.EX - p.Ex * p.Ex')
F = cholesky(Hermitian(Σ), check=false)
if rank(F.L) == p.d
p.adaptive = MvNormal(PDMat(Σ, F))
end
Comment on lines +56 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, can't we just use the default constructors of MvNormal which call PDMat implicitly?

Suggested change
F = cholesky(Hermitian(Σ), check=false)
if rank(F.L) == p.d
p.adaptive = MvNormal(PDMat(Σ, F))
end
p.adaptive = MvNormal(Σ)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it this way (inspired by the Mamba implementation) to check whether the empirical estimate of the covariance matrix is a proper covariance matrix. If it is we can adapt the proposal distribution without additional computation by feeding the cholesky decomposition to MvNormal this way. How would you do it otherwise, I guess you could use try/catch but that doesn't seem nicer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think a try-catch block would be any better. IMO it is not good to just suppress the errors here and use an incorrect Cholesky decomposition with possibly negative eigenvalues. I would suggest the following:

  • If possible, update the Cholesky decomposition (or some other parametrization of the covariance matrix) iteratively. Usually this is more efficient than recomputing the factorization in every step and it avoid the numerical problems.
  • Often calling Matrix(Symmetric(Sigma)) is sufficient to avoid problems with matrices that are not completely symmetric and for which therefore the Cholesky decomposition fails.
  • Alternatively, use PositiveFactorizations.

IMO the first alternative would clearly be the best, if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a quick look at Haario et al. and the update formula of the covariance matrix in eq (3). It seems that indeed the update could be performed by only d + 3 rank 1 up- and downdates which each can be performed in O(d^2). Actually, if eps = 0, then only 3 rank 1 up- and downdates are needed and therefore the iterative update of the Cholesky decomposition would be much cheaper than recomputing the decomposition in O(d^3) operations. The rank 1 up- and downdates are implemented in LinearAlgebra.lowrankupdate! and LinearAlgebra.lowrankdowndate! (and their corresponding out-of-place methods).

I think this would be the preferable method for updating the Cholesky decomposition. Probably it would also be cheaper to save just the Cholesky decomposition and not wrap it in a MvNormal object since it is trivial to sample from the multivariate normal distribution with the Cholesky decomposition.

Copy link
Author

@arzwa arzwa Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds interesting and worthwhile, have to look at it. FWIW I just based my code on the description in the Roberts & Rosenthal paper and the Mamba implementation, since I simply wanted to use this for a problem I'm having.

I had the impression from the code that there is a preference to use Distributions.jl distributions for sampling, that's why I wrapped them in MvNormal objects instead of using the Cholesky factor directly (when both the full matrix and cholesky factors are available, it is as cheap to put them in an MvNormal). When updating the cholesky factor directly I agree it would be better to to do as you suggest.

end

function Base.rand(rng::Random.AbstractRNG, p::AdaptiveMvNormal)
return if p.n > 2 * p.d
p.β * rand(rng, p.constant) + (1 - p.β) * rand(rng, p.adaptive)
else
rand(rng, p.constant)
end
end

function propose(rng::Random.AbstractRNG, proposal::AdaptiveMvNormal, m::DensityModel)
return rand(rng, proposal)
end

Comment on lines +70 to +73
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not covered by the default for Proposal?

function propose(
rng::Random.AbstractRNG,
proposal::AdaptiveMvNormal,
model::DensityModel,
t
)
adapt!(proposal, t)
return t + rand(rng, proposal)
end
3 changes: 3 additions & 0 deletions src/mh-core.jl
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,11 @@ function AbstractMCMC.step(

# Decide whether to return the previous params or the new one.
if -Random.randexp(rng) < logα
accepted!(spl.proposal)
return params, params
else
return params_prev, params_prev
end
end

accepted!(::Proposal) = nothing
2 changes: 1 addition & 1 deletion src/proposal.jl
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,4 @@ function q(
t_cond
)
return q(proposal(t_cond), t, t_cond)
end
end
4 changes: 2 additions & 2 deletions test/emcee.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Random.seed!(100)
sampler = Ensemble(1_000, StretchProposal([InverseGamma(2, 3), Normal(0, 1)]))
chain = sample(model, sampler, 1_000;
param_names = ["s", "m"], chain_type = Chains)
param_names = ["s", "m"], chain_type = Chains, progress = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@test mean(chain["s"]) ≈ 49/24 atol=0.1
@test mean(chain["m"]) ≈ 7/6 atol=0.1
Expand All @@ -43,7 +43,7 @@
Random.seed!(100)
sampler = Ensemble(1_000, StretchProposal(MvNormal(2, 1)))
chain = sample(model, sampler, 1_000;
param_names = ["logs", "m"], chain_type = Chains)
param_names = ["logs", "m"], chain_type = Chains, progress = false)

@test mean(exp, chain["logs"]) ≈ 49/24 atol=0.1
@test mean(chain["m"]) ≈ 7/6 atol=0.1
Expand Down
Loading