Skip to content

Commit

Permalink
Merge pull request #87 from TuringLang/tor/using-jmd
Browse files Browse the repository at this point in the history
[WIP] Implementation of #86
  • Loading branch information
torfjelde authored Sep 25, 2020
2 parents 4339740 + 6d38bcd commit f780124
Show file tree
Hide file tree
Showing 19 changed files with 251 additions and 1,942 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"

[compat]
julia = "1"
julia = "1"
109 changes: 13 additions & 96 deletions tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions tutorials/bayesian-deep-learning/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[deps]
AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
427 changes: 2 additions & 425 deletions tutorials/differential-equations/01_bayesian-diff-eq.jmd

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions tutorials/differential-equations/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
DiffEqBayes = "ebbdde9d-f333-5424-9be2-dbf1e9acfb5e"
DiffEqSensitivity = "41bf760c-e81c-5289-8e54-58b1f1f8abe2"
DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
63 changes: 14 additions & 49 deletions tutorials/graphical-models/01_hidden-markov-model.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,6 @@ N = length(y); K = 3;
plot(y, xlim = (0,15), ylim = (-1,5), size = (500, 250))
```




![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_3_0.svg)



We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.

Ultimately, we would like to understand three major parameters:
Expand All @@ -53,9 +46,9 @@ Ultimately, we would like to understand three major parameters:

With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.

\$\$
m_i \sim Normal(i, 0.5), \space m = \{1,2,3\}
\$\$
$$
m_i \sim \mathrm{Normal}(i, 0.5) \quad \text{where} \quad m = \{1,2,3\}
$$

Simply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state's emissions is that state's value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations.

Expand Down Expand Up @@ -120,8 +113,8 @@ The code below generates an animation showing the graph of the data above, and t
using StatsPlots

# Extract our m and s parameters from the chain.
m_set = c[:m].value.data
s_set = c[:s].value.data
m_set = MCMCChains.group(c, :m).value
s_set = MCMCChains.group(c, :s).value

# Iterate through the MCMC samples.
Ns = 1:length(c)
Expand All @@ -130,7 +123,7 @@ Ns = 1:length(c)
animation = @animate for i in Ns
m = m_set[i, :];
s = Int.(s_set[i,:]);
emissions = collect(skipmissing(m[s]))
emissions = m[s]

p = plot(y, c = :red,
size = (500, 250),
Expand All @@ -139,60 +132,32 @@ animation = @animate for i in Ns
legend = :topright, label = "True data",
xlim = (0,15),
ylim = (-1,5));
plot!(emissions, color = :blue, label = "Sample $$N")
end every 10;
plot!(emissions, color = :blue, label = "Sample $N")
end every 10
```

![animation](https://user-images.githubusercontent.com/422990/50612436-de588980-0e8e-11e9-8635-4e3e97c0d7f9.gif)

Looks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.


```julia
# Index the chain with the persistence probabilities.
subchain = c[:,["T[$$i][$$i]" for i in 1:K],:]
subchain = MCMCChains.group(c, :T)

# TODO: This doesn't work anymore. Note sure what it was originally doing
# Plot the chain.
plot(subchain,
plot(
subchain,
colordim = :parameter,
seriestype=:traceplot,
title = "Persistence Probability",
legend=:right
)
)
```




![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_11_0.svg)



A cursory examination of the traceplot above indicates that at least `T[3,3]` and possibly `T[2,2]` have converged to something resembling stationary. `T[1,1]`, on the other hand, has a slight "wobble", and seems less consistent than the others. We can use the diagnostic functions provided by [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) to engage in some formal tests, like the Heidelberg and Welch diagnostic:


```julia
heideldiag(c[:T])
heideldiag(MCMCChains.group(c, :T))
```




1-element Array{ChainDataFrame{NamedTuple{(:parameters, Symbol("Burn-in"), :Stationarity, Symbol("p-value"), :Mean, :Halfwidth, :Test),Tuple{Array{String,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1}}}},1}:
Heidelberger and Welch Diagnostic - Chain 1
parameters Burn-in Stationarity p-value Mean Halfwidth Test
────────── ─────── ──────────── ─────── ────── ───────── ──────
T[1][1] 50.0000 0.0000 0.0001 0.5329 0.0063 1.0000
T[1][2] 50.0000 0.0000 0.0189 0.1291 0.0043 1.0000
T[1][3] 50.0000 0.0000 0.0230 0.3381 0.0032 1.0000
T[2][1] 30.0000 1.0000 0.2757 0.0037 0.0000 1.0000
T[2][2] 0.0000 1.0000 0.1689 0.0707 0.0022 1.0000
T[2][3] 0.0000 1.0000 0.1365 0.9255 0.0022 1.0000
T[3][1] 50.0000 0.0000 0.0454 0.4177 0.0147 1.0000
T[3][2] 40.0000 1.0000 0.0909 0.2549 0.0080 1.0000
T[3][3] 50.0000 0.0000 0.0098 0.3274 0.0067 1.0000




The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be somewhat more confident that our transition matrix has converged to something reasonable.
5 changes: 5 additions & 0 deletions tutorials/graphical-models/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[deps]
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
43 changes: 5 additions & 38 deletions tutorials/introduction/01_introduction.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,18 +53,6 @@ data = rand(Bernoulli(p_true), last(Ns))
data[1:5]
```




5-element Array{Bool,1}:
1
0
1
1
0



After flipping all our coins, we want to set a prior belief about what we think the distribution of coin flips look like. In this case, we are going to choose a common prior distribution called the [Beta](https://en.wikipedia.org/wiki/Beta_distribution) distribution.


Expand All @@ -81,11 +69,11 @@ For the mathematically inclined, the `Beta` distribution is updated by adding ea

This works because mean of the `Beta` distribution is defined as the following:

\$\$ \text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta} \$\$
$$\text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta}$$

Which is 0.5 when $$\alpha = \beta$$, as we expect for a large enough number of coin flips. As we increase the number of samples, our variance will also decrease, such that the distribution will reflect less uncertainty about the probability of receiving a heads. The definition of the variance for the `Beta` distribution is the following:

\$\$ \text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \$\$
$$\text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$$

The intuition about this definition is that the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. More samples means less variance.

Expand All @@ -107,23 +95,16 @@ animation = @gif for (i, N) in enumerate(Ns)
# Plotting
plot(updated_belief,
size = (500, 250),
title = "Updated belief after $$N observations",
title = "Updated belief after $N observations",
xlabel = "probability of heads",
ylabel = "",
legend = nothing,
xlim = (0,1),
fill=0, α=0.3, w=3)
vline!([p_true])
end;
end
```

┌ Info: Saved animation to
│ fn = /home/cameron/code/TuringTutorials/tmp.gif
└ @ Plots /home/cameron/.julia/packages/Plots/Xnzc7/src/animation.jl:104


![animation](https://user-images.githubusercontent.com/7974003/44995702-37c1b200-af9c-11e8-8b26-c88a528956af.gif)

The animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.

### Coin Flipping With Turing
Expand Down Expand Up @@ -151,7 +132,7 @@ First, we define the coin-flip model using Turing.
@model coinflip(y) = begin

# Our prior belief about the probability of heads in a coin.
p ~ Beta(1, 1)
p ~ Beta(1, 1)

# The number of observations.
N = length(y)
Expand Down Expand Up @@ -184,13 +165,6 @@ p_summary = chain[:p]
plot(p_summary, seriestype = :histogram)
```




![svg](/tutorials/0_Introduction_files/0_Introduction_21_0.svg)



Now we can build our plot:


Expand All @@ -212,11 +186,4 @@ plot!(p, range(0, stop = 1, length = 100), pdf.(Ref(updated_belief), range(0, st
vline!(p, [p_true], label = "True probability", c = :red)
```




![svg](/tutorials/0_Introduction_files/0_Introduction_23_0.svg)



As we can see, the Turing model closely approximates the true probability. Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing's simpler applications. More advanced usage will be demonstrated in later tutorials.
84 changes: 15 additions & 69 deletions tutorials/introduction/02_gaussian-mixture-model.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,31 +27,24 @@ x = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2)
scatter(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
```




![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_2_0.svg)



## Gaussian Mixture Model in Turing

To cluster the data points shown above, we use a model that consists of two mixture components (clusters) and assigns each datum to one of the components. The assignment thereof determines the distribution that the data point is generated from.

In particular, in a Bayesian Gaussian mixture model with $$1 \leq k \leq K$$ components for 1-D data each data point $$x_i$$ with $$1 \leq i \leq N$$ is generated according to the following generative process.
First we draw the parameters for each cluster, i.e. in our example we draw location of the distributions from a Normal:
\$\$
\mu_k \sim Normal() \, , \; \forall k \\
\$\$
$$
\mu_k \sim \mathrm{Normal}() \, , \; \forall k
$$
and then draw mixing weight for the $$K$$ clusters from a Dirichlet distribution, i.e.
\$\$
w \sim Dirichlet(K, \alpha) \, . \\
\$\$
$$
w \sim \mathrm{Dirichlet}(K, \alpha) \, .
$$
After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters and then drawing the datum accordingly, i.e.
\$\$
z_i \sim Categorical(w) \, , \; \forall i \\
x_i \sim Normal(\mu_{z_i}, 1.) \, , \; \forall i
\$\$
$$
z_i \sim \mathrm{Categorical}(w) \, , \; \forall i \\
x_i \sim \mathrm{Normal}(\mu_{z_i}, 1.) \, , \; \forall i
$$

For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9.

Expand All @@ -60,21 +53,9 @@ For more details on Gaussian mixture models, we refer to Christopher M. Bishop,
using Turing, MCMCChains

# Turn off the progress monitor.
Turing.turnprogress(false)
Turing.turnprogress(false);
```

┌ Info: [Turing]: progress logging is disabled globally
└ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22





false




```julia
@model GaussianMixtureModel(x) = begin

Expand Down Expand Up @@ -107,13 +88,6 @@ Turing.turnprogress(false)
end
```




##GaussianMixtureModel#361 (generic function with 2 methods)



After having specified the model in Turing, we can construct the model function and run a MCMC simulation to obtain assignments of the data points.


Expand All @@ -139,17 +113,10 @@ In particular, in this example we consider the sample values of the location par


```julia
ids = findall(map(name -> occursin("μ", name), names(tchain)));
ids = findall(map(name -> occursin("μ", string(name)), names(tchain)));
p=plot(tchain[:, ids, :], legend=true, labels = ["Mu 1" "Mu 2"], colordim=:parameter)
```




![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_13_0.svg)



You'll note here that it appears the location means are switching between chains. We will address this in future tutorials. For those who are keenly interested, see [this](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) article on potential solutions.

For the moment, we will just use the first chain to ensure the validity of our inference.
Expand All @@ -173,44 +140,23 @@ function predict(x, y, w, μ)
end
```




predict (generic function with 1 method)




```julia
contour(range(-5, stop = 3), range(-6, stop = 2),
(x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1].value), mean(tchain[:μ2].value)])
(x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1]), mean(tchain[:μ2])])
)
scatter!(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
```




![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_18_0.svg)



## Infered Assignments

Finally, we can inspect the assignments of the data points infered using Turing. As we can see, the dataset is partitioned into two distinct groups.


```julia
assignments = collect(skipmissing(mean(tchain[:k].value, dims=1).data))
# TODO: is there a better way than this icky `.nt.mean` stuff?
assignments = mean(MCMCChains.group(tchain, :k)).nt.mean
scatter(x[1,:], x[2,:],
legend = false,
title = "Assignments on Synthetic Dataset",
zcolor = assignments)
```




![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_21_0.svg)


7 changes: 7 additions & 0 deletions tutorials/introduction/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
Loading

0 comments on commit f780124

Please sign in to comment.