Merge pull request #87 from TuringLang/tor/using-jmd

[WIP] Implementation of #86
TuringLang · Sep 25, 2020 · f780124 · f780124
2 parents 4339740 + 6d38bcd
commit f780124
Show file tree

Hide file tree

Showing 19 changed files with 251 additions and 1,942 deletions.
diff --git a/Project.toml b/Project.toml
@@ -9,4 +9,4 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 Weave = "44d3d7a6-8a23-5bf8-98c5-b353f8df5ec9"
 
 [compat]
-julia = "1"
+julia = "1"
diff --git a/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd b/tutorials/bayesian-deep-learning/01_bayesian-neural-network.jmd
diff --git a/tutorials/bayesian-deep-learning/Project.toml b/tutorials/bayesian-deep-learning/Project.toml
@@ -0,0 +1,8 @@
+[deps]
+AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
+Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
+Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/differential-equations/01_bayesian-diff-eq.jmd b/tutorials/differential-equations/01_bayesian-diff-eq.jmd
diff --git a/tutorials/differential-equations/Project.toml b/tutorials/differential-equations/Project.toml
@@ -0,0 +1,12 @@
+[deps]
+DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e"
+DiffEqBayes = "ebbdde9d-f333-5424-9be2-dbf1e9acfb5e"
+DiffEqSensitivity = "41bf760c-e81c-5289-8e54-58b1f1f8abe2"
+DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/graphical-models/01_hidden-markov-model.jmd b/tutorials/graphical-models/01_hidden-markov-model.jmd
@@ -36,13 +36,6 @@ N = length(y);  K = 3;
 plot(y, xlim = (0,15), ylim = (-1,5), size = (500, 250))
 ```
 
-
-
-
-![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_3_0.svg)
-
-
-
 We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.
 
 Ultimately, we would like to understand three major parameters:
@@ -53,9 +46,9 @@ Ultimately, we would like to understand three major parameters:
 
 With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.
 
-\$\$
-m_i \sim Normal(i, 0.5), \space m = \{1,2,3\}
-\$\$
+$$
+m_i \sim \mathrm{Normal}(i, 0.5) \quad \text{where} \quad m = \{1,2,3\}
+$$
 
 Simply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state's emissions is that state's value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations.
 
@@ -120,8 +113,8 @@ The code below generates an animation showing the graph of the data above, and t
 using StatsPlots
 
 # Extract our m and s parameters from the chain.
-m_set = c[:m].value.data
-s_set = c[:s].value.data
+m_set = MCMCChains.group(c, :m).value
+s_set = MCMCChains.group(c, :s).value
 
 # Iterate through the MCMC samples.
 Ns = 1:length(c)
@@ -130,7 +123,7 @@ Ns = 1:length(c)
 animation = @animate for i in Ns
     m = m_set[i, :]; 
     s = Int.(s_set[i,:]);
-    emissions = collect(skipmissing(m[s]))
+    emissions = m[s]
 
     p = plot(y, c = :red,
         size = (500, 250),
@@ -139,60 +132,32 @@ animation = @animate for i in Ns
         legend = :topright, label = "True data",
         xlim = (0,15),
         ylim = (-1,5));
-    plot!(emissions, color = :blue, label = "Sample $$N")
-end every 10;
+    plot!(emissions, color = :blue, label = "Sample $N")
+end every 10
 ```
 
-![animation](https://user-images.githubusercontent.com/422990/50612436-de588980-0e8e-11e9-8635-4e3e97c0d7f9.gif)
-
 Looks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.
 
 
 ```julia
 # Index the chain with the persistence probabilities.
-subchain = c[:,["T[$$i][$$i]" for i in 1:K],:]
+subchain = MCMCChains.group(c, :T)
 
+# TODO: This doesn't work anymore. Note sure what it was originally doing
 # Plot the chain.
-plot(subchain, 
+plot(
+    subchain, 
     colordim = :parameter, 
     seriestype=:traceplot,
     title = "Persistence Probability",
     legend=:right
-    )
+)
 ```
 
-
-
-
-![svg](/tutorials/4_BayesHmm_files/4_BayesHmm_11_0.svg)
-
-
-
 A cursory examination of the traceplot above indicates that at least `T[3,3]` and possibly `T[2,2]` have converged to something resembling stationary. `T[1,1]`, on the other hand, has a slight "wobble", and seems less consistent than the others. We can use the diagnostic functions provided by [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) to engage in some formal tests, like the Heidelberg and Welch diagnostic:
 
-
 ```julia
-heideldiag(c[:T])
+heideldiag(MCMCChains.group(c, :T))
 ```
 
-
-
-
-    1-element Array{ChainDataFrame{NamedTuple{(:parameters, Symbol("Burn-in"), :Stationarity, Symbol("p-value"), :Mean, :Halfwidth, :Test),Tuple{Array{String,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1}}}},1}:
-     Heidelberger and Welch Diagnostic - Chain 1
-      parameters  Burn-in  Stationarity  p-value    Mean  Halfwidth    Test
-      ──────────  ───────  ────────────  ───────  ──────  ─────────  ──────
-         T[1][1]  50.0000        0.0000   0.0001  0.5329     0.0063  1.0000
-         T[1][2]  50.0000        0.0000   0.0189  0.1291     0.0043  1.0000
-         T[1][3]  50.0000        0.0000   0.0230  0.3381     0.0032  1.0000
-         T[2][1]  30.0000        1.0000   0.2757  0.0037     0.0000  1.0000
-         T[2][2]   0.0000        1.0000   0.1689  0.0707     0.0022  1.0000
-         T[2][3]   0.0000        1.0000   0.1365  0.9255     0.0022  1.0000
-         T[3][1]  50.0000        0.0000   0.0454  0.4177     0.0147  1.0000
-         T[3][2]  40.0000        1.0000   0.0909  0.2549     0.0080  1.0000
-         T[3][3]  50.0000        0.0000   0.0098  0.3274     0.0067  1.0000
-
-
-
-
 The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be somewhat more confident that our transition matrix has converged to something reasonable.
diff --git a/tutorials/graphical-models/Project.toml b/tutorials/graphical-models/Project.toml
@@ -0,0 +1,5 @@
+[deps]
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
diff --git a/tutorials/introduction/01_introduction.jmd b/tutorials/introduction/01_introduction.jmd
@@ -53,18 +53,6 @@ data = rand(Bernoulli(p_true), last(Ns))
 data[1:5]
 ```
 
-
-
-
-    5-element Array{Bool,1}:
-     1
-     0
-     1
-     1
-     0
-
-
-
 After flipping all our coins, we want to set a prior belief about what we think the distribution of coin flips look like. In this case, we are going to choose a common prior distribution called the [Beta](https://en.wikipedia.org/wiki/Beta_distribution) distribution.
 
 
@@ -81,11 +69,11 @@ For the mathematically inclined, the `Beta` distribution is updated by adding ea
 
 This works because mean of the `Beta` distribution is defined as the following:
 
-\$\$ \text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta} \$\$
+$$\text{E}[\text{Beta}] = \dfrac{\alpha}{\alpha+\beta}$$
 
 Which is 0.5 when $$\alpha = \beta$$, as we expect for a large enough number of coin flips. As we increase the number of samples, our variance will also decrease, such that the distribution will reflect less uncertainty about the probability of receiving a heads. The definition of the variance for the `Beta` distribution is the following:
 
-\$\$ \text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \$\$
+$$\text{var}[\text{Beta}] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$$
 
 The intuition about this definition is that the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. More samples means less variance.
 
@@ -107,23 +95,16 @@ animation = @gif for (i, N) in enumerate(Ns)
     # Plotting
     plot(updated_belief, 
         size = (500, 250), 
-        title = "Updated belief after $$N observations",
+        title = "Updated belief after $N observations",
         xlabel = "probability of heads", 
         ylabel = "", 
         legend = nothing,
         xlim = (0,1),
         fill=0, α=0.3, w=3)
     vline!([p_true])
-end;
+end
 ```
 
-    ┌ Info: Saved animation to 
-    │   fn = /home/cameron/code/TuringTutorials/tmp.gif
-    └ @ Plots /home/cameron/.julia/packages/Plots/Xnzc7/src/animation.jl:104
-
-
-![animation](https://user-images.githubusercontent.com/7974003/44995702-37c1b200-af9c-11e8-8b26-c88a528956af.gif)
-
 The animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.
 
 ### Coin Flipping With Turing
@@ -151,7 +132,7 @@ First, we define the coin-flip model using Turing.
 @model coinflip(y) = begin
 
     # Our prior belief about the probability of heads in a coin.
-    p ~ Beta(1, 1)
+    p ~ Beta(1, 1)
 
     # The number of observations.
     N = length(y)
@@ -184,13 +165,6 @@ p_summary = chain[:p]
 plot(p_summary, seriestype = :histogram)
 ```
 
-
-
-
-![svg](/tutorials/0_Introduction_files/0_Introduction_21_0.svg)
-
-
-
 Now we can build our plot:
 
 
@@ -212,11 +186,4 @@ plot!(p, range(0, stop = 1, length = 100), pdf.(Ref(updated_belief), range(0, st
 vline!(p, [p_true], label = "True probability", c = :red)
 ```
 
-
-
-
-![svg](/tutorials/0_Introduction_files/0_Introduction_23_0.svg)
-
-
-
 As we can see, the Turing model closely approximates the true probability. Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing's simpler applications. More advanced usage will be demonstrated in later tutorials.
diff --git a/tutorials/introduction/02_gaussian-mixture-model.jmd b/tutorials/introduction/02_gaussian-mixture-model.jmd
@@ -27,31 +27,24 @@ x = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.), N), hcat, 1:2)
 scatter(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
 ```
 
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_2_0.svg)
-
-
-
 ## Gaussian Mixture Model in Turing
 
 To cluster the data points shown above, we use a model that consists of two mixture components (clusters) and assigns each datum to one of the components. The assignment thereof determines the distribution that the data point is generated from.
 
 In particular, in a Bayesian Gaussian mixture model with $$1 \leq k \leq K$$ components for 1-D data each data point $$x_i$$ with $$1 \leq i \leq N$$ is generated according to the following generative process.
 First we draw the parameters for each cluster, i.e. in our example we draw location of the distributions from a Normal:
-\$\$
-\mu_k \sim Normal() \, , \;  \forall k \\
-\$\$
+$$
+\mu_k \sim \mathrm{Normal}() \, , \;  \forall k
+$$
 and then draw mixing weight for the $$K$$ clusters from a Dirichlet distribution, i.e.
-\$\$
-    w \sim Dirichlet(K, \alpha) \, . \\
-\$\$
+$$
+    w \sim \mathrm{Dirichlet}(K, \alpha) \, .
+$$
 After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters and then drawing the datum accordingly, i.e.
-\$\$
-    z_i \sim Categorical(w) \, , \;  \forall i \\
-    x_i \sim Normal(\mu_{z_i}, 1.) \, , \;  \forall i
-\$\$
+$$
+    z_i \sim \mathrm{Categorical}(w) \, , \;  \forall i \\
+    x_i \sim \mathrm{Normal}(\mu_{z_i}, 1.) \, , \;  \forall i
+$$
 
 For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9.
 
@@ -60,21 +53,9 @@ For more details on Gaussian mixture models, we refer to Christopher M. Bishop,
 using Turing, MCMCChains
 
 # Turn off the progress monitor.
-Turing.turnprogress(false)
+Turing.turnprogress(false);
 ```
 
-    ┌ Info: [Turing]: progress logging is disabled globally
-    └ @ Turing /home/cameron/.julia/packages/Turing/cReBm/src/Turing.jl:22
-
-
-
-
-
-    false
-
-
-
-
 ```julia
 @model GaussianMixtureModel(x) = begin
 
@@ -107,13 +88,6 @@ Turing.turnprogress(false)
 end
 ```
 
-
-
-
-    ##GaussianMixtureModel#361 (generic function with 2 methods)
-
-
-
 After having specified the model in Turing, we can construct the model function and run a MCMC simulation to obtain assignments of the data points.
 
 
@@ -139,17 +113,10 @@ In particular, in this example we consider the sample values of the location par
 
 
 ```julia
-ids = findall(map(name -> occursin("μ", name), names(tchain)));
+ids = findall(map(name -> occursin("μ", string(name)), names(tchain)));
 p=plot(tchain[:, ids, :], legend=true, labels = ["Mu 1" "Mu 2"], colordim=:parameter)
 ```
 
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_13_0.svg)
-
-
-
 You'll note here that it appears the location means are switching between chains. We will address this in future tutorials. For those who are keenly interested, see [this](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) article on potential solutions.
 
 For the moment, we will just use the first chain to ensure the validity of our inference.
@@ -173,44 +140,23 @@ function predict(x, y, w, μ)
 end
 ```
 
-
-
-
-    predict (generic function with 1 method)
-
-
-
-
 ```julia
 contour(range(-5, stop = 3), range(-6, stop = 2), 
-    (x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1].value), mean(tchain[:μ2].value)])
+    (x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1]), mean(tchain[:μ2])])
 )
 scatter!(x[1,:], x[2,:], legend = false, title = "Synthetic Dataset")
 ```
 
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_18_0.svg)
-
-
-
 ## Infered Assignments
 
 Finally, we can inspect the assignments of the data points infered using Turing. As we can see, the dataset is partitioned into two distinct groups.
 
 
 ```julia
-assignments = collect(skipmissing(mean(tchain[:k].value, dims=1).data))
+# TODO: is there a better way than this icky `.nt.mean` stuff?
+assignments = mean(MCMCChains.group(tchain, :k)).nt.mean
 scatter(x[1,:], x[2,:], 
     legend = false, 
     title = "Assignments on Synthetic Dataset", 
     zcolor = assignments)
 ```
-
-
-
-
-![svg](/tutorials/1_GaussianMixtureModel_files/1_GaussianMixtureModel_21_0.svg)
-
-
diff --git a/tutorials/introduction/Project.toml b/tutorials/introduction/Project.toml
@@ -0,0 +1,7 @@
+[deps]
+Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
+MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
+Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
+Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"