diff --git a/search.json b/search.json index 01c8eebf6..a9f547ca2 100644 --- a/search.json +++ b/search.json @@ -723,7 +723,7 @@ "href": "tutorials/03-bayesian-neural-network/index.html", "title": "Bayesian Neural Networks", "section": "", - "text": "In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and Flux, a suite of machine learning tools. We will use Flux to specify the neural network’s layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.\nWe will begin with importing the relevant libraries.\nusing Turing\nusing FillArrays\nusing Flux\nusing Plots\nusing ReverseDiff\n\nusing LinearAlgebra\nusing Random\nOur goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.\n# Number of points to generate.\nN = 80\nM = round(Int, N / 4)\nRandom.seed!(1234)\n\n# Generate artificial data.\nx1s = rand(M) * 4.5;\nx2s = rand(M) * 4.5;\nxt1s = Array([[x1s[i] + 0.5; x2s[i] + 0.5] for i in 1:M])\nx1s = rand(M) * 4.5;\nx2s = rand(M) * 4.5;\nappend!(xt1s, Array([[x1s[i] - 5; x2s[i] - 5] for i in 1:M]))\n\nx1s = rand(M) * 4.5;\nx2s = rand(M) * 4.5;\nxt0s = Array([[x1s[i] + 0.5; x2s[i] - 5] for i in 1:M])\nx1s = rand(M) * 4.5;\nx2s = rand(M) * 4.5;\nappend!(xt0s, Array([[x1s[i] - 5; x2s[i] + 0.5] for i in 1:M]))\n\n# Store all the data for later.\nxs = [xt1s; xt0s]\nts = [ones(2 * M); zeros(2 * M)]\n\n# Convert xs to Float32\nxs = hcat(xs...)\nxs = convert(Array{Float32}, xs)\n\n# Plot data points.\nfunction plot_data()\n x1 = map(e -> e[1], xt1s)\n y1 = map(e -> e[2], xt1s)\n x2 = map(e -> e[1], xt0s)\n y2 = map(e -> e[2], xt0s)\n\n Plots.scatter(x1, y1; color=\"red\", clim=(0, 1))\n return Plots.scatter!(x2, y2; color=\"blue\", clim=(0, 1))\nend\n\nplot_data()", + "text": "In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and Flux, a suite of machine learning tools. We will use Flux to specify the neural network’s layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.\nWe will begin with importing the relevant libraries.\nusing Turing\nusing FillArrays\nusing Lux\nusing Plots\nusing Tracker\nusing Functors\n\nusing LinearAlgebra\nusing Random\nOur goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.\n# Number of points to generate\nN = 80\nM = round(Int, N / 4)\nrng = Random.default_rng()\nRandom.seed!(rng, 1234)\n\n# Generate artificial data\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M]))\n\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nxt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M])\nx1s = rand(rng, Float32, M) * 4.5f0;\nx2s = rand(rng, Float32, M) * 4.5f0;\nappend!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M]))\n\n# Store all the data for later\nxs = [xt1s; xt0s]\nts = [ones(2 * M); zeros(2 * M)]\n\n# Plot data points.\nfunction plot_data()\n x1 = map(e -> e[1], xt1s)\n y1 = map(e -> e[2], xt1s)\n x2 = map(e -> e[1], xt0s)\n y2 = map(e -> e[2], xt0s)\n\n Plots.scatter(x1, y1; color=\"red\", clim=(0, 1))\n return Plots.scatter!(x2, y2; color=\"blue\", clim=(0, 1))\nend\n\nplot_data()", "crumbs": [ "Documentation", "Using Turing - Tutorials", @@ -735,7 +735,7 @@ "href": "tutorials/03-bayesian-neural-network/index.html#building-a-neural-network", "title": "Bayesian Neural Networks", "section": "Building a Neural Network", - "text": "Building a Neural Network\nThe next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define liner layers and compose them via Chain, both are neural network primitives from Flux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.\n\n\n\n\n\n\n\nG\n\nInput layer                   Hidden layers                  Output layer\n\ncluster_input\n\n\n\ncluster_hidden1\n\n\n\ncluster_hidden2\n\n\n\ncluster_output\n\n\n\n\ninput1\n\n\n\n\nhidden11\n\n\n\n\ninput1--hidden11\n\n\n\n\nhidden12\n\n\n\n\ninput1--hidden12\n\n\n\n\nhidden13\n\n\n\n\ninput1--hidden13\n\n\n\n\ninput2\n\n\n\n\ninput2--hidden11\n\n\n\n\ninput2--hidden12\n\n\n\n\ninput2--hidden13\n\n\n\n\nhidden21\n\n\n\n\nhidden11--hidden21\n\n\n\n\nhidden22\n\n\n\n\nhidden11--hidden22\n\n\n\n\nhidden12--hidden21\n\n\n\n\nhidden12--hidden22\n\n\n\n\nhidden13--hidden21\n\n\n\n\nhidden13--hidden22\n\n\n\n\noutput1\n\n\n\n\nhidden21--output1\n\n\n\n\nhidden22--output1\n\n\n\n\n\n\n\n\n\nThe nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters and use destructure from Flux to extract the parameters as parameters_initial. The function destructure also returns another function reconstruct that can take (new) parameters in and return us a neural network instance whose architecture is the same as nn_initial but with updated parameters.\n\n# Construct a neural network using Flux\nnn_initial = Chain(Dense(2, 3, tanh), Dense(3, 2, tanh), Dense(2, 1, σ)) |> f32\n\n# Extract weights and a helper function to reconstruct NN from weights\nparameters_initial, reconstruct = Flux.destructure(nn_initial)\n\nlength(parameters_initial) # number of paraemters in NN\n\n20\n\n\nThe probabilistic model specification below creates a parameters variable, which has IID normal variables. The parameters vector represents all parameters of our neural net (weights and biases).\n\n@model function bayes_nn(xs, ts, nparameters, reconstruct; alpha=0.09)\n # Create the weight and bias vector.\n parameters ~ MvNormal(Zeros(nparameters), I / alpha)\n\n # Construct NN from parameters\n nn = reconstruct(parameters)\n # Forward NN to make predictions\n preds = nn(xs)\n\n # Observe each prediction.\n for i in 1:length(ts)\n ts[i] ~ Bernoulli(preds[i])\n end\nend;\n\nInference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.\n\nsetprogress!(false)\n\n\n# Perform inference.\nN = 5000\nch = sample(bayes_nn(xs, ts, length(parameters_initial), reconstruct), NUTS(;adtype=AutoReverseDiff()), N);\n\n┌ Info: Found initial step size\n└ ϵ = 0.2\n\n\nNow we extract the parameter samples from the sampled chain as theta (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.\n\n# Extract all weight and bias parameters.\ntheta = convert(Array{Float32}, MCMCChains.group(ch, :parameters).value)\n\n5000×20×1 Array{Float32, 3}:\n[:, :, 1] =\n -2.76754 4.33585 0.202649 … -2.83911 5.0453 3.58483 3.63858\n -1.22286 2.81332 0.647483 -4.37165 4.60901 6.17425 5.1122\n -1.28501 3.35274 0.2071 -3.39033 5.12901 5.3266 5.09537\n -0.939225 3.84344 1.6232 -3.00103 5.19338 5.38463 5.00454\n -1.10654 7.12114 0.261793 -3.91847 4.43435 4.35116 2.63475\n -2.31501 6.4595 0.480459 … -3.29219 2.28614 3.00919 2.63038\n -0.188994 2.56895 -0.0450458 -4.24586 5.93678 6.58049 5.05256\n -2.7119 3.76934 -0.70893 -4.79915 4.10487 3.6214 3.52587\n -0.0057128 2.371 -0.251854 -5.65489 5.08035 6.32055 5.58202\n 0.176594 2.81241 -1.13014 -5.36425 5.5353 5.81991 5.45871\n ⋮ ⋱ \n 0.614511 6.33012 -0.263359 -2.76436 5.37732 5.38015 3.49351\n 0.321701 1.60552 1.30103 -3.79229 3.78463 3.42431 3.463\n 0.468764 1.42767 1.4261 -3.73313 3.7815 3.49601 3.22722\n 2.85275 6.0128 0.156228 -3.18823 5.69763 7.10259 5.52247\n -6.92752 2.60902 0.356687 … -2.25084 6.97216 4.37522 4.86558\n -2.60802 5.23288 0.324784 -2.26749 4.97215 6.90851 5.91779\n -2.68243 6.44173 -0.888499 -2.83151 4.6619 6.69534 5.54244\n 0.248891 5.80957 2.21389 -3.42257 4.88962 6.57245 6.11376\n -1.16777 9.65887 -0.692307 -3.36406 5.24506 4.36318 4.32127", + "text": "Building a Neural Network\nThe next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define liner layers and compose them via Chain, both are neural network primitives from Lux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.\n\n\n\n\n\n\n\nG\n\nInput layer                   Hidden layers                  Output layer\n\ncluster_input\n\n\n\ncluster_hidden1\n\n\n\ncluster_hidden2\n\n\n\ncluster_output\n\n\n\n\ninput1\n\n\n\n\nhidden11\n\n\n\n\ninput1--hidden11\n\n\n\n\nhidden12\n\n\n\n\ninput1--hidden12\n\n\n\n\nhidden13\n\n\n\n\ninput1--hidden13\n\n\n\n\ninput2\n\n\n\n\ninput2--hidden11\n\n\n\n\ninput2--hidden12\n\n\n\n\ninput2--hidden13\n\n\n\n\nhidden21\n\n\n\n\nhidden11--hidden21\n\n\n\n\nhidden22\n\n\n\n\nhidden11--hidden22\n\n\n\n\nhidden12--hidden21\n\n\n\n\nhidden12--hidden22\n\n\n\n\nhidden13--hidden21\n\n\n\n\nhidden13--hidden22\n\n\n\n\noutput1\n\n\n\n\nhidden21--output1\n\n\n\n\nhidden22--output1\n\n\n\n\n\n\n\n\n\nThe nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters. \n\n# Construct a neural network using Lux\nnn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))\n\n# Initialize the model weights and state\nps, st = Lux.setup(rng, nn_initial)\n\nLux.parameterlength(nn_initial) # number of paraemters in NN\n\n20\n\n\nThe probabilistic model specification below creates a parameters variable, which has IID normal variables. The parameters vector represents all parameters of our neural net (weights and biases).\n\n# Create a regularization term and a Gaussian prior variance term.\nalpha = 0.09\nsigma = sqrt(1.0 / alpha)\n\n3.3333333333333335\n\n\nConstruct named tuple from a sampled parameter vector. We could also use ComponentArrays here and simply broadcast to avoid doing this. But let’s do it this way to avoid dependencies.\n\nfunction vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)\n @assert length(ps_new) == Lux.parameterlength(ps)\n i = 1\n function get_ps(x)\n z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x))\n i += length(x)\n return z\n end\n return fmap(get_ps, ps)\nend\n\nvector_to_parameters (generic function with 1 method)\n\n\nTo interface with external libraries it is often desirable to use the StatefulLuxLayer to automatically handle the neural network states.\n\nconst nn = StatefulLuxLayer(nn_initial, st)\n\n# Specify the probabilistic model.\n@model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)\n # Sample the parameters\n nparameters = Lux.parameterlength(nn_initial)\n parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))\n\n # Forward NN to make predictions\n preds = Lux.apply(nn, xs, vector_to_parameters(parameters, ps))\n\n # Observe each prediction.\n for i in eachindex(ts)\n ts[i] ~ Bernoulli(preds[i])\n end\nend\n\nbayes_nn (generic function with 2 methods)\n\n\nInference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.\n\nsetprogress!(false)\n\n\n# Perform inference.\nN = 2_000\nch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoTracker()), N);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nNow we extract the parameter samples from the sampled chain as θ (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.\n\n# Extract all weight and bias parameters.\nθ = MCMCChains.group(ch, :parameters).value;", "crumbs": [ "Documentation", "Using Turing - Tutorials", @@ -747,7 +747,7 @@ "href": "tutorials/03-bayesian-neural-network/index.html#prediction-visualization", "title": "Bayesian Neural Networks", "section": "Prediction Visualization", - "text": "Prediction Visualization\nWe can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.\n\n# A helper to create NN from weights `theta` and run it through data `x`\nnn_forward(x, theta) = reconstruct(theta)(x)\n\n# Plot the data we have.\nplot_data()\n\n# Find the index that provided the highest log posterior in the chain.\n_, i = findmax(ch[:lp])\n\n# Extract the max row value from i.\ni = i.I[1]\n\n# Plot the posterior distribution with a contour plot\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_forward(Float32[x1, x2], theta[i, :])[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe contour plot above shows that the MAP method is not too bad at classifying our data.\nNow we can visualize our predictions.\n\\[\np(\\tilde{x} | X, \\alpha) = \\int_{\\theta} p(\\tilde{x} | \\theta) p(\\theta | X, \\alpha) \\approx \\sum_{\\theta \\sim p(\\theta | X, \\alpha)}f_{\\theta}(\\tilde{x})\n\\]\nThe nn_predict function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.\n\n# Return the average predicted value across\n# multiple weights.\nfunction nn_predict(x, theta, num)\n x = convert(Vector{Float32}, x) # Ensure x is Float32\n return mean([nn_forward(x, theta[i, :])[1] for i in 1:10:num])\nend\n\nnn_predict (generic function with 1 method)\n\n\nNext, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.\n\n# Plot the average prediction.\nplot_data()\n\nn_end = 1500\nx1_range = collect(range(-6, stop=6, length=25))\nx2_range = collect(range(-6, stop=6, length=25))\n\n# Ensure x1, x2 are Float32 within the comprehension\nZ = [nn_predict(Float32[x1, x2], theta, n_end)[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSuppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.\n\n# Number of iterations to plot.\nn_end = 500\n\nanim = @gif for i in 1:n_end\n plot_data()\n Z = [nn_forward(Float32[x1, x2], theta[i, :])[1] for x1 in x1_range, x2 in x2_range]\n contour!(x1_range, x2_range, Z; title=\"Iteration $i\", clim=(0, 1))\nend every 5\n\n[ Info: Saved animation to /tmp/jl_8l4buVEbwv.gif\n\n\n\n\n\nThis has been an introduction to the applications of Turing and Flux in defining Bayesian neural networks.", + "text": "Prediction Visualization\nWe can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.\n\n# A helper to run the nn through data `x` using parameters `θ`\nnn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))\n\n# Plot the data we have.\nfig = plot_data()\n\n# Find the index that provided the highest log posterior in the chain.\n_, i = findmax(ch[:lp])\n\n# Extract the max row value from i.\ni = i.I[1]\n\n# Plot the posterior distribution with a contour plot\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThe contour plot above shows that the MAP method is not too bad at classifying our data.\nNow we can visualize our predictions.\n\\[\np(\\tilde{x} | X, \\alpha) = \\int_{\\theta} p(\\tilde{x} | \\theta) p(\\theta | X, \\alpha) \\approx \\sum_{\\theta \\sim p(\\theta | X, \\alpha)}f_{\\theta}(\\tilde{x})\n\\]\nThe nn_predict function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.\n\n# Return the average predicted value across\n# multiple weights.\nfunction nn_predict(x, θ, num)\n num = min(num, size(θ, 1)) # make sure num does not exceed the number of samples\n return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])\nend\n\nnn_predict (generic function with 1 method)\n\n\nNext, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.\n\n# Plot the average prediction.\nfig = plot_data()\n\nn_end = 1500\nx1_range = collect(range(-6; stop=6, length=25))\nx2_range = collect(range(-6; stop=6, length=25))\nZ = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]\ncontour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)\nfig\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSuppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.\n\n# Number of iterations to plot.\nn_end = 500\n\nanim = @gif for i in 1:n_end\n plot_data()\n Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]\n contour!(x1_range, x2_range, Z; title=\"Iteration $i\", clim=(0, 1))\nend every 5\n\n[ Info: Saved animation to /tmp/jl_NAT2FUXkVe.gif\n\n\n\n\n\nThis has been an introduction to the applications of Turing and Flux in defining Bayesian neural networks.", "crumbs": [ "Documentation", "Using Turing - Tutorials", diff --git a/sitemap.xml b/sitemap.xml index 257f8c0fb..bb5fde681 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,126 +2,126 @@ https://turinglang.org/tutorials/02-logistic-regression/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/docs-14-using-turing-quick-start/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.721Z https://turinglang.org/tutorials/12-gplvm/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-11-using-turing-dynamichmc/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-09-using-turing-advanced/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-12-using-turing-guide/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/15-gaussian-processes/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-05-for-developers-compiler/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-16-using-turing-external-samplers/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.721Z https://turinglang.org/tutorials/13-seasonal-time-series/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-10-using-turing-autodiff/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/11-probabilistic-pca/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/08-multinomial-logistic-regression/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/docs-00-getting-started/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-01-contributing-guide/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/00-introduction/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/09-variational-inference/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/05-linear-regression/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/03-bayesian-neural-network/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/docs-13-using-turing-performance-tips/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/10-bayesian-differential-equations/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/07-poisson-regression/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/14-minituring/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/01-gaussian-mixture-model/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/04-hidden-markov-model/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/docs-06-for-developers-interface/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-08-using-turing/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-04-for-developers-abstractmcmc-turing/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z https://turinglang.org/tutorials/docs-15-using-turing-sampler-viz/index.html - 2024-05-29T11:39:20.169Z + 2024-05-29T19:13:53.721Z https://turinglang.org/tutorials/06-infinite-mixture-model/index.html - 2024-05-29T11:39:20.161Z + 2024-05-29T19:13:53.713Z https://turinglang.org/tutorials/docs-07-for-developers-variational-inference/index.html - 2024-05-29T11:39:20.165Z + 2024-05-29T19:13:53.717Z diff --git a/tutorials/03-bayesian-neural-network/index.html b/tutorials/03-bayesian-neural-network/index.html index d3c26afe5..335ff1db8 100644 --- a/tutorials/03-bayesian-neural-network/index.html +++ b/tutorials/03-bayesian-neural-network/index.html @@ -487,187 +487,185 @@

Bayesian Neural Networks

using Turing
 using FillArrays
-using Flux
+using Lux
 using Plots
-using ReverseDiff
-
-using LinearAlgebra
-using Random
+using Tracker +using Functors + +using LinearAlgebra +using Random

Our goal here is to use a Bayesian neural network to classify points in an artificial dataset. The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.

-
# Number of points to generate.
+
# Number of points to generate
 N = 80
 M = round(Int, N / 4)
-Random.seed!(1234)
-
-# Generate artificial data.
-x1s = rand(M) * 4.5;
-x2s = rand(M) * 4.5;
-xt1s = Array([[x1s[i] + 0.5; x2s[i] + 0.5] for i in 1:M])
-x1s = rand(M) * 4.5;
-x2s = rand(M) * 4.5;
-append!(xt1s, Array([[x1s[i] - 5; x2s[i] - 5] for i in 1:M]))
-
-x1s = rand(M) * 4.5;
-x2s = rand(M) * 4.5;
-xt0s = Array([[x1s[i] + 0.5; x2s[i] - 5] for i in 1:M])
-x1s = rand(M) * 4.5;
-x2s = rand(M) * 4.5;
-append!(xt0s, Array([[x1s[i] - 5; x2s[i] + 0.5] for i in 1:M]))
-
-# Store all the data for later.
-xs = [xt1s; xt0s]
-ts = [ones(2 * M); zeros(2 * M)]
-
-# Convert xs to Float32
-xs = hcat(xs...)
-xs = convert(Array{Float32}, xs)
-
-# Plot data points.
-function plot_data()
-    x1 = map(e -> e[1], xt1s)
-    y1 = map(e -> e[2], xt1s)
-    x2 = map(e -> e[1], xt0s)
-    y2 = map(e -> e[2], xt0s)
-
-    Plots.scatter(x1, y1; color="red", clim=(0, 1))
-    return Plots.scatter!(x2, y2; color="blue", clim=(0, 1))
-end
-
-plot_data()
+rng = Random.default_rng() +Random.seed!(rng, 1234) + +# Generate artificial data +x1s = rand(rng, Float32, M) * 4.5f0; +x2s = rand(rng, Float32, M) * 4.5f0; +xt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M]) +x1s = rand(rng, Float32, M) * 4.5f0; +x2s = rand(rng, Float32, M) * 4.5f0; +append!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M])) + +x1s = rand(rng, Float32, M) * 4.5f0; +x2s = rand(rng, Float32, M) * 4.5f0; +xt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M]) +x1s = rand(rng, Float32, M) * 4.5f0; +x2s = rand(rng, Float32, M) * 4.5f0; +append!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M])) + +# Store all the data for later +xs = [xt1s; xt0s] +ts = [ones(2 * M); zeros(2 * M)] + +# Plot data points. +function plot_data() + x1 = map(e -> e[1], xt1s) + y1 = map(e -> e[2], xt1s) + x2 = map(e -> e[1], xt0s) + y2 = map(e -> e[2], xt0s) + + Plots.scatter(x1, y1; color="red", clim=(0, 1)) + return Plots.scatter!(x2, y2; color="blue", clim=(0, 1)) +end + +plot_data()
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Building a Neural Network

-

The next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define liner layers and compose them via Chain, both are neural network primitives from Flux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.

+

The next step is to define a feedforward neural network where we express our parameters as distributions, and not single points as with traditional neural networks. For this we will use Dense to define liner layers and compose them via Chain, both are neural network primitives from Lux. The network nn_initial we created has two hidden layers with tanh activations and one output layer with sigmoid (σ) activation, as shown below.

@@ -811,280 +809,286 @@

Building a Neura

-

The nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters and use destructure from Flux to extract the parameters as parameters_initial. The function destructure also returns another function reconstruct that can take (new) parameters in and return us a neural network instance whose architecture is the same as nn_initial but with updated parameters.

+

The nn_initial is an instance that acts as a function and can take data as inputs and output predictions. We will define distributions on the neural network parameters.

-
# Construct a neural network using Flux
-nn_initial = Chain(Dense(2, 3, tanh), Dense(3, 2, tanh), Dense(2, 1, σ)) |> f32
+
# Construct a neural network using Lux
+nn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))
 
-# Extract weights and a helper function to reconstruct NN from weights
-parameters_initial, reconstruct = Flux.destructure(nn_initial)
+# Initialize the model weights and state
+ps, st = Lux.setup(rng, nn_initial)
 
-length(parameters_initial) # number of paraemters in NN
+Lux.parameterlength(nn_initial) # number of paraemters in NN
20

The probabilistic model specification below creates a parameters variable, which has IID normal variables. The parameters vector represents all parameters of our neural net (weights and biases).

-
@model function bayes_nn(xs, ts, nparameters, reconstruct; alpha=0.09)
-    # Create the weight and bias vector.
-    parameters ~ MvNormal(Zeros(nparameters), I / alpha)
-
-    # Construct NN from parameters
-    nn = reconstruct(parameters)
-    # Forward NN to make predictions
-    preds = nn(xs)
-
-    # Observe each prediction.
-    for i in 1:length(ts)
-        ts[i] ~ Bernoulli(preds[i])
-    end
-end;
+
# Create a regularization term and a Gaussian prior variance term.
+alpha = 0.09
+sigma = sqrt(1.0 / alpha)
+
+
3.3333333333333335
-

Inference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.

+
+

Construct named tuple from a sampled parameter vector. We could also use ComponentArrays here and simply broadcast to avoid doing this. But let’s do it this way to avoid dependencies.

-
setprogress!(false)
+
function vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)
+    @assert length(ps_new) == Lux.parameterlength(ps)
+    i = 1
+    function get_ps(x)
+        z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x))
+        i += length(x)
+        return z
+    end
+    return fmap(get_ps, ps)
+end
+
+
vector_to_parameters (generic function with 1 method)
+
+

To interface with external libraries it is often desirable to use the StatefulLuxLayer to automatically handle the neural network states.

-
# Perform inference.
-N = 5000
-ch = sample(bayes_nn(xs, ts, length(parameters_initial), reconstruct), NUTS(;adtype=AutoReverseDiff()), N);
-
-
┌ Info: Found initial step size
-└   ϵ = 0.2
+
const nn = StatefulLuxLayer(nn_initial, st)
+
+# Specify the probabilistic model.
+@model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)
+    # Sample the parameters
+    nparameters = Lux.parameterlength(nn_initial)
+    parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))
+
+    # Forward NN to make predictions
+    preds = Lux.apply(nn, xs, vector_to_parameters(parameters, ps))
+
+    # Observe each prediction.
+    for i in eachindex(ts)
+        ts[i] ~ Bernoulli(preds[i])
+    end
+end
+
+
bayes_nn (generic function with 2 methods)
-

Now we extract the parameter samples from the sampled chain as theta (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.

+

Inference can now be performed by calling sample. We use the NUTS Hamiltonian Monte Carlo sampler here.

-
# Extract all weight and bias parameters.
-theta = convert(Array{Float32}, MCMCChains.group(ch, :parameters).value)
-
-
5000×20×1 Array{Float32, 3}:
-[:, :, 1] =
- -2.76754    4.33585   0.202649   …  -2.83911  5.0453   3.58483  3.63858
- -1.22286    2.81332   0.647483      -4.37165  4.60901  6.17425  5.1122
- -1.28501    3.35274   0.2071        -3.39033  5.12901  5.3266   5.09537
- -0.939225   3.84344   1.6232        -3.00103  5.19338  5.38463  5.00454
- -1.10654    7.12114   0.261793      -3.91847  4.43435  4.35116  2.63475
- -2.31501    6.4595    0.480459   …  -3.29219  2.28614  3.00919  2.63038
- -0.188994   2.56895  -0.0450458     -4.24586  5.93678  6.58049  5.05256
- -2.7119     3.76934  -0.70893       -4.79915  4.10487  3.6214   3.52587
- -0.0057128  2.371    -0.251854      -5.65489  5.08035  6.32055  5.58202
-  0.176594   2.81241  -1.13014       -5.36425  5.5353   5.81991  5.45871
-  ⋮                               ⋱                              
-  0.614511   6.33012  -0.263359      -2.76436  5.37732  5.38015  3.49351
-  0.321701   1.60552   1.30103       -3.79229  3.78463  3.42431  3.463
-  0.468764   1.42767   1.4261        -3.73313  3.7815   3.49601  3.22722
-  2.85275    6.0128    0.156228      -3.18823  5.69763  7.10259  5.52247
- -6.92752    2.60902   0.356687   …  -2.25084  6.97216  4.37522  4.86558
- -2.60802    5.23288   0.324784      -2.26749  4.97215  6.90851  5.91779
- -2.68243    6.44173  -0.888499      -2.83151  4.6619   6.69534  5.54244
-  0.248891   5.80957   2.21389       -3.42257  4.88962  6.57245  6.11376
- -1.16777    9.65887  -0.692307      -3.36406  5.24506  4.36318  4.32127
+
setprogress!(false)
+
+
+
# Perform inference.
+N = 2_000
+ch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoTracker()), N);
+
+
┌ Info: Found initial step size
+└   ϵ = 0.4
+
+

Now we extract the parameter samples from the sampled chain as θ (this is of size 5000 x 20 where 5000 is the number of iterations and 20 is the number of parameters). We’ll use these primarily to determine how good our model’s classifier is.

+
+
# Extract all weight and bias parameters.
+θ = MCMCChains.group(ch, :parameters).value;

Prediction Visualization

We can use MAP estimation to classify our population by using the set of weights that provided the highest log posterior.

-
-
# A helper to create NN from weights `theta` and run it through data `x`
-nn_forward(x, theta) = reconstruct(theta)(x)
-
-# Plot the data we have.
-plot_data()
-
-# Find the index that provided the highest log posterior in the chain.
-_, i = findmax(ch[:lp])
-
-# Extract the max row value from i.
-i = i.I[1]
-
-# Plot the posterior distribution with a contour plot
-x1_range = collect(range(-6; stop=6, length=25))
-x2_range = collect(range(-6; stop=6, length=25))
-Z = [nn_forward(Float32[x1, x2], theta[i, :])[1] for x1 in x1_range, x2 in x2_range]
-contour!(x1_range, x2_range, Z)
+
+
# A helper to run the nn through data `x` using parameters `θ`
+nn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))
+
+# Plot the data we have.
+fig = plot_data()
+
+# Find the index that provided the highest log posterior in the chain.
+_, i = findmax(ch[:lp])
+
+# Extract the max row value from i.
+i = i.I[1]
+
+# Plot the posterior distribution with a contour plot
+x1_range = collect(range(-6; stop=6, length=25))
+x2_range = collect(range(-6; stop=6, length=25))
+Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
+contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
+fig
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + +

The contour plot above shows that the MAP method is not too bad at classifying our data.

@@ -1093,224 +1097,224 @@

Prediction Visual p(\tilde{x} | X, \alpha) = \int_{\theta} p(\tilde{x} | \theta) p(\theta | X, \alpha) \approx \sum_{\theta \sim p(\theta | X, \alpha)}f_{\theta}(\tilde{x}) \]

The nn_predict function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.

-
-
# Return the average predicted value across
-# multiple weights.
-function nn_predict(x, theta, num)
-    x = convert(Vector{Float32}, x) # Ensure x is Float32
-    return mean([nn_forward(x, theta[i, :])[1] for i in 1:10:num])
-end
+
+
# Return the average predicted value across
+# multiple weights.
+function nn_predict(x, θ, num)
+    num = min(num, size(θ, 1))  # make sure num does not exceed the number of samples
+    return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])
+end
nn_predict (generic function with 1 method)

Next, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier—those regions between cluster boundaries.

-
-
# Plot the average prediction.
-plot_data()
-
-n_end = 1500
-x1_range = collect(range(-6, stop=6, length=25))
-x2_range = collect(range(-6, stop=6, length=25))
-
-# Ensure x1, x2 are Float32 within the comprehension
-Z = [nn_predict(Float32[x1, x2], theta, n_end)[1] for x1 in x1_range, x2 in x2_range]
-contour!(x1_range, x2_range, Z)
+
+
# Plot the average prediction.
+fig = plot_data()
+
+n_end = 1500
+x1_range = collect(range(-6; stop=6, length=25))
+x2_range = collect(range(-6; stop=6, length=25))
+Z = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]
+contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
+fig
- + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + +

Suppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.

-
-
# Number of iterations to plot.
-n_end = 500
-
-anim = @gif for i in 1:n_end
-    plot_data()
-    Z = [nn_forward(Float32[x1, x2], theta[i, :])[1] for x1 in x1_range, x2 in x2_range]
-    contour!(x1_range, x2_range, Z; title="Iteration $i", clim=(0, 1))
-end every 5
+
+
# Number of iterations to plot.
+n_end = 500
+
+anim = @gif for i in 1:n_end
+    plot_data()
+    Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
+    contour!(x1_range, x2_range, Z; title="Iteration $i", clim=(0, 1))
+end every 5
-
[ Info: Saved animation to /tmp/jl_8l4buVEbwv.gif
+
[ Info: Saved animation to /tmp/jl_NAT2FUXkVe.gif
- +

This has been an introduction to the applications of Turing and Flux in defining Bayesian neural networks.