diff --git a/_sources/book/topology/002-topological-spaces.md b/_sources/book/topology/002-topological-spaces.md
index 1422c22d..4efff7f6 100644
--- a/_sources/book/topology/002-topological-spaces.md
+++ b/_sources/book/topology/002-topological-spaces.md
@@ -32,7 +32,7 @@ We refer to the topology associated with a given metric as the induced topology.
:::{prf:definition} Induced topology
:label: topology:def-induced-topology
-Let $(X, d)$ be a metric space.
+Let $(X, d)$ be a {prf:ref}`metric spaceImportance sampling\(q\) and \(p\) are, the larger the variance will be.
In partricular, we can show that the variance of the importance weights can be lower bounded by a quantity that scales exponentially with the KL divergence.
+ (Lower bound to importance weight variance)
(Lower bound to importance weight variance)
Given distributions \(p\) and \(q\), it holds that
+ (Importance weighted MCMC algorithm)
(Importance weighted MCMC algorithm)
Given a proposal density \(q\), a target density \(p\) and a sequence of transition kernels \(T_1(x, x'), \dots, T_K(x, x')\) be a sequence of transition kernels such that \(T_k\) leaves \(p\) invariant. Sampling \(x_0 \sim q(x)\) followed by
@@ -737,7 +737,7 @@+ (Annealed Importance Sampling)
(Annealed Importance Sampling)
Given a target density \(p\), a proposal density \(q\) and a sequence \(0 = \beta_0 \leq \dots \leq \beta_K = 1\), define
In order to define the stochastic component of the transition rule of a stochastic system, we must define an appropriate noise model. The Wiener process is a stochastic process that is commonly used for this purpose.
+ (Wiener process)
(Wiener process)
A standard Wiener process over [0, T] is a random variable \(W(t)\) that depends continuously on \(t \in [0, T]\) and satisfies:
The Euler-Maruyama method is the analoge of the Euler method for deterministic integrals, applied to the stochastic case.
+ (Euler-Maruyama method)
(Euler-Maruyama method)
Given a scalar SDE with drift and diffusion functions \(f\) and \(g\)
Since the choice of the number of bins \(N\) of the discretisation affects the accuracy of our method, we are interested in how quickly the approximation converges to the exact solution as a function of \(N\). To do so, we must first define what convergence means in the stochastic case, which leads us to two disctinct notions of convergence, the strong sence and the weak sense.
+ (Strong convergence)
(Strong convergence)
A method for approximating a stochastic process \(X(t)\) is said to have strong order of convergence \(\gamma\) if there exists a constant such that
+ (Weak convergence)
(Weak convergence)
A method for approximating a stochastic process \(X(t)\) is said to have weak order of convergence \(\gamma\) if there exists a constant such that
Just as higher order methods for ODEs exist for obtaining refined estimates of the solution, so do methods for SDEs, such as Milstein’s higher order method.
+ (Milstein’s method)
(Milstein’s method)
Given a scalar SDE with drift and diffusion functions \(f\) and \(g\)
+ (Random Fourier Features)
(Random Fourier Features)
Given a translation invariant kernel \(k\) that is the Fourier transform of a probability measure \(p\), we have the unbiased real-valued estimator
Now there remains the question of how large the error of the RFF estimator is. In other words, how closely does RFF estimate the exact kernel \(k\)? Since \(-\sqrt{2} \leq z_{\omega, \phi} \leq \sqrt{2}\), we can use Hoeffding’s inequality[Grimmett, 2020] to obtain the following high-probability bound on the absolute error on our estimate of \(k\).
+ (Hoeffding for RFF)
(Hoeffding for RFF)
The RFF estimator of \(k\), using \(M\) pairs of \(\omega, \phi\), obeys
+ (Uniform convergence of RFF)
(Uniform convergence of RFF)
Let \(\mathcal{M}\) be a compact subset of \(\mathbb{R}^D\). Then the RFF estimator of \(k\), using \(M\) pairs of \(\omega, \phi\) converges uniformly to \(k\) according to
+ (Score matching objective)
(Score matching objective)
Given a data distribution \(p_d(x)\) and an approximating distribution \(p_\theta(x)\) with parameters \(\theta\), we define the score matching objective as
Now, if we approximate \(q\) by a finite set of \(N\) particles at locations \(x_n^{(i)}, n = 1, ..., N\), at the \(i^{th}\) iteration, we obtain at the following iterative algorithm.
+ (Stein variational gradient descent)
(Stein variational gradient descent)
Given a distribution \(p(x)\), a postive definite kernel \(k(x, x')\) and a set of particles with initial positions \(\{x_n^{(0)}\}_{n=1}^N\), Stein variational gradient descent evolves the particles according to
(Induced topology)
Let \((X, d)\) be a metric space. +
Let \((X, d)\) be a metric space. Then, the topology induced by \(d\) is the set of all open sets in \(X\) with respect to the metric \(d.\)
We now also re-define continuity in terms of open sets.
@@ -480,7 +481,103 @@+ (Composition preserves continuity)
If \(f: X \to Y\) and \(g: Y \to Z\) are continuous functions between topological spaces, then the composition \(g \circ f: X \to Z\) is continuous.
+In topology, we are interested in studying the properties of spaces that are preserved under continuous deformations. +Therefore, from a topology perspective, two spaces are considered essentially the same up to a continuous bijection. +This is captured by the notion of homeomorphism.
++ (Homeomorphism)
A function \(f: X \to Y\) between topological spaces is a homeomorphism if it is bijective, continuous, and its inverse \(f^{-1}\) is also continuous. +Equivalently, \(f\) is a homeomorphism if \(f\) is a bijection and \(U \subseteq X\) is open if and only if \(f(U) \subseteq Y\) is open. +We say two spaces are homeomorphic if there exists a homeomorphism between them.
++ (Homeomorphism is an equivalence relation)
Homeomorphism is an equivalence relation between topological spaces.
+Reflexivity: +The identity map \(I_X: X \to X\) is a homeomorphism, because it is bijective, continuous, and its inverse is itself. +Therefore \(X \equiv X.\)
+Symmetry: +If \(f: X \to Y\) is a homeomorphism, then \(f^{-1}: Y \to X\) is also a homeomorphism. +Therefore \(X \equiv Y\) implies \(Y \equiv X.\)
+Transitivity: +If \(f: X \to Y\) and \(g: Y \to Z\) are homeomorphisms, then \(g \circ f: X \to Z\) is a homeomorphism. +Therefore \(X \equiv Y\) and \(Y \equiv Z\) implies \(X \equiv Z.\)
+In general, the approach for showing that two spaces are homeomorphic is to find a homeomorphism between them. +However, showing that two spaces are not homeomorphic is more difficult. +In particular, there is no simple recipe for showing that two spaces are not homeomorphic. +Instead, we resort to certain topological properties that are preserved under homeomorphisms. +Whenever two spaces have different such properties, we can conclude that they are not homeomorphic. +Two such properties are connectedness and compactness. +In the remainder of this chapter we give definitions and results building up to these properties.
+ +We now turn to re-defining concepts from metric spaces in terms of topological spaces, starting with sequences. +First we re-define the following shorthand for open sets.
++ (Open neighbourhood)
An open neighbourhood of a point \(x \in X\) in a topological space \((X, \mathcal{U})\) is an open set \(U \in \mathcal{U}\) such that \(x \in U.\)
+In topological spaces, convergent sequences are defined directly in terms of open neighbourhoods, rather than using open balls.
++ (Convergent sequence)
A sequence \(x_n \to x\) if for every open neighbourhood \(U\) of \(x,\) there exists \(N \in \mathbb{N}\) such that \(x_n \in U\) for all \(n > N.\)
+We now turn to uniqueness of limits. +In general, in a topological space limits need not be unique. +For example, given a set \(X\) with the coarse topology \(\mathcal{U} = \{\emptyset, X\},\) every sequence converges to every point. +However, further assumptions on the topology can result into unique limits.
++ (Hausdorff space)
A topological space \((X, \mathcal{U})\) is Hausdorff if for every pair of distinct points \(x_1, x_2 \in X,\) there exist open neighbourhoods \(U_1, U_2\) of \(x_1, x_2\) respectively such that \(U_1 \cap U_2 = \emptyset.\)
++ (Limits are unique in Hausdorff spaces)
If \(X\) is Hausdorff and \((x_n)\) is a sequence in \(X\) such that \(x_n \to x\) and \(x_n \to x',\) then \(x = x'.\)
+Let \((x_n)\) be a sequence in \(X\) such that \(x_n \to x\) and \(x_n \to x'.\) +Suppose \(x \neq x'.\) +Since \(X\) is Hausdorff, there exist open neighbourhoods \(U, U'\) of \(x, x'\) respectively such that \(U \cap U' = \emptyset.\) +Since \(x_n \to x,\) there exists \(N \in \mathbb{N}\) such that \(x_n \in U\) for all \(n > N.\) +Similarly, since \(x_n \to x',\) there exists \(N' \in \mathbb{N}\) such that \(x_n \in U'\) for all \(n > N'.\) +Then, for all \(n > \max(N, N'),\) we have \(x_n \in U \cap U' = \emptyset,\) which is a contradiction. +Therefore, \(x = x'.\)
+