diff --git a/_sources/book/topology/002-topological-spaces.md b/_sources/book/topology/002-topological-spaces.md index 1422c22d..4efff7f6 100644 --- a/_sources/book/topology/002-topological-spaces.md +++ b/_sources/book/topology/002-topological-spaces.md @@ -32,7 +32,7 @@ We refer to the topology associated with a given metric as the induced topology. :::{prf:definition} Induced topology :label: topology:def-induced-topology -Let $(X, d)$ be a metric space. +Let $(X, d)$ be a {prf:ref}`metric space`. Then, the topology induced by $d$ is the set of all open sets in $X$ with respect to the metric $d.$ ::: @@ -45,3 +45,99 @@ Let $f: X \to Y$ be a function between topological spaces. Then, $f$ is continuous if for every open set $U \subseteq Y,$ the pre-image $f^{-1}(U)$ is an open set in $X.$ ::: + +:::{prf:lemma} Composition preserves continuity +:label: topology:lemma-composition-preserves-continuity +If $f: X \to Y$ and $g: Y \to Z$ are {prf:ref}`continuous functions` between {prf:ref}`topological spaces`, then the composition $g \circ f: X \to Z$ is continuous. +::: + + +In topology, we are interested in studying the properties of spaces that are preserved under continuous deformations. +Therefore, from a topology perspective, two spaces are considered essentially the same up to a continuous bijection. +This is captured by the notion of homeomorphism. + +:::{prf:definition} Homeomorphism +:label: topology:def-homeomorphism +A function $f: X \to Y$ between {prf:ref}`topological spaces` is a {prf:ref}`homeomorphism` if it is bijective, {prf:ref}`continuous`, and its inverse $f^{-1}$ is also continuous. +Equivalently, $f$ is a homeomorphism if $f$ is a bijection and $U \subseteq X$ is {prf:ref}`open` if and only if $f(U) \subseteq Y$ is open. +We say two spaces are homeomorphic if there exists a homeomorphism between them. +::: + +:::{prf:lemma} Homeomorphism is an equivalence relation +:label: topology:lemma-homeomorphism-equivalence-relation +{prf:ref}`Homeomorphism` is an equivalence relation between topological spaces. +::: + +:::{dropdown} Proof: Homeomorphism is an equivalence relation +__Reflexivity:__ +The identity map $I_X: X \to X$ is a homeomorphism, because it is bijective, continuous, and its inverse is itself. +Therefore $X \equiv X.$ + +__Symmetry:__ +If $f: X \to Y$ is a homeomorphism, then $f^{-1}: Y \to X$ is also a homeomorphism. +Therefore $X \equiv Y$ implies $Y \equiv X.$ + +__Transitivity:__ +If $f: X \to Y$ and $g: Y \to Z$ are homeomorphisms, then $g \circ f: X \to Z$ is a homeomorphism. +Therefore $X \equiv Y$ and $Y \equiv Z$ implies $X \equiv Z.$ +::: + +In general, the approach for showing that two spaces are homeomorphic is to find a homeomorphism between them. +However, showing that two spaces are _not_ homeomorphic is more difficult. +In particular, there is no simple recipe for showing that two spaces are not homeomorphic. +Instead, we resort to certain topological properties that are preserved under homeomorphisms. +Whenever two spaces have different such properties, we can conclude that they are not homeomorphic. +Two such properties are connectedness and compactness. +In the remainder of this chapter we give definitions and results building up to these properties. + + +## Sequences + +We now turn to re-defining concepts from metric spaces in terms of topological spaces, starting with sequences. +First we re-define the following shorthand for open sets. + +:::{prf:definition} Open neighbourhood +:label: topology:def-open-neighbourhood-topology +An open neighbourhood of a point $x \in X$ in a {prf:ref}`topological space` $(X, \mathcal{U})$ is an open set $U \in \mathcal{U}$ such that $x \in U.$ +::: + + +In topological spaces, convergent sequences are defined directly in terms of open neighbourhoods, rather than using open balls. + +:::{prf:definition} Convergent sequence +:label: topology:def-convergent-sequence-topology +A sequence $x_n \to x$ if for every open neighbourhood $U$ of $x,$ there exists $N \in \mathbb{N}$ such that $x_n \in U$ for all $n > N.$ +::: + + +We now turn to uniqueness of limits. +In general, in a topological space limits need not be unique. +For example, given a set $X$ with the coarse topology $\mathcal{U} = \{\emptyset, X\},$ every sequence converges to every point. +However, further assumptions on the topology can result into unique limits. + +:::{prf:definition} Hausdorff space +:label: topology:def-hausdorff-space +A topological space $(X, \mathcal{U})$ is Hausdorff if for every pair of distinct points $x_1, x_2 \in X,$ there exist open neighbourhoods $U_1, U_2$ of $x_1, x_2$ respectively such that $U_1 \cap U_2 = \emptyset.$ +::: + +:::{margin} +Earlier, we proved that {prf:ref}`limits in metric spaces are unique`. +The property we used in that proof was that, in a {prf:ref}`metric space `, open balls centered around distinct points are disjoint if their radii are small enough. +This was the Hausdorff property in disguise. +Metric spaces are always {prf:ref}`Hausdorff`, and therefore have unique limits. +::: + +:::{prf:lemma} Limits are unique in Hausdorff spaces +:label: topology:lemma-limits-unique-hausdorff +If $X$ is {prf:ref}`Hausdorff` and $(x_n)$ is a sequence in $X$ such that $x_n \to x$ and $x_n \to x',$ then $x = x'.$ +::: + +:::{dropdown} Proof: Limits are unique in Hausdorff spaces +Let $(x_n)$ be a sequence in $X$ such that $x_n \to x$ and $x_n \to x'.$ +Suppose $x \neq x'.$ +Since $X$ is Hausdorff, there exist open neighbourhoods $U, U'$ of $x, x'$ respectively such that $U \cap U' = \emptyset.$ +Since $x_n \to x,$ there exists $N \in \mathbb{N}$ such that $x_n \in U$ for all $n > N.$ +Similarly, since $x_n \to x',$ there exists $N' \in \mathbb{N}$ such that $x_n \in U'$ for all $n > N'.$ +Then, for all $n > \max(N, N'),$ we have $x_n \in U \cap U' = \emptyset,$ which is a contradiction. +Therefore, $x = x'.$ +::: \ No newline at end of file diff --git a/book/papers/ais/ais.html b/book/papers/ais/ais.html index 6d509c63..b12c07c2 100644 --- a/book/papers/ais/ais.html +++ b/book/papers/ais/ais.html @@ -591,7 +591,7 @@

Importance sampling\(q\) and \(p\) are, the larger the variance will be. In partricular, we can show that the variance of the importance weights can be lower bounded by a quantity that scales exponentially with the KL divergence.

-

Lemma 15 (Lower bound to importance weight variance)

+

Lemma 18 (Lower bound to importance weight variance)

Given distributions \(p\) and \(q\), it holds that

@@ -649,7 +649,7 @@

Importance-weighted MCMC -

Definition 83 (Importance weighted MCMC algorithm)

+

Definition 87 (Importance weighted MCMC algorithm)

Given a proposal density \(q\), a target density \(p\) and a sequence of transition kernels \(T_1(x, x'), \dots, T_K(x, x')\) be a sequence of transition kernels such that \(T_k\) leaves \(p\) invariant. Sampling \(x_0 \sim q(x)\) followed by

@@ -737,7 +737,7 @@

Annealed Importance Sampling\(q\) and \(p\) as we vary \(\beta\). AIS then proceeds in a similar way to the importance weighted MCMC algorithm we highlighted above, except that it requires that each \(T_k\) leaves \(\pi_k\), instead of \(p\), invariant.

-

Definition 84 (Annealed Importance Sampling)

+

Definition 88 (Annealed Importance Sampling)

Given a target density \(p\), a proposal density \(q\) and a sequence \(0 = \beta_0 \leq \dots \leq \beta_K = 1\), define

diff --git a/book/papers/num-sde/num-sde.html b/book/papers/num-sde/num-sde.html index 704c3cb3..3c230c2e 100644 --- a/book/papers/num-sde/num-sde.html +++ b/book/papers/num-sde/num-sde.html @@ -467,7 +467,7 @@

Why stochastic differential equations#

In order to define the stochastic component of the transition rule of a stochastic system, we must define an appropriate noise model. The Wiener process is a stochastic process that is commonly used for this purpose.

-

Definition 88 (Wiener process)

+

Definition 92 (Wiener process)

A standard Wiener process over [0, T] is a random variable \(W(t)\) that depends continuously on \(t \in [0, T]\) and satisfies:

    @@ -650,7 +650,7 @@

    Evaluating a stochastic integral#

    The Euler-Maruyama method is the analoge of the Euler method for deterministic integrals, applied to the stochastic case.

    -

    Definition 89 (Euler-Maruyama method)

    +

    Definition 93 (Euler-Maruyama method)

    Given a scalar SDE with drift and diffusion functions \(f\) and \(g\)

    @@ -770,7 +770,7 @@

    Euler-Maruyama methodStrong and weak convergence#

    Since the choice of the number of bins \(N\) of the discretisation affects the accuracy of our method, we are interested in how quickly the approximation converges to the exact solution as a function of \(N\). To do so, we must first define what convergence means in the stochastic case, which leads us to two disctinct notions of convergence, the strong sence and the weak sense.

    -

    Definition 90 (Strong convergence)

    +

    Definition 94 (Strong convergence)

    A method for approximating a stochastic process \(X(t)\) is said to have strong order of convergence \(\gamma\) if there exists a constant such that

    @@ -780,7 +780,7 @@

    Strong and weak convergence\(X_n\) to the exact solution \(X(\tau_n)\) as \(\Delta t \to 0\), in expectation. A weaker condition for convergence is rate at which the expected value of the approximation converges to the true expected value, as \(\Delta t \to 0\), as given below.

    -

    Definition 91 (Weak convergence)

    +

    Definition 95 (Weak convergence)

    A method for approximating a stochastic process \(X(t)\) is said to have weak order of convergence \(\gamma\) if there exists a constant such that

    @@ -799,7 +799,7 @@

    Strong and weak convergence#

    Just as higher order methods for ODEs exist for obtaining refined estimates of the solution, so do methods for SDEs, such as Milstein’s higher order method.

    -

    Definition 92 (Milstein’s method)

    +

    Definition 96 (Milstein’s method)

    Given a scalar SDE with drift and diffusion functions \(f\) and \(g\)

    diff --git a/book/papers/rff/rff.html b/book/papers/rff/rff.html index f38156c8..e856ab93 100644 --- a/book/papers/rff/rff.html +++ b/book/papers/rff/rff.html @@ -514,7 +514,7 @@

    The RFF approximationThis is also an unbiased estimate of the kernel, however its variance is lower than in the \(M = 1\) case, since the variance of the average of the sum of \(K\) i.i.d. random variables is lower than the variance of a single one of the variables. We therefore arrive at the following algorithm for estimating \(k\).

    -

    Definition 85 (Random Fourier Features)

    +

    Definition 89 (Random Fourier Features)

    Given a translation invariant kernel \(k\) that is the Fourier transform of a probability measure \(p\), we have the unbiased real-valued estimator

    @@ -536,7 +536,7 @@

    RFF and Bayesian regression#

    Now there remains the question of how large the error of the RFF estimator is. In other words, how closely does RFF estimate the exact kernel \(k\)? Since \(-\sqrt{2} \leq z_{\omega, \phi} \leq \sqrt{2}\), we can use Hoeffding’s inequality[Grimmett, 2020] to obtain the following high-probability bound on the absolute error on our estimate of \(k\).

    -

    Lemma 16 (Hoeffding for RFF)

    +

    Lemma 19 (Hoeffding for RFF)

    The RFF estimator of \(k\), using \(M\) pairs of \(\omega, \phi\), obeys

    @@ -548,7 +548,7 @@

    Rates of convergence\(z^\top(x)z(y)\) and \(k(x, y)\) for any two input pairs, rather than the closeness of these functions over the whole input space. In fact, it is possible[Rahimi et al., 2007] to make a stronger statement about the uniform convergence of the estimator.

    -

    Lemma 17 (Uniform convergence of RFF)

    +

    Lemma 20 (Uniform convergence of RFF)

    Let \(\mathcal{M}\) be a compact subset of \(\mathbb{R}^D\). Then the RFF estimator of \(k\), using \(M\) pairs of \(\omega, \phi\) converges uniformly to \(k\) according to

    diff --git a/book/papers/score-matching/score-matching.html b/book/papers/score-matching/score-matching.html index 849b9d22..d1c39621 100644 --- a/book/papers/score-matching/score-matching.html +++ b/book/papers/score-matching/score-matching.html @@ -471,7 +471,7 @@

    The score matching trick\(\psi_\theta(x)\) along with some observed data, to estimate the parameters \(\theta\). We can achieve this by defining the following score matching objective.

    -

    Definition 86 (Score matching objective)

    +

    Definition 90 (Score matching objective)

    Given a data distribution \(p_d(x)\) and an approximating distribution \(p_\theta(x)\) with parameters \(\theta\), we define the score matching objective as

    diff --git a/book/papers/svgd/svgd.html b/book/papers/svgd/svgd.html index e89beeac..7b462cdb 100644 --- a/book/papers/svgd/svgd.html +++ b/book/papers/svgd/svgd.html @@ -583,7 +583,7 @@

    Direction of steepest descent#

    Now, if we approximate \(q\) by a finite set of \(N\) particles at locations \(x_n^{(i)}, n = 1, ..., N\), at the \(i^{th}\) iteration, we obtain at the following iterative algorithm.

    -

    Definition 87 (Stein variational gradient descent)

    +

    Definition 91 (Stein variational gradient descent)

    Given a distribution \(p(x)\), a postive definite kernel \(k(x, x')\) and a set of particles with initial positions \(\{x_n^{(0)}\}_{n=1}^N\), Stein variational gradient descent evolves the particles according to

    diff --git a/book/topology/002-topological-spaces.html b/book/topology/002-topological-spaces.html index 9f941124..19fcc2f4 100644 --- a/book/topology/002-topological-spaces.html +++ b/book/topology/002-topological-spaces.html @@ -428,6 +428,7 @@

    Contents

    @@ -470,7 +471,7 @@

    Topologies

    Definition 81 (Induced topology)

    -

    Let \((X, d)\) be a metric space. +

    Let \((X, d)\) be a metric space. Then, the topology induced by \(d\) is the set of all open sets in \(X\) with respect to the metric \(d.\)

    We now also re-define continuity in terms of open sets.

    @@ -480,7 +481,103 @@

    Topologies\(f: X \to Y\) be a function between topological spaces. Then, \(f\) is continuous if for every open set \(U \subseteq Y,\) the pre-image \(f^{-1}(U)\) is an open set in \(X.\)

    -

    +

    In topology, we are interested in studying the properties of spaces that are preserved under continuous deformations. +Therefore, from a topology perspective, two spaces are considered essentially the same up to a continuous bijection. +This is captured by the notion of homeomorphism.

    +
    +

    Definition 83 (Homeomorphism)

    +
    +

    A function \(f: X \to Y\) between topological spaces is a homeomorphism if it is bijective, continuous, and its inverse \(f^{-1}\) is also continuous. +Equivalently, \(f\) is a homeomorphism if \(f\) is a bijection and \(U \subseteq X\) is open if and only if \(f(U) \subseteq Y\) is open. +We say two spaces are homeomorphic if there exists a homeomorphism between them.

    +
    +
    +

    Lemma 16 (Homeomorphism is an equivalence relation)

    +
    +

    Homeomorphism is an equivalence relation between topological spaces.

    +
    +
    + +Proof: Homeomorphism is an equivalence relation
    +
    +
    +
    +
    +

    Reflexivity: +The identity map \(I_X: X \to X\) is a homeomorphism, because it is bijective, continuous, and its inverse is itself. +Therefore \(X \equiv X.\)

    +

    Symmetry: +If \(f: X \to Y\) is a homeomorphism, then \(f^{-1}: Y \to X\) is also a homeomorphism. +Therefore \(X \equiv Y\) implies \(Y \equiv X.\)

    +

    Transitivity: +If \(f: X \to Y\) and \(g: Y \to Z\) are homeomorphisms, then \(g \circ f: X \to Z\) is a homeomorphism. +Therefore \(X \equiv Y\) and \(Y \equiv Z\) implies \(X \equiv Z.\)

    +
    +

    In general, the approach for showing that two spaces are homeomorphic is to find a homeomorphism between them. +However, showing that two spaces are not homeomorphic is more difficult. +In particular, there is no simple recipe for showing that two spaces are not homeomorphic. +Instead, we resort to certain topological properties that are preserved under homeomorphisms. +Whenever two spaces have different such properties, we can conclude that they are not homeomorphic. +Two such properties are connectedness and compactness. +In the remainder of this chapter we give definitions and results building up to these properties.

    +

    +
    +

    Sequences#

    +

    We now turn to re-defining concepts from metric spaces in terms of topological spaces, starting with sequences. +First we re-define the following shorthand for open sets.

    +
    +

    Definition 84 (Open neighbourhood)

    +
    +

    An open neighbourhood of a point \(x \in X\) in a topological space \((X, \mathcal{U})\) is an open set \(U \in \mathcal{U}\) such that \(x \in U.\)

    +
    +

    In topological spaces, convergent sequences are defined directly in terms of open neighbourhoods, rather than using open balls.

    +
    +

    Definition 85 (Convergent sequence)

    +
    +

    A sequence \(x_n \to x\) if for every open neighbourhood \(U\) of \(x,\) there exists \(N \in \mathbb{N}\) such that \(x_n \in U\) for all \(n > N.\)

    +
    +

    We now turn to uniqueness of limits. +In general, in a topological space limits need not be unique. +For example, given a set \(X\) with the coarse topology \(\mathcal{U} = \{\emptyset, X\},\) every sequence converges to every point. +However, further assumptions on the topology can result into unique limits.

    +
    +

    Definition 86 (Hausdorff space)

    +
    +

    A topological space \((X, \mathcal{U})\) is Hausdorff if for every pair of distinct points \(x_1, x_2 \in X,\) there exist open neighbourhoods \(U_1, U_2\) of \(x_1, x_2\) respectively such that \(U_1 \cap U_2 = \emptyset.\)

    +
    +
    +
    +

    Lemma 17 (Limits are unique in Hausdorff spaces)

    +
    +

    If \(X\) is Hausdorff and \((x_n)\) is a sequence in \(X\) such that \(x_n \to x\) and \(x_n \to x',\) then \(x = x'.\)

    +
    +
    + +Proof: Limits are unique in Hausdorff spaces
    +
    +
    +
    +
    +

    Let \((x_n)\) be a sequence in \(X\) such that \(x_n \to x\) and \(x_n \to x'.\) +Suppose \(x \neq x'.\) +Since \(X\) is Hausdorff, there exist open neighbourhoods \(U, U'\) of \(x, x'\) respectively such that \(U \cap U' = \emptyset.\) +Since \(x_n \to x,\) there exists \(N \in \mathbb{N}\) such that \(x_n \in U\) for all \(n > N.\) +Similarly, since \(x_n \to x',\) there exists \(N' \in \mathbb{N}\) such that \(x_n \in U'\) for all \(n > N'.\) +Then, for all \(n > \max(N, N'),\) we have \(x_n \in U \cap U' = \emptyset,\) which is a contradiction. +Therefore, \(x = x'.\)

    +
    +