orthogonal projections

HealthML · Jan 2, 2018 · d8f8996 · d8f8996
1 parent 583d109
commit d8f8996
Show file tree

Hide file tree

Showing 11 changed files with 292 additions and 137 deletions.
diff --git a/common.tex b/common.tex
@@ -18,6 +18,7 @@
 \DeclareMathOperator*{\dom}{dom}
 \DeclareMathOperator*{\range}{range}
 \DeclareMathOperator*{\diag}{diag}
+\DeclareMathOperator*{\Null}{null}
 \newcommand{\C}{\mathbb{C}}
 \newcommand{\F}{\mathbb{F}}
 \newcommand{\N}{\mathbb{N}}
@@ -40,12 +41,14 @@
 \renewcommand{\vec}[1]{\mathbf{#1}}
 \newcommand{\mat}[1]{\mathbf{#1}}
 \newcommand{\matlit}[1]{\begin{bmatrix}#1\end{bmatrix}}
-\newcommand{\tran}{^\top}
+\newcommand{\tran}{^{\!\top\!}}
 \newcommand{\inv}{^{-1}}
 \newcommand{\halfpow}{^{\frac{1}{2}}}
 \newcommand{\neghalfpow}{^{-\frac{1}{2}}}
 \renewcommand{\angle}[1]{\langle #1 \rangle}
+\newcommand{\bigangle}[1]{\left\langle #1 \right\rangle}
 \newcommand{\inner}[2]{\angle{#1, #2}}
+\newcommand{\biginner}[2]{\bigangle{#1, #2}}
 \renewcommand{\P}{\mathbb{P}}
 \newcommand{\pr}[1]{\P(#1)}
 \newcommand{\prbig}[1]{\P\big(#1\big)}
@@ -66,6 +69,7 @@
 \newcommand{\tab}{\hspace{0.5cm}}
 \renewcommand{\a}{\vec{a}}
 \renewcommand{\b}{\vec{b}}
+\newcommand{\e}{\vec{e}}
 \newcommand{\g}{\vec{g}}
 \newcommand{\h}{\vec{h}}
 \renewcommand{\o}{\vec{o}}

diff --git a/cs189-calculus-optimization.tex b/cs189-calculus-optimization.tex
@@ -8,12 +8,14 @@ \subsection{Extrema}
 Otherwise the problem is \term{constrained} and may be much harder to solve, depending on the nature of the feasible set.
 
 Suppose $f : \R^d \to \R$.
-A point $\x$ is said to be a \term{local minimum} (resp. \term{local maximum}) of $f$ in $\calX$ if $f(\x) \leq f(\y)$ (resp. $f(\x) \geq f(\y)$) for all $\y$ in some neighborhood $\calN \subseteq \calX$ that contains $\x$.
+A point $\x$ is said to be a \term{local minimum} (resp. \term{local maximum}) of $f$ in $\calX$ if $f(\x) \leq f(\y)$ (resp. $f(\x) \geq f(\y)$) for all $\y$ in some neighborhood $N \subseteq \calX$ about $\x$.\footnote{
+	A \textbf{neighborhood} about $\x$ is an open set which contains $\x$.
+}
 Furthermore, if $f(\x) \leq f(\y)$ for all $\y \in \calX$, then $\x$ is a \term{global minimum} of $f$ in $\calX$ (similarly for global maximum).
 If the phrase ``in $\calX$'' is unclear from context, assume we are optimizing over the whole domain of the function.
 
 The qualifier \term{strict} (as in e.g. a strict local minimum) means that the inequality sign in the definition is actually a $>$ or $<$, with equality not allowed.
-This indicates that the extremum is unique.
+This indicates that the extremum is unique within some neighborhood.
 
 Observe that maximizing a function $f$ is equivalent to minimizing $-f$, so optimization problems are typically phrased in terms of minimization without loss of generality.
 This convention (which we follow here) eliminates the need to discuss minimization and maximization separately.
@@ -42,9 +44,9 @@ \subsection{The Jacobian}
 \subsection{The Hessian}
 The \term{Hessian} matrix of $f : \R^d \to \R$ is a matrix of second-order partial derivatives:
 \[\nabla^2 f = \matlit{
-	\pdv[2]{f}{x_1} & \hdots & \pdv{f}{x_1}{x_n} \\
+	\pdv[2]{f}{x_1} & \hdots & \pdv{f}{x_1}{x_d} \\
 	\vdots & \ddots & \vdots \\
-	\pdv{f}{x_n}{x_1} & \hdots & \pdv[2]{f}{x_n}}
+	\pdv{f}{x_d}{x_1} & \hdots & \pdv[2]{f}{x_d}}
 \tab\text{i.e.}\tab
 [\nabla^2 f]_{ij} = {\pdv{f}{x_i}{x_j}}\]
 Recall that if the partial derivatives are continuous, the order of differentiation can be interchanged (Clairaut's theorem), so the Hessian matrix will be symmetric.
@@ -91,7 +93,7 @@ \subsection{Taylor's theorem}
 Then there exists $t \in (0,1)$ such that
 \[f(\x + \h) = f(\x) + \nabla f(\x + t\h)\tran\h\]
 Furthermore, if $f$ is twice continuously differentiable, then
-\[\nabla f(\x + \h) = \nabla f(\x) + \int_0^1 \nabla^2 f(\x + t\h)\h \dif{t}\]
+\[\nabla f(\x + \h) = \nabla f(\x) + \int_0^1 \nabla^2 f(\x + t\h)\h \dd{t}\]
 and there exists $t \in (0,1)$ such that
 \[f(\x + \h) = f(\x) + \nabla f(\x)\tran\h + \frac{1}{2}\h\tran\nabla^2f(\x+t\h)\h\]
 \end{theorem}
@@ -150,14 +152,14 @@ \subsection{Conditions for local minima}
 Furthermore if $\nabla^2 f(\x^*)$ is positive definite, then $\x^*$ is a strict local minimum.
 \end{proposition}
 \begin{proof}
-Let $\calB$ be an open ball of radius $r > 0$ centered at $\x^*$ which is contained in the neighborhood.
+Let $B$ be an open ball of radius $r > 0$ centered at $\x^*$ which is contained in the neighborhood.
 Applying Taylor's theorem, we have that for any $\h$ with $\|\h\|_2 < r$, there exists $t \in (0,1)$ such that
 \[f(\x^* + \h) = f(\x^*) + \underbrace{\h\tran\nabla f(\x^*)}_0 + \frac{1}{2}\h\tran\nabla^2 f(\x^* + t\h)\h \geq f(\x^*)\]
 The last inequality holds because $\nabla^2 f(\x^* + t\h)$ is positive semi-definite (since $\|t\h\|_2 = t\|\h\|_2 < \|\h\|_2 < r$), so $\h\tran\nabla^2 f(\x^* + t\h)\h \geq 0$.
 Since $f(\x^*) \leq f(\x^* + \h)$ for all directions $\h$ with $\|\h\|_2 < r$, we conclude that $\x^*$ is a local minimum.
 
 Now further suppose that $\nabla^2 f(\x^*)$ is strictly positive definite.
-Since the Hessian is continuous we can choose another ball $\calB'$ with radius $r' > 0$ centered at $\x^*$ such that $\nabla^2 f(\x)$ is positive definite for all $\x \in \calB'$.
+Since the Hessian is continuous we can choose another ball $B'$ with radius $r' > 0$ centered at $\x^*$ such that $\nabla^2 f(\x)$ is positive definite for all $\x \in B'$.
 Then following the same argument as above (except with a strict inequality now since the Hessian is positive definite) we have $f(\x^* + \h) > f(\x^*)$ for all $\h$ with $0 < \|\h\|_2 < r'$.
 Hence $\x^*$ is a strict local minimum.
 \end{proof}
@@ -173,3 +175,88 @@ \subsection{Conditions for local minima}
 
 \subsection{Convexity}
 \input{cs189-convexity.tex}
+
+\subsection{Orthogonal projections}
+We now consider a particular kind of optimization problem that is particularly well-understood and can often be solved in closed form: given some point $\x$ in an inner product space $V$, find the closest point to $\x$ in a subspace $S$ of $V$.
+This process is referred to as \term{projection onto a subspace}.
+
+The following diagram should make it geometrically clear that, at least in Euclidean space, the solution is intimately related to orthogonality and the Pythagorean theorem:
+\begin{center}
+\includegraphics[width=0.5\linewidth]{orthogonal-projection}
+\end{center}
+Here $\y$ is an arbitrary element of the subspace $S$, and $\y^*$ is the point in $S$ such that $\x-\y^*$ is perpendicular to $S$.
+The hypotenuse of a right triangle (in this case $\|\x-\y\|$) is always longer than either of the legs (in this case $\|\x-\y^*\|$ and $\|\y^*-\y\|$), and when $\y \neq \y^*$ there always exists such a triangle between $\x$, $\y$, and $\y^*$.
+
+Our intuition from Euclidean space suggests that the closest point to $\x$ in $S$ has the perpendicularity property described above, and we now show that this is indeed the case.
+\begin{proposition}
+Suppose $\x \in V$ and $\y \in S$.
+Then $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$ if and only if $\x-\y^* \perp S$.
+\end{proposition}
+\begin{proof}
+$(\implies)$
+Suppose $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$.
+That is, $\|\x-\y^*\| \leq \|\x-\y\|$ for all $\y \in S$, with equality only if $\y = \y^*$.
+Fix $\vec{v} \in S$ and observe that
+\begin{align*}
+g(t) &:= \|\x-\y^*+t\vec{v}\|^2 \\
+&= \inner{\x-\y^*+t\vec{v}}{\x-\y^*+t\vec{v}} \\
+&= \inner{\x-\y^*}{\x-\y^*} - 2t\inner{\x-\y^*}{\vec{v}} + t^2\inner{\vec{v}}{\vec{v}} \\
+&= \|\x-\y^*\|^2 - 2t\inner{\x-\y^*}{\vec{v}} + t^2\|\vec{v}\|^2
+\end{align*}
+must have a minimum at $t = 0$ as a consequence of this assumption.
+Thus
+\[0 = g'(0) = \left.-2\inner{\x-\y^*}{\vec{v}} + 2t\|\vec{v}\|^2\right|_{t=0} = -2\inner{\x-\y^*}{\vec{v}}\]
+giving $\x-\y^* \perp \vec{v}$.
+Since $\vec{v}$ was arbitrary in $S$, we have $\x-\y^* \perp S$ as claimed.
+
+$(\impliedby)$
+Suppose $\x-\y^* \perp S$.
+Observe that for any $\y \in S$, $\y^*-\y \in S$ because $\y^* \in S$ and $S$ is closed under subtraction.
+Under the hypothesis, $\x-\y^* \perp \y^*-\y$, so by the Pythagorean theorem,
+\[\|\x-\y\| = \|\x-\y^*+\y^*-\y\| = \|\x-\y^*\| + \|\y^*-\y\| \geq \|\x - \y^*\|\]
+and in fact the inequality is strict when $\y \neq \y^*$ since this implies $\|\y^*-\y\| > 0$.
+Thus $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$.
+\end{proof}
+Since a unique minimizer in $S$ can be found for any $\x \in V$, we can define an operator
+\[P\x = \argmin_{\y \in S} \|\x-\y\|\]
+Observe that $P\y = \y$ for any $\y \in S$, since $\y$ has distance zero from itself and every other point in $S$ has positive distance from $\y$.
+Thus $P(P\x) = P\x$ for any $\x$ (i.e., $P^2 = P$) because $P\x \in S$.
+The identity $P^2 = P$ is actually one of the defining properties of a \term{projection}, the other being linearity.
+
+An immediate consequence of the previous result is that $\x - P\x \perp S$ for any $\x \in V$, and conversely that $P$ is the unique operator that satisfies this property for all $\x \in V$.
+For this reason, $P$ is known as an \term{orthogonal projection}.
+
+If we choose an orthonormal basis for the target subspace $S$, it is possible to write down a more specific expression for $P$.
+\begin{proposition}
+If $\e_1, \dots, \e_m$ is an orthonormal basis for $S$, then
+\[P\x = \sum_{i=1}^m \inner{\x}{\e_i}\e_i\]
+\end{proposition}
+\begin{proof}	
+Let $\e_1, \dots, \e_m$ be an orthonormal basis for $S$, and suppose $\x \in V$.
+Then for all $j = 1, \dots, m$,
+\begin{align*}
+\biginner{\x-\sum_{i=1}^m \inner{\x}{\e_i}\e_i}{\e_j} &= \inner{\x}{\e_j} - \sum_{i=1}^m \inner{\x}{\e_i}\underbrace{\inner{\e_i}{\e_j}}_{\delta_{ij}} \\
+&= \inner{\x}{\e_j} - \inner{\x}{\e_j} \\
+&= 0
+\end{align*}
+We have shown that the claimed expression, call it $\tilde{P}\x$, satisfies $\x - \tilde{P}\x \perp \e_j$ for every element $\e_j$ of the orthonormal basis for $S$.
+It follows (by linearity of the inner product) that $\x - \tilde{P}\x \perp S$, so the previous result implies $P = \tilde{P}$.
+\end{proof}
+The fact that $P$ is a linear operator (and thus a proper projection, as earlier we showed $P^2 = P$) follows readily from this result.
+
+%Another useful fact about the orthogonal projection operator is that the metric it induces is \term{non-expansive}, i.e. $1$-Lipschitz.
+%\begin{proposition}
+%For any $\x \in V$,
+%\[\|P\x\| \leq \|\x\|\]
+%Thus for any $\x, \xye \in V$,
+%\[\|P\x - P\xye\| \leq \|\x-\xye\|\]
+%\end{proposition}
+%\begin{proof}
+%Suppose $\x \in V$.
+%Then
+%\[\|P\x\|^2 = \inner{P\x}{P\x} = \inner{\x}{P^2\x} = \inner{\x}{P\x} \leq \|\x\|\|P\x\|\]
+%using respectively the self-adjointness of $P$, the fact that $P^2 = P$, and the Cauchy-Schwarz inequality.
+%If $\|P\x\| = 0$, the inequality holds vacuously; otherwise we can divide both sides by $\|P\x\|$ to obtain $\|P\x\| \leq \|\x\|$.
+%
+%The second statement follows immediately from the first by linearity of $P$.
+%\end{proof}
diff --git a/cs189-convexity.tex b/cs189-convexity.tex
@@ -93,15 +93,15 @@ \subsubsection{Consequences of convexity}
 \end{proposition}
 \begin{proof}
 Suppose $f$ is convex, and let $\x^*$ be a local minimum of $f$ in $\calX$.
-Then for some neighborhood $\calN \subseteq \calX$ about $\x^*$, we have $f(\x) \geq f(\x^*)$ for all $\x \in \calN$.
+Then for some neighborhood $N \subseteq \calX$ about $\x^*$, we have $f(\x) \geq f(\x^*)$ for all $\x \in N$.
 Suppose towards a contradiction that there exists $\xye \in \calX$ such that $f(\xye) < f(\x^*)$.
 
 Consider the line segment $\x(t) = t\x^* + (1-t)\xye, ~ t \in [0,1]$, noting that $\x(t) \in \calX$ by the convexity of $\calX$.
 Then by the convexity of $f$,
 \[f(\x(t)) \leq tf(\x^*) + (1-t)f(\xye) < tf(\x^*) + (1-t)f(\x^*) = f(\x^*)\]
 for all $t \in (0,1)$.
 
-We can pick $t$ to be sufficiently close to $1$ that $\x(t) \in \calN$; then $f(\x(t)) \geq f(\x^*)$ by the definition of $\calN$, but $f(\x(t)) < f(\x^*)$ by the above inequality, a contradiction.
+We can pick $t$ to be sufficiently close to $1$ that $\x(t) \in N$; then $f(\x(t)) \geq f(\x^*)$ by the definition of $N$, but $f(\x(t)) < f(\x^*)$ by the above inequality, a contradiction.
 
 It follows that $f(\x^*) \leq f(\x)$ for all $\x \in \calX$, so $\x^*$ is a global minimum of $f$ in $\calX$.
 \end{proof}
@@ -153,7 +153,7 @@ \subsubsection{Showing that a function is convex}
 Norms are convex.
 \end{proposition}
 \begin{proof}
-Let $\|\cdot\|$ be a norm on $\R^d$. Then for all $\x, \y \in \R^d$ and $t \in [0,1]$,
+Let $\|\cdot\|$ be a norm on a vector space $V$. Then for all $\x, \y \in V$ and $t \in [0,1]$,
 \[\|t\x + (1-t)\y\| \leq \|t\x\| + \|(1-t)\y\| = |t|\|\x\| + |1-t|\|\y\| = t\|\x\| + (1-t)\|\y\|\]
 where we have used respectively the triangle inequality, the homogeneity of norms, and the fact that $t$ and $1-t$ are nonnegative.
 Hence $\|\cdot\|$ is convex.
@@ -228,16 +228,16 @@ \subsubsection{Showing that a function is convex}
 \end{proof}
 
 \begin{proposition}
-If $f$ is convex, then $g(\vec{x}) \equiv f(A\x + \vec{b})$ is convex for any appropriately-sized $A$ and $\b$.
+If $f$ is convex, then $g(\vec{x}) \equiv f(\A\x + \vec{b})$ is convex for any appropriately-sized $\A$ and $\b$.
 \end{proposition}
 \begin{proof}
 Suppose $f$ is convex and $g$ is defined like so. Then for all $\x, \y \in \dom g$,
 \begin{align*}
-g(t\x + (1-t)\y) &= f(A(t\x + (1-t)\y) + \b) \\
-&= f(tA\x + (1-t)A\y + \b) \\
-&= f(tA\x + (1-t)A\y + t\b + (1-t)\b) \\
-&= f(t(A\x + \b) + (1-t)(A\y + \b)) \\
-&\leq tf(A\x + \b) + (1-t)f(A\y + \b) & \text{convexity of $f$} \\
+g(t\x + (1-t)\y) &= f(\A(t\x + (1-t)\y) + \b) \\
+&= f(t\A\x + (1-t)\A\y + \b) \\
+&= f(t\A\x + (1-t)\A\y + t\b + (1-t)\b) \\
+&= f(t(\A\x + \b) + (1-t)(\A\y + \b)) \\
+&\leq tf(\A\x + \b) + (1-t)f(\A\y + \b) & \text{convexity of $f$} \\
 &= tg(\x) + (1-t)g(\y)
 \end{align*}
 Thus $g$ is convex.