Skip to content

Commit

Permalink
orthogonal projections
Browse files Browse the repository at this point in the history
  • Loading branch information
gwthomas committed Jan 2, 2018
1 parent 583d109 commit d8f8996
Show file tree
Hide file tree
Showing 11 changed files with 292 additions and 137 deletions.
6 changes: 5 additions & 1 deletion common.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
\DeclareMathOperator*{\dom}{dom}
\DeclareMathOperator*{\range}{range}
\DeclareMathOperator*{\diag}{diag}
\DeclareMathOperator*{\Null}{null}
\newcommand{\C}{\mathbb{C}}
\newcommand{\F}{\mathbb{F}}
\newcommand{\N}{\mathbb{N}}
Expand All @@ -40,12 +41,14 @@
\renewcommand{\vec}[1]{\mathbf{#1}}
\newcommand{\mat}[1]{\mathbf{#1}}
\newcommand{\matlit}[1]{\begin{bmatrix}#1\end{bmatrix}}
\newcommand{\tran}{^\top}
\newcommand{\tran}{^{\!\top\!}}
\newcommand{\inv}{^{-1}}
\newcommand{\halfpow}{^{\frac{1}{2}}}
\newcommand{\neghalfpow}{^{-\frac{1}{2}}}
\renewcommand{\angle}[1]{\langle #1 \rangle}
\newcommand{\bigangle}[1]{\left\langle #1 \right\rangle}
\newcommand{\inner}[2]{\angle{#1, #2}}
\newcommand{\biginner}[2]{\bigangle{#1, #2}}
\renewcommand{\P}{\mathbb{P}}
\newcommand{\pr}[1]{\P(#1)}
\newcommand{\prbig}[1]{\P\big(#1\big)}
Expand All @@ -66,6 +69,7 @@
\newcommand{\tab}{\hspace{0.5cm}}
\renewcommand{\a}{\vec{a}}
\renewcommand{\b}{\vec{b}}
\newcommand{\e}{\vec{e}}
\newcommand{\g}{\vec{g}}
\newcommand{\h}{\vec{h}}
\renewcommand{\o}{\vec{o}}
Expand Down
101 changes: 94 additions & 7 deletions cs189-calculus-optimization.tex
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@ \subsection{Extrema}
Otherwise the problem is \term{constrained} and may be much harder to solve, depending on the nature of the feasible set.

Suppose $f : \R^d \to \R$.
A point $\x$ is said to be a \term{local minimum} (resp. \term{local maximum}) of $f$ in $\calX$ if $f(\x) \leq f(\y)$ (resp. $f(\x) \geq f(\y)$) for all $\y$ in some neighborhood $\calN \subseteq \calX$ that contains $\x$.
A point $\x$ is said to be a \term{local minimum} (resp. \term{local maximum}) of $f$ in $\calX$ if $f(\x) \leq f(\y)$ (resp. $f(\x) \geq f(\y)$) for all $\y$ in some neighborhood $N \subseteq \calX$ about $\x$.\footnote{
A \textbf{neighborhood} about $\x$ is an open set which contains $\x$.
}
Furthermore, if $f(\x) \leq f(\y)$ for all $\y \in \calX$, then $\x$ is a \term{global minimum} of $f$ in $\calX$ (similarly for global maximum).
If the phrase ``in $\calX$'' is unclear from context, assume we are optimizing over the whole domain of the function.

The qualifier \term{strict} (as in e.g. a strict local minimum) means that the inequality sign in the definition is actually a $>$ or $<$, with equality not allowed.
This indicates that the extremum is unique.
This indicates that the extremum is unique within some neighborhood.

Observe that maximizing a function $f$ is equivalent to minimizing $-f$, so optimization problems are typically phrased in terms of minimization without loss of generality.
This convention (which we follow here) eliminates the need to discuss minimization and maximization separately.
Expand Down Expand Up @@ -42,9 +44,9 @@ \subsection{The Jacobian}
\subsection{The Hessian}
The \term{Hessian} matrix of $f : \R^d \to \R$ is a matrix of second-order partial derivatives:
\[\nabla^2 f = \matlit{
\pdv[2]{f}{x_1} & \hdots & \pdv{f}{x_1}{x_n} \\
\pdv[2]{f}{x_1} & \hdots & \pdv{f}{x_1}{x_d} \\
\vdots & \ddots & \vdots \\
\pdv{f}{x_n}{x_1} & \hdots & \pdv[2]{f}{x_n}}
\pdv{f}{x_d}{x_1} & \hdots & \pdv[2]{f}{x_d}}
\tab\text{i.e.}\tab
[\nabla^2 f]_{ij} = {\pdv{f}{x_i}{x_j}}\]
Recall that if the partial derivatives are continuous, the order of differentiation can be interchanged (Clairaut's theorem), so the Hessian matrix will be symmetric.
Expand Down Expand Up @@ -91,7 +93,7 @@ \subsection{Taylor's theorem}
Then there exists $t \in (0,1)$ such that
\[f(\x + \h) = f(\x) + \nabla f(\x + t\h)\tran\h\]
Furthermore, if $f$ is twice continuously differentiable, then
\[\nabla f(\x + \h) = \nabla f(\x) + \int_0^1 \nabla^2 f(\x + t\h)\h \dif{t}\]
\[\nabla f(\x + \h) = \nabla f(\x) + \int_0^1 \nabla^2 f(\x + t\h)\h \dd{t}\]
and there exists $t \in (0,1)$ such that
\[f(\x + \h) = f(\x) + \nabla f(\x)\tran\h + \frac{1}{2}\h\tran\nabla^2f(\x+t\h)\h\]
\end{theorem}
Expand Down Expand Up @@ -150,14 +152,14 @@ \subsection{Conditions for local minima}
Furthermore if $\nabla^2 f(\x^*)$ is positive definite, then $\x^*$ is a strict local minimum.
\end{proposition}
\begin{proof}
Let $\calB$ be an open ball of radius $r > 0$ centered at $\x^*$ which is contained in the neighborhood.
Let $B$ be an open ball of radius $r > 0$ centered at $\x^*$ which is contained in the neighborhood.
Applying Taylor's theorem, we have that for any $\h$ with $\|\h\|_2 < r$, there exists $t \in (0,1)$ such that
\[f(\x^* + \h) = f(\x^*) + \underbrace{\h\tran\nabla f(\x^*)}_0 + \frac{1}{2}\h\tran\nabla^2 f(\x^* + t\h)\h \geq f(\x^*)\]
The last inequality holds because $\nabla^2 f(\x^* + t\h)$ is positive semi-definite (since $\|t\h\|_2 = t\|\h\|_2 < \|\h\|_2 < r$), so $\h\tran\nabla^2 f(\x^* + t\h)\h \geq 0$.
Since $f(\x^*) \leq f(\x^* + \h)$ for all directions $\h$ with $\|\h\|_2 < r$, we conclude that $\x^*$ is a local minimum.

Now further suppose that $\nabla^2 f(\x^*)$ is strictly positive definite.
Since the Hessian is continuous we can choose another ball $\calB'$ with radius $r' > 0$ centered at $\x^*$ such that $\nabla^2 f(\x)$ is positive definite for all $\x \in \calB'$.
Since the Hessian is continuous we can choose another ball $B'$ with radius $r' > 0$ centered at $\x^*$ such that $\nabla^2 f(\x)$ is positive definite for all $\x \in B'$.
Then following the same argument as above (except with a strict inequality now since the Hessian is positive definite) we have $f(\x^* + \h) > f(\x^*)$ for all $\h$ with $0 < \|\h\|_2 < r'$.
Hence $\x^*$ is a strict local minimum.
\end{proof}
Expand All @@ -173,3 +175,88 @@ \subsection{Conditions for local minima}

\subsection{Convexity}
\input{cs189-convexity.tex}

\subsection{Orthogonal projections}
We now consider a particular kind of optimization problem that is particularly well-understood and can often be solved in closed form: given some point $\x$ in an inner product space $V$, find the closest point to $\x$ in a subspace $S$ of $V$.
This process is referred to as \term{projection onto a subspace}.

The following diagram should make it geometrically clear that, at least in Euclidean space, the solution is intimately related to orthogonality and the Pythagorean theorem:
\begin{center}
\includegraphics[width=0.5\linewidth]{orthogonal-projection}
\end{center}
Here $\y$ is an arbitrary element of the subspace $S$, and $\y^*$ is the point in $S$ such that $\x-\y^*$ is perpendicular to $S$.
The hypotenuse of a right triangle (in this case $\|\x-\y\|$) is always longer than either of the legs (in this case $\|\x-\y^*\|$ and $\|\y^*-\y\|$), and when $\y \neq \y^*$ there always exists such a triangle between $\x$, $\y$, and $\y^*$.

Our intuition from Euclidean space suggests that the closest point to $\x$ in $S$ has the perpendicularity property described above, and we now show that this is indeed the case.
\begin{proposition}
Suppose $\x \in V$ and $\y \in S$.
Then $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$ if and only if $\x-\y^* \perp S$.
\end{proposition}
\begin{proof}
$(\implies)$
Suppose $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$.
That is, $\|\x-\y^*\| \leq \|\x-\y\|$ for all $\y \in S$, with equality only if $\y = \y^*$.
Fix $\vec{v} \in S$ and observe that
\begin{align*}
g(t) &:= \|\x-\y^*+t\vec{v}\|^2 \\
&= \inner{\x-\y^*+t\vec{v}}{\x-\y^*+t\vec{v}} \\
&= \inner{\x-\y^*}{\x-\y^*} - 2t\inner{\x-\y^*}{\vec{v}} + t^2\inner{\vec{v}}{\vec{v}} \\
&= \|\x-\y^*\|^2 - 2t\inner{\x-\y^*}{\vec{v}} + t^2\|\vec{v}\|^2
\end{align*}
must have a minimum at $t = 0$ as a consequence of this assumption.
Thus
\[0 = g'(0) = \left.-2\inner{\x-\y^*}{\vec{v}} + 2t\|\vec{v}\|^2\right|_{t=0} = -2\inner{\x-\y^*}{\vec{v}}\]
giving $\x-\y^* \perp \vec{v}$.
Since $\vec{v}$ was arbitrary in $S$, we have $\x-\y^* \perp S$ as claimed.

$(\impliedby)$
Suppose $\x-\y^* \perp S$.
Observe that for any $\y \in S$, $\y^*-\y \in S$ because $\y^* \in S$ and $S$ is closed under subtraction.
Under the hypothesis, $\x-\y^* \perp \y^*-\y$, so by the Pythagorean theorem,
\[\|\x-\y\| = \|\x-\y^*+\y^*-\y\| = \|\x-\y^*\| + \|\y^*-\y\| \geq \|\x - \y^*\|\]
and in fact the inequality is strict when $\y \neq \y^*$ since this implies $\|\y^*-\y\| > 0$.
Thus $\y^*$ is the unique minimizer of $\|\x-\y\|$ over $\y \in S$.
\end{proof}
Since a unique minimizer in $S$ can be found for any $\x \in V$, we can define an operator
\[P\x = \argmin_{\y \in S} \|\x-\y\|\]
Observe that $P\y = \y$ for any $\y \in S$, since $\y$ has distance zero from itself and every other point in $S$ has positive distance from $\y$.
Thus $P(P\x) = P\x$ for any $\x$ (i.e., $P^2 = P$) because $P\x \in S$.
The identity $P^2 = P$ is actually one of the defining properties of a \term{projection}, the other being linearity.

An immediate consequence of the previous result is that $\x - P\x \perp S$ for any $\x \in V$, and conversely that $P$ is the unique operator that satisfies this property for all $\x \in V$.
For this reason, $P$ is known as an \term{orthogonal projection}.

If we choose an orthonormal basis for the target subspace $S$, it is possible to write down a more specific expression for $P$.
\begin{proposition}
If $\e_1, \dots, \e_m$ is an orthonormal basis for $S$, then
\[P\x = \sum_{i=1}^m \inner{\x}{\e_i}\e_i\]
\end{proposition}
\begin{proof}
Let $\e_1, \dots, \e_m$ be an orthonormal basis for $S$, and suppose $\x \in V$.
Then for all $j = 1, \dots, m$,
\begin{align*}
\biginner{\x-\sum_{i=1}^m \inner{\x}{\e_i}\e_i}{\e_j} &= \inner{\x}{\e_j} - \sum_{i=1}^m \inner{\x}{\e_i}\underbrace{\inner{\e_i}{\e_j}}_{\delta_{ij}} \\
&= \inner{\x}{\e_j} - \inner{\x}{\e_j} \\
&= 0
\end{align*}
We have shown that the claimed expression, call it $\tilde{P}\x$, satisfies $\x - \tilde{P}\x \perp \e_j$ for every element $\e_j$ of the orthonormal basis for $S$.
It follows (by linearity of the inner product) that $\x - \tilde{P}\x \perp S$, so the previous result implies $P = \tilde{P}$.
\end{proof}
The fact that $P$ is a linear operator (and thus a proper projection, as earlier we showed $P^2 = P$) follows readily from this result.

%Another useful fact about the orthogonal projection operator is that the metric it induces is \term{non-expansive}, i.e. $1$-Lipschitz.
%\begin{proposition}
%For any $\x \in V$,
%\[\|P\x\| \leq \|\x\|\]
%Thus for any $\x, \xye \in V$,
%\[\|P\x - P\xye\| \leq \|\x-\xye\|\]
%\end{proposition}
%\begin{proof}
%Suppose $\x \in V$.
%Then
%\[\|P\x\|^2 = \inner{P\x}{P\x} = \inner{\x}{P^2\x} = \inner{\x}{P\x} \leq \|\x\|\|P\x\|\]
%using respectively the self-adjointness of $P$, the fact that $P^2 = P$, and the Cauchy-Schwarz inequality.
%If $\|P\x\| = 0$, the inequality holds vacuously; otherwise we can divide both sides by $\|P\x\|$ to obtain $\|P\x\| \leq \|\x\|$.
%
%The second statement follows immediately from the first by linearity of $P$.
%\end{proof}
18 changes: 9 additions & 9 deletions cs189-convexity.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,15 @@ \subsubsection{Consequences of convexity}
\end{proposition}
\begin{proof}
Suppose $f$ is convex, and let $\x^*$ be a local minimum of $f$ in $\calX$.
Then for some neighborhood $\calN \subseteq \calX$ about $\x^*$, we have $f(\x) \geq f(\x^*)$ for all $\x \in \calN$.
Then for some neighborhood $N \subseteq \calX$ about $\x^*$, we have $f(\x) \geq f(\x^*)$ for all $\x \in N$.
Suppose towards a contradiction that there exists $\xye \in \calX$ such that $f(\xye) < f(\x^*)$.

Consider the line segment $\x(t) = t\x^* + (1-t)\xye, ~ t \in [0,1]$, noting that $\x(t) \in \calX$ by the convexity of $\calX$.
Then by the convexity of $f$,
\[f(\x(t)) \leq tf(\x^*) + (1-t)f(\xye) < tf(\x^*) + (1-t)f(\x^*) = f(\x^*)\]
for all $t \in (0,1)$.

We can pick $t$ to be sufficiently close to $1$ that $\x(t) \in \calN$; then $f(\x(t)) \geq f(\x^*)$ by the definition of $\calN$, but $f(\x(t)) < f(\x^*)$ by the above inequality, a contradiction.
We can pick $t$ to be sufficiently close to $1$ that $\x(t) \in N$; then $f(\x(t)) \geq f(\x^*)$ by the definition of $N$, but $f(\x(t)) < f(\x^*)$ by the above inequality, a contradiction.

It follows that $f(\x^*) \leq f(\x)$ for all $\x \in \calX$, so $\x^*$ is a global minimum of $f$ in $\calX$.
\end{proof}
Expand Down Expand Up @@ -153,7 +153,7 @@ \subsubsection{Showing that a function is convex}
Norms are convex.
\end{proposition}
\begin{proof}
Let $\|\cdot\|$ be a norm on $\R^d$. Then for all $\x, \y \in \R^d$ and $t \in [0,1]$,
Let $\|\cdot\|$ be a norm on a vector space $V$. Then for all $\x, \y \in V$ and $t \in [0,1]$,
\[\|t\x + (1-t)\y\| \leq \|t\x\| + \|(1-t)\y\| = |t|\|\x\| + |1-t|\|\y\| = t\|\x\| + (1-t)\|\y\|\]
where we have used respectively the triangle inequality, the homogeneity of norms, and the fact that $t$ and $1-t$ are nonnegative.
Hence $\|\cdot\|$ is convex.
Expand Down Expand Up @@ -228,16 +228,16 @@ \subsubsection{Showing that a function is convex}
\end{proof}

\begin{proposition}
If $f$ is convex, then $g(\vec{x}) \equiv f(A\x + \vec{b})$ is convex for any appropriately-sized $A$ and $\b$.
If $f$ is convex, then $g(\vec{x}) \equiv f(\A\x + \vec{b})$ is convex for any appropriately-sized $\A$ and $\b$.
\end{proposition}
\begin{proof}
Suppose $f$ is convex and $g$ is defined like so. Then for all $\x, \y \in \dom g$,
\begin{align*}
g(t\x + (1-t)\y) &= f(A(t\x + (1-t)\y) + \b) \\
&= f(tA\x + (1-t)A\y + \b) \\
&= f(tA\x + (1-t)A\y + t\b + (1-t)\b) \\
&= f(t(A\x + \b) + (1-t)(A\y + \b)) \\
&\leq tf(A\x + \b) + (1-t)f(A\y + \b) & \text{convexity of $f$} \\
g(t\x + (1-t)\y) &= f(\A(t\x + (1-t)\y) + \b) \\
&= f(t\A\x + (1-t)\A\y + \b) \\
&= f(t\A\x + (1-t)\A\y + t\b + (1-t)\b) \\
&= f(t(\A\x + \b) + (1-t)(\A\y + \b)) \\
&\leq tf(\A\x + \b) + (1-t)f(\A\y + \b) & \text{convexity of $f$} \\
&= tg(\x) + (1-t)g(\y)
\end{align*}
Thus $g$ is convex.
Expand Down
Loading

0 comments on commit d8f8996

Please sign in to comment.