README.html

<!DOCTYPE html>
<html>
<head>
	<meta charset="utf-8"/>
	<link type="text/css" rel="stylesheet" href="shared/css/style.css"/>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({ TeX: { equationNumbers: {autoNumber: "all"} } });
</script>
<script type="text/javascript" src="shared/js/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</head>
<body>

<div style="display:none">
<span class="math">\(\newcommand{\A}{\mat{A}}\)</span>
<span class="math">\(\newcommand{\B}{\mat{B}}\)</span>
<span class="math">\(\newcommand{\C}{\mat{C}}\)</span>
<span class="math">\(\newcommand{\D}{\mat{D}}\)</span>
<span class="math">\(\newcommand{\E}{\mat{E}}\)</span>
<span class="math">\(\newcommand{\F}{\mat{F}}\)</span>
<span class="math">\(\newcommand{\G}{\mat{G}}\)</span>
<span class="math">\(\newcommand{\H}{\mat{H}}\)</span>
<span class="math">\(\newcommand{\I}{\mat{I}}\)</span>
<span class="math">\(\newcommand{\K}{\mat{K}}\)</span>
<span class="math">\(\newcommand{\L}{\mat{L}}\)</span>
<span class="math">\(\newcommand{\M}{\mat{M}}\)</span>
<span class="math">\(\newcommand{\N}{\mat{N}}\)</span>
<span class="math">\(\newcommand{\One}{\mathbf{1}}\)</span>
<span class="math">\(\newcommand{\P}{\mat{P}}\)</span>
<span class="math">\(\newcommand{\Q}{\mat{Q}}\)</span>
<span class="math">\(\newcommand{\Rot}{\mat{R}}\)</span>
<span class="math">\(\newcommand{\R}{\mathbb{R}}\)</span>
<span class="math">\(\newcommand{\S}{\mathcal{S}}\)</span>
<span class="math">\(\newcommand{\T}{\mat{T}}\)</span>
<span class="math">\(\newcommand{\U}{\mat{U}}\)</span>
<span class="math">\(\newcommand{\V}{\mat{V}}\)</span>
<span class="math">\(\newcommand{\W}{\mat{W}}\)</span>
<span class="math">\(\newcommand{\X}{\mat{X}}\)</span>
<span class="math">\(\newcommand{\Y}{\mat{Y}}\)</span>
<span class="math">\(\newcommand{\argmax}{\mathop{\text{argmax}}}\)</span>
<span class="math">\(\newcommand{\argmin}{\mathop{\text{argmin}}}\)</span>
<span class="math">\(\newcommand{\c}{\vec{c}}\)</span>
<span class="math">\(\newcommand{\d}{\vec{d}}\)</span>
<span class="math">\(\newcommand{\e}{\vec{e}}\)</span>
<span class="math">\(\newcommand{\f}{\vec{f}}\)</span>
<span class="math">\(\newcommand{\g}{\vec{g}}\)</span>
<span class="math">\(\newcommand{\mat}[1]{\mathbf{#1}}\)</span>
<span class="math">\(\newcommand{\min}{\mathop{\text{min}}}\)</span>
<span class="math">\(\newcommand{\n}{\vec{n}}\)</span>
<span class="math">\(\newcommand{\p}{\vec{p}}\)</span>
<span class="math">\(\newcommand{\q}{\vec{q}}\)</span>
<span class="math">\(\newcommand{\r}{\vec{r}}\)</span>
<span class="math">\(\newcommand{\transpose}{{\mathsf T}}\)</span>
<span class="math">\(\newcommand{\tr}[1]{\mathop{\text{tr}}{\left(#1\right)}}\)</span>
<span class="math">\(\newcommand{\t}{\vec{t}}\)</span>
<span class="math">\(\newcommand{\u}{\vec{u}}\)</span>
<span class="math">\(\newcommand{\vec}[1]{\mathbf{#1}}\)</span>
<span class="math">\(\newcommand{\x}{\vec{x}}\)</span>
<span class="math">\(\newcommand{\y}{\vec{y}}\)</span>
<span class="math">\(\newcommand{\z}{\vec{z}}\)</span>
<span class="math">\(\renewcommand{\v}{\vec{v}}\)</span>
</div>

<h1 id="geometryprocessing–registration">Geometry Processing – Registration</h1>

<blockquote>
<p><strong>To get started:</strong> Fork this repository then issue</p>

<pre><code>git clone --recursive http://github.com/[username]/geometry-processing-registration.git
</code></pre>
</blockquote>

<h2 id="installationlayoutandcompilation">Installation, Layout, and Compilation</h2>

<p>See
<a href="http://github.com/alecjacobson/geometry-processing-introduction">introduction</a>.</p>

<h2 id="execution">Execution</h2>

<p>Once built, you can execute the assignment from inside the <code>build/</code> using </p>

<pre><code>./registration [path to mesh1.obj] [path to mesh2.obj]
</code></pre>

<h2 id="background">Background</h2>

<p>In this assignment, we will be implementing a version of the <a href="https://en.wikipedia.org/wiki/Iterative_closest_point">iterative closest
point (ICP)</a>, not to be
confused with <a href="https://en.wikipedia.org/wiki/Insane_Clown_Posse">Insane Clown Posse</a>.</p>

<p>Rather than <a href="https://en.wikipedia.org/wiki/Point_set_registration">registering multiple point
clouds</a>, we will register
multiple triangle mesh surfaces. </p>

<p>This <em>algorithm</em> and its many variants has been use for quite some time to
align discrete shapes. One of the first descriptions is given in &#8220;A Method for
Registration of 3-D Shapes&#8221; by Besl &amp; McKay 1992. However, the award-winning
PhD thesis of Sofien Bouaziz (&#8220;Realtime Face Tracking and Animation&#8221; 2015,
section 3.2&#8211;3.3) contains a more modern view that unifies many of the variants
with respect to how they impact the same core optimization problem. </p>

<p>For our assignment, we will assume that we have a triangle mesh representing a
complete scan of the surface <span class="math">\(Y\)</span> of some <a href="https://en.wikipedia.org/wiki/Rigid_body">rigid
object</a> and a new partial scan of
that surface <span class="math">\(X\)</span>.</p>

<figure>
<img src="images/max-inputs.jpg" alt="Example input" />
<figcaption>	<meta name="Example input" content="a partial scan mesh surface $X$ is misaligned with the 
mesh of the complete surface $Y$"/>
</figcaption>
</figure>

<p>These meshes will not have the same number of vertices or the even the same
topology. We will first explore different ways to <em>measure</em> how well aligned
two surfaces are and then how to optimize the <em>rigid</em> alignment of the partial
surface <span class="math">\(X\)</span> to the complete surface <span class="math">\(Y\)</span>.</p>

<h2 id="hausdorffdistance">Hausdorff distance</h2>

<p>We would like to compute a single scalar number that measures how poorly two
surfaces are matched. In other words, we would like to measure the <em>distance</em>
between two surfaces. Let&#8217;s start by reviewing more familiar distances:</p>

<h4 id="point-to-pointdistance">Point-to-point distance</h4>

<p>The usually Euclidean distance between <em>two points</em> <span class="math">\(\x\)</span> and <span class="math">\(\y\)</span> is the <span class="math">\(L²\)</span>
norm of their difference :</p>

<p><span class="math">\[
d(\x,\y) = ‖\x - \y‖.
\]</span></p>

<h4 id="point-to-projectiondistance">Point-to-projection distance</h4>

<p>When we consider the distance between a point <span class="math">\(\x\)</span> and some <em>larger</em> object <span class="math">\(Y\)</span> (a line,
a circle, a surface), the natural extension is to take the distance to the
closest point <span class="math">\(\y\)</span> on <span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
d(\x,Y) = \inf_{\y ∈ Y} d(\x,\y).
\]</span></p>

<p>written in this way the
<a href="https://en.wikipedia.org/wiki/Infimum_and_supremum">infimum</a> considers all
possible points <span class="math">\(\y\)</span> and keeps the minimum distance. We may equivalently write
this distance instead as simply the point-to-point distance between <span class="math">\(\x\)</span> and
the <em>closest-point projection</em> <span class="math">\(P_Y(\x)\)</span>:</p>

<p><span class="math">\[
d(\x,Y) = d((\x,P_Y(\x)) = ‖\x - P_Y(\x)‖.
\]</span></p>

<p>If <span class="math">\(Y\)</span> is a smooth surface, this projection will also be an <a href="https://en.wikipedia.org/wiki/Projection_(linear_algebra)#Orthogonal_projections">orthogonal
projection</a>.</p>

<figure>
<img src="images/max-point-mesh.gif" alt="The distance between a surface $Y$ (light blue) and a point $\x$ (orange) is
determined by the closest point $P_Y(\x)$ (blue)" />
<figcaption>The distance between a surface <span class="math">\(Y\)</span> (light blue) and a point <span class="math">\(\x\)</span> (orange) is
determined by the closest point <span class="math">\(P_Y(\x)\)</span> (blue)</figcaption>
</figure>

<h3 id="directedhausdorffdistance">Directed Hausdorff distance</h3>

<p>We might be tempted to define the distance from surface <span class="math">\(X\)</span> to <span class="math">\(Y\)</span> as the
<em>infimum</em> of <em>point-to-projection</em> distances over all points <span class="math">\(\x\)</span> on <span class="math">\(X\)</span>:</p>

<p><span class="math">\[
D_\text{inf}(X,Y) = \inf_{\x ∈ X} ‖\x - P_Y(\x)‖,
\]</span></p>

<p>but this will not be useful for registering two surfaces: it will measure zero
if even just a single point of <span class="math">\(\x\)</span> happens to lie on <span class="math">\(Y\)</span>. Imagine the noses of
two faces touching at their tips.</p>

<p>Instead, we should take the <em>supremum</em> of <em>point-to-projection</em> distances over
all points <span class="math">\(\x\)</span> on <span class="math">\(X\)</span>:</p>

<p><span class="math">\[
D_{\overrightarrow{H}}(X,Y) = \sup_{\x ∈ X} ‖\x - P_Y(\x)‖.
\]</span></p>

<p>This surface-to-surface distance measure is called the <em>directed</em> <a href="https://en.wikipedia.org/wiki/Hausdorff_distance">Hausdorff
distance</a>. We may interpret
this as taking the worst of the best: we
let each point <span class="math">\(\x\)</span> on <span class="math">\(X\)</span> declare its shortest distance to <span class="math">\(Y\)</span> and then keep
the longest of those.</p>

<figure>
<img src="images/max-point-mesh-farthest.jpg" alt="The directed Hausdorff distance between from surface $X$ (light orange) to
another surface $Y$ (light blue) is determined by the point on $X$ (orange)
whose closest point on $Y$ (blue) is the farthest
away." />
<figcaption>The directed Hausdorff distance between from surface <span class="math">\(X\)</span> (light orange) to
another surface <span class="math">\(Y\)</span> (light blue) is determined by the point on <span class="math">\(X\)</span> (orange)
whose closest point on <span class="math">\(Y\)</span> (blue) is the farthest
away.</figcaption>
</figure>

<p>It is easy to verify that <span class="math">\(D_{\overrightarrow{H}}\)</span> will only equal zero if all
points on <span class="math">\(X\)</span> also lie exactly on <span class="math">\(Y\)</span>. </p>

<p>The converse is not true: if <span class="math">\(D_{\overrightarrow{H}}=0\)</span> there may still be
points on <span class="math">\(Y\)</span> that do not lie on <span class="math">\(X\)</span>. In other words, <em>in general</em> the directed
Hausdorff distance from surface <span class="math">\(X\)</span> to surface <span class="math">\(Y\)</span> will not equal the Hausdorff
distance from surface <span class="math">\(Y\)</span> to surface <span class="math">\(X\)</span>:</p>

<p><span class="math">\[
D_{\overrightarrow{H}}(X,Y) ≠ D_{\overrightarrow{H}}(Y,X).
\]</span></p>

<h4 id="directedhausdorffdistancebetweentrianglemeshes">directed Hausdorff distance between triangle meshes</h4>

<p>We can approximate a <em>lower bound</em> on the Hausdorff distance between two meshes
by densely sampling surfaces <span class="math">\(X\)</span> and <span class="math">\(Y\)</span>. We will discuss sampling methods,
later. For now consider that we have chosen a set <span class="math">\(\P_X\)</span> of <span class="math">\(k\)</span> points on <span class="math">\(X\)</span>
(each point might lie at a vertex, along an edge, or inside a triangle). The
directed Hausdorff distance from <span class="math">\(X\)</span> to another triangle mesh <span class="math">\(Y\)</span> must be
<em>greater</em> than the directed Hausdorff distance from this <a href="https://en.wikipedia.org/wiki/Point_cloud">point
cloud</a> <span class="math">\(\P_X\)</span> to <span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
D_{\overrightarrow{H}}(X,Y) ≥ 
D_{\overrightarrow{H}}(\P_X,Y) = \max_{i=1}^k ‖\p_i - P_Y(\p_i)‖,
\]</span></p>

<p>where we should be careful to ensure that the projection <span class="math">\(P_Y(\p_i)\)</span> of the
point <span class="math">\(\p_i\)</span> onto the triangle mesh <span class="math">\(Y\)</span> might lie at a vertex, along an edge or
inside a triangle. </p>

<p>As our sampling <span class="math">\(\P_X\)</span> becomes denser and denser on <span class="math">\(X\)</span> this lower bound will
approach the true directed Hausdorff distance. Unfortunately, an efficient
<em>upper bound</em> is significantly more difficult to design.</p>

<h4 id="hausdorffdistanceforalignmentoptimization">Hausdorff distance for alignment optimization</h4>

<p>Even if it <em>were</em> cheap to compute, Hausdorff distance is difficult to
<em>optimize</em> when aligning two surfaces. If we treat the Hausdorff distance
between surfaces <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> as an energy to be minimized, then only change to
the surfaces that will decrease the energy will be moving the (in general)
isolated point on <span class="math">\(X\)</span> and isolated point on <span class="math">\(Y\)</span> generating the maximum-minimum
distance. In effect, the rest of the surface does not even matter or effect the
Hausdorff distance. This, or any type of <span class="math">\(L^∞\)</span> norm, will be much more
difficult to optimize.</p>

<p>Hausdorff distance can serve as a validation measure, while we turn to <span class="math">\(L²\)</span>
norms for optimization.</p>

<h2 id="integratedclosest-pointdistance">Integrated closest-point distance</h2>

<p>We would like a distance measure between two surfaces that&#8212;like Hausdorff
distance&#8212;does not require a shared parameterization. Unlike Hausdorff
distance, we would like this distance to <em>diffuse</em> the measurement over the
entire surfaces rather than generate it from the sole <em>worst offender</em>. We can
accomplish this by replacing the <em>supremum</em> in the Hausdorff distance (<span class="math">\(L^∞\)</span>)
with a integral of squared distances (<span class="math">\(L²\)</span>). Let us first define a directed
<em>closest-point distance</em> from a surface <span class="math">\(X\)</span> to another surface <span class="math">\(Y\)</span>, as the
integral of the squared distance from every point <span class="math">\(\x\)</span> on <span class="math">\(X\)</span> to its
closest-point projection <span class="math">\(P_Y(\x)\)</span> on the surfaces <span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
D_{\overrightarrow{C}}(X,Y) = \sqrt{\ ∫\limits_{\x∈X} ‖\x - P_Y(\x) ‖² \;dA }.
\]</span></p>

<p>This distance will only be zero if all points on <span class="math">\(X\)</span> also lie on <span class="math">\(Y\)</span>, but when
it is non-zero it is summing/averaging/diffusing the distance measures of all
of the points.</p>

<p>This distance is suitable to define a matching energy, but is not necessarily
welcoming for optimization: the function inside the square is non-linear. Let&#8217;s
dig into it a bit. We&#8217;ll define a directed <em>matching energy</em>
<span class="math">\(E_{\overrightarrow{C}}(Z,Y)\)</span> from <span class="math">\(Z\)</span> to <span class="math">\(Y\)</span> to be the squared directed
closest point distance from <span class="math">\(X\)</span> to <span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
E_{\overrightarrow{C}}(Z,Y) = ∫\limits_{\z∈Z} ‖\z - P_Y(\z) ‖² \;dA =
∫\limits_{\z∈Z} ‖f_Y(\z) ‖² \;dA
\]</span></p>

<p>where we introduce the proximity function <span class="math">\(\f_Y:\R³→\R³\)</span> defined simply as the
vector from a point <span class="math">\(\z\)</span> to its closest-point projection onto <span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
\f(\z) = \z - P_Y(\z).
\]</span></p>

<p>Suppose <span class="math">\(Y\)</span> was not a surface, but just a single point <span class="math">\(Y = \{\y\}\)</span>. In this
case, <span class="math">\(\f(\z) = \z - \y\)</span> is clearly linear in <span class="math">\(\z\)</span>.</p>

<p>Similarly, suppose <span class="math">\(Y\)</span> was an <a href="https://en.wikipedia.org/wiki/Plane_(geometry)">infinite
plane</a> <span class="math">\(Y = \{\y | (\y-\p)⋅\n =
0\}\)</span> defined by some point <span class="math">\(\p\)</span> on the plane and the plane&#8217;s unit normal vector
<span class="math">\(\n\)</span>. Then <span class="math">\(\f(\z) = ((\z-\p)⋅\n)\n)\)</span> is also linear in <span class="math">\(\z\)</span>.</p>

<p>But in general, if <span class="math">\(Y\)</span> is an interesting surface <span class="math">\(\f(\z)\)</span> will be non-linear; it
might not even be a continuous function.</p>

<figure>
<img src="images/closest-point-discontinuous.png" alt="" />
</figure>

<p>In optimization, a common successful strategy to minimize energies composed of
squaring a non-linear functions <span class="math">\(\f\)</span> is to
<a href="https://en.wikipedia.org/wiki/Linearization">linearize</a> the function about a
current input value (i.e., a current guess <span class="math">\(\z₀\)</span>), minimize the energy built
from this linearization, then re-linearize around that solution, and then
repeat. </p>

<p>This is the core idea behind <a href="https://en.wikipedia.org/wiki/Gradient_descent">gradient
descent</a> and the
<a href="https://en.wikipedia.org/wiki/Gauss–Newton_algorithm">Gauss-Newton</a> methods:</p>

<pre><code>minimize f(z)²
  z₀ ← initial guess
  repeat until convergence
    f₀ ← linearize f(z) around z₀
    z₀ ← minimize f₀(z)²
</code></pre>

<p>Since our <span class="math">\(\f\)</span> is a geometric function, we can derive its linearizations
<em>geometrically</em>.</p>

<h3 id="constantfunctionapproximation">Constant function approximation</h3>

<p>If we make the convenient&#8212;however unrealistic&#8212;assumption that in the
neighborhood of the closest-point projection <span class="math">\(P_Y(\z₀)\)</span> of the current guess
<span class="math">\(\z₀\)</span> the surface <span class="math">\(Y\)</span> is simply the point <span class="math">\(P_Y(\z₀)\)</span> (perhaps imagine that <span class="math">\(Y\)</span>
is makes a sharp needle-like point at <span class="math">\(P_Y(\z₀)\)</span> or that <span class="math">\(Y\)</span> is very far away
from <span class="math">\(\x\)</span>), then we can approximate <span class="math">\(\f(\z)\)</span> in the proximity of our current
guess <span class="math">\(\z₀\)</span> as the vector between the input point <span class="math">\(\z\)</span> and <span class="math">\(P_Y(\z₀)\)</span>:</p>

<p><span class="math">\[
\f(\z) \approx \f_\text{point}(\z) = \z-P_Y(\z₀)
\]</span></p>

<p>In effect, we are assuming that the surface <span class="math">\(Y\)</span> is <em>constant</em> function of its
parameterization: <span class="math">\(\y(u,v) = P_Y(\z₀)\)</span>.</p>

<p>Minimizing <span class="math">\(E_{\overrightarrow{C}}\)</span> iteratively using this linearization (or
rather <em>constantization</em>) of <span class="math">\(\f\)</span> is equivalent to the <a href="https://en.wikipedia.org/wiki/Gradient_descent">gradient
descent</a>. We have simply
derived our gradients geometrically.</p>

<h3 id="linearfunctionapproximation">Linear function approximation</h3>

<p>If we make make a slightly more appropriate assumption that in the neighborhood
of the <span class="math">\(P_Y(\z₀)\)</span> the surface <span class="math">\(Y\)</span> is a plane, then we can improve this
approximation while keeping <span class="math">\(\f\)</span> linear in <span class="math">\(\z\)</span>:</p>

<p><span class="math">\[
\f(\z) \approx \f_\text{plane}(\z) = ((\z-P_Y(\z₀))⋅\n) \n.
\]</span></p>

<p>where the plane that <em>best</em> approximates <span class="math">\(Y\)</span> locally near <span class="math">\(P_Y(\z₀)\)</span> is the
<a href="https://en.wikipedia.org/wiki/Tangent_space">tangent plane</a> defined by the
<a href="https://en.wikipedia.org/wiki/Normal_(geometry)">normal vector</a> <span class="math">\(\n\)</span> at
<span class="math">\(P_Y(\z₀)\)</span>.</p>

<p>Minimizing <span class="math">\(E_{\overrightarrow{C}}\)</span> iteratively using this linearization of
<span class="math">\(\f\)</span> is equivalent to the
<a href="https://en.wikipedia.org/wiki/Gauss–Newton_algorithm">Gauss-Newton</a> method. We
have simply derived our linear approximation geometrically.</p>

<p>Equipped with these linearizations, we may now describe an <a href="https://en.wikipedia.org/wiki/Mathematical_optimization#Optimization_algorithms">optimization
algorithm</a>
for minimizing the matching energy between a surface <span class="math">\(Z\)</span> and another surface
<span class="math">\(Y\)</span>.</p>

<h2 id="iterativeclosestpointalgorithm">Iterative closest point algorithm</h2>

<p>So far we have derived distances between a surface <span class="math">\(Z\)</span> and another surface <span class="math">\(Y\)</span>.
In our <em>rigid</em> alignment and registration problem, we would like to
<a href="https://en.wikipedia.org/wiki/Transformation_(function)">transform</a> one
surface <span class="math">\(X\)</span> into a new surface <span class="math">\(T(X) = Z\)</span> so that it best aligns with/matches
the other surface <span class="math">\(Y\)</span>. Further we require that <span class="math">\(T\)</span> is a rigid transformation:
<span class="math">\(T(\x) = \Rot \x + \t\)</span> for some rotation matrix <span class="math">\(\Rot ∈ SO(3) ⊂ \R^{3×3}\)</span>
(i.e., an <a href="https://en.wikipedia.org/wiki/Rotation_group_SO(3)">orthogonal matrix with determinant
1</a>) and translation vector
<span class="math">\(\t∈\R³\)</span>.</p>

<p>Our matching problem can be written as an optimization problem to find the best
possible rotation <span class="math">\(\Rot\)</span> and translation <span class="math">\(\t\)</span> that match surface <span class="math">\(X\)</span> to surface
<span class="math">\(Y\)</span>:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\t∈\R³,\ \Rot ∈ SO(3)} 
  ∫\limits_{\x∈X} ‖\Rot \x + \t - P_Y(T(\x)) ‖² \;dA
\]</span></p>

<p>Even if <span class="math">\(X\)</span> is a triangle mesh, it is difficult to <em>integrate</em> over <em>all</em>
points on the surface of <span class="math">\(X\)</span>. <em>At any point</em>, we can approximate this energy by
<em>summing</em> over a point-sampling of <span class="math">\(X\)</span>:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\t∈\R³,\ \Rot ∈ SO(3)} 
  ∑_{i=1}^k ‖\Rot \x_i + \t - P_Y(T(\x_i)) ‖²,
\]</span></p>

<p>where <span class="math">\(\X ∈ \R^{k×3}\)</span> is a set of <span class="math">\(k\)</span> points on <span class="math">\(X\)</span> so that each point <span class="math">\(\x_i\)</span>
might lie at a vertex, along an edge, or inside a triangle. We defer discussion
of <em>how</em> to sample a triangle mesh surface.</p>

<h3 id="pseudocode">Pseudocode</h3>

<p>As the name implies, the method proceeds by iteratively finding the closest
point on <span class="math">\(Y\)</span> to the current rigid transformation <span class="math">\(\Rot \x + \t\)</span> of each sample
point <span class="math">\(\x\)</span> in <span class="math">\(\X\)</span> and then minimizing the <em>linearized</em> energy to update the
rotation <span class="math">\(\Rot\)</span> and translation <span class="math">\(\t\)</span>. </p>

<p>If <span class="math">\(V_X\)</span> and <span class="math">\(F_X\)</span> are the vertices and faces of a triangle mesh surface <span class="math">\(X\)</span>
(and correspondingly for <span class="math">\(Y\)</span>), then we can summarize a generic ICP algorithm in
pseudocode:</p>

<pre><code>icp V_X, F_X, V_Y, F_Y
  R,t ← initialize (e.g., set to identity transformation)
  repeat until convergence
    X ← sample source mesh (V_X,F_X)
    P0 ← project all X onto target mesh (V_Y,F_Y)
    R,t ← update rigid transform to best match X and P0
    V_X ← rigidly transform original source mesh by R and t
</code></pre>

<h3 id="updatingtherigidtransformation">Updating the rigid transformation</h3>

<p>We would like to find the rotation matrix <span class="math">\(\Rot ∈ SO(3) ⊂ \R^{3×3}\)</span> and
translation vector <span class="math">\(\t∈\R³\)</span> that <em>best</em> aligns a given a set of points <span class="math">\(\X ∈
\R^{k×3}\)</span> on the source mesh and their current closest points <span class="math">\(\P ∈ \P^{k×3}\)</span>
on the target mesh. We have two choices for <em>linearizing</em> our matching energy:
point-to-point (gradient descent) and point-to-plane (Gauss-Newton).</p>

<figure>
<img src="images/max-point-to-point.gif" alt="ICP using the point-to-point matching energy linearization is slow to
converge." />
<figcaption>ICP using the point-to-point matching energy linearization is slow to
converge.</figcaption>
</figure>

<figure>
<img src="images/max-point-to-plane.gif" alt="ICP using the point-to-plane matching energy linearization is
faster." />
<figcaption>ICP using the point-to-plane matching energy linearization is
faster.</figcaption>
</figure>

<p>In either case, this is still a non-linear optimization problem. This time due
to the <a href="https://en.wikipedia.org/wiki/Constrained_optimization">constraints</a>
rather than the energy term. </p>

<p>We require that <span class="math">\(\Rot\)</span> is not just any 3×3 matrix, but a rotation matrix. We
can <em>linearize</em> this constraint, by assuming that the rotation in <span class="math">\(\Rot\)</span> will
be very small and thus well approximated by the identity matrix <span class="math">\(\I\)</span> plus a
skew-symmetric matrix:</p>

<p><span class="math">\[
\Rot \approx \I + 
  \left(\begin{array}{ccc}
   0 & -γ &  β \\
   γ &  0 & -α \\
  -β &  α &  0 \\
  \end{array}\right)
\]</span></p>

<p>where we can now work directly with the three scalar unknowns <span class="math">\(α\)</span>, <span class="math">\(β\)</span> and <span class="math">\(γ\)</span>.</p>

<h3 id="approximatepoint-to-pointminimizer">Approximate point-to-point minimizer</h3>

<p>If we apply our linearization of <span class="math">\(\Rot\)</span> to the <strong>point-to-point</strong> distance
linearization of the matching energy, our minimization becomes:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\t∈\R³, α, β, γ} 
  ∑_{i=1}^k \left\|
  \left(\begin{array}{ccc}
   0 & -γ &  β \\
   γ &  0 & -α \\
  -β &  α &  0 \\
  \end{array}\right)
  \x_i + \t - \p_i \right\|^2.
\]</span></p>

<p>This energy is quadratic in the translation vector <span class="math">\(\t\)</span> and the linearized
rotation angles <span class="math">\(α\)</span>, <span class="math">\(β\)</span> and <span class="math">\(γ\)</span>. Let&#8217;s gather these degrees of freedom into a
vector of unknowns: <span class="math">\(\u = [α β γ \t^\transpose] ∈ \R⁶\)</span>. Then we can write our
problem in summation form as:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\u∈\R⁶}
  ∑_{i=1}^k \left\| 
  \left(\begin{array}{cccccc}
         0 &  x_{i,3} & -x_{i,2} & 1 & 0 & 0 \\
  -x_{i,3} &        0 &  x_{i,1} & 0 & 1 & 0 \\
   x_{i,2} & -x_{i,1} &        0 & 0 & 0 & 1
  \end{array}\right) \u +
  \x_i - \p_i \right\|^2.
\]</span></p>

<p>This can be written compactly in matrix form as:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\u∈\R⁶}
  \left\|
  \underbrace{
  \left(\begin{array}{cccccc}
      0 &  \X_3 & -\X_2 & \One & 0    & 0 \\
  -\X_3 &     0 &  \X_1 & 0    & \One & 0 \\
   \X_2 & -\X_1 &     0 & 0    & 0    & \One
  \end{array}\right)
  }_{\A}
  \u +
\left[\begin{array}{c}
  \X_1-\P_1 \\
  \X_2-\P_2 \\
  \X_3-\P_3
\end{array}\right]
  \right\|_F^2,
\]</span>
where we introduce the matrix <span class="math">\(\A ∈ \R^{3k × 6}\)</span> that gathers the columns
<span class="math">\(\X_i\)</span> of <span class="math">\(\X\)</span> and columns of ones <span class="math">\(\One ∈ \R^k\)</span>.</p>

<p>This quadratic energy is minimized with its partial derivatives with respect to
entries in <span class="math">\(\u\)</span> are all zero:</p>

<p><span class="math">\[
\begin{align}
\A^\transpose \A \u & = -\A^\transpose 
\left[\begin{array}{c}
  \X_1-\P_1 \\
  \X_2-\P_2 \\
  \X_3-\P_3
\end{array}\right]
, \\
\u & = \left(\A^\transpose \A\right)^{-1} \left(-\A^\transpose
\left[\begin{array}{c}
  \X_1-\P_1 \\
  \X_2-\P_2 \\
  \X_3-\P_3
\end{array}\right]
\right),
\end{align}
\]</span></p>

<p>Solving this small 6×6 system gives us our translation vector <span class="math">\(\t\)</span> and the
linearized rotation angles <span class="math">\(α\)</span>, <span class="math">\(β\)</span> and <span class="math">\(γ\)</span>. If we simply assign </p>

<p><span class="math">\[
\Rot ←  \M := \I + 
  \left(\begin{array}{ccc}
   0 & -γ &  β \\
   γ &  0 & -α \\
  -β &  α &  0 \\
  \end{array}\right)
\]</span></p>

<p>then our transformation will <em>not</em> be rigid. Instead, we should project <span class="math">\(\M\)</span>
onto the space of rotation matrices.</p>

<h4 id="recoveringapurerotationfromitslinearization">Recovering a pure rotation from its linearization</h4>

<blockquote>
<p>In an effort to provide an alternative from &#8220;Least-Squares Rigid Motion Using
SVD&#8221; [Sorkine 2009], this derivation purposefully <em>avoids</em> the <a href="https://en.wikipedia.org/wiki/Trace_(linear_algebra)">trace
operator</a> and its
various nice properties.</p>
</blockquote>

<p>If <span class="math">\(α\)</span>, <span class="math">\(β\)</span> and <span class="math">\(γ\)</span> are all small, then it may be safe to <em>interpret</em> these
values as rotation angles about the <span class="math">\(x\)</span>, <span class="math">\(y\)</span>, and <span class="math">\(z\)</span> axes respectively.</p>

<p>In general, it is better to find the closest rotation matrix to <span class="math">\(\M\)</span>. In other
words, we&#8217;d like to solve the small optimization problem:</p>

<p><span class="math">\[
\Rot^* = \argmin_{\Rot ∈ SO(3)} \left\| \Rot - \M \right\|_F^2,
\]</span>
where <span class="math">\(\|\X\|_F^2\)</span> computes the squared <a href="https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm">Frobenius
norm</a> of the matrix
<span class="math">\(\X\)</span> (i.e., the sum of all squared element values. In MATLAB syntax:
<code>sum(sum(A.^2))</code>). We can expand the norm by taking advantage of the <a href="https://en.wikipedia.org/wiki/Associative_property">associativity
property</a> of the Frobenius
norm:
<span class="math">\[
\Rot^* = \argmin_{\Rot ∈ SO(3)} \left\| \M \right\|_F^2 + \left\| \Rot
\right\|_F^2 - 2 \left<\Rot, \M \right>_F,
\]</span>
where <span class="math">\(\left<\A, \B \right>_F\)</span> is the
<a href="https://en.wikipedia.org/wiki/Frobenius_inner_product">Frobenius inner
product</a> of <span class="math">\(\A\)</span> and
<span class="math">\(\B\)</span> (i.e., the sum of all per-element products. In MATLAB syntax:
<code>sum(sum(A.*B))</code>). We can drop the Frobenius norm
of <span class="math">\(\M\)</span> term (<span class="math">\(\left\| \M
\right\|_F^2\)</span>) because it is constant with respect to the unknown rotation
matrix <span class="math">\(\Rot\)</span>. We can also drop the Frobenius norm of <span class="math">\(\Rot\)</span> term because it
must equal one (<span class="math">\(\left\|
\Rot\right\|_F^2 = 1\)</span>) since <span class="math">\(\Rot\)</span> is required to be a orthonormal matrix
(<span class="math">\(\Rot ∈ SO(3)\)</span>). We can drop the factor of <span class="math">\(2\)</span> and flip the minus sign to
change our <em>minimization</em> problem into a <em>maximization</em> problem:
<span class="math">\[
\Rot^* = \argmax_{\Rot ∈ SO(3)} \left<\Rot, \M \right>_F
\]</span></p>

<p>We now take advantage of the <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value
decomposition</a> of
<span class="math">\(\M = \U Σ \V^\transpose\)</span>, where <span class="math">\(\U, \V ∈ \R^{3×3}\)</span> are orthonormal matrices
and <span class="math">\(Σ∈\R^{3×3}\)</span> is a non-negative diagonal matrix:</p>

<p><span class="math">\[
\Rot^* = \argmax_{\Rot ∈ SO(3)} \left<\Rot,\U Σ \V^\transpose \right>_F.
\]</span></p>

<p>The Frobenius inner product will let us move the products by <span class="math">\(\V\)</span> and <span class="math">\(\U\)</span> from
the right argument to the left argument:</p>

<blockquote>
<p>Recall some linear algebra properties:</p>

<ol>
<li>Matrix multiplication (on the left) can be understood as <em>acting</em> on each
 column: <span class="math">\(\A \B = \A [\B_1 \  \B_2 \ … \ \B_n] = [\A \B_1 \  \A \B_2 \  … \
   \A \B_n]\)</span>,</li>
<li>The <a href="https://en.wikipedia.org/wiki/Kronecker_product">Kronecker product</a>
 <span class="math">\(\I ꕕ \A\)</span> of the identity matrix <span class="math">\(\I\)</span> of size <span class="math">\(k\)</span> and a matrix <span class="math">\(\A\)</span> simply
 repeats <span class="math">\(\A\)</span> along the diagonal k times. In MATLAB, <code>repdiag(A,k)</code>,</li>
<li>Properties 1. and 2. imply that the vectorization of a matrix product
 <span class="math">\(\B\C\)</span> can be written as the Kronecker product of the #-columns-in-<span class="math">\(\C\)</span>
 identity matrix and <span class="math">\(\B\)</span> times the vectorization of <span class="math">\(\C\)</span>:
 <span class="math">\(\text{vec}(\B\C) = (\I ꕕ \B)\text{vec}(\C)\)</span>,</li>
<li>The transpose of a Kronecker product is the Kronecker product of
 transposes: <span class="math">\((\A ꕕ \B)^\transpose = \A^\transpose ꕕ \B^\transpose\)</span>,</li>
<li>The Frobenius inner product can be written as a <a href="://en.wikipedia.org/wiki/Dot_product">dot
 product</a> of
 <a href="https://en.wikipedia.org/wiki/Vectorization_(mathematics)">vectorized</a>
 matrices: <span class="math">\(<\A,\B>_F = \text{vec}(\A) ⋅ \text{vec}(\B) =
   \text{vec}(\A)^\transpose \text{vec}(\B)\)</span>,</li>
<li>Properties 3., 4., and 5. imply that Frobenius inner product of a matrix
 <span class="math">\(\A\)</span> and the matrix product of matrix <span class="math">\(\B\)</span> and <span class="math">\(\C\)</span> is equal to the
 Frobenius inner product of the matrix product of the transpose of <span class="math">\(\B\)</span> and
 <span class="math">\(\A\)</span> and the matrix <span class="math">\(\C\)</span>:
 <span class="math">\(<\A,\B\C>_F = \text{vec}(\A)^\transpose \text{vec}(\B\C) =
   \text{vec}(\A)^\transpose (\I ꕕ \B)\text{vec}(\C) = 
   \text{vec}(\A)^\transpose (\I ꕕ \B^\transpose)^\transpose \text{vec}(\C) = 
   \text{vec}(\B^\transpose\A)^\transpose \text{vec}(\C) = 
   <\B^\transpose \A,\C>_F\)</span>.</li>
</ol>
</blockquote>

<p><span class="math">\[
\Rot^* = \argmax_{\Rot ∈ SO(3)} \left<\U^\transpose \Rot \V, Σ \right>_F.
\]</span></p>

<p>Now, <span class="math">\(\U\)</span> and <span class="math">\(\V\)</span> are both
<a href="https://en.wikipedia.org/wiki/Orthogonal_matrix">orthonormal</a>, so multiplying
them against a rotation matrix <span class="math">\(\Rot\)</span> does not change its orthonormality. We can
pull them out of the maximization if we account for the reflection they <em>might</em>
incur: introduce <span class="math">\(Ω = \U^T\Rot\V ∈ O(3)\)</span> with <span class="math">\(\det{Ω} = \det{\U\V^\transpose}\)</span>.
This implies that the optimal rotation for the original probklem is recovered
via <span class="math">\(\Rot^* = \U Ω^* \V^\transpose\)</span>. When we move the <span class="math">\(\argmax\)</span> inside, we now
look for an orthonormal matrix <span class="math">\(Ω ∈ O(3)\)</span> that is a reflection (if
<span class="math">\(\det{\U\V^\transpose} = -1\)</span>) or a rotation (if <span class="math">\(\det{\U\V^\transpose} = 1\)</span>):</p>

<p><span class="math">\[
  \Rot^* = \U \left( \argmax_{Ω ∈ O(3),\ \det{Ω} = \det{\U\V^\transpose}} \left<Ω, Σ \right>_F \right) \V^\transpose.
\]</span></p>

<p>This ensures that as a result <span class="math">\(\Rot^*\)</span> will be a rotation: <span class="math">\(\det{\Rot^*} = 1\)</span>.</p>

<blockquote>
<p>Recall that <span class="math">\(Σ∈\R^{3×3}\)</span> is a non-negative diagonal matrix of singular values
sorted so that the smallest value is in the bottom right corner.</p>
</blockquote>

<p>Because <span class="math">\(Ω\)</span> is orthonormal, each column (or row) of <span class="math">\(Ω\)</span> must have unit norm.
Placing a non-zero on the off-diagonal will get &#8220;killed&#8221; when multiplied by the
corresponding zero in <span class="math">\(Σ\)</span>. So the optimal choice of <span class="math">\(Ω\)</span> is to set all values to
zero except on the diagonal. If <span class="math">\(\det{\U\V^\transpose} = -1\)</span>, then we should set
one (and only one) of these values to <span class="math">\(-1\)</span>. The best choice is the bottom right
corner since that will multiply against the smallest singular value in <span class="math">\(∑\)</span> (add
negatively affect the maximization the least):</p>

<p><span class="math">\[
Ω^*_{ij} = \begin{cases}
1 & \text{ if $i=j\lt3$} \\\\
\det{\U\V^\transpose} & \text{ if $i=j=3$} \\\\
0 & \text{ otherwise.}
\end{cases}
\]</span></p>

<p>Finally, we have a formula for our optimal rotation:</p>

<p><span class="math">\[
\Rot = \U Ω^* \V^\transpose.
\]</span></p>

<blockquote>
<h3 id="closed-formpoint-to-pointminimizer">Closed-form point-to-point minimizer</h3>

<p><em>Interestingly</em>, despite the non-linear constraint on <span class="math">\(\Rot\)</span> there is actually
a closed-form solution to the point-to-point matching problem:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\t∈\R³,\ \Rot ∈ SO(3)} ∑_{i=1}^k ‖\Rot \x_i + \t - \p_i‖²,
\]</span></p>

<p>This is a variant of what&#8217;s known as a <a href="https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem">Procrustes
problem</a>, named
after a <a href="https://en.wikipedia.org/wiki/Procrustes">mythical psychopath</a> who
would kidnap people and force them to fit in his bed by stretching them or
cutting off their legs. In our case, we are forcing <span class="math">\(\Rot\)</span> to be perfectly
orthogonal (no &#8220;longer&#8221;, no &quot;shorter).</p>

<h4 id="substitutingoutthetranslationterms">Substituting out the translation terms</h4>

<p>This energy is <em>quadratic</em> in <span class="math">\(\t\)</span> and there are no other constraints on
<span class="math">\(\t\)</span>. We can immediately solve for the optimal <span class="math">\(\t^*\)</span>&#8212;leaving <span class="math">\(\Rot\)</span> as an unknown&#8212;by
setting all derivatives with respect to unknowns in <span class="math">\(\t\)</span> to zero:</p>

<p><span class="math">\[
\begin{align}
\t^*
  &= \argmin_{\t} ∑_{i=1}^k ‖\Rot \x_i + \t - \p_i‖²  \\\\
  &= \argmin_\t \left\|\Rot \X^\transpose + \t \One^\transpose - \P^\transpose\right\|^2_F,
\end{align}
\]</span>
where <span class="math">\(\One ∈ \R^{k}\)</span> is a vector ones. Setting the partial derivative with
respect to <span class="math">\(\t\)</span> of this
quadratic energy to zero finds the minimum:
<span class="math">\[
\begin{align}
0 
  &= \frac{∂}{∂\t} \left\|\Rot \X^\transpose + \t \One^\transpose - \P^\transpose\right\|^2_F \\\\
  &= \One^\transpose \One \t + \Rot \X^\transpose \One - \P^\transpose \One,
\end{align}
\]</span></p>

<p>Rearranging terms above reveals that the optimal <span class="math">\(\t\)</span> is the vector aligning
the <a href="https://en.wikipedia.org/wiki/Centroid">centroids</a> of the points in <span class="math">\(\P\)</span>
and the points in <span class="math">\(\X\)</span> rotated by the&#8212;yet-unknown&#8212;<span class="math">\(\Rot\)</span>. Introducing
variables for the respective centroids <span class="math">\(\hat{\p} = \tfrac{1}{k} ∑_{i=1}^k
\p_i\)</span> and <span class="math">\(\hat{\x} = \tfrac{1}{k} ∑_{i=1}^k \x_i\)</span>, we can write the
formula for the optimal <span class="math">\(\t\)</span>:</p>

<p><span class="math">\[
\begin{align}
\t 
  &= \frac{\P^\transpose \One - \Rot \X^\transpose \One}{ \One^\transpose \One} \\\\
  &= \hat{\p} - \Rot \hat{\x}.
\end{align}
\]</span></p>

<p>Now we have a formula for the optimal translation vector <span class="math">\(\t\)</span> in terms of the
unknown rotation <span class="math">\(\Rot\)</span>. Let us
<a href="https://en.wikipedia.org/wiki/Substitution_(algebra)">substitute</a> this formula
for all occurrences of <span class="math">\(\t\)</span> in our energy written in its original summation
form:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\Rot ∈ SO(3)}  ∑\limits_{i=1}^k \left\| \Rot \x_i + ( \hat{\p} - \Rot\hat{\x}) - \p_i \right\|^2 \\\
\mathop{\text{minimize}}_{\Rot ∈ SO(3)}  ∑\limits_{i=1}^k \left\| \Rot (\x_i - \hat{\x}) - (\p_i - \hat{\p}) \right\|^2 \\\\
\mathop{\text{minimize}}_{\Rot ∈ SO(3)}  ∑\limits_{i=1}^k \left\| \Rot \overline{\x}_i - \overline{\p}_i \right\|^2 \\\\
\mathop{\text{minimize}}_{\Rot ∈ SO(3)}  \left\| \Rot \overline{\X}^\transpose - \overline{\P}^\transpose \right\|_F^2,
\]</span></p>

<p>where we introduce <span class="math">\(\overline{\X} ∈ \R^{k × 3}\)</span> where the ith row contains the
<em>relative position</em> of the ith point to the centroid <span class="math">\(\hat{\x}\)</span>: i.e.,
<span class="math">\(\overline{\x}_i = (\x_i - \hat{\x})\)</span> (and analagously for <span class="math">\(\overline{\P}\)</span>).</p>

<p>Now we have the canonical form of the <a href="https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem">orthogonal procrustes
problem</a>. To
find the optimal rotation matrix <span class="math">\(\Rot^*\)</span> we will massage the terms in the
<em>minimization</em> until we have a <em>maximization</em> problem involving the <a href="https://en.wikipedia.org/wiki/Frobenius_inner_product">Frobenius
inner-product</a> of the
unknown rotation <span class="math">\(\Rot\)</span> and <a href="https://en.wikipedia.org/wiki/Covariance_matrix">covariance
matrix</a> of <span class="math">\(\X\)</span> and <span class="math">\(\P\)</span>:</p>

<p><span class="math">\[
\begin{align}
\Rot^* 
&= \argmin_{\Rot ∈ SO(3)} \left\| \Rot \overline{\X}^\transpose - \overline{\P}^\transpose \right\|_F^2 \\\\
&= \argmin_{\Rot ∈ SO(3)} \left<\Rot \overline{\X}^\transpose - \overline{\P}^\transpose , \Rot \overline{\X}^\transpose - \overline{\P}^\transpose \right>_F\\\\
&= \argmin_{\Rot ∈ SO(3)} \left\| \overline{\X} \right\|_F^2 + \left\| \overline{\P} \right\|_F^2 - 2 \left<\Rot \overline{\X}^\transpose , \overline{\P}^\transpose \right>_F\\\\
&= \argmax_{\Rot ∈ SO(3)} \left<\Rot,\overline{\P}^\transpose\,\overline{\X}\right>_F\\\\
&= \argmax_{\Rot ∈ SO(3)} \left<\Rot,\M\right>_F\\
\end{align}
\]</span></p>

<p>Letting <span class="math">\(\M = \overline{\P}^\transpose\,\overline{\X}\)</span> we can now follow the
steps above using <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value
decomposition</a> to
find the optimal <span class="math">\(\Rot\)</span>.</p>
</blockquote>

<h3 id="approximatepoint-to-planeminimizer">Approximate point-to-plane minimizer</h3>

<p>If we apply our linearization of <span class="math">\(\Rot\)</span> to the <strong>point-to-plane</strong> distance
linearization of the matching energy, our minimization is:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\t∈\R³, α, β, γ} 
  ∑_{i=1}^k 
  \left( 
  \left(
  \left(\begin{array}{ccc}
   0 & -γ &  β \\
   γ &  0 & -α \\
  -β &  α &  0 \\
  \end{array}\right)\x_i +
  \x_i + \t - \p_i 
  \right)⋅\n_i
  \right)^2.
\]</span></p>

<p>We can follow similar steps as above. Let&#8217;s gather a vector of unknowns: <span class="math">\(\u =
[α β γ \t^\transpose] ∈ \R⁶\)</span>. Then we can write our problem in summation form
as:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\u∈\R⁶}
  ∑_{i=1}^k \left(\n_i^\transpose 
  \left(\begin{array}{cccccc}
         0 &  x_{i,3} & -x_{i,2} & 1 & 0 & 0 \\
  -x_{i,3} &        0 &  x_{i,1} & 0 & 1 & 0 \\
   x_{i,2} & -x_{i,1} &        0 & 0 & 0 & 1
  \end{array}\right) \u +
  \n_i^\transpose(\x_i - \p_i) \right)^2.
\]</span></p>

<p>This can be written compactly in matrix form as:</p>

<p><span class="math">\[
\mathop{\text{minimize}}_{\u∈\R⁶}
  \left(
  \left[\begin{array}{ccc} \text{diag}(\N_1) & \text{diag}(\N_2) & \text{diag}(\N_2)\end{array}\right]
  \left( 
  \A
  \u +
\left[\begin{array}{c}
  \X_1-\P_1 \\
  \X_2-\P_2 \\
  \X_3-\P_3
\end{array}\right]\right)
  \right)^2,
\]</span></p>

<p>where <span class="math">\(\N_i\)</span> is the ith column from the matrix of normals <span class="math">\(\N ∈ \R^{k × 3}\)</span>,
<span class="math">\(\text{diag}(\v)\)</span> <a href="https://en.wikipedia.org/wiki/Diagonal_matrix#Matrix_operations">creates a diagonal
matrix</a> from a
vector, and <span class="math">\(\A ∈ \R^{3k × 6}\)</span> is the same as above.</p>

<p>This energy is quadratic in <span class="math">\(\u\)</span> and can be solve by setting all partial
derivatives with respect to <span class="math">\(\u\)</span> to zero.</p>

<blockquote>
<h3 id="closed-formpoint-to-pointminimizer">Closed-form point-to-point minimizer</h3>

<p>To the best of my knowledge, no known closed-form solution exists. I am not
sure whether it <strong><em>can not</em></strong> exist or just whether no one has figured it out
(or they did and I just do not know about it).</p>
</blockquote>

<h2 id="uniformrandomsamplingofatrianglemesh">Uniform random sampling of a triangle mesh</h2>

<p>Our last missing piece is to sample the surface of a triangle mesh <span class="math">\(X\)</span> with <span class="math">\(m\)</span>
faces uniformly randomly. This allows us to approximate <em>continuous</em> integrals
over the surface <span class="math">\(X\)</span> with a summation of the integrand evaluated at a finite
number of randomly selected points. This type of <a href="https://en.wikipedia.org/wiki/Numerical_integration">numerical
integration</a> is called the
<a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo method</a>.</p>

<p>We would like our <a href="https://en.wikipedia.org/wiki/Random_variable">random
variable</a> <span class="math">\(\x ∈ X\)</span> to have a
uniform <a href="https://en.wikipedia.org/wiki/Probability_density_function">probability density
function</a> <span class="math">\(f(\x) =
1/A_X\)</span>, where <span class="math">\(A_X\)</span> is the <a href="https://en.wikipedia.org/wiki/Surface_area">surface
area</a> of the triangle mesh <span class="math">\(X\)</span>. We
can achieve this by breaking the problem into two steps: uniformly sampling in
a single triangle and sampling triangles non-uniformly according to their
area.</p>

<p>Suppose we have a way to evaluate a continuous random point <span class="math">\(\x\)</span> in a triangle
<span class="math">\(T\)</span> with uniform probability density function <span class="math">\(g_T(\x) = 1/A_T\)</span> <em>and</em> we have a
away to evaluate a discrete random triangle index <span class="math">\(T ∈ \{1,2,‥,m\}\)</span> with <a href="https://en.wikipedia.org/wiki/Probability_distribution#Discrete_probability_distribution">discrete
probability
distribution</a>
<span class="math">\(h(T) = A_T/A_X\)</span>, then the joint probability of evaluating a certain triangle
index <span class="math">\(T\)</span> and then uniformly random point in that triangle <span class="math">\(\x\)</span> is indeed
uniform over the surface:</p>

<p><span class="math">\[
h(T) g_T(\x) = \frac{A_T}{A_X} \frac{1}{A_T} = \frac{1}{A_X} = f(\x).
\]</span></p>

<h3 id="uniformrandomsamplingofasingletriangle">Uniform random sampling of a single triangle</h3>

<p>In order to pick a point uniformly randomly in a triangle with corners <span class="math">\(\v_1,
\v_2, \v_3 ∈ \R^3\)</span> we will <em>first</em> pick a point uniformly randomly in the
<a href="https://en.wikipedia.org/wiki/Parallelogram">parallelogram</a> formed by
reflecting <span class="math">\(\v_1\)</span> across the line <span class="math">\(\overline{\v_2\v_3}\)</span>:</p>

<p><span class="math">\[
\x = \v_1 + α (\v_2-\v_1) + β (\v_3 - \v_1)
\]</span></p>

<p>where <span class="math">\(α,β\)</span> are uniformly sampled from the unit interval <span class="math">\([0,1]\)</span>. If <span class="math">\(α+β > 1\)</span>
then the point <span class="math">\(\x\)</span> above will lie in the reflected triangle rather than the
original one. In this case, preprocess <span class="math">\(α\)</span> and <span class="math">\(β\)</span> by setting <span class="math">\(α←1-α\)</span> and
<span class="math">\(β←1-β\)</span> to reflect the point <span class="math">\(\x\)</span> back into the original triangle.</p>

<h3 id="area-weightedrandomsamplingoftriangles">Area-weighted random sampling of triangles</h3>

<p>Assuming we know how to draw a <em>continuous</em> uniform random variable <span class="math">\(γ\)</span> from
the unit interval <span class="math">\([0,1]\)</span>, we would now like to draw a <em>discrete</em> random
triangle index <span class="math">\(T\)</span> from the sequence <span class="math">\({1,‥,m}\)</span> with likelihood proportional to
the relative area of each triangle in the mesh.</p>

<p>We can achieve this by first computing the <a href="https://en.wikipedia.org/wiki/Running_total">cumulative
sum</a> <span class="math">\(\C ∈ \R^{m}\)</span> of the relative
areas:</p>

<p><span class="math">\[
C_i = ∑_{j=1}^i \frac{A_j}{A_X},
\]</span></p>

<p>Then our random index is found by identifying the first entry in <span class="math">\(\C\)</span> whose
value is greater than a uniform random variable <span class="math">\(γ\)</span>. Since <span class="math">\(\C\)</span> is sorted,
locating this entry can be done in <span class="math">\(O(\log m)\)</span>
<a href="https://en.wikipedia.org/wiki/Big_O_notation">time</a>.</p>

<h3 id="whyismycodesoslow">Why is my code so slow?</h3>

<p>Try profiling your code. Where is most of the computation time spent?</p>

<p>If you have done things right, the majority of time is spent computing
point-to-mesh distances. For each query point, the <a href="https://en.wikipedia.org/wiki/Computational_complexity_theory">computational
complexity</a> of
computing its distance to a mesh with <span class="math">\(m\)</span> faces is <span class="math">\(O(m)\)</span>.</p>

<p>This can be <em>dramatically</em> improved (e.g., to <span class="math">\(O(\log m)\)</span> on average) using an
<a href="https://en.wikipedia.org/wiki/Space_partitioning">space partitioning</a> data
structure such as a <a href="https://en.wikipedia.org/wiki/K-d_tree">kd tree</a>, a
<a href="https://en.wikipedia.org/wiki/Bounding_volume_hierarchy">bounding volume
hierarchy</a>, or
<a href="https://en.wikipedia.org/wiki/Bin_(computational_geometry)">spatial hash</a>.</p>

<h2 id="tasks">Tasks</h2>

<h3 id="readbouaziz2015">Read [Bouaziz 2015]</h3>

<p>This reading task is not directly graded, but it&#8217;s expected that you read and
understand sections 3.2&#8211;3.3 of Sofien Bouaziz&#8217;s PhD thesis &#8220;Realtime Face
Tracking and Animation&#8221; 2015. <em>Understanding</em> this may require digging into
wikipedia, other online resources or other papers.</p>

<h3 id="blacklist">Blacklist</h3>

<p>You may not use the following libigl functions:</p>

<ul>
<li><code>igl::AABB</code></li>
<li><code>igl::fit_rotations</code></li>
<li><code>igl::hausdorff</code></li>
<li><code>igl::point_mesh_squared_distance</code></li>
<li><code>igl::point_simplex_squared_distance</code></li>
<li><code>igl::polar_dec</code></li>
<li><code>igl::polar_svd3x3</code></li>
<li><code>igl::polar_svd</code></li>
<li><code>igl::random_points_on_mesh</code></li>
</ul>

<h3 id="whitelist">Whitelist</h3>

<p>You are encouraged to use the following libigl functions:</p>

<ul>
<li><code>igl::cumsum</code> computes cumulative sum</li>
<li><code>igl::doublearea</code> computes triangle areas</li>
<li><code>igl::per_face_normals</code> computes normal vectors for each triangle face</li>
</ul>

<h3 id="srcrandom_points_on_mesh.cpp"><code>src/random_points_on_mesh.cpp</code></h3>

<p>Generate <code>n</code> random points uniformly sampled <em>on</em> a given triangle mesh with
vertex positions <code>VX</code> and face indices <code>FX</code>. </p>

<h3 id="srcpoint_triangle_distance.cpp"><code>src/point_triangle_distance.cpp</code></h3>

<p>Compute the distance <code>d</code> between a given point <code>x</code> and the closest point <code>p</code> on
a given triangle with corners <code>a</code>, <code>b</code>, and <code>c</code>.</p>

<h3 id="srcpoint_mesh_distance.cpp"><code>src/point_mesh_distance.cpp</code></h3>

<p>Compute the distances <code>D</code> between a set of given points <code>X</code> and their closest
points <code>P</code> on a given mesh with vertex positions <code>VY</code> and face indices <code>FY</code>.
For each point in <code>P</code> also output a corresponding normal in <code>N</code>.</p>

<blockquote>
<p>It is OK to assume that all points in <code>P</code> lie inside (rather than exactly at
vertices or exactly along edges) for the purposes of normal computation in
<code>N</code>.</p>
</blockquote>

<h3 id="srchausdorff_lower_bound.cpp"><code>src/hausdorff_lower_bound.cpp</code></h3>

<p>Compute a lower bound on the <em>directed</em> Hausdorff distance from a given mesh
(<code>VX</code>,<code>FX</code>) to another mesh (<code>VY</code>,<code>FY</code>). This function should be implemented by
randomly sampling the <span class="math">\(X\)</span> mesh.</p>

<h3 id="srcclosest_rotation.cpp"><code>src/closest_rotation.cpp</code></h3>

<p>Given a 3×3 matrix <code>M</code>, find the closest rotation matrix <code>R</code>.</p>

<h3 id="srcpoint_to_point_rigid_matching.cpp"><code>src/point_to_point_rigid_matching.cpp</code></h3>

<p>Given a set of source points X and corresponding target points P, find the
optimal rigid transformation (R,t) that aligns X to P, minimizing the
point-to-point matching energy.</p>

<p>You may implement either that &#8220;Approximate&#8221; solution via linearizing the
rotation matrix or the &#8220;closed form&#8221; solution</p>

<h3 id="srcpoint_to_plane_rigid_matching.cpp"><code>src/point_to_plane_rigid_matching.cpp</code></h3>

<p>Given a set of source points <code>X</code> and corresponding target points <code>P</code> and their
normals <code>N</code>, find the optimal rigid transformation (<code>R</code>,<code>t</code>) that aligns <code>X</code> to
planes passing through <code>P</code> orthogonal to <code>N</code>, minimizing the point-to-point
matching energy.</p>

<h3 id="srcicp_single_iteration.cpp"><code>src/icp_single_iteration.cpp</code></h3>

<p>Conduct a <em>single iteration</em> of the iterative closest point method align
(<code>VX</code>,<code>FX</code>) to (<code>VY</code>,<code>FY</code>) by finding the rigid transformation (<code>R</code>,<code>t</code>)
minimizing the matching energy.</p>

<p>The caller can specify the number of samples <code>num_samples</code> used to approximate
the integral over <span class="math">\(X\)</span> and specify the <code>method</code> (point-to-point or
point-to-plane).</p>

</body>
</html>