(ns morelinear.leastsquares
(:require [clojure.core.matrix :refer :all]
[clojure.core.matrix.linear :as linear]
[morelinear.gauss :as gauss]
[morelinear.householder :as householder]))
(set-current-implementation :vectorz)
Now that we have many ways to solve square linear systems we need to extend Ax=b to the overdefined case where we have more linear equations than parameters. This situation will come up constantly, generally in situations where you only have a few inputs you want to solve for, but you have a lot of redundant measurements.
\begin{equation}
\begin{bmatrix}
a_11 & a_12
a_21 & a_22\
a_31 & a_32\
a_41 & a_42\
…\
\end{bmatrix}
\begin{bmatrix}
x_1\
x_2\
\end{bmatrix}
=
\begin{bmatrix}
y_1\
y_2\
\
\end{bmatrix}
\end{equation}
If the measurements were ideal then the rows of A will not be independent and you will be able to find an explicit solution though Gaussian Elimination. But in the general case (with the addition of noise and rounding errors) no explicit solution will exist for Askinnyx=b . You could try to find a set of independent equations and throw away the extra equation to try to make a square matrix, but this is throwing information away.
Since there is no x for which Askinnyx=b holds true we instead solve for a “close” system Askinnyx=bclose which does have a solution. In other words we want to find an x that will give us a bclose which is as close as possible to b. When we say “close” what we mean mathematically is that we want to minimize the sum of the difference between bclose and b - ie. sum_of_all_values(bclose-b).
The problem is that sums don’t express very easily in matrix form. Fortunately we do have a mechanism which is really close - the inner product. The inner product of a vector x with itself - xTx - is the sum of the squares of the values of x. While this is not equivalent, it actually does not change the solution b/c the minimum stays the minimum. So instead of sum_of_all_values(bclose-b). we can work with (bclose-b)T(bclose-b) and guarantee the same solution
If we plug in our equation for bclose for we get (Askinnyx-b)T(Askinnyx-b).
\begin{equation}
(Askinnyx-b)T(Askinnyx-b)
((Askinnyx)T-bT)(Askinnyx-b) \
(xTAskinnyT-bT)(Askinnyx-b) \
xTAskinnyTAskinnyx
-xTAskinnyTb
-bTAskinnyx
+b^2
\end{equation}
As we learn in calculus, minimizing a function is done by take its derivative with respect to the parameter we are changing (here that’s x
), setting it equal to zero and then solving for that parameter (b/c the minimum point has zero slope). What page 226-227 shows is that the derivative of our difference equation give us the equation ATAx=ATb. The right hand side ATb is a vector, and in-fact the whole equation is in the form Ax=b which we know how to solve (again, solving for x
here). Also note that ATA is always square and singlular - and that b/c A was skinny it’s actually smaller than A.
We take the product ATA and the product ATb and solve ATAx=ATb just like we solve an Ax=b system
(defn lu-direct
"Solve ATA=ATb directly"
[linear-system output-vector]
(let [AT (transpose linear-system)
ATA (mmul AT linear-system)
ATb (mmul AT output-vector)]
(gauss/solve-with-lu ATA ATb)))
;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
[1.0 0.25 0.0625]
[1.0 0.50 0.25]
[1.0 0.75 0.5625]
[1.0 1.00 1.0]]
b [0.000 0.008 0.015 0.019 0.02]
x (lu-direct A b)]
(pm x)
(pm (mmul A x)))
[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]
;; [-0.000 0.040 -0.019] ;; [-2.2857142857142797E-4 ;; 0.008514285714285717 ;; 0.01482857142857143 ;; 0.018714285714285718 ;; 0.020171428571428573]
Unfortunately while the previous solutions is very simple, it does involve a matrix multiplication and therefore has some numerical issues. A nice trick is presented in Exercise 4.6.9 with the equations
\begin{equation}
\begin{bmatrix}
Im*m & A
A^T & 0n*n\
\end{bmatrix}
\begin{bmatrix}
x_1\
x_2\
\end{bmatrix}
=
\begin{bmatrix}
b\
0\
\end{bmatrix}
\end{equation}
This special block has the property that it’ll always be square no matter what shape A is. If you multiply out the block matrices you will get two equations
\begin{equation}
\begin{bmatrix}
x_1+Ax2
ATx1\
\end{bmatrix}
=
\begin{bmatrix}
b\
0\
\end{bmatrix}
\end{equation}
\begin{equation} x_1+Ax2 = b \end{equation}
\begin{equation} ATx1 = 0 \end{equation}
Note how the second equation tell us x1 is in the N(AT)
Now we multiple both sides of our first equation by AT
\begin{equation} AT(x_1+Ax2) = ATb \end{equation}
\begin{equation} ATx_1+ATAx2 = ATb \end{equation}
\begin{equation} ATAx2 = ATb \end{equation}
We just used the fact that x1 was in the nullspace of AT to drop the first term and now we see how x2 is also a solution to the least squares equation
So we can use any method that works on solving square matrices, use it on our jumbo matrix and get a solution for the least squares problem. All without computing ATA.
(defn- jumbo-matrix
"Use the big block matrix method to compute the least squares solution"
[input-matrix]
(let [num-rows (row-count input-matrix)
num-columns (column-count input-matrix)
identity (identity-matrix num-rows)
AT (transpose input-matrix)
zeroes (zero-matrix num-columns
num-columns)]
(join-along 0
(join-along 1
identity
input-matrix)
(join-along 1
AT
zeroes))))
(let [A [[1.0 0.0 0.0]
[1.0 0.25 0.0625]
[1.0 0.50 0.25]
[1.0 0.75 0.5625]
[1.0 1.00 1.0]]]
(-> A
jumbo-matrix
pm))
[[1.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000] [0.000 1.000 0.000 0.000 0.000 1.000 0.250 0.063] [0.000 0.000 1.000 0.000 0.000 1.000 0.500 0.250] [0.000 0.000 0.000 1.000 0.000 1.000 0.750 0.563] [0.000 0.000 0.000 0.000 1.000 1.000 1.000 1.000] [1.000 1.000 1.000 1.000 1.000 0.000 0.000 0.000] [0.000 0.250 0.500 0.750 1.000 0.000 0.000 0.000] [0.000 0.063 0.250 0.563 1.000 0.000 0.000 0.000]]
(defn- pad-with-zeroes
"Takes the INPUT-VECTOR and make it longer with some zeroes padded on the end"
[input-vector number-of-zeroes]
(join-along 0
input-vector
(zero-vector number-of-zeroes)))
(defn lu-jumbo
""
[input-matrix output-vector]
(subvector (gauss/solve-with-lu (jumbo-matrix input-matrix)
(pad-with-zeroes output-vector
(column-count input-matrix)))
(row-count input-matrix)
(column-count input-matrix)))
;; missile problem from page 232 but redone with the big matrix method
(let [A [[1.0 0.0 0.0]
[1.0 0.25 0.0625]
[1.0 0.50 0.25]
[1.0 0.75 0.5625]
[1.0 1.00 1.0]]
b [0.000 0.008 0.015 0.019 0.02]
x (lu-jumbo A b)]
(pm x)
(pm (mmul A x)))
[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]
(let [A [[1.000 0.000 0.000]
[1.000 250.000 62500.000]
[1.000 500.000 250000.000]
[1.000 750.000 562500.000]
[1.000 1000.000 1000000.000]]
b [0.0 8.0 15.0 19.0 20.0]
x (lu-jumbo A b)]
(pm x)
(pm (mmul A x)))
[-0.229 0.040 -0.000] [-0.229 8.514 14.829 18.714 20.171]
TODO This should be numerically stable but it’s not. Need to double check everything again.. Maybe compare results to elinear
With the QR decomposition we similarly get to avoid computing the ATA product. Since A=QR we can write rewrite ATAx=ATb:
\begin{equation}
ATAx=ATb
(QR)TQRx = (QR)Tb \
RTQTQRx = RTQTb
\end{equation}
Now since QT is equivalent to Q-1 (see: Householder transform) this simplifies further to:
\begin{equation}
RTRx = RTQTb
Rx = QTb \
\end{equation}
The R matrix is upper triangular so the left side solves directly through backsubsitution, while the right side evaluates to some column vector.
(defn householder-qr-long
""
[A b]
(let [R (mutable A)
Q (householder/qr! R)
QT (transpose Q)
QTb (mmul QT b)]
(gauss/backward-substitution R QTb)))
However when we test the result…
;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
[1.0 0.25 0.0625]
[1.0 0.50 0.25]
[1.0 0.75 0.5625]
[1.0 1.00 1.0]]
b [0.000 0.008 0.015 0.019 0.02]
x (householder-qr-long A b)]
(pm x)
(pm (mmul A x)))
[0.003 0.016 0.005] [0.003 0.007 0.012 0.017 0.023]
The values are .. in the right direction but clearly much more off the mark than in the previous solutions. The trick, as mentioned in the Householder transform section, is to reduce A and b simultaneously with reflectors and skip forming Q entirely.
(defn householder-qr
""
[A b]
(householder/reduction A b)
(gauss/backward-substitution A b))
;; missile problem from page 232
(let [A [[1.0 0.0 0.0]
[1.0 0.25 0.0625]
[1.0 0.50 0.25]
[1.0 0.75 0.5625]
[1.0 1.00 1.0]]
R (mutable A)
b (mutable [0.000 0.008 0.015 0.019 0.02])
x (householder-qr R b)]
(pm x)
(pm (mmul A x)))
[-0.000 0.040 -0.019] [-0.000 0.009 0.015 0.019 0.020]
And now the results match closely to what we got before