math4ml.tex

\documentclass{article}
\title{Mathematics for Machine Learning}
\author{Garrett Thomas\\
Department of Electrical Engineering and Computer Sciences\\
University of California, Berkeley}

\input{common.tex}

\begin{document}
\maketitle

\section{About}
Machine learning uses tools from a variety of mathematical fields.
This document is an attempt to provide a summary of the mathematical background needed for an introductory class in machine learning, which at UC Berkeley is known as CS 189/289A.

Our assumption is that the reader is already familiar with the basic concepts of multivariable calculus and linear algebra (at the level of UCB Math 53/54).
We emphasize that this document is \textbf{not} a replacement for the prerequisite classes.
Most subjects presented here are covered rather minimally; we intend to give an overview and point the interested reader to more comprehensive treatments for further details.

Note that this document concerns math background for machine learning, not machine learning itself.
We will not discuss specific machine learning models or algorithms except possibly in passing to highlight the relevance of a mathematical concept.

Earlier versions of this document did not include proofs.
We have begun adding in proofs where they are reasonably short and aid in understanding.
These proofs are not necessary background for CS 189 but can be used to deepen the reader's understanding.

You are free to distribute this document as you wish.
The latest version can be found at \url{http://gwthomas.github.io/docs/math4ml.pdf}.
Please report any mistakes to \url{gwthomas@berkeley.edu}.

\newpage
\tableofcontents

\newpage
\section{Notation}
\begin{tabular}{|l|l|}
\hline
Notation & Meaning \\
\hline
$\R$ & set of real numbers \\
$\R^n$ & set (vector space) of $n$-tuples of real numbers, endowed with the usual inner product \\
$\R^{m \times n}$ & set (vector space) of $m$-by-$n$ matrices \\
$\delta_{ij}$ & Kronecker delta, i.e. $\delta_{ij} = 1$ if $i = j$, $0$ otherwise \\
$\nabla f(\vec{x})$ & gradient of the function $f$ at $\x$ \\
$\nabla^2 f(\vec{x})$ & Hessian of the function $f$ at $\x$ \\
$\A\tran$ & transpose of the matrix $\A$ \\
$\Omega$ & sample space \\
$\pr{A}$ & probability of event $A$ \\
$p(X)$ & distribution of random variable $X$ \\
$p(x)$ & probability density/mass function evaluated at $x$ \\
$A\comp$ & complement of event $A$ \\
$A \dotcup B$ & union of $A$ and $B$, with the extra requirement that $A \cap B = \varnothing$ \\
$\ev{X}$ & expected value of random variable $X$ \\
$\var{X}$ & variance of random variable $X$ \\
$\cov{X}{Y}$ & covariance of random variables $X$ and $Y$ \\
\hline
\end{tabular}

\vspace{0.5cm}
Other notes:
\begin{itemize}
\item Vectors and matrices are in bold (e.g. $\x, \A$).
This is true for vectors in $\R^n$ as well as for vectors in general vector spaces.
We generally use Greek letters for scalars and capital Roman letters for matrices and random variables.

\item To stay focused at an appropriate level of abstraction, we restrict ourselves to real values.
In many places in this document, it is entirely possible to generalize to the complex case, but we will simply state the version that applies to the reals.

\item We assume that vectors are column vectors, i.e. that a vector in $\R^n$ can be interpreted as an $n$-by-$1$ matrix.
As such, taking the transpose of a vector is well-defined (and produces a row vector, which is a $1$-by-$n$ matrix).
\end{itemize}

\newpage
\section{Linear Algebra}
\input{cs189-linalg.tex}

\newpage
\section{Calculus and Optimization}
\input{cs189-calculus-optimization.tex}

\newpage
\section{Probability}
\input{cs189-probability.tex}

\newpage
\section*{Acknowledgements}
The author would like to thank Michael Franco for suggested clarifications, and Chinmoy Saayujya for catching a typo.

\bibliography{math4ml}
\addcontentsline{toc}{section}{References}
\bibliographystyle{ieeetr}
\nocite{*}
\end{document}