notes.tex

\documentclass[12pt]{article}
\usepackage{e-jc}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{algorithmic}
\usepackage{fancyhdr}
\usepackage{hyperref}

\hypersetup {
  colorlinks=true,
  linkcolor=black,
  citecolor=blue,
  pdftitle={Math 230A Notes}
  pdfauthor={Churchill}
  pdfsubject={Notes from Math230A/Stat310A Probability Theory}
  pdfkeywords={Probability}{Notes}
}

\usepackage{graphicx}
\usepackage{wrapfig}
\usepackage{url}
\long\def\symbolfootnote[#1]#2{\begingroup%
\def\thefootnote{\fnsymbol{footnote}}\footnote[#1]{#2}\endgroup}
\newtheorem{lemma}{Lemma}
\newtheorem{theorem}{Theorem}
\newtheorem{defn}{Definition}
\newtheorem{corr}{Corollary}

\title{Math 230A / Stat 310A -- Probability Theory -- Notes}

\author{
Alex Churchill\\
\small \texttt{achur@stanford.edu}
}

\date{Autumn, 2011}

\begin{document}

\maketitle
\thispagestyle{empty} % ignore page number on first page
\tableofcontents


\newpage

\setcounter{page}{1} % set page number back to 1

\section{Course Information}
Prof. Persi Diaconis, Sequoia 131, 725-1965 (no email) \\
Office Hours: Wednesday 1:30 - 3:00
\\ \\
TA. Anirban, Sequoia 208 (anirbanb@stanford.edu) \\
Office Hours: Friday 10-12
\\ \\
TA. Sumit, Sequoia 237 (sumitm@stanford.edu) \\
Office Hours: Monday 2-4
\\ \\
Text: P. Billingsley, \underline{Probability and Measure} 3rd Ed. (On reserve at Math Library).
\\ \\
Grading: Homework (30\%), Midterm (30\%), Final (40\%)
\\ \\
Midterm: Thursday Nov. 3, in class; 5:30-8:00 pm; One $3 \times 5$ notecard allowed.
\\ \\
Final: Thursday, Dec. 15, 7-10 pm.  Room 380Y.  Guesses: we will profit from knowing 1. The Four T's Proof, 2. The proof of Lindeberg condition, 3. How to do the general birthday problem, 4. Something from measure theory, 5. Something about weak convergence.  Additional things to read up on: Stein's equation and the motivation for Stein's method, dependency graphs, Bernstein polynomial, Weierstrass approximation using weak law.  Take a look at characteristic functions and brush up on your complex.
\\ \\
{\bf Oh, and for God's sake... READ THEOREM 25.10!!!!!}
\\ \\
Halloween Talk on Non-Measurable Sets: Monday (Oct 31) 5:30-6:30 in Pigott Hall (260-113).

\subsection{Homeworks}

{\bf HW WEEK 2: READ Sec 3, 4.  Do problems 3.2(a,b); 3.3(a,b,c,d); 3.11; 3.13; 3.16; 4.11}

{\bf HW WEEK 3: READ Sec 10, 11, 14.  Do problems 10.1, 10.2, 14.5, 14.8.}
Problem: Let $F_1, F_2$ be distribution functions on $\mathbb{R}$.  Define $H_l(x,y) = (F_1(x) + F_2(y) - 1)_+$ where $(x)_+ = x$ if $x \ge 0$ and $0$ otherwise.  Define $H_u(x,y) = \min(F_1(x), F_2(y))$.  (a) Prove that $H_l$, $H_u$ are bivariate distribution functions with margins $F_1(x) = H_l(x, \infty) = H_u(x, \infty)$ and $F_2(y) = H_l(\infty, y) = H_u(\infty, y)$. (b) Prove for all $H(x,y)$, $F_1, F_2$ as margins, $H_l(x,y) \le H(x,y) \le H_u(x,y)$ for $-\infty < x,y < \infty$.
\\ \\
{\bf HW WEEK 4: READ Sec 15, 16.  Do Problems 15.1, 15.2, 16.1, 16.7 + PROBLEM (see it down there in the notes)}
\\ \\
{\bf HW WEEK 5: Read Sec 18, Do: 2, 4, 10, 13, 14}
\\ \\
{\bf HW WEEK 6: Read Sec 20, 21, 22.  Do 20.21, 20.24, 20.25(a, b, d), 21.11, 21.15, 22.2, 22.3}
\\ \\
{\bf HW WEEK 8: Read Sec. 27.  Do 27.3, 27.4, 27.7, 27.10, 27.11}
\\ \\
{\bf HW WEEK 9: Read Sec. 25, 26.  Do 25.1, 25.3, 25.16, 26.15, 26.16, 26.17}


\section{Week 1}
This week covers material from sections 1 and 2 in the book.

\subsection{Introduction}
We start by posing a simple probability problem: how many people must be in a room for even odds that two people will have the same birthday?
\\ \\
I wasn't in class for this derivation, so I'm not sure exactly how it happened, but it is easy enough to estimate using the Poisson distribution's approximation of the binomial distribution (though note I'm fairly sure this wasn't the approximation used in class).
\\ \\
Recall that the Binomial Distribution determines the number of successes of $n$ experiments drawn with probability $p$.  In this case, given $N$ people in the room, the number of total birthday pairs is ${N \choose 2}$.  For each pair, the probability the two birthdays are on the same day is $1/365$.  Therefore, to apply the Poisson approximation, the expected number of birthday pairs is $\lambda = {N \choose 2}/365$, so we estimate the probability of $k$ successful trials to be $\frac{\lambda^k e^{-\lambda}}{k!}$.  Therefore, the probability there are no successful trials is approximately $e^{-{N \choose 2}/365}$, which turns out to be slightly less than $0.5$ for $N = 23$.
\\ \\
{\it Variations:}\\ \\
Let the probability a person is born on day $i$ be $\theta_i$ (in the usual case, $\theta_i = 1/365$) where $\sum \theta_i = 1$.
\\ \\
How many people for even odds of $j$ matching birthdays in $N$ days? \\
Answer: In general, $k = \{ N^{j-1} ln \frac{1}{1-p} \}^{1/j}$.\\
Cash \$10 to whomever proves this.
\\ \\
\subsection{Coin Tossing}
We introduce a model for fair coin tossing:
\\ \\
Let $\Sigma$ be the interval $(0, 1]$.  For $0 < a \le b \le 1$, define $P([a,b]) = b-a$.
\\ \\
If $I_1, ..., I_k$ are disjoint intervals, define $P(\cup_{i=1}^k I_k) = \sum_{i=1}^k |I_k|$.
\\ \\
Write $\omega \in [0,1]$ as binary: $\omega = \sum_{i=1}^\infty \frac{d_i(\omega)}{2^i}$.
\\ \\
$d_i: (0,1] \rightarrow \{0, 1\}$ where $d_i$ is constructed by breaking $(0, 1]$ into $i$ equally-sized intervals and assigning $d_i$ to 0, 1, 0, 1, ... among those intervals (not a formal definition!).
\\ \\
Then $P(\{ \omega: d_i(\omega) = 1\}) = 1/2$, so we say $P\{d_i = 1\} = 1/2$.
\\ \\
$P(d_1 = d_2 = 1) = 1/4$, and more generally, $P(d_1 = e_1, ..., d_k = e_k) = 1/2^k$ for $e_i \in \{0, 1\}$.

\subsection{Strong Law (Coin Tossing)}
\begin{lemma}
(Markov's Inequality): Given $f:(0, 1] \rightarrow [0, \infty)$, $P\{\omega: f(\omega) \ge a\} \le \frac{ \int_0^1 f(\omega) d \omega }{a}$.
\end{lemma}
\begin{proof}
Note that we break the integral into two pieces: $\int_{\{ \omega : f(\omega) \ge a \}} f(\omega) d\omega +\int_{\{ \omega : f(\omega) \ge a \}} f(\omega) d\omega$ at which point the proof simply notices that the second integral must be positive and the first is at least $a P\{ \omega : f(\omega) \ge a \}$
\end{proof}

\begin{theorem}
(Weak Law of Large Numbers for Coin Tossing): For all $\epsilon > 0$, $P \{ | \frac{1}{n} \sum d_i - 1/2 | > \epsilon \} \rightarrow 0$ as $n \rightarrow \infty$.
\end{theorem}
\begin{proof}
Note it is easier to work with $r_i(\omega) = 2 \cdot d_i(\omega) - 1$; it is enough to show $P\{ | \frac{1}{n} \sum_{i=1}^n r_i | > \epsilon \} \rightarrow 0$.
\\ \\
Evaluate $\int_0^1 r_i(\omega) d\omega = 0$ and $\int_0^1 r_i(\omega) r_j(\omega) d\omega = 1$ if $i = j$ and $0$ otherwise.
\\ \\
Hence, $\int_0^1 (\sum_{i=1}^n r_i(\omega) )^2 d\omega = n$.
\\ \\
Finally, note $P \{ | \frac{1}{n} \sum_{i=1}^n r_i | > \epsilon \} = P \{ | \sum_{i=1}^n r_i |^2 > n^2 \cdot \epsilon^2 \} \le \frac{1}{n \epsilon^2} \rightarrow 0$ by Markov's inequality.
\end{proof}

Now, we would like to say
$$\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n d_i(\omega) = \frac{1}{2}$$
but we can't:
$$ \omega = 0.00111100000000111111111111111100...$$

So instead...

\begin{theorem}
(Strong Law of Large Numbers for Coin Tossing): $\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n d_i(\omega) = \frac{1}{2}$ except for $\omega$ in a negligible set (later: true for $\omega$ almost everywhere).
\end{theorem}
\begin{proof}
Let $r_i = 2d_i - 1$.  Show $\frac{1}{n} \sum_{i=1}^n r_i \rightarrow 0$.
\\ \\
Last time, $P\{ | \frac{s_n}{n} | > \epsilon \} = P \{ s_n^4 \ge (\epsilon n)^4 \}$.
\\ \\
By Markov's inequality,
$$\le \frac{ \int_0^1 |s_n \omega|^4 d \omega }{\epsilon^4 n^4} \le \frac{3}{\epsilon^4 n^2}$$
We show $\int |s_n|^4 \le 3n^2$.
\\ \\
Choose $\epsilon_n \to 0$ so $\sum_{n=1}^\infty \frac{1}{n^2 \epsilon_n^4} < \infty$.
e.g. let $\epsilon_n = \frac{1}{n^{1/5}}$, let $B = \{ \omega: lim \textrm{ } exists \}$, $A_n = \{ \omega: | \frac{s_n}{n} | \ge \epsilon_n \}$.
Then if $\omega \in \cap_{n=m}^\infty A_n^C$, $\omega$ in $B$:
$$\cap_{n=m}^\infty A_n^C \subset B \textrm{ on } B^C \subset \cup_{n=m}^\infty A_n$$
$A_n = \cup_{k=1}^{k(n)} I_{nk}$ so $|A_n| \le \frac{1}{n^2 \epsilon_n^4}$\\
$b^C \subset \cup_{n=1, k=1}^\infty I_{nk}$ so $\sum_{k=1, n=m}^\infty |I_{n,k}| < \epsilon$.
\\ \\
\end{proof}
Remarks: Borel proved this first, and was studying the number theory problem:\\ \\
For $\omega \in (0, 1]$, the proportion of the first $n$ binary digits tends to $1/2$; same for all bases simultaneously.
\\ \\
Difference between weak and strong laws: weak law says gets close to 1/2, strong law says for almost everywhere, gets close and stays there.
\\ \\
Notice that $B = \{ \omega : \lim \frac{s_n}{n} = 0 \} = \cap_{k=1}^\infty \cup_{m=1}^\infty \cap_{n=m}^\infty \{ \omega: |\frac{s_n}{n} | < \frac{1}{k} \}$ is a complicated set.
\\ \\
\subsection{Fields and $\sigma$-Algebras}

\begin{defn}
Let $\Omega$ be any set, and $\mathcal{F}_0$ a collection of subsets is a {\bf Field} if:

\begin{enumerate}
\item  $\emptyset, \Omega \in \mathcal{F}_0$
\item  $A \in \mathcal{F}_0 \Rightarrow A^C \in \mathcal{F}_0$
\item  $A_1, ..., A_n \in \mathcal{F}_0 \Rightarrow \cup_{i=1}^n A_i \in \mathcal{F}_0$
\end{enumerate}

\end{defn}

\begin{defn}
Let $\Omega$ be any set, and $\mathcal{F}_0$ a collection of subsets is a {\bf $\sigma$-Algebra} if:

\begin{enumerate}
\item  $\emptyset, \Omega \in \mathcal{F}_0$
\item  $A \in \mathcal{F}_0 \Rightarrow A^C \in \mathcal{F}_0$
\item  $A_1, ... \in \mathcal{F}_0 \Rightarrow \cup_{i=1}^\infty A_i \in \mathcal{F}_0$
\end{enumerate}

\end{defn}

That is, a $\sigma$-Field is a Field closed under countable unions.
\\ \\
Example: The Borel sets are the minimal $\sigma$-algebra containing all open intervals $I \in [-\infty, \infty]$.  Note that there is no easy description of Borel sets and constructing them requires transfinite operations.  But it doesn't matter because Borel sets will just work the way we want them to anyway.
\\ \\

\begin{defn}
$(\Sigma, \mathcal{F})$ be a (set, $\sigma$-algebra).  A function $P$ is a {\bf probability} if

\begin{enumerate}
\item $P(\emptyset) = 0$, $P(\Omega) = 1$
\item $P(A) = 1 - P(A^C)$ for all $A \in \mathcal{F}$
\item $A_i \in \mathcal{F}$ for all $1 \le i < \infty$, $P(\cup_1^\infty A_i) = \sum_1^\infty P(A_i)$ for all $A_i$ disjoint.
\end{enumerate}

\end{defn}

In other words, $P$ is just a measure that takes on value $1$ for $\Omega$.
\\ \\
Task: Say $\mathcal{F}_0$ field of subsets.  Given $P$ on $\mathcal{F}_0$, want to extend $P$ to $\sigma(\mathcal{F}_0)$.  (We follow the Greeks)
\\ \\
Define: For al $A \in \Sigma$ $P^*(A) = \inf \sum_{i=1}^\infty P(A_i)$ where $A \subset \cup_{i=1}^\infty A_i$.
\\ \\
In this case, $P^*$ is an outer measure (and for our interval definition, is in fact the Lebesgue outer measure).
\\ \\
Some obvious facts: (1) $P^*(\emptyset) = 0$.  (2) Similarly, $P^*(\Omega) = 1$.  (3) If $A \subset B$, then $P^*(A) \le P^*(B)$.  (4) Given $\{A_i\}$ any sets in $\Omega$, $P^*(\cup A_i) \le \sum_{i=1}^\infty P^*(A_i)$ (subadditivity).
\\ \\
Proof of (4): \\
Fix $\epsilon > 0$, choose $B_{ik} \in \mathcal{F}_0$ : $\cup B_{ik} \supset A_i$ and $\sum_{i=1}^\infty P(B_{ik}) \le P^*(A_i) + \epsilon/2^i$.
\\ \\
Then $\cup_1^\infty A_i \subset \cup_{i.k} B_{ik}$ so $P^*(\cup A_i) \le \sum_1^\infty (P^*(A_i) + \epsilon / 2^i) = \sum_1^\infty P^*(A_i) + \epsilon$.  QED.
\\ \\
What have we learned since the Greeks?  We can't assign a length to all subsets of (0,1].  We need a clever way of approximating (or maybe just a way to do some rigorous math) so outer measures $(P^*)$ work there.
\\ \\
Idea: (Caratheodory): \\
Given $\Omega$, $\mathcal{F}_0$, $P$ on $\mathcal{F}_0$, $A$ probability, define $P^*$ as above.
\\ \\
Let $M = \{ A : \forall E, P^*(E) = P^*(A \cap E) + P^*(A^C \cap E) \}$.  (Collection of all measurable sets).
\\ \\
We show: (1) $M$ is a $\sigma$-algebra containing $\mathcal{F}_0$, (2) $P^*$ is countably additive on $M$ (in other words, $P^*$ is a measure on $M$), (3) $P^*$ agrees with $P$ for sets in $\mathcal{F}_0$, (4) $P^*$ is unique on $M$ given (1-3).
\\ \\
Note that $M = \{ A : \forall E, P^*(E) \ge P^*(A \cap E) + P^*(A^C \cap E) \}$ by subadditivity.
\\ \\
Proof that $M$ is a field: Clearly contains $\emptyset$ and $\Omega$.  Because of symmetry, if $A \in M$, $A^C$ in $M$.  Now, say $A$, $B$ in $M$.  Then $P^*(E) = P^*(B \cap E) + p^*(B^C \cap E) = P^*(A \cap B \cap E) + P^*(A^C \cap B \cap E) + P^*(A \cap B^C \cap E) + P^*(A^C \cap B^C \cap E) \ge P^*(A \cap B \cap E) + P^*((A^C \cap B \cap E) \cup (A \cap B^C \cap E) \cup (A^C \cap B^C \cap E)) = P^*((A \cap B) \cap E) + P^*((A \cap B)^C \cap E)$.


\section{Week 2}
This week covers material from sections 3 and 4 in the book.

\subsection{$\sigma$-algebras}
Given $\Omega$ and $\mathcal{F}_0$ a field of subsets of $\Omega$.  $P$ is a probability given on $\mathcal{F}_0$.  We want to extend to $P^*$, the outer measure generated by $P$; for any set $A \subset \Omega$
$$P^*(A) = \inf \sum_{i=1}^\infty P(A_i)$$
where $A_i \in \mathcal{F}_0$ such that $A \subset \cup A_i$.  (Note this is just a normalized Lebesgue outer measure).
\\ \\
Easy to show that 
\begin{enumerate}
\item $P^*(\emptyset) = 0$
\item $P^*(\Omega) = 1$
\item $A \subset B \Rightarrow P^*(A) \le P^*(B)$
\item $P^*$ is countably subadditive.
\end{enumerate}

Let $\mathcal{M} = \{ A \subset \Omega : \forall E \subset \Omega, P^*(E) = P^*(E \cap A) + P^*(E \cap A^C) \}$.  These are measurable sets under $P$.
\\ \\
Just as a heads-up (we'll prove this later), the Caratheadory theorm says:
\begin{enumerate}
\item $\mathcal{M}$ is a $\sigma$-algebra containing $\mathcal{F}_0$
\item $P^*$ is a probability on $\mathcal{M}$
\item $P^*(A) = P(A)$ for $A \in \mathcal{F}_0$
\item $P^*$ is the unique such extension
\end{enumerate}

Last time, we proved $\mathcal{M}$ is a field.  We want to prove $\mathcal{M}$ is a $\sigma$-field.
\\ \\
{\bf Fact:} If $\{ A_i \}_{n=1}^\infty \in \mathcal{M}, E \subset \Omega$, $A_i$ disjoint,
$$P^*(E \cap (\cup_{i=1}^\infty A_i)) = \sum_{i=1}^\infty P^*(E \cap A_i)$$
{\bf Proof} by induction.  If $n = 1$ we're OK.  If $n = 2$, we have:
$$P^*(E \cap (A_1 \cup A_2)) = P^*(E \cap (A_1 \cup A_2) \cap A_1) + P^*(E \cap (A_1 \cup A_2) \cap A_2)$$
$$ = P^*(E \cap A_1) + P^*(E \cap A_2)$$
Same for all finite $n$.  In general, $P^*(E \cap (\cup_{i=1}^
\infty A_i)) \ge P^*(E \cap (\cup_{i=1}^n A_i)) = \sum_{i=1}^n P^*(E \cap A_i)$.  Taking the limit as $n \to \infty$, we get $P^*(E \cap (\cup_{i=1}^\infty A_i)) \ge \sum_{i=1}^\infty P^*(E \cap A_i)$.  The other direction follows by subadditivity.
\\ \\
{\bf Fact:} $\mathcal{M}$ is a $\sigma$-algebra and $P^*$ is a probability on $\mathcal{M}$.
\\ \\
{\bf Proof:}
Given $A_n \in \mathcal{M}$ for $1 \le n < \infty$ where $A_i'$ is defined by $A_1' = A_1$, $A_2' = A_2 \cap A_1^C$, $A_3' = A_3 \cap A_1^C \cap A_2^C$, etc.  Therefore, $A_i'$ disjoint for all $i$, but $\cup_{i=1}^\infty A_i = \cup_{i=1}^\infty A_i'$.
\\ \\
So without loss of generality, we can say all $A_i$ are disjoint.
\\ \\
Want $P^*(E) \ge P^*(E \cap (\cup A_i)) + P^*(E \cap (\cup A_i)^C$.
\\ \\
Set $F_n = \cup_{i=1}^n A_i$.  Then 
$$P^*(E) = P^*(E \cap F_n) + P^*(E \cap F_n^C)$$
$$ \ge \sum_{i=1}^n  P^*(E \cap A_i) + P^*(E \cap (\cup_{i=1}^\infty A_i)^C)$$
so as $n \to \infty$
$$P^*(E) \ge \sum_{i=1}^\infty P^*(E \cap A_i)$$
$$ \ge P^*(E \cap (\cup_{i=1}^\infty A_i)) + P^*(E \cap (\cup_{i=1}^\infty A_i)^C)$$
The reverse inequality follows directly from $A \in \mathcal{M}$ and $P^*$ countably additive on $\mathcal{M}$.
\\ \\
Note: $\mathcal{F}_0 \subset \mathcal{M}$, pick $A \in \mathcal{F}_0$ and $E \subset \Omega$.
\\ \\
From the definition of $P^*(E)$, for all $\epsilon > 0$, there exist $A_1, A_2, ...$ with $A_n \in \mathcal{F}_0$ where $E \subset \cup_{i=1}^\infty A_i$ and $\sum_{i=1}^\infty P(A_i) \le P^*(E) + \epsilon$.
\\ \\
Let $B_n = A_n \cap A$, $C_n = A_n \cap A^C$.  $E \cap A \subset \cup_{i=1}^n B_n$, $E \cap A^C \subset \cup C_n$.

$$P^*(E \cap A) + P^*(E \cap A^C) \le \sum P(B_n) + \sum P(C_n) = \sum P(A_n) \le P^*(E) + \epsilon$$
Letting $\epsilon \to 0$ gives us what we want.
\\ \\
Further, we note $P^*(A) = P(A)$ if $A \in \mathcal{F}_0$.  To show this, we know $P^*(A) \le P(A)$.  If $A \subset \cup A_i$, where $A_i \in \mathcal{F}_0$, $P(A) \le \sum P(A \cap A_i) \le \sum P(A_i)$ so $P^*(A) \ge P(A)$.
\\ \\
For Uniqueness, we need to define a $\Pi$ system.

\begin{defn}
A class of subsets $\mathcal{P}$ is a $\Pi$-system if it is closed under finite intersection.
\end{defn}

\begin{defn}
A set $\mathcal{L}$ of subsets is called a $\lambda$-system if
\begin{enumerate}
\item $\Omega \in \mathcal{L}$
\item $A \in \mathcal{L} \Rightarrow A^C \in \mathcal{L}$
\item $A_1, A_2, ... \in \mathcal{L}$, with all $A_i$ disjoint guarantees $\cup_{i=1}^\infty A_i \in \mathcal{L}$.
\end{enumerate}
\end{defn}

\begin{theorem}
(Dynkin's $\Pi$-theorem) If $\mathcal{P}$ is a $\Pi$-system and a $\mathcal{L}$ is a $\lambda$-system, $P \subset \mathcal{L}$, then $\sigma(\mathcal{P}) \subset \mathcal{L}$
\end{theorem}
\begin{proof}
If $A, B \in \mathcal{L}$, $A \subset B$, then $B \backslash A = B \cap A^C \in \mathcal{L}$.  Proof: $A \cup B^C \in \mathcal{L}$, so $B \cap A^C \in \mathcal{L}$.
\\ \\
Because the intersection of $\lambda$-systems is a $\lambda$-system, there is a smallest $\lambda$-system containing $\mathcal{P}$, call it $\mathcal{L}_0$.  We show that $\mathcal{L}_0$ is a $\Pi$-system.  Then we're done for $\mathcal{L}_0$, since it is a $\sigma$-algebra containing $\mathcal{P}$, so $\mathcal{L}$ is a $\sigma$-algebra containing $\mathcal{P}$.
\\ \\
Let $A \in \mathcal{L}_0$.  Let $\mathcal{L}_A = \{ E \subset \Sigma: E \cap A \in \mathcal{L}_0 \}$.  We claim $\mathcal{L}_A$ is a $\lambda$-system.  
\end{proof}

\begin{theorem}
Corrolary: If $u$ and $v$ are probabilities that agree on the $\Pi$-system $\mathcal{P}$, then they agree on $\sigma(\mathcal{P})$
\end{theorem}
\begin{proof}
Let $\mathcal{L}$ be all subsets of $A$ where $m(A) = v(A)$.  This is a $\lambda$-system, so $m(A) = v(A)$ for all $A \in \sigma(\mathcal{P})$.  If $B \in \mathcal{L}_A$, $A \cap B \in \mathcal{L}_0$, $A \cap (A \cap B)^C = A \cap B^C \in \mathcal{L}_0$ so $B^C \in A$.
\\ \\
If $\{B_i\}$ disjoint in $\mathcal{L}_A$ then $A \cap (\cup B_i) = \cup(A \cap B_i)$ so $\cup B_i \in \mathcal{L}_A$.
\\ \\
Say $A, B \in \mathcal{P}$ so $A \cap B \in \mathcal{P}$ so $A \in \mathcal{L}_B$.  But $\mathcal{L}_B$ is a $\lambda$-system, so $\mathcal{L}_0 \subset \mathcal{L}_B$; e.g. for every $A \in \mathcal{L}_0$, $B \in \mathcal{L}_A$, so $\mathcal{L}_0 \subset \mathcal{L}_A$ for all $A \in \mathcal{L}_0$.
\\ \\
So if $B, C \in \mathcal{L}_0$ then $C \in \mathcal{L}_A$; e.g. $C \cap A \in \mathcal{L}_0$.  So $\mathcal{L}_0$ is closed under finite intersections.  So $\mathcal{L}_0$ is a $\Pi$-system.
\end{proof}

Therefore, if $P$ is a probability on a field $\mathcal{F}_0 \le 2^\Omega$, then extension $P^*$ to $\sigma( \mathcal{F}_0)$ is unique.
\\ \\
Note we assumed $P$ was a probability on $\mathcal{F}_0$; that is we assume $A$, $A_1, A_2, ...\in \mathcal{F}_0$ with $A = \cup A_i$ and all $A_i$ disjoint then $P(A) = \sum P(A_i)$.  This check is performed in the book.

\subsection{Extensions}
YEAH THERE NEEDS TO BE SOME STUFF FILLED IN HERE... (Wk 2., Day 2).

\section{Week 3}
This week covers material from sections 10 - 12 and 14 in the book.

\subsection{$\infty$ measures}
\begin{defn}
$\Omega$ any set, $\mathcal{F}$ is a field of subsets of $\Omega$ ($\emptyset \in \mathcal{F}$, $\mathcal{F}$ closed under finite intersections and complementation).  $\mu: \mathcal{F} \rightarrow [0, \infty]$, $\mu(\emptyset) = 0$, $\mu(\cup_1^\infty A_i) = \sum \mu(A_i)$ is a measure on $\mathcal{F}$.  In other words, a measure is 0 on the emptyset, nonnegative, and countably additive.
\end{defn}

If $\mu(\Omega) < \infty$, it is the same as a probability.  If $\exists A_n \in \mathcal{F}, \Omega = \cup_{n=1}^\infty A_n, \mu(A_n) < \infty$ for all $n$, $\mu$ is $\sigma$-finite.
\\ \\
Example: $\Omega = \mathbb{N}$, $\mu(i) = 1$, $\mu$ is $\sigma$-finite.  Similarly, $\lambda$ on $\mathbb{R}$ is $\sigma$-finite (cover $[0,1]$, $[1,2]$, ...).  However, if $\Omega = [0,1]$ and $\mu(A) = $ the number of points in $A$, $\mu$ is not $\sigma$-finite.
\\ \\
Why do we want to talk about this?
\begin{enumerate}
\item Probability densities on $\mathbb{R}$; for example, $\frac{e^{-x^2/2}}{\sqrt{2 \pi}}$ with respect to length on $\mathbb{R}$.
\item In the $\sigma$-finite case, it is easy.
\end{enumerate}
Most arguments are ``same''.  For example, if $A_n \uparrow A$, $A_n, A \in \mathcal{F}$.  Then $\mu(A_n) \to \mu(A)$.  Proof.  $B_1 = A_1$, $B_n = A_n \backslash A_{n-1}$.  $\cup B_n = A$.  $\mu(A) = \sum \mu(B_i) = \lim_{n \to \infty} \sum_1^n \mu(B_i) = \lim_{n \to \infty} \mu(A_n)$.
\\ \\
But sometimes you need to watch it.  If $\mu$ is a probability, $A \subset B$, $\mu(B \backslash A) = \mu(B) - \mu(A)$ and if $A_n \downarrow A$ $\mu(A_n) \to \mu(A)$.  But on $\Omega = (-\infty, \infty)$, $A = (-\infty, 0]$, $B = (-\infty, 1]$, $\mu(B \backslash A) = 1$.  But $\mu(B) - \mu(A) = \infty - \infty$.
\\ \\
Uniqueness of extensions.  Given $\mathcal{P}$ a $\Pi$-system and $M_1, M_2$ measures on $\mathcal{P}$:
\begin{theorem}
If for some $\{B_i\}_{i=1}^\infty$, $B_i \in \mathcal{P}$ where $\mu_j(B_i) < \infty$ for $j = 1, 2$ and $\mu_1 = \mu_2$ on $\mathcal{P}$ then $M_1 = M_2$ on $\sigma(\mathcal{P})$.
\end{theorem}
\begin{proof}
Fix $B \in \mathcal{P}$ where $\mu_j(B) < \infty$ for $j=1,2$.  Let $\nu_j(F) = \mu_j(F \cap B)$.  These are finite measures, so by $\Pi-\lambda$ theorem, $\nu_1 = \nu_2$ on $\sigma(\mathcal{P})$.  Now, let $A_1 = B_1$, $A_2 = B_2 \backslash A_1$, $A_n = B_n \backslash (\cup_{i=1}^n A_i)$.  Then for all $F$, $\mu_1(F) = \mu_1(F \cap (\cup_i A_i)) = \sum_i \mu_1 (F \cap A_i) = \sum_i \mu_2 (F \cap A_i) = \mu_2(F \cap (\cup_i A_i)) = \mu_2(F)$.
\end{proof}
Note that if things are not $\sigma$-finite you can have two different extensions.  See the book.
\\ \\
\subsection{Back to Outer Measures}
\begin{defn}
$\mu^*$ is an outer measure on $\Omega$ if $\mu^* : 2^\Omega \rightarrow [0, \infty]$, $\mu^*(\emptyset) = 0$, and $\mu^*(\cup A_i) \le \sum \mu^*(A_i)$.  In other words, nonnegative, nontrivial, and countably subadditive.
\end{defn}

Example: $\Omega$ any set, $\mathcal{A}$ any collection of subsets, $\emptyset \in \mathcal{A}$.  $\rho : \mathcal{A} \rightarrow [0, \infty]$ any function.  Define $\mu_\rho^* (A) = \inf \sum_{i=1}^\infty \rho(A_n)$, $A \subset \cup_{i=1}^\infty A_i$, and $\mu_\rho^*(A) = \infty$ if no such cover exists.  Claim: $\mu^*$ is an outer measure.  $\emptyset$ follows trivially (it is covered by zero sets).  The measure is definitionally nonnegative.  It is also obviously countably subadditive.
\\ \\
Example: Hausdorff $\gamma$ measure on $\mathbb{R}^n$.  Let $\mathcal{A}$ be the collection all closed balls $B_n(x)$. $\rho(B)$ is the volume of the ball.  $\gamma$ is fixed in $(-\infty, \infty)$.  Read more in 2nd edition of billingsley.
\\ \\
As usual, given an outer measure $\mu^*$, define $\mathcal{M}(\mu^*) = \{ A \subset \omega: \forall E \subset \Omega, \mu^*(E) = \mu^*(E \cap A) + \mu^*(E \cap A^C) \}$.
\begin{theorem}
$\mathcal{M}(\mu^*)$ is a $\sigma$-algebra and $\mu^*$ is a measure on $\mathcal{M}$.
\end{theorem}
\begin{proof}
Sentence for sentence same proof as for probabilities.
\end{proof}
To work with $\infty$ measures, it is useful to know $\sigma$-rings, $\Omega$ a set.

\begin{defn}
A collection of subsets $\mathcal{A}$ is a $\sigma$-ring if $\emptyset \in \mathcal{A}$, $A,B \in \mathcal{A} \Rightarrow A \cap B \in \mathcal{A}$, $A, B \in \mathcal{A}$ and $A \le B$ then $\exists C_i, 1 \le i \le n$ disjoint in $\mathcal{A}$ such that $B \backslash A = \cup_{i=1}^n C_i$.
\end{defn}

Example: On $\mathbb{R}$ consider all $(a, b)$ with $-\infty \le a \le b \le \infty$ form a $\sigma$-ring.
\\ \\
\begin{theorem}
(Extension Theorem) Let $\mu$ be a function on a $\sigma$-ring of subsets $\mathcal{A}$ with $\mu(A) \in [0, \infty]$, $\mu(\emptyset) = 0$, $\mu$ finitely additive and $\mu$ countably subadditive: for all $A_i, \cup A_i \in \mathcal{A}$, $\mu(\cup A_i) \le \sum_1^\infty \mu(A_i)$.  Then $\mu$ extends to a measure on $\sigma(\mathcal{A})$ and if there exists $A_i$ contained in $\mathcal{A}$ such that $\Omega = \cup_1^\infty A_i$, $\mu(A_i) < \infty$ the extension is unique.
\end{theorem}
\begin{proof}
Define $\mu^*(A) = \inf \sum_{i=1}^\infty \mu(A_i)$ wher e$A \subset \cup_1^\infty A_i$ and $A_i \in \mathcal{A}$.  That is an outer measure and $\mu^*$ on $\mathcal{M}(\mu^*)$ is a $\sigma$-algebra that does the job.  We show (1) $\mathcal{A} \subset \mathcal{M}(\mu^*)$ and (2) $\mu^*(A) = \mu(A)$ for all $A \in \mathcal{A}$.
\\ \\
For (1), pick $A \in \mathcal{A}$.  We must show for all $E$, $\mu^*(E) \ge \mu^*(E \cap A) + \mu^*(E \cap A^C)$.  If $\mu^*(E) = \infty$ it is true.  Suppose $\mu^*(E) < \infty$.  Then for every $\epsilon$ there exists $A_i \in \mathcal{A}$ such that $\sum \mu(A_i) \le \mu^*(E) + \epsilon$.  Since $\mu^*$ is finite, $\mu^*(A_i)$ is finite for all $i$.  Set $B_n = A \cap A_n$. $B_n \in \mathcal{F}$.  Now $B_n \subset A_n$ so $A_n \backslash B_n = \cup_{i=1}^{m_n} C_{ni}$ where disjoint $C_{ni} \in \mathcal{A}$.  Then $A_n = B_n \cup (\cup_{i=1}^{m_n} C_{ni})$, $A \cap E \subset \cup B_n$.  Then $A^C \cap  E \subset \cup_n \cup_i C_{ni}$.  Now 
$$\mu^*(E \cap A) + \mu^*(E \cap A^C) \le \sum_{n=1}^\infty \mu(B_n) + \sum_{n=1}^\infty \sum_{i=1}^{m_n} \mu(C_{ni})$$
$$ = \sum_{n=1}^\infty \mu(B_n) + \mu(A_n - B_n)$$
$$ = \sum_n \mu(A_n) \le \mu^*(E) + \epsilon$$
For (2), if $A \subset \cup A_i$ where $A, A_i \in \mathcal{A}$, then $\mu(A) \le \sum_i \mu(A_i)$.  The other direction is free.
\end{proof}


\subsection{Distribution functions on $\mathbb{R}$}
Given probability $\mu$ we describe a {\bf distribution function} by $F(x) = \mu(-\infty, x]$.  We often define probability measures using distribution functions.  For instance, the ``Gauss Measure'' $F(x) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^x e^{-t^2/2} dt$.
\\ \\
Observe (1) $\lim_{x \to -\infty} F(x) = 0$, $\lim_{x \to \infty} F(x) = 1$ (normalization), and $y < x$ means $F(x) - F(y) = \mu(-\infty, x] - \mu(-\infty, y] = \mu(y, x] \ge 0$.  Therefore, (2) $F$ is monotone.
\\ \\
Further, (3) if $x_n \downarrow x$, $(-\infty, x_n] \downarrow (-\infty, x]$, so $F(x_n) \downarrow F(x)$ so $F(x)$ is right continuous.  Note it isn't left continuous.  Let $\mu(A) = 1$ if $0 \in A$ and $0$ otherwise.  Then $F(x) = 0$ for $x < 0$ and $F(x) = 1$ for $x \ge 0$.
\begin{theorem}
Conversely, if $F(x)$ satisfies (1. normalization), (2. monotonicity), and (3. right-continuity), then $\exists !$ probability measure on $(-\infty, \infty)$ with $F(x) = \mu(-\infty, x]$.
\end{theorem}

Note $\{(-\infty, x]: x \in \mathbb{R}\}$ is a $\Pi$-system.
\\ \\
Want to do this in higher dimensions.
\\ \\
Let $A_{x_1, x_2} = \{ (\eta_1, \eta_2) : \eta_i < x_i \}$.  Given $\mu$ on the Borel sets of $\mathbb{R}^2$, define $H(x_1, x_2) = \mu(A_{x_1, x_2})$.  $H$ is monotone, right continuous, but need a bit more.  Note $\mu(A) = \mu(A_x) - \mu(A_y) - \mu(A_w) + \mu(A_z)$
\begin{verbatim}
 -----------------
 .      |        |
 . A_w  |  A_x   |
 .      |        |
 -------|--------|
 .      |        |
 . A_z  |  A_y   |
 . ...  |  ...   |

\end{verbatim}
In $\mathbb{R}_d$, let $A = \{(x_1, ..., x_d) : a_i \le x_i \le b_i\}$.  If $\underline{v}$ is a vertex, $sign(\underline{v}) = -1$ if $\underline{v}$ has an odd number of $a_i$'s, and $1$ otherwise.
\\ \\
If $H(x_1, ..., x_d), \delta_A H = \sum_{\underline{v}} sign(\underline{v}) H(\underline{v})$.  $H$ satisfies $H(x_1, ..., x_d) = \mu(A_{x_1, ..., x_d})$ for a unique probability $\mu$ $\Leftrightarrow$ $\lim_{\underline{x} \to \infty} H(\underline{x}) = 0$, $\lim_{\underline{x} \to \infty} H(\underline{x}) = 1$ (min coord goes to $-\infty$ or max coord goes to $\infty$), $H$ is right continuous and for all rectangles, $\prod(x_i, B_i] = \delta_A H \ge 0$.
\\ \\
So now we have a HW problem from this week: \\Let $F_1(x), F_2(x)$ be distribution functions on $\mathbb{R}$.  $H(x,y)$ with $H(x, \infty) = F_1(x)$ and $H(\infty, y) = F_2(y)$ is called a bivariate distribution function with margins $F_1$ and $F_2$.
\\ \\
Problem a: Consider $H_L(x, y) = (F_1(x) + F_2(y) - 1)$ ($H$-lower) and $H_U = \min(F_1(x), F_2(y))$ ($H$-upper).  Check that these are distribution functions with margins $F_1$ and $F_2$.
\\ \\
Problem b: For every $H$ with margins $F_1, F_2$, show $H_L(x,y) \le H(x,y) \le H_U(x,y)$.
\\ \\
Remark: Once we know what correlation is, $H_L$ is the most negatively correlated D.F. with margins $F_1$, $F_2$ and $H_U$ is the most positively correlated.
\\ \\
\subsection{Measurable Functions and Random Variables}
Let $(\Omega, \mathcal{F})$, $(\Omega', \mathcal{F}')$ be measure spaces.

\begin{defn}
A function $T: \Omega \rightarrow \Omega'$ is {\bf measurable} if\\
For all $A' \in \mathcal{F}'$, $T^{-1}(A') \in \mathcal{F}$
\end{defn}

Proposition: Suppose $\mathcal{F}' = \sigma(\mathcal{A})$.  (a) Then $T$ is measureable iff $T^{-1}(A') \in \mathcal{F}$, $A' \in \mathcal{A}'$.
\\ \\
(b) If $T_1 : (\Sigma_1, \mathcal{F}_1) \rightarrow (\Sigma_2, \mathcal{F}_2)$ and $T_2 : (\Sigma_2, \mathcal{F}_2) \rightarrow (\Sigma_3, \mathcal{F}_3)$ are measurable, then $T_2 \circ T_1 : (\Sigma_1, \mathcal{F}_1) \rightarrow (\Sigma_3, \mathcal{F}_3)$ is measurable.
\\ \\
Both follow directly.
\\ \\
\begin{defn}
A {\bf random variable} is a measurable function $T : (\Omega, \mathcal{F}) \rightarrow (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ where $\mathcal{B}(\mathbb{R})$ is the class of the Borel sets.
\end{defn}
\begin{defn}
A {\bf random vector} is a measurable function $(\Omega, \mathcal{F}) \rightarrow (\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$ where $T(\underline{x}) = (T_1(x_1), ..., T_n(x_n))$.
\end{defn}

\begin{lemma}
$T$ is a random variable $\Leftrightarrow$ all $T_i$ are random variables.
\end{lemma}
\begin{proof}
If each $T_i$ is a measure, then $T^{-1}(A_{\underline{x}}) = \cap_{i=1}^n T_i^{-1}(x_i)$.  Note this is enough by proposition (a), $A' = \{A_x\}$.
\\ \\
If $T$ is a random vector, then $T_i^{-1} (-\infty, x] = \cup_{n=1}^\infty T^{-1} \{ \underline{y} : y_i \le x_i \textrm{ and } y_j \le n, j \neq i\}$, so each is measurable.
\end{proof}

\begin{lemma}
If $T: \mathbb{R}^k \rightarrow \mathbb{R}$ is continuous, then $T$ is measurable.
\end{lemma}
\begin{proof}
Preimage of an open set is open (and closed set is closed) iff $T$ is continuous.  In $\mathbb{R}$, $T$ continuous $\Leftrightarrow T^{-1}(-\infty, x]$ is closed and closed sets are Borel measurable.
\end{proof}

\begin{corr}
If $X$ and $Y$ are random variables on $\Omega$, then $X+Y$, $X \cdot Y$, $\min(X, Y)$, $\max(X,Y)$ are random variables.
\end{corr}
\begin{proof}
$\Omega \rightarrow \mathbb{R}^2 \rightarrow \mathbb{R}$: composite functions will be measurable, since $f(x,y) = x+y$ is measurable.  Same for all others.
\end{proof}

\subsubsection{New measures from old:}
Say $\mu$ is a measure on $(\Omega, \mathcal{F})$ and $T: (\Omega, \mathcal{F}) \rightarrow (\Omega', \mathcal{F}')$ is measurable.  Define push forward of measurable $T$
$$\mu(T^{-1} (A'))  = \mu(T^{-1}(A')) = \mu \{ \omega : T(\omega) \in A'\}$$
This is a measure.
\\ \\
We use this to construct measures.  Example: let $O_n$ be the orthogonal group: that is, the set of all $n times n$ matrices $M$ such that $MM^T = I$.  Want to know what it means to ``pick a matrix at random''.  Suppose we know how to pick from the normal distribution.  Let $X_{ij}$ be independent picks from $\frac{e^{-x^2/2}}{\sqrt{2 \pi}}$ (good ol' bell-shaped curve).  As math: $\Omega = \mathbb{R}^{n^2}$, define $\mathcal{P}$ as $\mathcal{P}(A_{x_{11}, ..., x_{nn}}) = \int_{-\infty}^{x_{11}} ... \int_{-\infty}^{x_{nn}} \frac{e^{-\sum x_{ij}^2}}{(\sqrt{2 \pi})^n} d x_{11} ... x_{nn}$.  Map $T: \mathbb{R}^{n^2} \rightarrow O_n$ (Gram-Schmidt).  Then $P^{T^{-1}}$ is Harr measure.
\\ \\
This stuff is all sort of near section 15 of the book.  

\section{Week 4}
This week covers material from sections ~14,15 in the book.

\subsection{Lebesuge Interval}
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space.  For measureable $f$, define
$$\int f d \mu = \int_\Omega f(\omega) \mu(d \omega)$$
Strategy to define:
\begin{enumerate}
\item Define it for $f \in SF_+$ (simple functions)
\item Define for $f \in m \mathcal{F}_)$
\item Extend to $f \in m \mathcal{F}$
\end{enumerate}
Read Wikipedia for all of this.

\begin{theorem}
(1) $f \in SF_+$, $f = \sum_{i=1}^m f_i I_{A_i} \Rightarrow \int f d \mu = \sum_{i=1}^m f_i \mu(A_i)$ \\ \\
(2) For all $\omega$, $0 \le f(\omega) \le g(\omega) \Rightarrow 0 \le \int f d \mu \le \int g d \mu$.
\\ \\
(3) For all $\omega$, $0 \le f_n(\omega) \le f (\omega)$, $f_n(\omega) \uparrow f(\omega) \Rightarrow \int f_n d \omega \uparrow \int f  d \mu$.
\\ \\
(4) $\alpha, \beta \ge 0$, $\int(\alpha f + \beta g) d \mu = \alpha \int f d \mu + \beta \int g d \mu$
\end{theorem}
\begin{proof}
(1) Note that $\ge$ is obvious (sup over all simple functions...).  Now, let $\{B_1, ..., B_n\}$ be a partition with $\beta_i = \inf_{\omega \in B_i} f (\omega)$.  Let $\{C_1, ..., C_k\}$ be the partition such that for each $A_i$, $B_j$, there are some $C_l, C_{l+1}, ..., C_{l+h}$ with $\cup C_l = A_i$, $\cup C_h = A_j$ (in other words, partition by both $A$ and $B$).  Then consider $\sum_{i=1}^n \beta_i \mu(B_i) \le \sum_{i=1}^h \gamma_i \mu(C_i) \le \sum_{i=1}^k f_i A_i$.
\\ \\
(2) We know $0 \le \int f d \mu$.  Now, let $f(\omega) \le g(\omega)$.  Then $\int f d \mu = \sup \sum_{i=1}^m \inf \omega \in A_i f (\omega) \mu(A_i) \le \sum \sum_{i=1}^m \inf \omega \in A_i g(\omega) \mu(A_i) = \int g d \mu$.
\\ \\
(3) $\int f_n d \mu$ non-decreasing, $\int f_n d \mu \le \int f d \mu$.  Hence, $\lim_{n \to \infty} \int f_n d \mu \le \int f d \mu$.
\\ \\
Note it is sufficient to prove for all partitions, $\lim_{n \to \infty} \int f_n d \mu \ge \sum_{i=1}^m \mu(A_i) f_i$ i.e. for all $\epsilon$, there is a large enough $n$ such that
$$\int f_n d \mu \ge \sum_{i=1}^m (f_i - \epsilon) \mu(A_i)$$
Let $A_{i, n} = \{ \omega \in A_i : f_n(\omega) \ge x_i - \epsilon \}$.  $A_{0, n} = \Omega \backslash \cup_{i=1}^m A_{i, n}$.  $\int f_n d \mu \ge \int - \epsilon \sum_{i=1}^m \mu(A_i)$ for $i = 1, ..., m$.
\\ \\
$\int f_n d \mu \ge \sum_{i=1}^m \inf_{\omega \in A_{i, n}} f_n(\omega) \mu(A_{i, n}) \ge \sum_{i=1}^m (x_i - \epsilon) \mu(A_{i,n})$ so $A_{i, n} \uparrow A_i$.
\\ \\
Note this assumes $\mu(A_i) < \infty$.  If $\mu(A_1), ..., \mu(A_{m_0}) < \infty, \mu(A_{m_0 + 1}), ..., \mu(A_m) = \infty$.  $S < \infty, \inf_{\omega \in A_i} f(\omega) = 0$ for all $i \in \{m_0 + 1, ..., m\}$.
\\ \\
(4) For $\int \alpha f d \mu = \sup_{\{A_i\}} \sum_{i=1}^m \mu(A_i) \inf_{\omega \in A_i} [ \alpha f(\omega) ] = \alpha \sup \sum \mu(A_i) \inf f(\omega) = \alpha \int f d \mu$.
\\ \\
Let $f, g$ be simple.  Obvious for $f, g$.  Now, all functions are limits from below of simple functions, so the whole thing becomes obvious.

\end{proof}

Review: Let $(\Omega, \mathcal{F},\mu)$ be a measure space, and let $f : \Omega \rightarrow [0, \infty]$ be a measurable function.  Then we define $\int f d \mu = \int_\Omega f (\omega) \mu(d \omega) = \sup_{\{A_n\}} \sum_{i=1}^N \inf_{\omega \in A_i} f (\omega_i) \mu(A_i)$, where $\Omega = \cup A_i$.
\\ \\
Properties:
\begin{enumerate}
\item $0 \le f \le g \Rightarrow \int f d \mu \le \int g d \mu$
\item Integral is linear
\item (Monotone Convergence Theorem) If $f_n, f \ge 0$ and $f_n(\omega) \uparrow f(\omega)$, then $\lim \int f_n d \mu = \int \lim f_n d \mu = \int f d \mu$.
\item If $f(\Omega) = \sum_{i=1}^N x_i \delta_{B_i}$ (step function), then $\int f d \mu = \sum_{i=1}^n x_i \mu(B_i)$.
\end{enumerate}

That is, monotonicity, linearity, MCT, and step functions.
\\ \\
If $f : \Omega \to \mathbb{R}$, write $f^+(\omega) = \max(f(\omega), 0)$ and $f_-(\omega) = \max(-f(\omega), 0)$.  Say $\int f d \mu = \int f^+ d \mu - \int f_- d \mu$.
\\ \\
\begin{theorem}
(Fatou's Lemma).  On $(\Omega, \mathcal{F}, P)$, let $f_n \ge 0$ be any measurable functions.  Then $\int \lim f_n d \mu \le \lim \int f_n d \mu$.
\end{theorem}
\begin{proof}
Set $g_n  = \inf_{h \ge n} f_h \uparrow g = \lim f_n$.  So $\lim \int g_n d \mu = \int \lim f_n d \mu$ (since $g_n$ are monotone).  But $g_n \le f_n$ so $\int g_n \le \int f_n$.  Taking the limit of both sides gives $\lim \int g_n \le \lim \int f_n$.  QED.
\end{proof}
Remarks:  This works for any $f_n \ge 0$.
\\ \\
Example: Enumerate the rationals in $[0,1]$ as $\Omega_1, \Omega_2, ...$.  Define $f(x) = \sum_{i=1}^\infty \frac{1}{i^2 \sqrt{|\Omega_i - x|}}$.  Claim: $f(x) < \infty$ (almost sure).
\\ \\
Proof: Let $f_n(x) = \sum_{i=1}^n \frac{1}{n^2} \frac{1}{\sqrt{|\Omega_i = x|}} \uparrow f(x)$.  Then $\int_0^1 f(x) dx \le \lim \sum \frac{1}{i^2} \int_0^1 \frac{dx}{\sqrt{|\Omega_i - x|} } \le \sum_{i=1}^\infty \frac{c}{i^2}$.
\\ \\
\$10 problem:
Find a single $x$ such that $f(x) < \infty$.
\\ \\
\begin{theorem}
(Dominated Convergence Theorem) On $(\Omega, \mathcal{F}, \mu)$, $f_n, f, g$ be measurable functions, $f_n(\omega) \rightarrow f(\omega)$ almost everywhere (i.e. almost sure), $|f_n| \le g$ and $\int g d \mu < \infty$, then $f_n, f$ are integrable and $\lim_{n \to \infty} \int f_n d \mu = \int f d \mu$.
\end{theorem}
\begin{proof}
By hypothesis, $f_n^+ + f_n^- \le g$, $f_* = \lim \inf f_n$ and $f^* = \lim \sup f_n$ are both $\le g$.  Then $g + f^*$ and $g - f_* \ge 0$.  Therefore, $\int g d \mu + \int f_* d \mu = \int \lim \inf (g + f_n) d \mu \le \int g d  \mu + \lim \inf \int f_n d \mu$ by Fatou.
\\ \\
For all $x_n$ check $\lim \inf -x_n = - \lim \sup x_n$.  Then $\int g d \mu - \int f^* d \mu = \int \lim \inf (g - f_n) d \mu \le \int g d \mu - \lim \sup \int f_n d \mu$ by Fatou.
\\ \\
$\int \lim \inf f_n d \mu \le \lim \inf \int f_n d \mu \le \lim \sup \int f_n d \mu \le \int \lim \sup f_n d \mu$.  But because $f_n$ converges almost surely, everything is equal.
\end{proof}

A little probability (and a Homework Problem):
\\ \\
In English, Let $x_n$ for $1 \le n \le \infty$ be an independent exponential $(P(X_i > x) = e^{-x})$, $M_n = -\max_{1 \le i \le n} X_i$.  Find limit behavior of $M_n$.
\\ \\
In Math: Let $\Omega = \mathbb{R}^n$, $\mathcal{F}$ borel.  Let $G(x_1, ..., x_n) = \prod_{i=1}^\infty (1 - e^{-x_i})_+$.  This is a distribution function which given $G(box) = \prod \int_{a_i}^{b_i} e^{-x} dx \ge 0$.  Let $P$ be associated probability.  $X_i(\omega_1, ..., \omega_n)$.  $P(M_n \le x) = P(X_i \le x \textrm{ for all } i) = (1 - e^{-x})^n = e^{n \log(1 - e^{-x})}$.  So, for $x$ large, $\log(1 - e^{-x}) ~ e^{-x}$, set $x = \log n + C$.  $P(M_n \le x) ~ e^{-e^{-c}}$ (that is, the extreme value distribution).
\\ \\
HW Problem: \\
(a): $x \ge 0$, $\frac{x}{1 + x^2} e^{-x^2/2} \le \int_x^\infty e^{-t^2/2} dt \le \frac{e^{-x^2/2}}{x}$.  \\ \\
(b): Let $x_i$ be i.i.d. in $\mathcal{N}(0,1)$ (normal about $(0,1)$).  Let $y_i = \lfloor x_i \rfloor$.  Let $M_n = \max(y_i)$.  Show there exist integers $a_n, p_n \in (0,1)$, $P(M_n = a_n) \sim p_n$, $P(M_n = a_n - 1) \sim (1 - p_n)$. \\ \\
(c):  $\lim \inf p_n \neq \lim \sup p_n$.

\section{Week 5}
This week covers material from sections 18 - 20 in the book.
\subsection{Product Measures}
Suppose $(\Omega_1, \mathcal{F}_1)$ and $(\Omega_2, \mathcal{F}_2)$ are measure spaces.  Then $\Omega_1 \times \Omega_2 = \{(\omega_1, \omega_2) : \omega_1 \in \Omega_1 \textrm{ and } \omega_2 \in \Omega_2\}$.
\\ \\
If $A_1 \in \mathcal{F}_1$ and $A_2 \in \mathcal{F}_2$, then $A_1 \times A_2$ is a measurable rectangle.  Measurable rectangles form a semi-ring.  Check: $\Omega_1 \times \Omega_2$ OK.  $A_1 \times A_2 \cap B_1 \times B_2 = A_1 \cap B_1 \times A_2 \cap B_2$.  $(A_1 \times A_2)^C = A_1^C \times A_2 \cup \Omega_1 \times A_2^C$ which leads to disjoint union of rectangles.
\\ \\
\begin{defn} $\mathcal{F}_1 \times \mathcal{F}_2 = \sigma(\{A_1 \times A_2 : A_i \in \mathcal{F}_i\})$ \end{defn}
Let $\Pi_1 : \Omega_1 \times \Omega_2 \rightarrow \Omega_1$ and $\Pi_2 : \Omega_1 \times \Omega_2$ be the coordinate projections onto $\Omega_1$ and $\Omega_2$ respectively.  Then $\mathcal{F}_1 \times \mathcal{F}_2$ is the smallest $\sigma$-algebra making $\Pi_n$ measurable for $n = 1, 2$.
\\ \\
Sections: $A \subset \Omega_1 \times \Omega_2$, $\omega_1 \in \Omega_1$.  Then $A_{\omega_1} = \{ \omega_2 : (\omega_1, \omega_2) \in A \}$.  $f : \Omega_1 \times \Omega_2 \rightarrow \Omega_3$ where $f_{\omega_1}(\omega)2 = f(\omega_1, \omega_2)$.
\\ \\
Fact: Sections are homeomorphisms for $\cup$, $\cap$, complementation.  That is, $(\cup A^i)_{\omega_1} = \cup A^i_{\omega_1}$.  Same for intersection and complementation.
\\ \\
\begin{lemma}
(Section Lemma) If $A \subset \mathcal{F}_1 \times \mathcal{F}_2$, then $A_{\omega_1}$ is $\mathcal{F}_2$-measurable.  Similarly, if $f : \Omega_1 \times \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_1 \times \mathcal{F}_2 measuable$, $f_{\omega_1} : \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_2$-measurable.
\end{lemma}
\begin{proof}
Let $\rho$ be $\{A \subset \Omega_1 \times \Omega_2 : A_{\omega_1} \textrm{ is measurable} \}$.  Note: $(A_1 \times A_2)_{\omega_1} = \emptyset$ if $\omega_1 \not\in A_1$ and $= A_2$ if $\omega_1 \in A_1$ is measurable.  Therefore, $\rho$ contains a $\Pi-$system of measurable rectangles.  Also, $\rho$ is closed under complements and disjoint unions.  Therefore, $\rho$ is a $\lambda$-system and $\rho \supset \mathcal{F}_1 \times \mathcal{F}_0$.  Further, $f_{\omega_1}^{-1}(B) = (f^{-1}(B))_{\omega_1}$ is measurable, so $f$ is measurable.  (Warning: the converse is false.  If $A \subset \Omega_1 \times \Omega_2$ and $A_{\omega_1}, A_{\omega_2}$ are measurable for all $\omega_1, \omega_2$, $A$ might not be measurable.  Example: $\Omega_1 = \Omega_2 = (0, 1]$ and $\mathcal{F}_i$ is the countable, co-countable $\sigma$-algebra.  The diagonal $\{(x, x)\}$ is not $\mathcal{F}_1 \times \mathcal{F}_2$ measurable, but every section is a point (so is countable).

\end{proof}

\subsection{Kernels}
$(\Omega_1, \mathcal{F}_1)$, $(\Omega_2, \mathcal{F}_2)$ are measure spaces.  A probability kernel is a map $K: \Omega_1 \times \mathcal{F}_2 \rightarrow [0,1]$ $K(\omega_1, A_2)$ such that:
\begin{enumerate}
\item $\forall A_2 \in \mathcal{F}_2$, $\omega_1 \rightarrow K(\omega_1, A_2)$ is Borel measurable
\item $\forall \omega_1$, $K(\omega_1, A_2)$ is a probability measure in $A_2$.
\end{enumerate}
Examples:
1.  $K(\omega_1, A_2) = \mu(A_2)$ where $\mu$ is some probability on $\mathcal{F}_2$.  2.  Families of probabilities.  Let $\{P_\theta(dx) \}_{\theta \in \Theta}$ family of probabilities.  e.g. $\Theta = \mathbb{R} \times (0, \infty)$, $P_{\mu, \sigma^2} \mathcal{N}(\mu, \infty)$.  Has $\mathcal{F}_\Theta$ measure structure.  $P_\Theta(A)$ is a kernel.  3.  $\Omega_1 = \Omega_2$ $K(\omega_1, A_2)$ is called a Markov kernel (so you get standard CS Markov chains by using $\Omega$ finite).
\\ \\
Suppose $K(\omega_1, A_2)$ is a kernel.  Consider $\mathcal{G} = \{ A \in \mathcal{F}_1 \times \mathcal{F}_2 : \omega_1 \mapsto K(\omega_1, A_{\omega_1}) \textrm{ is measurable} \}$.  Claim: $\mathcal{G} = \mathcal{F}_1 \times \mathcal{F}_2$.  Proof: $A_1 \times A_2 \in \mathcal{G}$ for $K(\omega_1, (A_1 \times A_1)_{\omega_1}) = I_{A_1}(\omega_1) K(\omega)1, A_2)$.
\\ \\
If $A \in \mathcal{G}$, then $K(\omega_1, (A^C)_{\omega_1}) = K(\omega_1, (A_{\omega_1})^C) = 1 - K(\omega_1, A_{\omega_1})$.  If $A_i \in \mathcal{G}$ are disjoint, $K(\omega_1, (\cup A^i)_\omega) = \sum_i K(\omega, A_i^\omega)$.
\\ \\
Let $\Pi$ be a probability on $\Omega_1$.  Define $\Pi K(A) = \int_{\Omega_1} k(\omega_1, A_{\omega_1}) \Pi(d \omega_1)$.  $A \in \mathcal{F}_1 \times \mathcal{F}_2$.  $\Pi K$ is a probability because $K(\omega_1, (\Omega_1 \times \Omega_2)_{\omega_1}) = K(\omega_1, \Omega_2) = 1$.  By properties of the integral, it is countably additive.
\\ \\
Note: (a) $\Pi K (A_1 \times A_2)  = \int_{A_1} K(\omega_1, A_2) \Pi(d \omega_1)$.  This gives $\Pi K$ integration.  First pick $\omega_1$ from $\Pi(.)$, then pick $\omega_2$ from $K(\omega_1, .)$.  (b) $\Pi K (A_1 \times \Omega_2) = \Pi(A_1)$ (marginal distribution).

\subsection{Fubini's Theorem}
\begin{theorem}
(Fubini's Theorem for Kernels) $(\Omega_1, \mathcal{F}_1), (\Omega_2, \mathcal{F}_2), \Pi, K$ as above.  Let $f: \Omega_1 \times \Omega_2 \rightarrow [0, \infty]$ be $\mathcal{F}_1 \times \mathcal{F}_2$ measurable.  Then $\int_{\Omega_2} f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2)$ is $\mathcal{F}_1$-measurable and
$$\int_{\Omega_1 \times \Omega_2} f(\omega_1, \omega_2) d \Pi K(\omega_1, \omega_2) = \int_{\Omega_1} [\int_{\Omega_2} f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2) ] \Pi (d \omega_1)$$
\end{theorem}
\begin{proof}
Use 1-2-3 argument.  Let $\mathcal{G}$ be the class of all functions such that the theorem holds.  $G$ contains $\delta_{A_1 \times A_2}$ by previous definition.  Then $\mathcal{G}$ contains positive linear combinations: $\sum a_i f_i \in \mathcal{G}$ for nonnegative $a$, $f_i \in \mathcal{G}$.  By monotone convergence, $f_n \uparrow f$, $f_i \in \mathcal{G}$ means $f \in \mathcal{G}$.  Therefore, $\mathcal{G}$ contains all $\mathcal{F}_1 \times \mathcal{F}_2$ measurable functions.
\end{proof}

\begin{theorem}
(Fubini for possibly negative functions) $\Pi$, $K$ as above. $f : \Omega_1 \times \Omega_2 \rightarrow [-\infty, \infty]$, $\mathcal{F}_1 \times \mathcal{F}_2$ measurable and $f$ is $\Pi K$-measurable.  Let $H = \{\omega_1 : f_{\omega_1} \textrm{ is } K(\Omega, .) integrable \}$.  Define $K f(\omega_1) = \int f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2)$ if measurable, 0 otherwise.  Then $H \in \mathcal{F}_1$, $\Pi(H) = 1$ and fubini's theorem holds:
$$\int f d \Pi K = \int_{\Omega_2} K f(\omega_1) \Pi(d \omega_1)$$
\end{theorem}

Warning: for $f : \Omega_1 \times \Omega_2 \rightarrow [-\infty, \infty]$, we assume $f$ is $\Pi K$ integrable, then we conclude that one of $\int f_{\omega_1}^{+/-} K(\omega_1, d \omega_2)$ has finite integral.
Example: Let $\Omega_1 = (0, 1]$, Borel sets, Lebesgue measure.  Let $\Omega_2 = \{1, 2\}$ all subsets, counting measure.  $\Pi$ is the Lebesgue measure.  $K(\omega_1, 1) = K(\omega_1, 2) = 1/2$.  $f(\omega_1, \omega_2) = \frac{(-1)^{\omega_2}}{\omega_1}$ so $f_{\omega_1} (\omega_2) = -1/\omega_1$ if $\omega_2 = 1$ and $1/ \omega_1$ if $\omega_2 = 2$.  $K f (\omega_1) = 0$ but $\int f^+ d \Pi K = \int f^- d \Pi K = \infty$.
\\ \\
let $(\Omega, \mathcal{F}, \mu)$ be a measure space, $f(\omega, t) : \Omega \times (a,b) \rightarrow \mathbb{R}$.  Suppose $\int |f(\omega, t) \mu (d \omega) < \infty$ for all $t$.  Let $J(t) = \int f(\omega, t) \mu( d\omega)$.
\begin{theorem}
Suppose for each $A \in \mathcal{F}$, $\mu(A^C) = 0$, integrable $g(\omega)$ and open interval $I \subset (a,b)$ and for $\omega \in A$, $f(\omega, t$ is continuous at $x_0$ and $\sup_{t \in I} | f(\omega, t) | \le g(\omega)$, then $F(t)$ is continuous at $t_0$ and $\lim_{t \to t_0} f(\omega, t) \mu(d\omega) = \int f(\omega, t_0) \mu(d \omega)$.
\end{theorem}
\begin{theorem}
Assume $\omega \in A$.  If $f(\omega, t$ is differentiable at $t_0$ and $\sup_{\omega \in I} | \frac{ f(\omega, t) - f(\omega, t_0) }{t - t_0} | \le g(\omega)$ then $J(t)$ is differentiable at $t_0$.
\end{theorem}

Of course, we can't always take limits inside.  Example: $f_n(x) = n^2 \delta_{(n, n+1)} \to 0$ almost surely on $(0, \infty)$.
\\ \\
\subsection{Uniform Integrability}
Most useful for finite measures, so we will cover in terms of probability.

\begin{defn}
$f_n: \Omega \rightarrow \mathbb{R}$ is uniformly integrable $1 \le n < \infty$.  If $\forall t > 0$, $\exists A$ where $$\int_{\{\omega : |f_n(\omega) | > A\}} \int |f_n(\omega) | P(d \omega) < \epsilon$$ for all $n$.
\end{defn}
Example 1: Single functions If $|f|$ is integrable, then $\{f \}$ is uniformly integrable.  Proof: $f_n(\omega) = \delta_{\{ \omega : |f(\omega)| < n \}}(\omega) \rightarrow f(\omega)$ as $n \to \infty$ for all $\omega$.  By dominated convergence theorem, $\int f_n dP \rightarrow \int f (dP)$ so $\int |f_n| dP \rightarrow 0$.
\\ \\
Example 2: Finite families $f_1, ..., f_n$ are uniformly integrable.
\\ \\
Example 3: $f_n(x) = n^2 \delta_{(n, n+1)}$ is not uniformly integrable.
\\ \\
Example 4: Suppose $f_n$ with $1 \le n < \infty$ are integrable and $\exists \epsilon > 0$, $B < \infty$ with $\int |f_n|^{1+\epsilon} dP\ le B$ for all $n$ (usually, $\epsilon = 1$).  Then $\{ f_n\}$ is uniformly integrable.  Proof: $\int_{\{ \omega : |f_n^{1+\epsilon}| > A \}} | f_n | dP  \le \frac{1}{A^{1+\epsilon}} \int_{\{ \omega : |f_n^{1+\epsilon}| > A \}} |f_n|^{1+\epsilon} dP \le \frac{B}{A^{1+\epsilon}}$.  Choose $A$ large to make this small.

\begin{theorem}
Let $(\Omega, \mathcal{F}, P)$ a probability space, $f_n, f$ measurable with $f_n \to f$ almost surely and $f_n$ uniformly integrable.  Then $f$ is integrable and
$$\lim_n \int f_n d P = \int f d P$$
\end{theorem}
\begin{proof}
Given $A$ set, let $f_n(\omega)^A (\omega) = f_n(\omega)$ if $|f_n(\omega)| \le A$ or $0$ is $|f_n(\omega) | > A$.  Similar for $f^A$.  For $A$ fixed, then $f_n^A \to F^A$ almost surely and is bounded by $A$.  So by the dominated convergence theorem, $\int f_n^A d P \rightarrow \int f^A d P$.
\\ \\
Also Fatou tells us $\int f dP \le \lim \int f_n dP \le 1 + A < \infty$ so $f$ is integrable.  Now, choose $A$ large such that $\int |f_n - f| = \int_{\omega |f_n| > A} f_n + \int_{\omega |f_n| \le A} f_n - \int_{\omega |f_n| \le A} f - \int_{\omega |f_n| < A} f$
which we just make less than $\epsilon$.
\end{proof}

\begin{theorem}
Conversely, assume that $f_n \to f$ almost surely and $f_n$ and $f$ are integrable and nonnegative and $\lim \int f_n dt = \int f dt$.  Then, $f_n$ is uniformly integrable.
\end{theorem}
\begin{proof}
For all $A$, $f_n^A \to f^A$ almost surely.  Choose $A$ so $\int_{|f| > A} f dP  < \epsilon$.  So there exists some $n_0$ such that $n > n_0$ implies $|\int f_n^A - \int f^A | < \epsilon$ for all $n > n_0$.  Then choose $A_1 > A$ so $\int_{\{|f_n| > A} |f_n| dP < \epsilon / 3$ for all $n \le n_0$.
\\ \\
$\int f_n dP = \int_{|f_n| > A} f_n dP + \int f_n^A dP$.  Now $\int_{f_n(\omega) > A} f_n DP \rightarrow \int_{f > A} f dP$ using DCT.  Choose $A$ so that the r.h.s. $< \epsilon/3$ and for $n$ large, the l.h.s. $< 2 \epsilon / 3$.
\end{proof}


\section{Week 6}
From now on, everything is probability.  We'll see: Strong Law, Poisson Convergence, Central Limit Theorem, Weak Convergence.  We'll do these all with Stein's method.

\subsection{Tail Fields and Kolmogorov's Zero/One Law}
Have $(\Omega, \mathcal{F}, P)$, $A_n \in \mathcal{F}$ for all $n$.
\begin{defn}
Tail Field generated by $\{ A_n \}$ is $\tau = \cap_{n=1}^\infty \sigma(A_n, A_{n+1}, ...)$
\end{defn}
Event $A \in \tau$, then $A \in \sigma(A_n, A_{n+1}, ...)$ so $A$ doesn't depend on $A_1, ..., A_{n-1}$ for all $n$, so $A$ doesn't depend on any finite number of $A_n$.
\\ \\
Example: Consider $(0, 1]$, Borel sets, $\lambda$.  Let $\omega = \sum_{i=1}^\infty \frac{d_i(\omega)}{2^i}$, $A_i = \{ d_i(\omega) = 1\}$.  Then $\{ \omega : \lim \frac{1}{n} \sum_{i=1}^n d_i(\omega) \textrm{ exists} \}$ is in $\tau$, but $\{ \omega : \frac{1}{n} \sum_{i=1}^n d_i(\omega) = 1/2 \textrm{ i.o.}\}$ is not $\tau$-measurable.
\begin{theorem}
(Kolmogorov's zero/one law) If $\{A_i\}_{i=1}^\infty$ are independent, then $A \in \tau$ has $P(A) = 0$ or $P(A) = 1$.
\end{theorem}
\begin{proof}
Take $A \in \tau$.  Then $A \in \sigma(A_{n+1}, A_{n+1}, ...)$ for each $n$, so $A$ independent of $A_1, ..., A_n$ for all $n$.  Therefore, $A$ independent of $\sigma(A_1, A_2, ...)$.  But $A \in \sigma(A_1, A_2, ...)$, so $A$ independent of $A$.  Therefore $P(A) = P(A) P(A)$, so $P(A) = 0$ of $P(A) = 1$.
\end{proof}
Same Theorem: if $X_i$ independent random variables, $\tau = \cap_{i=1}^\infty \sigma(X_i, X_{i+1}, ...)$, $P$ is $0-1$ on $\tau$.
\\ \\
Might ask if there is a finite version of the 0-1 law.  There are...
\\ \\
Usage Example: Different construction of a nonmeasurable set.  Let $\mathcal{C}$ be collection of all subsets of $\{1, 2, ... \}$ with finite complement.  Obvious that $\emptyset \not\in \mathcal{C}$.  If $A, B \in \mathcal{C}$, then $A \cap B \in \mathcal{C}$.  $A \in \mathcal{C}$, $A \subset B$, then $B \in \mathcal{C}$.  Therefore, $\mathcal{C}$ is a filter.  Since filters give a partial ordering, by Zorn's lemma, there is some maximal filter.  Call it $\mathcal{M}$.  Then for each $A \subset \mathbb{N}$, $A \in \mathcal{M}$ or $A^C \in \mathcal{M}$ (i.e. $\mathcal{M}$ is an ultra filter).  Using $\mathcal{M}$, we build $D = \{ \omega \in (0, 1] : A_\omega \in \mathcal{M} \}$ where $A_{\omega} = \{ i : \omega_i = 1\}$.  Claim $D$ is not Borel (in fact isn't Lebesgue measurable).  Proof: First, note $D$ is a tail set, so $\omega \in D$ and $\omega'$ differs from $\omega$ in finitely many places, $A_\omega \in \mathcal{M}$ means $A_{\omega'}$ must be in $\mathcal{M}$.  Else $A^c_{\omega'} \in \mathcal{M}$.  $|A_\omega \cap A_{\omega'}^C < \infty$ which is a contradiction.  Now, let $T(\omega_1 \omega_2 \omega_3 ...) = \overline{\omega_1} \overline{\omega_2} ...$ (just flipping ones and zeroes).  Note that $T$ preserves Lebesuge measure (check on diadic intervals, or just think about swapping heads and tails on your fair coin).  If $D$ were $\lambda$-measurable, then $1 = \lambda(D \cup D^C) = 2 \lambda(D)$ so $\lambda(D) = 1/2$.  But zero/one law says it can't be.

\subsection{Random Variables}
Recall: A Random Variable is a Boreal measurable function from $X: (\Omega, \mathcal{F}) \rightarrow ( \mathbb{R}, Borel)$.
\\ \\
Example: Let $X$, $Y$ be idependent Random Variables with $X$ distributed as $\mu$ and $Y$ distributed as $\eta$.  Find law of $X' + Y'$.  Translation: $(\Omega, \mathcal{F}) = \mathbb{R} \times \mathbb{R}$, Borel sets.  Let $P = \mu \times \nu$.  Let $X(x,y) = x$ and $Y(x,y) = y$.  Let $z(x,y) = x+y$.  $\mu \times \nu(B) = \int_\mathbb{R} \mu(B_x) \nu(dx) = \int \nu(B_y) \mu(dy) = P((x,y) \in B)$.  For $C \subset \mathbb{R}$, $B = \{ (x,y) : x + y \in C \}$.  $P( z \in C ) = \int \mu(C - x) \nu(dx) = \int \nu(C-y) \mu(dy)$.
\\ \\
This recipe is called convolution of $\mu$ and $\nu$ and is written $\mu * \nu$.
\\ \\
For instance, if $\mu \sim \eta( m_1, \sigma_1^2)$, $\nu \sim \eta(m_2, \sigma_2^2)$, $\mu * \nu \sim \eta(m_1 + m_2, \sigma_1^2 + \sigma_2^2)$.
\\ \\
Best Reference: Hogg and Craig has lots of convolution examples.
\\ \\
\begin{defn}
$X$ be a random variable.  $X$ has finite $k$th moment means
$$\int |x(\omega)|^k P(d \omega) < \infty$$
\end{defn}
The normal random variable has moments $0$ for $k$ odd, $(2j - 1)(2j - 3)$ where $k = 2j$.

Proposition: Let $X, Y$ be independent random variables with $E(X), E(y) < \infty$.  Then $E(XY) < \infty$ and $E(XY) = E(X) E(Y)$.  Proof: User 1 -2-3.  Say $X(\omega) = \delta_A(\omega), Y(\omega) = \delta_B(\omega)$.  $XY = \delta_{A \cap B} = \delta_A \delta_B$.  Now, treating linear combinations, works for monotonicity.  In general, $X = X^+ - X^-$, $Y = Y^+ - Y^-$.  $XY = (X^+ - X^-)(Y^+ - Y^-) = X^+ Y^+ - X^- Y^+ - X^+ Y^- + X^- Y^-$ which works.

\begin{theorem}
Kolmogorov's Strong Law: Let $X_1, X_2, ...$ identically independently distributed with $E(X_i) = \mu < \infty$.  Then $\lim_{n \to \infty} \frac{s_n}{n} = \mu$ almost sure (where $s_n = X_1 + ... + X_n$.
\end{theorem}
\begin{proof}
Due to Etamadi.  Four tricks: the Four-T's proof.

\begin{enumerate}
\item $X_1 = X_1^+ - X_1^-$, $\mu = \mu^+ - \mu^-$, so WLOG $X_1 \ge 0$.

\item (Truncation) $Y_i = X_i \delta_{ \{ X_i \le i \} }$.  $T_n = \sum_{i=1}^n Y_i$.  $\alpha > 1$ fixed.  Let $u_n = \lfloor \alpha^n \rfloor$, $\epsilon > 0$.  We show
$$\sum_{n=1}^\infty P \{ |\frac{T_{u_n} - E(T_{u_n})}{u_n}| \} < \infty$$
Use Tchebyshev:
$$Var(T_n) = \sum_{i=1}^n Var(Y_i) \le \sum_{i=1}^n E(Y_i^2)$$
$$ = \sum_{i=1}^n E(X_i^2 \delta_{\{ X_i \le i \}} \le n E(X_1^2 \delta_{\{X_1 \le n\}})$$
so by Tchebyshev:
$$\le \sum_{n=1}^\infty \frac{Var(T_{u_n})}{\epsilon^2 u_n^2} \le \frac{1}{\epsilon^2} \sum_{n=1}^\infty \frac{1}{u_n} E(X_1 \delta_{\{ x_1 \le u_n \}})$$
$$= \frac{1}{\epsilon^2} E(X_1^2 \sum_{n=1}^\infty \frac{ \delta(x_1 \le u_n) }{u_n} ) $$
For any $x$, let $N_x$ be the smallest integer such that $u_{N_x} > x$.  Then we can bound $\sum_{u_n \ge x} \frac{1}{u_n} \le 2 \sum_{n \ge N_x} \frac{1}{\alpha^n} = \frac{K}{\alpha^{N_x}}$ where $K = \frac{2 \alpha}{\alpha - 1}$.
So we can bound our original chain by:
$$\le \frac{K}{\epsilon^2} E(X_1) < \infty$$
Use this with $\epsilon = \frac{1}{m}$.  Get 
$$\lim_{n \to \infty} \frac{T_{u_n} - E(T_{u_n}) }{ u_n} = 0 \textrm{ a.s.}$$
by Borel-Cantelli.  So we have the Strong law for truncated vars on a subsequence (the $\alpha$ exponential subsequence).  Now we need to fight back to $\frac{s_n}{n} \to \mu$.

\item (Remove Truncation) Easy Fact: if $x_i \to x$ then $\frac{1}{n} \sum_{n=1}^n x_i \to x$.  Then
$$E(Y_i) = E(X_i \delta_{\{X_1 \le i\}}) \rightarrow \mu$$
so
$$\frac{1}{u_n} \sum_{i=1}^{u_n} E(Y_i) \rightarrow \mu$$
$$\lim \frac{T_{u_n}}{u_n} = \mu \textrm{ a.s.}$$
Now, consider $\sum_{n=1}^\infty P \{X_i \neq Y_i \} = \sum_{i=1}^\infty P \{ X_i = i \} \le \int_0^\infty P \{X_1 > t \} dt = E(X_1) < \infty$.  So $P(X_i \neq Y_i \textrm{ i.o.}) = 0$ by Borel-Cantelli.  So $\frac{s_{u_n}}{u_n} \to \mu$ almost surely.

\item (Interpolation) Go from theorem on subsequences to theorem everywhere.  Given any integer $k$, choose $u_n$ so that $u_n \le k \le u_{n+1}$.  Then
$$\frac{u_n}{u_{n+1}} \frac{s_{u_n}}{u_n} \le \frac{s_n}{n} \le \frac{u_{n+1}}{u_n} \frac{s_{u_{n+1}}}{u_{n+1}}$$
so
$$\frac{1}{\alpha} m \le \lim \inf \frac{s_n}{n} \le \lim \sup \frac{s_n}{n} = \alpha m \textrm{ a.s.}$$
so take $\alpha = 1 + \frac{1}{m}$.  Then we have $\lim \frac{s_n}{n} = m$.

\end{enumerate}
\end{proof}

That was a pretty slick proof.  There were four tricks.  Call them the Four T's.  Truncation, Tchebyshev, Tsubsequences, inTerpolation.  Note: Used identically distributed by replacing by $X_1$, but barely used independence.  Only used independence  by saying sum of variance equal to variance of sum.  Thus strong law holds for identically distributed pairwise independent random variables.
\\ \\
There are stationary $\{X_i\}$ which are pairwise independent, $X_i = \pm 1$, $P(X_i = 1) = \frac{1}{2}$, so have strong law, but central limit theorem fails.  (Svante Janson).  SLLN is a special case of Ergodic Theorem or Martingale Convergence Theorems.
\\ \\
Difference from Weak Law: Weak Law says $P(|\frac{s_n}{n} - \mu | > \epsilon) \to 0$ as $n \to \infty$.  Strong Law says $\lim \frac{s_n}{n} - \mu$ almost surely.  Strong law says it gets close and stays close forever (almost surely).  There are random variables (they don't have a mean, but might fluctuate symmetrically) such that the weak law holds, but the strong law does not.  If you want to know about this, look up ``Unfavorable Fair Games''.
\\ \\
The strong law of large numbers has a very clean statement, but no content, because there's no rate of convergence (the way we do for the weak law).  What we would like is given $N$ and $\epsilon > 0$, look at the probability $P(|\frac{s_n}{n} - \mu | < \epsilon \textrm{ for all } n \ge N) = f(N, \epsilon)$.
\\ \\
Fact: $E(X_1) < \infty$ is necessary and sufficient for the small law of large numbers.
\\ \\
\begin{theorem}
Say $X_i$ i.i.d. and $E(X_1^-) < \infty$, $E(X_1^+) = \infty$.  So $E(X_i) = \infty$.  Then $\lim{s_n}{n} = \infty$ almost sure.
\end{theorem}
\begin{proof}
Since $\frac{1}{n} \sum X_i = \frac{1}{n} \sum X_i^+ - \frac{1}{n} \sum X_i^-$ but $\frac{1}{n} \sum X_i^- \to k$.  Therefore, it's enough to assume that $X_i > 0$.  $E(X_i) = \infty$>  Then $\frac{1}{n} \sum_{i=1}^n X_i \ge \frac{1}{n} \sum_{i=1}^\infty X_i \delta_{\{X_i < u \}}$
so
$$\lim \frac{1}{n} \sum_{i=1}^n X_i \ge E(X_1 \delta_{\{X_1 \le u \}})$$
and let $u \to \infty$, and we win by monotone convergence.
\end{proof}
\begin{theorem}
$E(X) = \int_0^\infty P(X \ge t) dt = \int_0^\infty P(X > t) dt$
\end{theorem}
\begin{proof}
Let $X$ takes values $0, 1, 2, ...$ and $P(X = i) = p_i$.  $\sum_{i=0}^\infty p_i = 1$.  Then $E(X) = \sum_{i=1}^\infty P(X > i)$ (obvious by definition of expectation).  Now, let it take on discrete values.  It works for the same reason.  By 1-2-3 proof, we're basically done.
\end{proof}

Example of use: Guessing game.  Deck of $n$ cards labeled $1, ..., n $.  Mixed, you try to guess value of each card, told if you're right or wrong.  How should you guess?  If you use the optimal strategy, what's the expected number of correct cards?  Let's imagine we already know optimal strategy: keep guessing a card until you get it or run out of cards.  Then the chance of getting $k$ or more correct is $\frac{1}{k!}$.  Therefore, $E(X) = \sum_{i=1}^n \frac{1}{i!}$ where $X$ is the number correct.  So as you keep playing forever means you get closer and closer to $e$.

\section{Week 7}
Poisson approximation and Stein's method.  His lecture notes will be on the website for the course.
\\ \\
$X \sim Poi(\lambda), Y \sim Poi(\mu)$
\subsection{Problem 1}
Show $E[(X - \lambda)^3] = E[(X - \lambda)^4]$, $E[(X - \lambda^5)] = E[(X - \lambda^6)]$, ...
\\ \\
Poisson Heuristic: $\{X_i\}$ are 0/1-valued random variables and $W = \sum_{i \in I} X_i$, then if $P(X_i) = 1$ is small and $\lambda = E(W) = \sum P_i$ ``is a number'' and $\{X_i\}$ are not too dependent, then
$$P(W = j) \sim \frac{e^{-\lambda} \lambda^j}{j!}$$
\\ \\
\subsection{Problem 2}
We introduce $|| ||$ on probabilities on $\mathbb{N}$.  Let $||P - Q || = \max_{A \subset \mathbb{N}} | P(A) - Q(A) | = \frac{1}{2} \sum_{n=0}^\infty |P(i) - Q(i)| = \frac{1}{2} \max_{||b||_\infty \le 1} |P(b) - Q(b)|$.  The homework problem is to prove these equalities.
\\ \\
Real Example:
\\ \\
(Multiple birthday problem): Drop $n$ balls (people) uniformly at random into $B$ boxes (birthdays).  What is the chance that you have $k$ or more balls (people) in the same box (having the same birthday)?
\\ \\
Let $I$ be the $n \choose k$ $k$-sets of $\{1, ..., n\}$.  Let $X_S = 1$ if balls in $S$ all fall into the same box, and let $X_S = 0$ otherwise.  Consider $P(X_S = 1) = \frac{1}{B^{|S| - 1}}$.  Then $P(W = 0)$ is the chance that every $k$-tuple fails.
\\ \\
Just look at this in the notes.


\subsection{Problem 3}
Let $k = 2$, drop $n$ balls in $B$ boxes.  The probability of box $i$ is $P_i$ with $0 < P_i < 1$, with $\sum_{i=1}^B P_i = 1$.  Fix $B$ and $P_i$.  Determine $n$ as a function of $B, P_i$ such that $P(W = 0) = 1/2$ (with error).

\subsection{Problem 4}
Consider $n$ boys and $n$ girls.  Color the boys and girls $B$ colors.  What is the chance that some boy has the same color as some girl?
\\ \\
The 3 basic problems of elementary probability are:
\begin{enumerate}
\item Birthday Problem
\item Coupon Collector's Problem
\item Matching Problem
\end{enumerate}

Coupon Collector's Problem: Drop $n$ balls into $B$ boxes with probabilities $P_i$.  What is the probability that you cover every box?  Define $X_i = 1$ if a box is empty, $0$ otherwise.  We're interested in the probability $P(W = 0)$ where $\lambda = \sum_{i=1}^B ( 1 - P_i)^n$.
\\ \\
Matching Problem: 2 decks of cards labeled $1, ..., n$ are shuffled.  Turned up two at a time.  What is the chance of a match?  Then let $X_i = 1$ if there is a match at time $i$ and $0$ otherwise.  Then $P(W = 0)$ is the chance it fails.  In this case, $\lambda = 1/n$.
\\ \\
\subsection{SIX (or Five) Problems}
1.  Let $X \sim Poi(\lambda)$.  Then $E(X - \lambda)^{2k-1} = E(X - \lambda)^{2k}$.  Turns out not to be true for 3 and 4. (So isn't really a problem).
\\ \\
2.  Birthday problem, probabilities of boxes are $P_i$.
\\ \\
3.  Tot. Var. Equalities
\\ \\
4.  Birthday problem boys and girls
\\ \\
5.  (Stein) $\lambda \ge 3$, show $|f(k) - f(k+1)| \le \frac{1}{\lambda}$.
\\ \\
6.  ``Test for clumping''
\\ \\
Last day, $|I| < \infty$, $X_i$ are zero/one.  $P_i = P(X_i = 1)$, $P_{ij} = P(X_i = X_j = 1)$.  $W = \sum_{i \in I} X_i$, $\lambda = \sum_{i \in I} P_i$.  Then $|| \mathcal{L}_W - Po_\lambda || \le \min(3, 1/\lambda) \{  \sum_{i \in I} \sum_{j \in W_i} P_{ij} + \sum_{i \in I} \sum_{j \in \mathbb{N}} P_i P_j \}$.  Have dependency graph, vertex set $I$.  We say $i \sim j \Leftrightarrow X_i, X_j$ dependent.
\\ \\
\begin{proof}
(Stein's Method).  Ingredient: Say $Z$ is an integer random variable.  Then has a Poisson $\lambda$ distribution iff for each bounded $f: \mathbb{N} \to \mathbb{R}$
$$E \{ \lambda f(z+1)- z f(z) \} = 0$$
(This is actually obvious.  Just write it all out).
\\ \\
Idea of proof:  If $Z$ has $|E( \lambda f(z+1) - z f(z) |$ small, then $\mathcal{L}_Z$ is close to $Po_\lambda$.  
\\ \\
For all $A \subset \mathbb{N}$, there exists a unique $f : \mathbb{N} \to \mathbb{R}$ with $f(0) = 0$, $f$ bounded and $\lambda f(k+1) - k f(k) = \delta_A(k) - P_\lambda(A)$.  In fact, $|f(k) \le 1.25$ and $|f(k+1) - f(k)| \le \min(3, \frac{1}{\lambda})$.
\\ \\
Application: suppose $Z$ satisfies $E(\lambda f(z+1) - z f(z) = 0$ for all bounded $f$.  Choose $f = f_A$.  We get $E(\delta_A(Z) - P_\lambda(A)) = 0$, so $P(Z \in A) = P_\lambda(A)$.
\\ \\
Use previous statement to prove theorem.
\\ \\
$$P(W \in A) - P_\lambda(A)) = E ( \lambda f(w + 1) - Wf(W))$$
$$ = \sum_{i \in I} E(P_i(f(W + 1) - X_i f(W) ) = \Delta$$
Set $W_i = W - X_i$ and $V_i = \sum_{j \in N_i} X_j$ so $V_i$ independent of $X_i$.
$$X_i f(W) = X_i f(W_i + 1)$$
$$-\Delta = \sum_i E((X_i - P_i) f(W + 1)) + P_i E(f(W_i+1) - f(W+1))$$
$$ = \sum_{i \in I} E \{ (X_i - P_i) ( f(W_i + 1) - f(V_i) ) \} + P_i E(f(W_i + 1) - f(W + 1))$$
so by previous statement
$$|f(W_i + 1) - f(W + 1) | \le \min(3, \lambda^{-1}) X_i$$

Then for all $i$, $f(W_i + 1) - f(V_i + 1)$ is a telescoping sum of terms of form $F(U + 1) - f(U + X_i + 1)$.
\\ \\
... See continuation of notes on coursework page.
\end{proof}

Note we still need to prove (**)
\begin{lemma}
For all $A$, there exists a unique $f$ such that $f(0) = 0$, $\lambda f(k+1) - k f(k) = \delta_A(k) - P_{\lambda}(A)$ where $|f(k) | \le 1/25$, $|f(k+1) - f(k) | \le \min(3, \lambda^{-1})$.
\end{lemma}
\begin{proof}
First write down $f(k)$.  $f(0) = 0$, $f(1) = \frac{ \delta_A(1) - P_\lambda(A) }{\lambda}$.  Neat way to continue is to multiply recurrence by $\frac{\lambda^k}{k!}$.  Then we get
$$\frac{\lambda^{k+1} f(k+1)}{k!} - \frac{\lambda^k f(k)}{(k-1)!} = \frac{\lambda^k}{k!} (\delta_A(k - P_\lambda(A)))$$
sum up to $k$ to get
$$f(k) = \frac{(k-1)!}{\lambda^k} \sum_{j=0}^{k-1} \frac{\lambda^j}{j!} (\delta_A(j) - P_\lambda(A))$$
... See notes on coursework page

\end{proof}

Remarks:  1.  Everything is finite explicitly.  Don't need anything to go to infinity.  Usually we write $ \mathcal{L}_W \sim Po_\lambda$.  Recall we said the strong law of large numbers is pretty empty, since it doesn't have a rate of convergence.  But if we want to be quantitative, we have to ``pay'' with $|| ... || \le $ a bit of a mess.
\\ \\
2.  The last page of the handout on the web reviews the literature.  This treatment (dependency graph approach) is often called the Chen-Stein method.  There's an extremely useful introduction to Stein's method, by Arratia-Gordon-Goldstein in Statistical Science.
\\ \\
3.  Can replace no edge if independent with no edge if ``almost'' independent.
\\ \\
4.  Very similar method works to prove CLT.  If $\{X_i\}_{i \in I} \in \mathbb{R}$ and no $X_i$ dominates the rest and they're not too dependent, then $W = \sum_{i \in I} X_i$ and $E(W) = \mu$ and $Var(W) = \sigma^2$, then $\mathcal{L}_W \sim \mathcal{N}(\mu, \sigma^2)$.
\\ \\
5.  There are two other approaches to Poisson's approximation using Stein's method.  The first is called size-biased coupling (Barbour's Method) and the second is called method of exchangable pairs.


\section{Week 8}
\subsection{Central Limit Theorem}
Heuristic: if $X_1, ..., X_n$ are random variables with no one dominates, not too dependent, then
$$P( \frac{S_n - M_n}{\sigma_n} \le x ) = \Phi(x) $$
Notes will be posted online.


\section{Week 9}
Example: of R.V. that are pairwise independent, but that the CLT fails.
\\ \\
Let $\epsilon_1, ..., \epsilon_n$ be $\pm 1$ independent with probability 1/2.  Let $I = \{ (i, j) : 1 \le i < j \le n \}$.  Let $X_{(i, j)} = \epsilon_i \epsilon_j$.
\\ \\
These are Random Variables with mean 0.  Build a triangular array with $n$-th row $X_{(i, j)}$ enumerated in some way (such that there are ${n \choose 2}$ things in the $n$-th row.  Let $S_n = \sum X_{(i, j)}$ in the $n$-th row.  Observe $E(X_i) = 0$, $Var(S_n) = {n \choose 2}$ and $X_{(i, j)}$ are pairwise independent.
\\ \\
Claim: The CLT fails for this.  Look at $\frac{1}{\sqrt{{n \choose 2}}} S_n = \frac{1}{\sqrt{{n \choose 2}}} \{ \frac{ (\sum \epsilon_i)^2 - \sum \epsilon_i^2 }{2} \}$
$$\sim \frac{1}{\sqrt{2}} \{ \sum_{i=1}^n \frac{\epsilon_i}{\sqrt{n}}^2 - 1 \} \Rightarrow \frac{1}{\sqrt{2}} (z^2 - 1)$$
where $z^2 \sim \chi_1^2$ (which is not Normal).
\\ \\
\subsection{Weak Convergence}
\begin{defn}
On $\mathbb{R}$, suppose $F_n(x), F(x)$ are distribution functions.  Say that $F_n \Rightarrow F$ (weak convergence) if $F_n(x) \to F(x)$ for every $x$ which is a continuity point of $F$.
\end{defn}

Example: Suppose that $F_n \leftrightarrow X_n = 1 - 1/n$ with probability 1/2, $1 + 1/n$ with probability 1/2.  Let $F \leftrightarrow X = 1$.  Then $F_n(1) = 1/2 \not\to F(1) = 1$, but $F_n(x) \to F(x)$ everywhere else.

\begin{defn}
We say $X_n \to_p X$ that $X_n$ converges in probability to $X$ if for all $\epsilon > 0$, $P \{ | X_n - X | > \epsilon \} \to 0$.  Note for this, we need $X_n$ and $X$ to be jointly defined.
\end{defn}

\begin{theorem}
Let $X_n$, $X$ jointly defined.  Then $X_n \to X$ almost sure $\Rightarrow X_n \to_p X \Rightarrow (X_n \Rightarrow X)$.  All converses are false.
\end{theorem}
\begin{proof}
If $X_n \to X$ almost sure, then $P(|X_n - X| > \epsilon) \to 0$.  Say $X_n \to_P X$.  Then $\forall x \in \mathbb{R}$, $P \{ X \le x - \epsilon \} - P \{ |X - X_n| \ge \epsilon \} \le P \{X_n \le x \} \le P \{X \le x + \epsilon \} - P \{ |X - X_n | \ge \epsilon \}$.
\end{proof}

\begin{theorem}
(Slutzky Theorem) Let $(X_n, Y_n)$ be jointly defined.  Let $X_n \Rightarrow X$ and $X_n - Y_n \to_p 0$.  Then $Y_n \Rightarrow X$.
\end{theorem}
\begin{proof}
Take any $x \in \mathbb{R}$.  Choose $y' < x < y''$ are continuity points of $F_x(.)$.  Then, given this choice, $\epsilon > 0$ such that $y' < x - \epsilon < x < x + \epsilon < y''$.  Then $P \{ X_n \le y'  \} - P \{ |X_n - Y_n| \ge \epsilon \} \le P \{ Y_n \le x \} \le P \{ X_n \le y'' \} + P \{ |X_n - Y_n | \ge \epsilon \}$.  Taking $n \to \infty$ gives $P \{X \le y' \} \le \lim\ \inf P \{ Y_n \le x \} \le \lim \sup P \{Y_n \le x \} \le P \{X \le y' \}$.
\\ \\
Now, let $x$ be a point of continuity of $F_x$.  Then squeeze in $y'$ and $y''$.  Then we're done.
\end{proof}

\subsection{Fourier Transforms and Weak Convergence}
Recall: $F(x)$ is the distribution function of Random Variable $X$.  Let $F(x)$ be uniform on $[0,1]$.  Suppose $F(x)$ is continuous.  Then $P(F(x) \le x) = P(X \le F^{-1}(x)) = F(F^{-1}(x)) = x$.
\\ \\
Also, if $F(x)$ is any distribution function, then $X = F^{-1}(U)$ has distribution function $F(x)$.
\\ \\
So what happens if $F$ is not strictly monotone or is not continuous?  Define $F^{-1}(u) = \inf \{ x : u \le F(x) \}$.  Then the previous statements hold generally.

\begin{theorem}
(Skorhead) Suppose on $\mathbb{R}$, $F_n$, $F$ are distribution functions with $F_n \Rightarrow F$.  Then for some $(\Omega, \mathcal{F}, P)$ and random variables $Y_n$, $Y$ $P\{Y_n \le x \} = F_n(x)$ and $P(Y \le x) = F(x)$ and $Y_n(\omega) \to Y(\omega)$ for each $\omega$.
\end{theorem}
\begin{proof}
Let $\Omega = (0, 1]$.  Set $Y_n(u) = F^{-1}(u)$ where $F^{-1}$ is defined as above.  Same for $Y(u)$.
\\ \\
Pick $u \in (0, 1)$.  Choose $\epsilon > 0$ and $x : Y(u) - \epsilon < x < Y(u)$ and $F \{ x \} = 0$.  Must have $u > F(x)$ so $u > F_n(x)$ for all $n$ sufficiently large, so $x < Y_n(u)$ for $n$ sufficiently large.  $Y(u) - \epsilon < Y_n(u) \Rightarrow Y(u) \le \lim \inf Y_n(u)$.  SImilarly, $u < u'$ so we get
$$\lim \sup Y_n(u) \le Y(u')$$
if $u$ is a point of continuity of $F(x)$.
\\ \\
We get $\lim \sup Y_n(u) \le Y(u)$ at only countably many points of discontinuity.  Set $Y_n'(u) = Y'(u) = 0$ at these: $Y' - Y$.  Else $P(Y(u) \le x) = F(x)$ and $Y_n'(u) \to Y'(u)$ for all $u$.
\end{proof}
Comment: If $P_n$, $P$ are probabilities on a complete metric space, $P_n \Rightarrow P$ if $\int_\Omega f(\omega) P_n(d \omega) \to \int_\Omega f(\omega) P(d \omega)$ for all bounded, continuous $f$.  Skorohead says $P_n \Rightarrow P \Leftrightarrow \exists Y_n, Y$ with $P(Y_n \in B) = P_n(B)$.
\\ \\
Corrolary: on $\mathbb{R}$ if $F_n \Rightarrow F$ then for all bounded and continuous $f$, $\int_{-\infty}^\infty f(x) F_n(dx) \rightarrow \int_{-\infty}^\infty f(x) F(dx)$.
\\ \\
Corrolary if $F_n \Rightarrow F$ then $d_n(t) = \int_{-\infty}^\infty e^{itx} F_n(dx) \to d(t)$.  
\begin{theorem}
 (Continuity Theorem) $F_n \Rightarrow F \Leftrightarrow d_n(t) \rightarrow d(t)$ for all $t \in (-\infty, \infty)$.  That is, if the Fourier transforms converge, the distribution functions do as well.
\end{theorem}
\begin{proof}
$\Leftarrow$\\
If $\{F_n\}$ is tight if for all $\epsilon$, there is some $R$ such that $F_n \{ |x| > R \} < \epsilon$ for all $n$.  Consider $u > 0$.
$$\frac{1}{2u} \int_{-u}^u (1 - d_n(t)) dt = \int_{-\infty}^\infty \frac{1}{2u} \int_{-u}^u (1 - e^{itx}) dt F_n(dx)$$
$$ = \int_{-infty}^\infty (1 - \frac{\cos(ux)}{xu}) F_n(dx) $$
$$ \ge F_n \{ |x| > 2/u \} $$
$d(t)$ is continuous at 0, $d(0) = 1$ so for $u$ small, $|1 - d(t) | < \epsilon$ for all $t$.
\\ \\
To show $F_n(x) \to F(x)$ for all $x$ continuity points of $F$, it is enough to show that $F_n \{ (A, x] \} \rightarrow F \{ (A, x] \}$ for all $A$ sufficiently small.  For $A, x$ take $R$ large such that $|A||x| < R$, $F_n \{ |x| > R \} < \epsilon$.  Can consider on $\delta_{(A, x]}$.  Now
$$\int_{-R}^R f dF_n = \int_{-R}^R (f - g) d F_n + \int_{-R}^R g dF_n$$
$$= \int_{-\infty}^\infty g(t) F_n (dt) + 3 \epsilon + \epsilon$$
...
...
...
\end{proof}

\begin{theorem}
(Uniqueness Theorem)
If $F$ and $G$ are distribution functions and $d_F(t) = d_G(t)$ then $F = G$.
\end{theorem}
\begin{proof}
Approximate $\delta_{(A, X]}$ by trigonometric polynomials.  This proves $F(A, x] = G(A, x]$ for points of continuity of both.  Intervals of this kind form a $\Pi$ system.  This generates the Borel sets.
\end{proof}If $d(t)$ is uniformly continuous at all $t$.  Then $|d(t + h) - d(t) | \to 0$ as $h \to 0$.

Comment: there is an inversion theorem which writes $F(x)$ as an integral in terms of $d_F$.  Can be useful, but isn't needed to prove continuity theorem.

\section{Week 10}
On $I$ interval on $\mathbb{R}$ (maybe $I = [-\infty, \infty]$), let $F_{n, \theta}$ be distribution functions with $E_{n, \theta}(X) = \theta$, $Var(X) = \sigma_n^2(\theta)$.
\begin{theorem}
Say $U$ is a continuous function on $I$.  Then $|E_{n, \theta} U(X) - U(\theta) | \to 0$ as $n \to \infty$.  For each $\theta$, convergence is uniform in any $\theta$ interval when $\sigma_n^2(\theta) \to 0$ is uniform.
\end{theorem}

Corrolary: Bernstein polynomial: let $X = \frac{S_n}{n}$ where $S_n \sim Bin(n, \theta)$.
$$E_{n, \theta} (U(X)) = \sum_{j=0}^n U(\frac{j}{n}) {n \choose j} \theta^j (1 - \theta)^{n-j}$$

Last Theorem of 310A:

\begin{theorem}
(Helly Selection) $F_n(x)$ for $1 \le n < \infty$ any sequence of distribution functions.  Then $\exists n_k$ and $H(x)$ increasing with $0 \le H \le 1$ right continuous, there exists for each $x$ a continuity point of $H$ $F_{n_k}(x) \to H(x)$.
\end{theorem}
\begin{proof}
Let $\{r_i\}_{i=1}^\infty$ be an enumeration of $\mathbb{Q} \subset \mathbb{R}$.  Then there exists some subsequence such that $F_{n_k}(r_i) \to G(r_i)$ for all $r \in \mathbb{Q}$.  Define $H(x) = \inf_{r > x} G(r)$.  Can check that $H(x)$ is weakly increasing, right-continuous, and equal to $G$ on $\mathbb{Q}$.  Keep going and you'll find that $F_{n_k} \to H(x)$ at continuity points (again, obvious -- make both the continuity and the convergence less than $1/2\epsilon$).
\end{proof}
Warning: $H(x)$ need not be a distribution function.  It might be $1$ everywhere or $0$ everywhere.  Other than that, $H(x)$ will actually be a distribution function (surprise, surprise, given that it meets all the other criteria).
\\ \\
Therefore, we can make the following statments:
\begin{theorem}
If $\{F_n\}_{n=1}^\infty$ is tight, then there exists a distribution function $F$ and some $n_k$ with $F_{n_k} \Rightarrow F$.
\end{theorem}

\begin{theorem}
$\{F_n\}$ is tight $\Leftrightarrow$ for all $n_k$, there exists a $n_{k_i}$ and a distribution function $F$ such that $F_{n_{k_i}} \Rightarrow F$ (called ``Weak Subsequence Compactness'').  {\bf Please read Theorem 25.10.}
\end{theorem}

This should feel quite similar to the definition of compactness, because it basically is.


% This is the Bibliography
%%%%%%%%%%%%%%%
\newpage
\begin{thebibliography}{99}

\end{thebibliography}

\end{document}