-
Notifications
You must be signed in to change notification settings - Fork 1
/
notes.tex
executable file
·1063 lines (940 loc) · 75.7 KB
/
notes.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{article}
\usepackage{e-jc}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{algorithmic}
\usepackage{fancyhdr}
\usepackage{hyperref}
\hypersetup {
colorlinks=true,
linkcolor=black,
citecolor=blue,
pdftitle={Math 230A Notes}
pdfauthor={Churchill}
pdfsubject={Notes from Math230A/Stat310A Probability Theory}
pdfkeywords={Probability}{Notes}
}
\usepackage{graphicx}
\usepackage{wrapfig}
\usepackage{url}
\long\def\symbolfootnote[#1]#2{\begingroup%
\def\thefootnote{\fnsymbol{footnote}}\footnote[#1]{#2}\endgroup}
\newtheorem{lemma}{Lemma}
\newtheorem{theorem}{Theorem}
\newtheorem{defn}{Definition}
\newtheorem{corr}{Corollary}
\title{Math 230A / Stat 310A -- Probability Theory -- Notes}
\author{
Alex Churchill\\
\small \texttt{[email protected]}
}
\date{Autumn, 2011}
\begin{document}
\maketitle
\thispagestyle{empty} % ignore page number on first page
\tableofcontents
\newpage
\setcounter{page}{1} % set page number back to 1
\section{Course Information}
Prof. Persi Diaconis, Sequoia 131, 725-1965 (no email) \\
Office Hours: Wednesday 1:30 - 3:00
\\ \\
TA. Anirban, Sequoia 208 ([email protected]) \\
Office Hours: Friday 10-12
\\ \\
TA. Sumit, Sequoia 237 ([email protected]) \\
Office Hours: Monday 2-4
\\ \\
Text: P. Billingsley, \underline{Probability and Measure} 3rd Ed. (On reserve at Math Library).
\\ \\
Grading: Homework (30\%), Midterm (30\%), Final (40\%)
\\ \\
Midterm: Thursday Nov. 3, in class; 5:30-8:00 pm; One $3 \times 5$ notecard allowed.
\\ \\
Final: Thursday, Dec. 15, 7-10 pm. Room 380Y. Guesses: we will profit from knowing 1. The Four T's Proof, 2. The proof of Lindeberg condition, 3. How to do the general birthday problem, 4. Something from measure theory, 5. Something about weak convergence. Additional things to read up on: Stein's equation and the motivation for Stein's method, dependency graphs, Bernstein polynomial, Weierstrass approximation using weak law. Take a look at characteristic functions and brush up on your complex.
\\ \\
{\bf Oh, and for God's sake... READ THEOREM 25.10!!!!!}
\\ \\
Halloween Talk on Non-Measurable Sets: Monday (Oct 31) 5:30-6:30 in Pigott Hall (260-113).
\subsection{Homeworks}
{\bf HW WEEK 2: READ Sec 3, 4. Do problems 3.2(a,b); 3.3(a,b,c,d); 3.11; 3.13; 3.16; 4.11}
{\bf HW WEEK 3: READ Sec 10, 11, 14. Do problems 10.1, 10.2, 14.5, 14.8.}
Problem: Let $F_1, F_2$ be distribution functions on $\mathbb{R}$. Define $H_l(x,y) = (F_1(x) + F_2(y) - 1)_+$ where $(x)_+ = x$ if $x \ge 0$ and $0$ otherwise. Define $H_u(x,y) = \min(F_1(x), F_2(y))$. (a) Prove that $H_l$, $H_u$ are bivariate distribution functions with margins $F_1(x) = H_l(x, \infty) = H_u(x, \infty)$ and $F_2(y) = H_l(\infty, y) = H_u(\infty, y)$. (b) Prove for all $H(x,y)$, $F_1, F_2$ as margins, $H_l(x,y) \le H(x,y) \le H_u(x,y)$ for $-\infty < x,y < \infty$.
\\ \\
{\bf HW WEEK 4: READ Sec 15, 16. Do Problems 15.1, 15.2, 16.1, 16.7 + PROBLEM (see it down there in the notes)}
\\ \\
{\bf HW WEEK 5: Read Sec 18, Do: 2, 4, 10, 13, 14}
\\ \\
{\bf HW WEEK 6: Read Sec 20, 21, 22. Do 20.21, 20.24, 20.25(a, b, d), 21.11, 21.15, 22.2, 22.3}
\\ \\
{\bf HW WEEK 8: Read Sec. 27. Do 27.3, 27.4, 27.7, 27.10, 27.11}
\\ \\
{\bf HW WEEK 9: Read Sec. 25, 26. Do 25.1, 25.3, 25.16, 26.15, 26.16, 26.17}
\section{Week 1}
This week covers material from sections 1 and 2 in the book.
\subsection{Introduction}
We start by posing a simple probability problem: how many people must be in a room for even odds that two people will have the same birthday?
\\ \\
I wasn't in class for this derivation, so I'm not sure exactly how it happened, but it is easy enough to estimate using the Poisson distribution's approximation of the binomial distribution (though note I'm fairly sure this wasn't the approximation used in class).
\\ \\
Recall that the Binomial Distribution determines the number of successes of $n$ experiments drawn with probability $p$. In this case, given $N$ people in the room, the number of total birthday pairs is ${N \choose 2}$. For each pair, the probability the two birthdays are on the same day is $1/365$. Therefore, to apply the Poisson approximation, the expected number of birthday pairs is $\lambda = {N \choose 2}/365$, so we estimate the probability of $k$ successful trials to be $\frac{\lambda^k e^{-\lambda}}{k!}$. Therefore, the probability there are no successful trials is approximately $e^{-{N \choose 2}/365}$, which turns out to be slightly less than $0.5$ for $N = 23$.
\\ \\
{\it Variations:}\\ \\
Let the probability a person is born on day $i$ be $\theta_i$ (in the usual case, $\theta_i = 1/365$) where $\sum \theta_i = 1$.
\\ \\
How many people for even odds of $j$ matching birthdays in $N$ days? \\
Answer: In general, $k = \{ N^{j-1} ln \frac{1}{1-p} \}^{1/j}$.\\
Cash \$10 to whomever proves this.
\\ \\
\subsection{Coin Tossing}
We introduce a model for fair coin tossing:
\\ \\
Let $\Sigma$ be the interval $(0, 1]$. For $0 < a \le b \le 1$, define $P([a,b]) = b-a$.
\\ \\
If $I_1, ..., I_k$ are disjoint intervals, define $P(\cup_{i=1}^k I_k) = \sum_{i=1}^k |I_k|$.
\\ \\
Write $\omega \in [0,1]$ as binary: $\omega = \sum_{i=1}^\infty \frac{d_i(\omega)}{2^i}$.
\\ \\
$d_i: (0,1] \rightarrow \{0, 1\}$ where $d_i$ is constructed by breaking $(0, 1]$ into $i$ equally-sized intervals and assigning $d_i$ to 0, 1, 0, 1, ... among those intervals (not a formal definition!).
\\ \\
Then $P(\{ \omega: d_i(\omega) = 1\}) = 1/2$, so we say $P\{d_i = 1\} = 1/2$.
\\ \\
$P(d_1 = d_2 = 1) = 1/4$, and more generally, $P(d_1 = e_1, ..., d_k = e_k) = 1/2^k$ for $e_i \in \{0, 1\}$.
\subsection{Strong Law (Coin Tossing)}
\begin{lemma}
(Markov's Inequality): Given $f:(0, 1] \rightarrow [0, \infty)$, $P\{\omega: f(\omega) \ge a\} \le \frac{ \int_0^1 f(\omega) d \omega }{a}$.
\end{lemma}
\begin{proof}
Note that we break the integral into two pieces: $\int_{\{ \omega : f(\omega) \ge a \}} f(\omega) d\omega +\int_{\{ \omega : f(\omega) \ge a \}} f(\omega) d\omega$ at which point the proof simply notices that the second integral must be positive and the first is at least $a P\{ \omega : f(\omega) \ge a \}$
\end{proof}
\begin{theorem}
(Weak Law of Large Numbers for Coin Tossing): For all $\epsilon > 0$, $P \{ | \frac{1}{n} \sum d_i - 1/2 | > \epsilon \} \rightarrow 0$ as $n \rightarrow \infty$.
\end{theorem}
\begin{proof}
Note it is easier to work with $r_i(\omega) = 2 \cdot d_i(\omega) - 1$; it is enough to show $P\{ | \frac{1}{n} \sum_{i=1}^n r_i | > \epsilon \} \rightarrow 0$.
\\ \\
Evaluate $\int_0^1 r_i(\omega) d\omega = 0$ and $\int_0^1 r_i(\omega) r_j(\omega) d\omega = 1$ if $i = j$ and $0$ otherwise.
\\ \\
Hence, $\int_0^1 (\sum_{i=1}^n r_i(\omega) )^2 d\omega = n$.
\\ \\
Finally, note $P \{ | \frac{1}{n} \sum_{i=1}^n r_i | > \epsilon \} = P \{ | \sum_{i=1}^n r_i |^2 > n^2 \cdot \epsilon^2 \} \le \frac{1}{n \epsilon^2} \rightarrow 0$ by Markov's inequality.
\end{proof}
Now, we would like to say
$$\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n d_i(\omega) = \frac{1}{2}$$
but we can't:
$$ \omega = 0.00111100000000111111111111111100...$$
So instead...
\begin{theorem}
(Strong Law of Large Numbers for Coin Tossing): $\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^n d_i(\omega) = \frac{1}{2}$ except for $\omega$ in a negligible set (later: true for $\omega$ almost everywhere).
\end{theorem}
\begin{proof}
Let $r_i = 2d_i - 1$. Show $\frac{1}{n} \sum_{i=1}^n r_i \rightarrow 0$.
\\ \\
Last time, $P\{ | \frac{s_n}{n} | > \epsilon \} = P \{ s_n^4 \ge (\epsilon n)^4 \}$.
\\ \\
By Markov's inequality,
$$\le \frac{ \int_0^1 |s_n \omega|^4 d \omega }{\epsilon^4 n^4} \le \frac{3}{\epsilon^4 n^2}$$
We show $\int |s_n|^4 \le 3n^2$.
\\ \\
Choose $\epsilon_n \to 0$ so $\sum_{n=1}^\infty \frac{1}{n^2 \epsilon_n^4} < \infty$.
e.g. let $\epsilon_n = \frac{1}{n^{1/5}}$, let $B = \{ \omega: lim \textrm{ } exists \}$, $A_n = \{ \omega: | \frac{s_n}{n} | \ge \epsilon_n \}$.
Then if $\omega \in \cap_{n=m}^\infty A_n^C$, $\omega$ in $B$:
$$\cap_{n=m}^\infty A_n^C \subset B \textrm{ on } B^C \subset \cup_{n=m}^\infty A_n$$
$A_n = \cup_{k=1}^{k(n)} I_{nk}$ so $|A_n| \le \frac{1}{n^2 \epsilon_n^4}$\\
$b^C \subset \cup_{n=1, k=1}^\infty I_{nk}$ so $\sum_{k=1, n=m}^\infty |I_{n,k}| < \epsilon$.
\\ \\
\end{proof}
Remarks: Borel proved this first, and was studying the number theory problem:\\ \\
For $\omega \in (0, 1]$, the proportion of the first $n$ binary digits tends to $1/2$; same for all bases simultaneously.
\\ \\
Difference between weak and strong laws: weak law says gets close to 1/2, strong law says for almost everywhere, gets close and stays there.
\\ \\
Notice that $B = \{ \omega : \lim \frac{s_n}{n} = 0 \} = \cap_{k=1}^\infty \cup_{m=1}^\infty \cap_{n=m}^\infty \{ \omega: |\frac{s_n}{n} | < \frac{1}{k} \}$ is a complicated set.
\\ \\
\subsection{Fields and $\sigma$-Algebras}
\begin{defn}
Let $\Omega$ be any set, and $\mathcal{F}_0$ a collection of subsets is a {\bf Field} if:
\begin{enumerate}
\item $\emptyset, \Omega \in \mathcal{F}_0$
\item $A \in \mathcal{F}_0 \Rightarrow A^C \in \mathcal{F}_0$
\item $A_1, ..., A_n \in \mathcal{F}_0 \Rightarrow \cup_{i=1}^n A_i \in \mathcal{F}_0$
\end{enumerate}
\end{defn}
\begin{defn}
Let $\Omega$ be any set, and $\mathcal{F}_0$ a collection of subsets is a {\bf $\sigma$-Algebra} if:
\begin{enumerate}
\item $\emptyset, \Omega \in \mathcal{F}_0$
\item $A \in \mathcal{F}_0 \Rightarrow A^C \in \mathcal{F}_0$
\item $A_1, ... \in \mathcal{F}_0 \Rightarrow \cup_{i=1}^\infty A_i \in \mathcal{F}_0$
\end{enumerate}
\end{defn}
That is, a $\sigma$-Field is a Field closed under countable unions.
\\ \\
Example: The Borel sets are the minimal $\sigma$-algebra containing all open intervals $I \in [-\infty, \infty]$. Note that there is no easy description of Borel sets and constructing them requires transfinite operations. But it doesn't matter because Borel sets will just work the way we want them to anyway.
\\ \\
\begin{defn}
$(\Sigma, \mathcal{F})$ be a (set, $\sigma$-algebra). A function $P$ is a {\bf probability} if
\begin{enumerate}
\item $P(\emptyset) = 0$, $P(\Omega) = 1$
\item $P(A) = 1 - P(A^C)$ for all $A \in \mathcal{F}$
\item $A_i \in \mathcal{F}$ for all $1 \le i < \infty$, $P(\cup_1^\infty A_i) = \sum_1^\infty P(A_i)$ for all $A_i$ disjoint.
\end{enumerate}
\end{defn}
In other words, $P$ is just a measure that takes on value $1$ for $\Omega$.
\\ \\
Task: Say $\mathcal{F}_0$ field of subsets. Given $P$ on $\mathcal{F}_0$, want to extend $P$ to $\sigma(\mathcal{F}_0)$. (We follow the Greeks)
\\ \\
Define: For al $A \in \Sigma$ $P^*(A) = \inf \sum_{i=1}^\infty P(A_i)$ where $A \subset \cup_{i=1}^\infty A_i$.
\\ \\
In this case, $P^*$ is an outer measure (and for our interval definition, is in fact the Lebesgue outer measure).
\\ \\
Some obvious facts: (1) $P^*(\emptyset) = 0$. (2) Similarly, $P^*(\Omega) = 1$. (3) If $A \subset B$, then $P^*(A) \le P^*(B)$. (4) Given $\{A_i\}$ any sets in $\Omega$, $P^*(\cup A_i) \le \sum_{i=1}^\infty P^*(A_i)$ (subadditivity).
\\ \\
Proof of (4): \\
Fix $\epsilon > 0$, choose $B_{ik} \in \mathcal{F}_0$ : $\cup B_{ik} \supset A_i$ and $\sum_{i=1}^\infty P(B_{ik}) \le P^*(A_i) + \epsilon/2^i$.
\\ \\
Then $\cup_1^\infty A_i \subset \cup_{i.k} B_{ik}$ so $P^*(\cup A_i) \le \sum_1^\infty (P^*(A_i) + \epsilon / 2^i) = \sum_1^\infty P^*(A_i) + \epsilon$. QED.
\\ \\
What have we learned since the Greeks? We can't assign a length to all subsets of (0,1]. We need a clever way of approximating (or maybe just a way to do some rigorous math) so outer measures $(P^*)$ work there.
\\ \\
Idea: (Caratheodory): \\
Given $\Omega$, $\mathcal{F}_0$, $P$ on $\mathcal{F}_0$, $A$ probability, define $P^*$ as above.
\\ \\
Let $M = \{ A : \forall E, P^*(E) = P^*(A \cap E) + P^*(A^C \cap E) \}$. (Collection of all measurable sets).
\\ \\
We show: (1) $M$ is a $\sigma$-algebra containing $\mathcal{F}_0$, (2) $P^*$ is countably additive on $M$ (in other words, $P^*$ is a measure on $M$), (3) $P^*$ agrees with $P$ for sets in $\mathcal{F}_0$, (4) $P^*$ is unique on $M$ given (1-3).
\\ \\
Note that $M = \{ A : \forall E, P^*(E) \ge P^*(A \cap E) + P^*(A^C \cap E) \}$ by subadditivity.
\\ \\
Proof that $M$ is a field: Clearly contains $\emptyset$ and $\Omega$. Because of symmetry, if $A \in M$, $A^C$ in $M$. Now, say $A$, $B$ in $M$. Then $P^*(E) = P^*(B \cap E) + p^*(B^C \cap E) = P^*(A \cap B \cap E) + P^*(A^C \cap B \cap E) + P^*(A \cap B^C \cap E) + P^*(A^C \cap B^C \cap E) \ge P^*(A \cap B \cap E) + P^*((A^C \cap B \cap E) \cup (A \cap B^C \cap E) \cup (A^C \cap B^C \cap E)) = P^*((A \cap B) \cap E) + P^*((A \cap B)^C \cap E)$.
\section{Week 2}
This week covers material from sections 3 and 4 in the book.
\subsection{$\sigma$-algebras}
Given $\Omega$ and $\mathcal{F}_0$ a field of subsets of $\Omega$. $P$ is a probability given on $\mathcal{F}_0$. We want to extend to $P^*$, the outer measure generated by $P$; for any set $A \subset \Omega$
$$P^*(A) = \inf \sum_{i=1}^\infty P(A_i)$$
where $A_i \in \mathcal{F}_0$ such that $A \subset \cup A_i$. (Note this is just a normalized Lebesgue outer measure).
\\ \\
Easy to show that
\begin{enumerate}
\item $P^*(\emptyset) = 0$
\item $P^*(\Omega) = 1$
\item $A \subset B \Rightarrow P^*(A) \le P^*(B)$
\item $P^*$ is countably subadditive.
\end{enumerate}
Let $\mathcal{M} = \{ A \subset \Omega : \forall E \subset \Omega, P^*(E) = P^*(E \cap A) + P^*(E \cap A^C) \}$. These are measurable sets under $P$.
\\ \\
Just as a heads-up (we'll prove this later), the Caratheadory theorm says:
\begin{enumerate}
\item $\mathcal{M}$ is a $\sigma$-algebra containing $\mathcal{F}_0$
\item $P^*$ is a probability on $\mathcal{M}$
\item $P^*(A) = P(A)$ for $A \in \mathcal{F}_0$
\item $P^*$ is the unique such extension
\end{enumerate}
Last time, we proved $\mathcal{M}$ is a field. We want to prove $\mathcal{M}$ is a $\sigma$-field.
\\ \\
{\bf Fact:} If $\{ A_i \}_{n=1}^\infty \in \mathcal{M}, E \subset \Omega$, $A_i$ disjoint,
$$P^*(E \cap (\cup_{i=1}^\infty A_i)) = \sum_{i=1}^\infty P^*(E \cap A_i)$$
{\bf Proof} by induction. If $n = 1$ we're OK. If $n = 2$, we have:
$$P^*(E \cap (A_1 \cup A_2)) = P^*(E \cap (A_1 \cup A_2) \cap A_1) + P^*(E \cap (A_1 \cup A_2) \cap A_2)$$
$$ = P^*(E \cap A_1) + P^*(E \cap A_2)$$
Same for all finite $n$. In general, $P^*(E \cap (\cup_{i=1}^
\infty A_i)) \ge P^*(E \cap (\cup_{i=1}^n A_i)) = \sum_{i=1}^n P^*(E \cap A_i)$. Taking the limit as $n \to \infty$, we get $P^*(E \cap (\cup_{i=1}^\infty A_i)) \ge \sum_{i=1}^\infty P^*(E \cap A_i)$. The other direction follows by subadditivity.
\\ \\
{\bf Fact:} $\mathcal{M}$ is a $\sigma$-algebra and $P^*$ is a probability on $\mathcal{M}$.
\\ \\
{\bf Proof:}
Given $A_n \in \mathcal{M}$ for $1 \le n < \infty$ where $A_i'$ is defined by $A_1' = A_1$, $A_2' = A_2 \cap A_1^C$, $A_3' = A_3 \cap A_1^C \cap A_2^C$, etc. Therefore, $A_i'$ disjoint for all $i$, but $\cup_{i=1}^\infty A_i = \cup_{i=1}^\infty A_i'$.
\\ \\
So without loss of generality, we can say all $A_i$ are disjoint.
\\ \\
Want $P^*(E) \ge P^*(E \cap (\cup A_i)) + P^*(E \cap (\cup A_i)^C$.
\\ \\
Set $F_n = \cup_{i=1}^n A_i$. Then
$$P^*(E) = P^*(E \cap F_n) + P^*(E \cap F_n^C)$$
$$ \ge \sum_{i=1}^n P^*(E \cap A_i) + P^*(E \cap (\cup_{i=1}^\infty A_i)^C)$$
so as $n \to \infty$
$$P^*(E) \ge \sum_{i=1}^\infty P^*(E \cap A_i)$$
$$ \ge P^*(E \cap (\cup_{i=1}^\infty A_i)) + P^*(E \cap (\cup_{i=1}^\infty A_i)^C)$$
The reverse inequality follows directly from $A \in \mathcal{M}$ and $P^*$ countably additive on $\mathcal{M}$.
\\ \\
Note: $\mathcal{F}_0 \subset \mathcal{M}$, pick $A \in \mathcal{F}_0$ and $E \subset \Omega$.
\\ \\
From the definition of $P^*(E)$, for all $\epsilon > 0$, there exist $A_1, A_2, ...$ with $A_n \in \mathcal{F}_0$ where $E \subset \cup_{i=1}^\infty A_i$ and $\sum_{i=1}^\infty P(A_i) \le P^*(E) + \epsilon$.
\\ \\
Let $B_n = A_n \cap A$, $C_n = A_n \cap A^C$. $E \cap A \subset \cup_{i=1}^n B_n$, $E \cap A^C \subset \cup C_n$.
$$P^*(E \cap A) + P^*(E \cap A^C) \le \sum P(B_n) + \sum P(C_n) = \sum P(A_n) \le P^*(E) + \epsilon$$
Letting $\epsilon \to 0$ gives us what we want.
\\ \\
Further, we note $P^*(A) = P(A)$ if $A \in \mathcal{F}_0$. To show this, we know $P^*(A) \le P(A)$. If $A \subset \cup A_i$, where $A_i \in \mathcal{F}_0$, $P(A) \le \sum P(A \cap A_i) \le \sum P(A_i)$ so $P^*(A) \ge P(A)$.
\\ \\
For Uniqueness, we need to define a $\Pi$ system.
\begin{defn}
A class of subsets $\mathcal{P}$ is a $\Pi$-system if it is closed under finite intersection.
\end{defn}
\begin{defn}
A set $\mathcal{L}$ of subsets is called a $\lambda$-system if
\begin{enumerate}
\item $\Omega \in \mathcal{L}$
\item $A \in \mathcal{L} \Rightarrow A^C \in \mathcal{L}$
\item $A_1, A_2, ... \in \mathcal{L}$, with all $A_i$ disjoint guarantees $\cup_{i=1}^\infty A_i \in \mathcal{L}$.
\end{enumerate}
\end{defn}
\begin{theorem}
(Dynkin's $\Pi$-theorem) If $\mathcal{P}$ is a $\Pi$-system and a $\mathcal{L}$ is a $\lambda$-system, $P \subset \mathcal{L}$, then $\sigma(\mathcal{P}) \subset \mathcal{L}$
\end{theorem}
\begin{proof}
If $A, B \in \mathcal{L}$, $A \subset B$, then $B \backslash A = B \cap A^C \in \mathcal{L}$. Proof: $A \cup B^C \in \mathcal{L}$, so $B \cap A^C \in \mathcal{L}$.
\\ \\
Because the intersection of $\lambda$-systems is a $\lambda$-system, there is a smallest $\lambda$-system containing $\mathcal{P}$, call it $\mathcal{L}_0$. We show that $\mathcal{L}_0$ is a $\Pi$-system. Then we're done for $\mathcal{L}_0$, since it is a $\sigma$-algebra containing $\mathcal{P}$, so $\mathcal{L}$ is a $\sigma$-algebra containing $\mathcal{P}$.
\\ \\
Let $A \in \mathcal{L}_0$. Let $\mathcal{L}_A = \{ E \subset \Sigma: E \cap A \in \mathcal{L}_0 \}$. We claim $\mathcal{L}_A$ is a $\lambda$-system.
\end{proof}
\begin{theorem}
Corrolary: If $u$ and $v$ are probabilities that agree on the $\Pi$-system $\mathcal{P}$, then they agree on $\sigma(\mathcal{P})$
\end{theorem}
\begin{proof}
Let $\mathcal{L}$ be all subsets of $A$ where $m(A) = v(A)$. This is a $\lambda$-system, so $m(A) = v(A)$ for all $A \in \sigma(\mathcal{P})$. If $B \in \mathcal{L}_A$, $A \cap B \in \mathcal{L}_0$, $A \cap (A \cap B)^C = A \cap B^C \in \mathcal{L}_0$ so $B^C \in A$.
\\ \\
If $\{B_i\}$ disjoint in $\mathcal{L}_A$ then $A \cap (\cup B_i) = \cup(A \cap B_i)$ so $\cup B_i \in \mathcal{L}_A$.
\\ \\
Say $A, B \in \mathcal{P}$ so $A \cap B \in \mathcal{P}$ so $A \in \mathcal{L}_B$. But $\mathcal{L}_B$ is a $\lambda$-system, so $\mathcal{L}_0 \subset \mathcal{L}_B$; e.g. for every $A \in \mathcal{L}_0$, $B \in \mathcal{L}_A$, so $\mathcal{L}_0 \subset \mathcal{L}_A$ for all $A \in \mathcal{L}_0$.
\\ \\
So if $B, C \in \mathcal{L}_0$ then $C \in \mathcal{L}_A$; e.g. $C \cap A \in \mathcal{L}_0$. So $\mathcal{L}_0$ is closed under finite intersections. So $\mathcal{L}_0$ is a $\Pi$-system.
\end{proof}
Therefore, if $P$ is a probability on a field $\mathcal{F}_0 \le 2^\Omega$, then extension $P^*$ to $\sigma( \mathcal{F}_0)$ is unique.
\\ \\
Note we assumed $P$ was a probability on $\mathcal{F}_0$; that is we assume $A$, $A_1, A_2, ...\in \mathcal{F}_0$ with $A = \cup A_i$ and all $A_i$ disjoint then $P(A) = \sum P(A_i)$. This check is performed in the book.
\subsection{Extensions}
YEAH THERE NEEDS TO BE SOME STUFF FILLED IN HERE... (Wk 2., Day 2).
\section{Week 3}
This week covers material from sections 10 - 12 and 14 in the book.
\subsection{$\infty$ measures}
\begin{defn}
$\Omega$ any set, $\mathcal{F}$ is a field of subsets of $\Omega$ ($\emptyset \in \mathcal{F}$, $\mathcal{F}$ closed under finite intersections and complementation). $\mu: \mathcal{F} \rightarrow [0, \infty]$, $\mu(\emptyset) = 0$, $\mu(\cup_1^\infty A_i) = \sum \mu(A_i)$ is a measure on $\mathcal{F}$. In other words, a measure is 0 on the emptyset, nonnegative, and countably additive.
\end{defn}
If $\mu(\Omega) < \infty$, it is the same as a probability. If $\exists A_n \in \mathcal{F}, \Omega = \cup_{n=1}^\infty A_n, \mu(A_n) < \infty$ for all $n$, $\mu$ is $\sigma$-finite.
\\ \\
Example: $\Omega = \mathbb{N}$, $\mu(i) = 1$, $\mu$ is $\sigma$-finite. Similarly, $\lambda$ on $\mathbb{R}$ is $\sigma$-finite (cover $[0,1]$, $[1,2]$, ...). However, if $\Omega = [0,1]$ and $\mu(A) = $ the number of points in $A$, $\mu$ is not $\sigma$-finite.
\\ \\
Why do we want to talk about this?
\begin{enumerate}
\item Probability densities on $\mathbb{R}$; for example, $\frac{e^{-x^2/2}}{\sqrt{2 \pi}}$ with respect to length on $\mathbb{R}$.
\item In the $\sigma$-finite case, it is easy.
\end{enumerate}
Most arguments are ``same''. For example, if $A_n \uparrow A$, $A_n, A \in \mathcal{F}$. Then $\mu(A_n) \to \mu(A)$. Proof. $B_1 = A_1$, $B_n = A_n \backslash A_{n-1}$. $\cup B_n = A$. $\mu(A) = \sum \mu(B_i) = \lim_{n \to \infty} \sum_1^n \mu(B_i) = \lim_{n \to \infty} \mu(A_n)$.
\\ \\
But sometimes you need to watch it. If $\mu$ is a probability, $A \subset B$, $\mu(B \backslash A) = \mu(B) - \mu(A)$ and if $A_n \downarrow A$ $\mu(A_n) \to \mu(A)$. But on $\Omega = (-\infty, \infty)$, $A = (-\infty, 0]$, $B = (-\infty, 1]$, $\mu(B \backslash A) = 1$. But $\mu(B) - \mu(A) = \infty - \infty$.
\\ \\
Uniqueness of extensions. Given $\mathcal{P}$ a $\Pi$-system and $M_1, M_2$ measures on $\mathcal{P}$:
\begin{theorem}
If for some $\{B_i\}_{i=1}^\infty$, $B_i \in \mathcal{P}$ where $\mu_j(B_i) < \infty$ for $j = 1, 2$ and $\mu_1 = \mu_2$ on $\mathcal{P}$ then $M_1 = M_2$ on $\sigma(\mathcal{P})$.
\end{theorem}
\begin{proof}
Fix $B \in \mathcal{P}$ where $\mu_j(B) < \infty$ for $j=1,2$. Let $\nu_j(F) = \mu_j(F \cap B)$. These are finite measures, so by $\Pi-\lambda$ theorem, $\nu_1 = \nu_2$ on $\sigma(\mathcal{P})$. Now, let $A_1 = B_1$, $A_2 = B_2 \backslash A_1$, $A_n = B_n \backslash (\cup_{i=1}^n A_i)$. Then for all $F$, $\mu_1(F) = \mu_1(F \cap (\cup_i A_i)) = \sum_i \mu_1 (F \cap A_i) = \sum_i \mu_2 (F \cap A_i) = \mu_2(F \cap (\cup_i A_i)) = \mu_2(F)$.
\end{proof}
Note that if things are not $\sigma$-finite you can have two different extensions. See the book.
\\ \\
\subsection{Back to Outer Measures}
\begin{defn}
$\mu^*$ is an outer measure on $\Omega$ if $\mu^* : 2^\Omega \rightarrow [0, \infty]$, $\mu^*(\emptyset) = 0$, and $\mu^*(\cup A_i) \le \sum \mu^*(A_i)$. In other words, nonnegative, nontrivial, and countably subadditive.
\end{defn}
Example: $\Omega$ any set, $\mathcal{A}$ any collection of subsets, $\emptyset \in \mathcal{A}$. $\rho : \mathcal{A} \rightarrow [0, \infty]$ any function. Define $\mu_\rho^* (A) = \inf \sum_{i=1}^\infty \rho(A_n)$, $A \subset \cup_{i=1}^\infty A_i$, and $\mu_\rho^*(A) = \infty$ if no such cover exists. Claim: $\mu^*$ is an outer measure. $\emptyset$ follows trivially (it is covered by zero sets). The measure is definitionally nonnegative. It is also obviously countably subadditive.
\\ \\
Example: Hausdorff $\gamma$ measure on $\mathbb{R}^n$. Let $\mathcal{A}$ be the collection all closed balls $B_n(x)$. $\rho(B)$ is the volume of the ball. $\gamma$ is fixed in $(-\infty, \infty)$. Read more in 2nd edition of billingsley.
\\ \\
As usual, given an outer measure $\mu^*$, define $\mathcal{M}(\mu^*) = \{ A \subset \omega: \forall E \subset \Omega, \mu^*(E) = \mu^*(E \cap A) + \mu^*(E \cap A^C) \}$.
\begin{theorem}
$\mathcal{M}(\mu^*)$ is a $\sigma$-algebra and $\mu^*$ is a measure on $\mathcal{M}$.
\end{theorem}
\begin{proof}
Sentence for sentence same proof as for probabilities.
\end{proof}
To work with $\infty$ measures, it is useful to know $\sigma$-rings, $\Omega$ a set.
\begin{defn}
A collection of subsets $\mathcal{A}$ is a $\sigma$-ring if $\emptyset \in \mathcal{A}$, $A,B \in \mathcal{A} \Rightarrow A \cap B \in \mathcal{A}$, $A, B \in \mathcal{A}$ and $A \le B$ then $\exists C_i, 1 \le i \le n$ disjoint in $\mathcal{A}$ such that $B \backslash A = \cup_{i=1}^n C_i$.
\end{defn}
Example: On $\mathbb{R}$ consider all $(a, b)$ with $-\infty \le a \le b \le \infty$ form a $\sigma$-ring.
\\ \\
\begin{theorem}
(Extension Theorem) Let $\mu$ be a function on a $\sigma$-ring of subsets $\mathcal{A}$ with $\mu(A) \in [0, \infty]$, $\mu(\emptyset) = 0$, $\mu$ finitely additive and $\mu$ countably subadditive: for all $A_i, \cup A_i \in \mathcal{A}$, $\mu(\cup A_i) \le \sum_1^\infty \mu(A_i)$. Then $\mu$ extends to a measure on $\sigma(\mathcal{A})$ and if there exists $A_i$ contained in $\mathcal{A}$ such that $\Omega = \cup_1^\infty A_i$, $\mu(A_i) < \infty$ the extension is unique.
\end{theorem}
\begin{proof}
Define $\mu^*(A) = \inf \sum_{i=1}^\infty \mu(A_i)$ wher e$A \subset \cup_1^\infty A_i$ and $A_i \in \mathcal{A}$. That is an outer measure and $\mu^*$ on $\mathcal{M}(\mu^*)$ is a $\sigma$-algebra that does the job. We show (1) $\mathcal{A} \subset \mathcal{M}(\mu^*)$ and (2) $\mu^*(A) = \mu(A)$ for all $A \in \mathcal{A}$.
\\ \\
For (1), pick $A \in \mathcal{A}$. We must show for all $E$, $\mu^*(E) \ge \mu^*(E \cap A) + \mu^*(E \cap A^C)$. If $\mu^*(E) = \infty$ it is true. Suppose $\mu^*(E) < \infty$. Then for every $\epsilon$ there exists $A_i \in \mathcal{A}$ such that $\sum \mu(A_i) \le \mu^*(E) + \epsilon$. Since $\mu^*$ is finite, $\mu^*(A_i)$ is finite for all $i$. Set $B_n = A \cap A_n$. $B_n \in \mathcal{F}$. Now $B_n \subset A_n$ so $A_n \backslash B_n = \cup_{i=1}^{m_n} C_{ni}$ where disjoint $C_{ni} \in \mathcal{A}$. Then $A_n = B_n \cup (\cup_{i=1}^{m_n} C_{ni})$, $A \cap E \subset \cup B_n$. Then $A^C \cap E \subset \cup_n \cup_i C_{ni}$. Now
$$\mu^*(E \cap A) + \mu^*(E \cap A^C) \le \sum_{n=1}^\infty \mu(B_n) + \sum_{n=1}^\infty \sum_{i=1}^{m_n} \mu(C_{ni})$$
$$ = \sum_{n=1}^\infty \mu(B_n) + \mu(A_n - B_n)$$
$$ = \sum_n \mu(A_n) \le \mu^*(E) + \epsilon$$
For (2), if $A \subset \cup A_i$ where $A, A_i \in \mathcal{A}$, then $\mu(A) \le \sum_i \mu(A_i)$. The other direction is free.
\end{proof}
\subsection{Distribution functions on $\mathbb{R}$}
Given probability $\mu$ we describe a {\bf distribution function} by $F(x) = \mu(-\infty, x]$. We often define probability measures using distribution functions. For instance, the ``Gauss Measure'' $F(x) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^x e^{-t^2/2} dt$.
\\ \\
Observe (1) $\lim_{x \to -\infty} F(x) = 0$, $\lim_{x \to \infty} F(x) = 1$ (normalization), and $y < x$ means $F(x) - F(y) = \mu(-\infty, x] - \mu(-\infty, y] = \mu(y, x] \ge 0$. Therefore, (2) $F$ is monotone.
\\ \\
Further, (3) if $x_n \downarrow x$, $(-\infty, x_n] \downarrow (-\infty, x]$, so $F(x_n) \downarrow F(x)$ so $F(x)$ is right continuous. Note it isn't left continuous. Let $\mu(A) = 1$ if $0 \in A$ and $0$ otherwise. Then $F(x) = 0$ for $x < 0$ and $F(x) = 1$ for $x \ge 0$.
\begin{theorem}
Conversely, if $F(x)$ satisfies (1. normalization), (2. monotonicity), and (3. right-continuity), then $\exists !$ probability measure on $(-\infty, \infty)$ with $F(x) = \mu(-\infty, x]$.
\end{theorem}
Note $\{(-\infty, x]: x \in \mathbb{R}\}$ is a $\Pi$-system.
\\ \\
Want to do this in higher dimensions.
\\ \\
Let $A_{x_1, x_2} = \{ (\eta_1, \eta_2) : \eta_i < x_i \}$. Given $\mu$ on the Borel sets of $\mathbb{R}^2$, define $H(x_1, x_2) = \mu(A_{x_1, x_2})$. $H$ is monotone, right continuous, but need a bit more. Note $\mu(A) = \mu(A_x) - \mu(A_y) - \mu(A_w) + \mu(A_z)$
\begin{verbatim}
-----------------
. | |
. A_w | A_x |
. | |
-------|--------|
. | |
. A_z | A_y |
. ... | ... |
\end{verbatim}
In $\mathbb{R}_d$, let $A = \{(x_1, ..., x_d) : a_i \le x_i \le b_i\}$. If $\underline{v}$ is a vertex, $sign(\underline{v}) = -1$ if $\underline{v}$ has an odd number of $a_i$'s, and $1$ otherwise.
\\ \\
If $H(x_1, ..., x_d), \delta_A H = \sum_{\underline{v}} sign(\underline{v}) H(\underline{v})$. $H$ satisfies $H(x_1, ..., x_d) = \mu(A_{x_1, ..., x_d})$ for a unique probability $\mu$ $\Leftrightarrow$ $\lim_{\underline{x} \to \infty} H(\underline{x}) = 0$, $\lim_{\underline{x} \to \infty} H(\underline{x}) = 1$ (min coord goes to $-\infty$ or max coord goes to $\infty$), $H$ is right continuous and for all rectangles, $\prod(x_i, B_i] = \delta_A H \ge 0$.
\\ \\
So now we have a HW problem from this week: \\Let $F_1(x), F_2(x)$ be distribution functions on $\mathbb{R}$. $H(x,y)$ with $H(x, \infty) = F_1(x)$ and $H(\infty, y) = F_2(y)$ is called a bivariate distribution function with margins $F_1$ and $F_2$.
\\ \\
Problem a: Consider $H_L(x, y) = (F_1(x) + F_2(y) - 1)$ ($H$-lower) and $H_U = \min(F_1(x), F_2(y))$ ($H$-upper). Check that these are distribution functions with margins $F_1$ and $F_2$.
\\ \\
Problem b: For every $H$ with margins $F_1, F_2$, show $H_L(x,y) \le H(x,y) \le H_U(x,y)$.
\\ \\
Remark: Once we know what correlation is, $H_L$ is the most negatively correlated D.F. with margins $F_1$, $F_2$ and $H_U$ is the most positively correlated.
\\ \\
\subsection{Measurable Functions and Random Variables}
Let $(\Omega, \mathcal{F})$, $(\Omega', \mathcal{F}')$ be measure spaces.
\begin{defn}
A function $T: \Omega \rightarrow \Omega'$ is {\bf measurable} if\\
For all $A' \in \mathcal{F}'$, $T^{-1}(A') \in \mathcal{F}$
\end{defn}
Proposition: Suppose $\mathcal{F}' = \sigma(\mathcal{A})$. (a) Then $T$ is measureable iff $T^{-1}(A') \in \mathcal{F}$, $A' \in \mathcal{A}'$.
\\ \\
(b) If $T_1 : (\Sigma_1, \mathcal{F}_1) \rightarrow (\Sigma_2, \mathcal{F}_2)$ and $T_2 : (\Sigma_2, \mathcal{F}_2) \rightarrow (\Sigma_3, \mathcal{F}_3)$ are measurable, then $T_2 \circ T_1 : (\Sigma_1, \mathcal{F}_1) \rightarrow (\Sigma_3, \mathcal{F}_3)$ is measurable.
\\ \\
Both follow directly.
\\ \\
\begin{defn}
A {\bf random variable} is a measurable function $T : (\Omega, \mathcal{F}) \rightarrow (\mathbb{R}, \mathcal{B}(\mathbb{R}))$ where $\mathcal{B}(\mathbb{R})$ is the class of the Borel sets.
\end{defn}
\begin{defn}
A {\bf random vector} is a measurable function $(\Omega, \mathcal{F}) \rightarrow (\mathbb{R}^n, \mathcal{B}(\mathbb{R}^n))$ where $T(\underline{x}) = (T_1(x_1), ..., T_n(x_n))$.
\end{defn}
\begin{lemma}
$T$ is a random variable $\Leftrightarrow$ all $T_i$ are random variables.
\end{lemma}
\begin{proof}
If each $T_i$ is a measure, then $T^{-1}(A_{\underline{x}}) = \cap_{i=1}^n T_i^{-1}(x_i)$. Note this is enough by proposition (a), $A' = \{A_x\}$.
\\ \\
If $T$ is a random vector, then $T_i^{-1} (-\infty, x] = \cup_{n=1}^\infty T^{-1} \{ \underline{y} : y_i \le x_i \textrm{ and } y_j \le n, j \neq i\}$, so each is measurable.
\end{proof}
\begin{lemma}
If $T: \mathbb{R}^k \rightarrow \mathbb{R}$ is continuous, then $T$ is measurable.
\end{lemma}
\begin{proof}
Preimage of an open set is open (and closed set is closed) iff $T$ is continuous. In $\mathbb{R}$, $T$ continuous $\Leftrightarrow T^{-1}(-\infty, x]$ is closed and closed sets are Borel measurable.
\end{proof}
\begin{corr}
If $X$ and $Y$ are random variables on $\Omega$, then $X+Y$, $X \cdot Y$, $\min(X, Y)$, $\max(X,Y)$ are random variables.
\end{corr}
\begin{proof}
$\Omega \rightarrow \mathbb{R}^2 \rightarrow \mathbb{R}$: composite functions will be measurable, since $f(x,y) = x+y$ is measurable. Same for all others.
\end{proof}
\subsubsection{New measures from old:}
Say $\mu$ is a measure on $(\Omega, \mathcal{F})$ and $T: (\Omega, \mathcal{F}) \rightarrow (\Omega', \mathcal{F}')$ is measurable. Define push forward of measurable $T$
$$\mu(T^{-1} (A')) = \mu(T^{-1}(A')) = \mu \{ \omega : T(\omega) \in A'\}$$
This is a measure.
\\ \\
We use this to construct measures. Example: let $O_n$ be the orthogonal group: that is, the set of all $n times n$ matrices $M$ such that $MM^T = I$. Want to know what it means to ``pick a matrix at random''. Suppose we know how to pick from the normal distribution. Let $X_{ij}$ be independent picks from $\frac{e^{-x^2/2}}{\sqrt{2 \pi}}$ (good ol' bell-shaped curve). As math: $\Omega = \mathbb{R}^{n^2}$, define $\mathcal{P}$ as $\mathcal{P}(A_{x_{11}, ..., x_{nn}}) = \int_{-\infty}^{x_{11}} ... \int_{-\infty}^{x_{nn}} \frac{e^{-\sum x_{ij}^2}}{(\sqrt{2 \pi})^n} d x_{11} ... x_{nn}$. Map $T: \mathbb{R}^{n^2} \rightarrow O_n$ (Gram-Schmidt). Then $P^{T^{-1}}$ is Harr measure.
\\ \\
This stuff is all sort of near section 15 of the book.
\section{Week 4}
This week covers material from sections ~14,15 in the book.
\subsection{Lebesuge Interval}
Let $(\Omega, \mathcal{F}, \mu)$ be a measure space. For measureable $f$, define
$$\int f d \mu = \int_\Omega f(\omega) \mu(d \omega)$$
Strategy to define:
\begin{enumerate}
\item Define it for $f \in SF_+$ (simple functions)
\item Define for $f \in m \mathcal{F}_)$
\item Extend to $f \in m \mathcal{F}$
\end{enumerate}
Read Wikipedia for all of this.
\begin{theorem}
(1) $f \in SF_+$, $f = \sum_{i=1}^m f_i I_{A_i} \Rightarrow \int f d \mu = \sum_{i=1}^m f_i \mu(A_i)$ \\ \\
(2) For all $\omega$, $0 \le f(\omega) \le g(\omega) \Rightarrow 0 \le \int f d \mu \le \int g d \mu$.
\\ \\
(3) For all $\omega$, $0 \le f_n(\omega) \le f (\omega)$, $f_n(\omega) \uparrow f(\omega) \Rightarrow \int f_n d \omega \uparrow \int f d \mu$.
\\ \\
(4) $\alpha, \beta \ge 0$, $\int(\alpha f + \beta g) d \mu = \alpha \int f d \mu + \beta \int g d \mu$
\end{theorem}
\begin{proof}
(1) Note that $\ge$ is obvious (sup over all simple functions...). Now, let $\{B_1, ..., B_n\}$ be a partition with $\beta_i = \inf_{\omega \in B_i} f (\omega)$. Let $\{C_1, ..., C_k\}$ be the partition such that for each $A_i$, $B_j$, there are some $C_l, C_{l+1}, ..., C_{l+h}$ with $\cup C_l = A_i$, $\cup C_h = A_j$ (in other words, partition by both $A$ and $B$). Then consider $\sum_{i=1}^n \beta_i \mu(B_i) \le \sum_{i=1}^h \gamma_i \mu(C_i) \le \sum_{i=1}^k f_i A_i$.
\\ \\
(2) We know $0 \le \int f d \mu$. Now, let $f(\omega) \le g(\omega)$. Then $\int f d \mu = \sup \sum_{i=1}^m \inf \omega \in A_i f (\omega) \mu(A_i) \le \sum \sum_{i=1}^m \inf \omega \in A_i g(\omega) \mu(A_i) = \int g d \mu$.
\\ \\
(3) $\int f_n d \mu$ non-decreasing, $\int f_n d \mu \le \int f d \mu$. Hence, $\lim_{n \to \infty} \int f_n d \mu \le \int f d \mu$.
\\ \\
Note it is sufficient to prove for all partitions, $\lim_{n \to \infty} \int f_n d \mu \ge \sum_{i=1}^m \mu(A_i) f_i$ i.e. for all $\epsilon$, there is a large enough $n$ such that
$$\int f_n d \mu \ge \sum_{i=1}^m (f_i - \epsilon) \mu(A_i)$$
Let $A_{i, n} = \{ \omega \in A_i : f_n(\omega) \ge x_i - \epsilon \}$. $A_{0, n} = \Omega \backslash \cup_{i=1}^m A_{i, n}$. $\int f_n d \mu \ge \int - \epsilon \sum_{i=1}^m \mu(A_i)$ for $i = 1, ..., m$.
\\ \\
$\int f_n d \mu \ge \sum_{i=1}^m \inf_{\omega \in A_{i, n}} f_n(\omega) \mu(A_{i, n}) \ge \sum_{i=1}^m (x_i - \epsilon) \mu(A_{i,n})$ so $A_{i, n} \uparrow A_i$.
\\ \\
Note this assumes $\mu(A_i) < \infty$. If $\mu(A_1), ..., \mu(A_{m_0}) < \infty, \mu(A_{m_0 + 1}), ..., \mu(A_m) = \infty$. $S < \infty, \inf_{\omega \in A_i} f(\omega) = 0$ for all $i \in \{m_0 + 1, ..., m\}$.
\\ \\
(4) For $\int \alpha f d \mu = \sup_{\{A_i\}} \sum_{i=1}^m \mu(A_i) \inf_{\omega \in A_i} [ \alpha f(\omega) ] = \alpha \sup \sum \mu(A_i) \inf f(\omega) = \alpha \int f d \mu$.
\\ \\
Let $f, g$ be simple. Obvious for $f, g$. Now, all functions are limits from below of simple functions, so the whole thing becomes obvious.
\end{proof}
Review: Let $(\Omega, \mathcal{F},\mu)$ be a measure space, and let $f : \Omega \rightarrow [0, \infty]$ be a measurable function. Then we define $\int f d \mu = \int_\Omega f (\omega) \mu(d \omega) = \sup_{\{A_n\}} \sum_{i=1}^N \inf_{\omega \in A_i} f (\omega_i) \mu(A_i)$, where $\Omega = \cup A_i$.
\\ \\
Properties:
\begin{enumerate}
\item $0 \le f \le g \Rightarrow \int f d \mu \le \int g d \mu$
\item Integral is linear
\item (Monotone Convergence Theorem) If $f_n, f \ge 0$ and $f_n(\omega) \uparrow f(\omega)$, then $\lim \int f_n d \mu = \int \lim f_n d \mu = \int f d \mu$.
\item If $f(\Omega) = \sum_{i=1}^N x_i \delta_{B_i}$ (step function), then $\int f d \mu = \sum_{i=1}^n x_i \mu(B_i)$.
\end{enumerate}
That is, monotonicity, linearity, MCT, and step functions.
\\ \\
If $f : \Omega \to \mathbb{R}$, write $f^+(\omega) = \max(f(\omega), 0)$ and $f_-(\omega) = \max(-f(\omega), 0)$. Say $\int f d \mu = \int f^+ d \mu - \int f_- d \mu$.
\\ \\
\begin{theorem}
(Fatou's Lemma). On $(\Omega, \mathcal{F}, P)$, let $f_n \ge 0$ be any measurable functions. Then $\int \lim f_n d \mu \le \lim \int f_n d \mu$.
\end{theorem}
\begin{proof}
Set $g_n = \inf_{h \ge n} f_h \uparrow g = \lim f_n$. So $\lim \int g_n d \mu = \int \lim f_n d \mu$ (since $g_n$ are monotone). But $g_n \le f_n$ so $\int g_n \le \int f_n$. Taking the limit of both sides gives $\lim \int g_n \le \lim \int f_n$. QED.
\end{proof}
Remarks: This works for any $f_n \ge 0$.
\\ \\
Example: Enumerate the rationals in $[0,1]$ as $\Omega_1, \Omega_2, ...$. Define $f(x) = \sum_{i=1}^\infty \frac{1}{i^2 \sqrt{|\Omega_i - x|}}$. Claim: $f(x) < \infty$ (almost sure).
\\ \\
Proof: Let $f_n(x) = \sum_{i=1}^n \frac{1}{n^2} \frac{1}{\sqrt{|\Omega_i = x|}} \uparrow f(x)$. Then $\int_0^1 f(x) dx \le \lim \sum \frac{1}{i^2} \int_0^1 \frac{dx}{\sqrt{|\Omega_i - x|} } \le \sum_{i=1}^\infty \frac{c}{i^2}$.
\\ \\
\$10 problem:
Find a single $x$ such that $f(x) < \infty$.
\\ \\
\begin{theorem}
(Dominated Convergence Theorem) On $(\Omega, \mathcal{F}, \mu)$, $f_n, f, g$ be measurable functions, $f_n(\omega) \rightarrow f(\omega)$ almost everywhere (i.e. almost sure), $|f_n| \le g$ and $\int g d \mu < \infty$, then $f_n, f$ are integrable and $\lim_{n \to \infty} \int f_n d \mu = \int f d \mu$.
\end{theorem}
\begin{proof}
By hypothesis, $f_n^+ + f_n^- \le g$, $f_* = \lim \inf f_n$ and $f^* = \lim \sup f_n$ are both $\le g$. Then $g + f^*$ and $g - f_* \ge 0$. Therefore, $\int g d \mu + \int f_* d \mu = \int \lim \inf (g + f_n) d \mu \le \int g d \mu + \lim \inf \int f_n d \mu$ by Fatou.
\\ \\
For all $x_n$ check $\lim \inf -x_n = - \lim \sup x_n$. Then $\int g d \mu - \int f^* d \mu = \int \lim \inf (g - f_n) d \mu \le \int g d \mu - \lim \sup \int f_n d \mu$ by Fatou.
\\ \\
$\int \lim \inf f_n d \mu \le \lim \inf \int f_n d \mu \le \lim \sup \int f_n d \mu \le \int \lim \sup f_n d \mu$. But because $f_n$ converges almost surely, everything is equal.
\end{proof}
A little probability (and a Homework Problem):
\\ \\
In English, Let $x_n$ for $1 \le n \le \infty$ be an independent exponential $(P(X_i > x) = e^{-x})$, $M_n = -\max_{1 \le i \le n} X_i$. Find limit behavior of $M_n$.
\\ \\
In Math: Let $\Omega = \mathbb{R}^n$, $\mathcal{F}$ borel. Let $G(x_1, ..., x_n) = \prod_{i=1}^\infty (1 - e^{-x_i})_+$. This is a distribution function which given $G(box) = \prod \int_{a_i}^{b_i} e^{-x} dx \ge 0$. Let $P$ be associated probability. $X_i(\omega_1, ..., \omega_n)$. $P(M_n \le x) = P(X_i \le x \textrm{ for all } i) = (1 - e^{-x})^n = e^{n \log(1 - e^{-x})}$. So, for $x$ large, $\log(1 - e^{-x}) ~ e^{-x}$, set $x = \log n + C$. $P(M_n \le x) ~ e^{-e^{-c}}$ (that is, the extreme value distribution).
\\ \\
HW Problem: \\
(a): $x \ge 0$, $\frac{x}{1 + x^2} e^{-x^2/2} \le \int_x^\infty e^{-t^2/2} dt \le \frac{e^{-x^2/2}}{x}$. \\ \\
(b): Let $x_i$ be i.i.d. in $\mathcal{N}(0,1)$ (normal about $(0,1)$). Let $y_i = \lfloor x_i \rfloor$. Let $M_n = \max(y_i)$. Show there exist integers $a_n, p_n \in (0,1)$, $P(M_n = a_n) \sim p_n$, $P(M_n = a_n - 1) \sim (1 - p_n)$. \\ \\
(c): $\lim \inf p_n \neq \lim \sup p_n$.
\section{Week 5}
This week covers material from sections 18 - 20 in the book.
\subsection{Product Measures}
Suppose $(\Omega_1, \mathcal{F}_1)$ and $(\Omega_2, \mathcal{F}_2)$ are measure spaces. Then $\Omega_1 \times \Omega_2 = \{(\omega_1, \omega_2) : \omega_1 \in \Omega_1 \textrm{ and } \omega_2 \in \Omega_2\}$.
\\ \\
If $A_1 \in \mathcal{F}_1$ and $A_2 \in \mathcal{F}_2$, then $A_1 \times A_2$ is a measurable rectangle. Measurable rectangles form a semi-ring. Check: $\Omega_1 \times \Omega_2$ OK. $A_1 \times A_2 \cap B_1 \times B_2 = A_1 \cap B_1 \times A_2 \cap B_2$. $(A_1 \times A_2)^C = A_1^C \times A_2 \cup \Omega_1 \times A_2^C$ which leads to disjoint union of rectangles.
\\ \\
\begin{defn} $\mathcal{F}_1 \times \mathcal{F}_2 = \sigma(\{A_1 \times A_2 : A_i \in \mathcal{F}_i\})$ \end{defn}
Let $\Pi_1 : \Omega_1 \times \Omega_2 \rightarrow \Omega_1$ and $\Pi_2 : \Omega_1 \times \Omega_2$ be the coordinate projections onto $\Omega_1$ and $\Omega_2$ respectively. Then $\mathcal{F}_1 \times \mathcal{F}_2$ is the smallest $\sigma$-algebra making $\Pi_n$ measurable for $n = 1, 2$.
\\ \\
Sections: $A \subset \Omega_1 \times \Omega_2$, $\omega_1 \in \Omega_1$. Then $A_{\omega_1} = \{ \omega_2 : (\omega_1, \omega_2) \in A \}$. $f : \Omega_1 \times \Omega_2 \rightarrow \Omega_3$ where $f_{\omega_1}(\omega)2 = f(\omega_1, \omega_2)$.
\\ \\
Fact: Sections are homeomorphisms for $\cup$, $\cap$, complementation. That is, $(\cup A^i)_{\omega_1} = \cup A^i_{\omega_1}$. Same for intersection and complementation.
\\ \\
\begin{lemma}
(Section Lemma) If $A \subset \mathcal{F}_1 \times \mathcal{F}_2$, then $A_{\omega_1}$ is $\mathcal{F}_2$-measurable. Similarly, if $f : \Omega_1 \times \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_1 \times \mathcal{F}_2 measuable$, $f_{\omega_1} : \Omega_2 \rightarrow \mathbb{R}$ is $\mathcal{F}_2$-measurable.
\end{lemma}
\begin{proof}
Let $\rho$ be $\{A \subset \Omega_1 \times \Omega_2 : A_{\omega_1} \textrm{ is measurable} \}$. Note: $(A_1 \times A_2)_{\omega_1} = \emptyset$ if $\omega_1 \not\in A_1$ and $= A_2$ if $\omega_1 \in A_1$ is measurable. Therefore, $\rho$ contains a $\Pi-$system of measurable rectangles. Also, $\rho$ is closed under complements and disjoint unions. Therefore, $\rho$ is a $\lambda$-system and $\rho \supset \mathcal{F}_1 \times \mathcal{F}_0$. Further, $f_{\omega_1}^{-1}(B) = (f^{-1}(B))_{\omega_1}$ is measurable, so $f$ is measurable. (Warning: the converse is false. If $A \subset \Omega_1 \times \Omega_2$ and $A_{\omega_1}, A_{\omega_2}$ are measurable for all $\omega_1, \omega_2$, $A$ might not be measurable. Example: $\Omega_1 = \Omega_2 = (0, 1]$ and $\mathcal{F}_i$ is the countable, co-countable $\sigma$-algebra. The diagonal $\{(x, x)\}$ is not $\mathcal{F}_1 \times \mathcal{F}_2$ measurable, but every section is a point (so is countable).
\end{proof}
\subsection{Kernels}
$(\Omega_1, \mathcal{F}_1)$, $(\Omega_2, \mathcal{F}_2)$ are measure spaces. A probability kernel is a map $K: \Omega_1 \times \mathcal{F}_2 \rightarrow [0,1]$ $K(\omega_1, A_2)$ such that:
\begin{enumerate}
\item $\forall A_2 \in \mathcal{F}_2$, $\omega_1 \rightarrow K(\omega_1, A_2)$ is Borel measurable
\item $\forall \omega_1$, $K(\omega_1, A_2)$ is a probability measure in $A_2$.
\end{enumerate}
Examples:
1. $K(\omega_1, A_2) = \mu(A_2)$ where $\mu$ is some probability on $\mathcal{F}_2$. 2. Families of probabilities. Let $\{P_\theta(dx) \}_{\theta \in \Theta}$ family of probabilities. e.g. $\Theta = \mathbb{R} \times (0, \infty)$, $P_{\mu, \sigma^2} \mathcal{N}(\mu, \infty)$. Has $\mathcal{F}_\Theta$ measure structure. $P_\Theta(A)$ is a kernel. 3. $\Omega_1 = \Omega_2$ $K(\omega_1, A_2)$ is called a Markov kernel (so you get standard CS Markov chains by using $\Omega$ finite).
\\ \\
Suppose $K(\omega_1, A_2)$ is a kernel. Consider $\mathcal{G} = \{ A \in \mathcal{F}_1 \times \mathcal{F}_2 : \omega_1 \mapsto K(\omega_1, A_{\omega_1}) \textrm{ is measurable} \}$. Claim: $\mathcal{G} = \mathcal{F}_1 \times \mathcal{F}_2$. Proof: $A_1 \times A_2 \in \mathcal{G}$ for $K(\omega_1, (A_1 \times A_1)_{\omega_1}) = I_{A_1}(\omega_1) K(\omega)1, A_2)$.
\\ \\
If $A \in \mathcal{G}$, then $K(\omega_1, (A^C)_{\omega_1}) = K(\omega_1, (A_{\omega_1})^C) = 1 - K(\omega_1, A_{\omega_1})$. If $A_i \in \mathcal{G}$ are disjoint, $K(\omega_1, (\cup A^i)_\omega) = \sum_i K(\omega, A_i^\omega)$.
\\ \\
Let $\Pi$ be a probability on $\Omega_1$. Define $\Pi K(A) = \int_{\Omega_1} k(\omega_1, A_{\omega_1}) \Pi(d \omega_1)$. $A \in \mathcal{F}_1 \times \mathcal{F}_2$. $\Pi K$ is a probability because $K(\omega_1, (\Omega_1 \times \Omega_2)_{\omega_1}) = K(\omega_1, \Omega_2) = 1$. By properties of the integral, it is countably additive.
\\ \\
Note: (a) $\Pi K (A_1 \times A_2) = \int_{A_1} K(\omega_1, A_2) \Pi(d \omega_1)$. This gives $\Pi K$ integration. First pick $\omega_1$ from $\Pi(.)$, then pick $\omega_2$ from $K(\omega_1, .)$. (b) $\Pi K (A_1 \times \Omega_2) = \Pi(A_1)$ (marginal distribution).
\subsection{Fubini's Theorem}
\begin{theorem}
(Fubini's Theorem for Kernels) $(\Omega_1, \mathcal{F}_1), (\Omega_2, \mathcal{F}_2), \Pi, K$ as above. Let $f: \Omega_1 \times \Omega_2 \rightarrow [0, \infty]$ be $\mathcal{F}_1 \times \mathcal{F}_2$ measurable. Then $\int_{\Omega_2} f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2)$ is $\mathcal{F}_1$-measurable and
$$\int_{\Omega_1 \times \Omega_2} f(\omega_1, \omega_2) d \Pi K(\omega_1, \omega_2) = \int_{\Omega_1} [\int_{\Omega_2} f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2) ] \Pi (d \omega_1)$$
\end{theorem}
\begin{proof}
Use 1-2-3 argument. Let $\mathcal{G}$ be the class of all functions such that the theorem holds. $G$ contains $\delta_{A_1 \times A_2}$ by previous definition. Then $\mathcal{G}$ contains positive linear combinations: $\sum a_i f_i \in \mathcal{G}$ for nonnegative $a$, $f_i \in \mathcal{G}$. By monotone convergence, $f_n \uparrow f$, $f_i \in \mathcal{G}$ means $f \in \mathcal{G}$. Therefore, $\mathcal{G}$ contains all $\mathcal{F}_1 \times \mathcal{F}_2$ measurable functions.
\end{proof}
\begin{theorem}
(Fubini for possibly negative functions) $\Pi$, $K$ as above. $f : \Omega_1 \times \Omega_2 \rightarrow [-\infty, \infty]$, $\mathcal{F}_1 \times \mathcal{F}_2$ measurable and $f$ is $\Pi K$-measurable. Let $H = \{\omega_1 : f_{\omega_1} \textrm{ is } K(\Omega, .) integrable \}$. Define $K f(\omega_1) = \int f_{\omega_1}(\omega_2) K(\omega_1, d \omega_2)$ if measurable, 0 otherwise. Then $H \in \mathcal{F}_1$, $\Pi(H) = 1$ and fubini's theorem holds:
$$\int f d \Pi K = \int_{\Omega_2} K f(\omega_1) \Pi(d \omega_1)$$
\end{theorem}
Warning: for $f : \Omega_1 \times \Omega_2 \rightarrow [-\infty, \infty]$, we assume $f$ is $\Pi K$ integrable, then we conclude that one of $\int f_{\omega_1}^{+/-} K(\omega_1, d \omega_2)$ has finite integral.
Example: Let $\Omega_1 = (0, 1]$, Borel sets, Lebesgue measure. Let $\Omega_2 = \{1, 2\}$ all subsets, counting measure. $\Pi$ is the Lebesgue measure. $K(\omega_1, 1) = K(\omega_1, 2) = 1/2$. $f(\omega_1, \omega_2) = \frac{(-1)^{\omega_2}}{\omega_1}$ so $f_{\omega_1} (\omega_2) = -1/\omega_1$ if $\omega_2 = 1$ and $1/ \omega_1$ if $\omega_2 = 2$. $K f (\omega_1) = 0$ but $\int f^+ d \Pi K = \int f^- d \Pi K = \infty$.
\\ \\
let $(\Omega, \mathcal{F}, \mu)$ be a measure space, $f(\omega, t) : \Omega \times (a,b) \rightarrow \mathbb{R}$. Suppose $\int |f(\omega, t) \mu (d \omega) < \infty$ for all $t$. Let $J(t) = \int f(\omega, t) \mu( d\omega)$.
\begin{theorem}
Suppose for each $A \in \mathcal{F}$, $\mu(A^C) = 0$, integrable $g(\omega)$ and open interval $I \subset (a,b)$ and for $\omega \in A$, $f(\omega, t$ is continuous at $x_0$ and $\sup_{t \in I} | f(\omega, t) | \le g(\omega)$, then $F(t)$ is continuous at $t_0$ and $\lim_{t \to t_0} f(\omega, t) \mu(d\omega) = \int f(\omega, t_0) \mu(d \omega)$.
\end{theorem}
\begin{theorem}
Assume $\omega \in A$. If $f(\omega, t$ is differentiable at $t_0$ and $\sup_{\omega \in I} | \frac{ f(\omega, t) - f(\omega, t_0) }{t - t_0} | \le g(\omega)$ then $J(t)$ is differentiable at $t_0$.
\end{theorem}
Of course, we can't always take limits inside. Example: $f_n(x) = n^2 \delta_{(n, n+1)} \to 0$ almost surely on $(0, \infty)$.
\\ \\
\subsection{Uniform Integrability}
Most useful for finite measures, so we will cover in terms of probability.
\begin{defn}
$f_n: \Omega \rightarrow \mathbb{R}$ is uniformly integrable $1 \le n < \infty$. If $\forall t > 0$, $\exists A$ where $$\int_{\{\omega : |f_n(\omega) | > A\}} \int |f_n(\omega) | P(d \omega) < \epsilon$$ for all $n$.
\end{defn}
Example 1: Single functions If $|f|$ is integrable, then $\{f \}$ is uniformly integrable. Proof: $f_n(\omega) = \delta_{\{ \omega : |f(\omega)| < n \}}(\omega) \rightarrow f(\omega)$ as $n \to \infty$ for all $\omega$. By dominated convergence theorem, $\int f_n dP \rightarrow \int f (dP)$ so $\int |f_n| dP \rightarrow 0$.
\\ \\
Example 2: Finite families $f_1, ..., f_n$ are uniformly integrable.
\\ \\
Example 3: $f_n(x) = n^2 \delta_{(n, n+1)}$ is not uniformly integrable.
\\ \\
Example 4: Suppose $f_n$ with $1 \le n < \infty$ are integrable and $\exists \epsilon > 0$, $B < \infty$ with $\int |f_n|^{1+\epsilon} dP\ le B$ for all $n$ (usually, $\epsilon = 1$). Then $\{ f_n\}$ is uniformly integrable. Proof: $\int_{\{ \omega : |f_n^{1+\epsilon}| > A \}} | f_n | dP \le \frac{1}{A^{1+\epsilon}} \int_{\{ \omega : |f_n^{1+\epsilon}| > A \}} |f_n|^{1+\epsilon} dP \le \frac{B}{A^{1+\epsilon}}$. Choose $A$ large to make this small.
\begin{theorem}
Let $(\Omega, \mathcal{F}, P)$ a probability space, $f_n, f$ measurable with $f_n \to f$ almost surely and $f_n$ uniformly integrable. Then $f$ is integrable and
$$\lim_n \int f_n d P = \int f d P$$
\end{theorem}
\begin{proof}
Given $A$ set, let $f_n(\omega)^A (\omega) = f_n(\omega)$ if $|f_n(\omega)| \le A$ or $0$ is $|f_n(\omega) | > A$. Similar for $f^A$. For $A$ fixed, then $f_n^A \to F^A$ almost surely and is bounded by $A$. So by the dominated convergence theorem, $\int f_n^A d P \rightarrow \int f^A d P$.
\\ \\
Also Fatou tells us $\int f dP \le \lim \int f_n dP \le 1 + A < \infty$ so $f$ is integrable. Now, choose $A$ large such that $\int |f_n - f| = \int_{\omega |f_n| > A} f_n + \int_{\omega |f_n| \le A} f_n - \int_{\omega |f_n| \le A} f - \int_{\omega |f_n| < A} f$
which we just make less than $\epsilon$.
\end{proof}
\begin{theorem}
Conversely, assume that $f_n \to f$ almost surely and $f_n$ and $f$ are integrable and nonnegative and $\lim \int f_n dt = \int f dt$. Then, $f_n$ is uniformly integrable.
\end{theorem}
\begin{proof}
For all $A$, $f_n^A \to f^A$ almost surely. Choose $A$ so $\int_{|f| > A} f dP < \epsilon$. So there exists some $n_0$ such that $n > n_0$ implies $|\int f_n^A - \int f^A | < \epsilon$ for all $n > n_0$. Then choose $A_1 > A$ so $\int_{\{|f_n| > A} |f_n| dP < \epsilon / 3$ for all $n \le n_0$.
\\ \\
$\int f_n dP = \int_{|f_n| > A} f_n dP + \int f_n^A dP$. Now $\int_{f_n(\omega) > A} f_n DP \rightarrow \int_{f > A} f dP$ using DCT. Choose $A$ so that the r.h.s. $< \epsilon/3$ and for $n$ large, the l.h.s. $< 2 \epsilon / 3$.
\end{proof}
\section{Week 6}
From now on, everything is probability. We'll see: Strong Law, Poisson Convergence, Central Limit Theorem, Weak Convergence. We'll do these all with Stein's method.
\subsection{Tail Fields and Kolmogorov's Zero/One Law}
Have $(\Omega, \mathcal{F}, P)$, $A_n \in \mathcal{F}$ for all $n$.
\begin{defn}
Tail Field generated by $\{ A_n \}$ is $\tau = \cap_{n=1}^\infty \sigma(A_n, A_{n+1}, ...)$
\end{defn}
Event $A \in \tau$, then $A \in \sigma(A_n, A_{n+1}, ...)$ so $A$ doesn't depend on $A_1, ..., A_{n-1}$ for all $n$, so $A$ doesn't depend on any finite number of $A_n$.
\\ \\
Example: Consider $(0, 1]$, Borel sets, $\lambda$. Let $\omega = \sum_{i=1}^\infty \frac{d_i(\omega)}{2^i}$, $A_i = \{ d_i(\omega) = 1\}$. Then $\{ \omega : \lim \frac{1}{n} \sum_{i=1}^n d_i(\omega) \textrm{ exists} \}$ is in $\tau$, but $\{ \omega : \frac{1}{n} \sum_{i=1}^n d_i(\omega) = 1/2 \textrm{ i.o.}\}$ is not $\tau$-measurable.
\begin{theorem}
(Kolmogorov's zero/one law) If $\{A_i\}_{i=1}^\infty$ are independent, then $A \in \tau$ has $P(A) = 0$ or $P(A) = 1$.
\end{theorem}
\begin{proof}
Take $A \in \tau$. Then $A \in \sigma(A_{n+1}, A_{n+1}, ...)$ for each $n$, so $A$ independent of $A_1, ..., A_n$ for all $n$. Therefore, $A$ independent of $\sigma(A_1, A_2, ...)$. But $A \in \sigma(A_1, A_2, ...)$, so $A$ independent of $A$. Therefore $P(A) = P(A) P(A)$, so $P(A) = 0$ of $P(A) = 1$.
\end{proof}
Same Theorem: if $X_i$ independent random variables, $\tau = \cap_{i=1}^\infty \sigma(X_i, X_{i+1}, ...)$, $P$ is $0-1$ on $\tau$.
\\ \\
Might ask if there is a finite version of the 0-1 law. There are...
\\ \\
Usage Example: Different construction of a nonmeasurable set. Let $\mathcal{C}$ be collection of all subsets of $\{1, 2, ... \}$ with finite complement. Obvious that $\emptyset \not\in \mathcal{C}$. If $A, B \in \mathcal{C}$, then $A \cap B \in \mathcal{C}$. $A \in \mathcal{C}$, $A \subset B$, then $B \in \mathcal{C}$. Therefore, $\mathcal{C}$ is a filter. Since filters give a partial ordering, by Zorn's lemma, there is some maximal filter. Call it $\mathcal{M}$. Then for each $A \subset \mathbb{N}$, $A \in \mathcal{M}$ or $A^C \in \mathcal{M}$ (i.e. $\mathcal{M}$ is an ultra filter). Using $\mathcal{M}$, we build $D = \{ \omega \in (0, 1] : A_\omega \in \mathcal{M} \}$ where $A_{\omega} = \{ i : \omega_i = 1\}$. Claim $D$ is not Borel (in fact isn't Lebesgue measurable). Proof: First, note $D$ is a tail set, so $\omega \in D$ and $\omega'$ differs from $\omega$ in finitely many places, $A_\omega \in \mathcal{M}$ means $A_{\omega'}$ must be in $\mathcal{M}$. Else $A^c_{\omega'} \in \mathcal{M}$. $|A_\omega \cap A_{\omega'}^C < \infty$ which is a contradiction. Now, let $T(\omega_1 \omega_2 \omega_3 ...) = \overline{\omega_1} \overline{\omega_2} ...$ (just flipping ones and zeroes). Note that $T$ preserves Lebesuge measure (check on diadic intervals, or just think about swapping heads and tails on your fair coin). If $D$ were $\lambda$-measurable, then $1 = \lambda(D \cup D^C) = 2 \lambda(D)$ so $\lambda(D) = 1/2$. But zero/one law says it can't be.
\subsection{Random Variables}
Recall: A Random Variable is a Boreal measurable function from $X: (\Omega, \mathcal{F}) \rightarrow ( \mathbb{R}, Borel)$.
\\ \\
Example: Let $X$, $Y$ be idependent Random Variables with $X$ distributed as $\mu$ and $Y$ distributed as $\eta$. Find law of $X' + Y'$. Translation: $(\Omega, \mathcal{F}) = \mathbb{R} \times \mathbb{R}$, Borel sets. Let $P = \mu \times \nu$. Let $X(x,y) = x$ and $Y(x,y) = y$. Let $z(x,y) = x+y$. $\mu \times \nu(B) = \int_\mathbb{R} \mu(B_x) \nu(dx) = \int \nu(B_y) \mu(dy) = P((x,y) \in B)$. For $C \subset \mathbb{R}$, $B = \{ (x,y) : x + y \in C \}$. $P( z \in C ) = \int \mu(C - x) \nu(dx) = \int \nu(C-y) \mu(dy)$.
\\ \\
This recipe is called convolution of $\mu$ and $\nu$ and is written $\mu * \nu$.
\\ \\
For instance, if $\mu \sim \eta( m_1, \sigma_1^2)$, $\nu \sim \eta(m_2, \sigma_2^2)$, $\mu * \nu \sim \eta(m_1 + m_2, \sigma_1^2 + \sigma_2^2)$.
\\ \\
Best Reference: Hogg and Craig has lots of convolution examples.
\\ \\
\begin{defn}
$X$ be a random variable. $X$ has finite $k$th moment means
$$\int |x(\omega)|^k P(d \omega) < \infty$$
\end{defn}
The normal random variable has moments $0$ for $k$ odd, $(2j - 1)(2j - 3)$ where $k = 2j$.
Proposition: Let $X, Y$ be independent random variables with $E(X), E(y) < \infty$. Then $E(XY) < \infty$ and $E(XY) = E(X) E(Y)$. Proof: User 1 -2-3. Say $X(\omega) = \delta_A(\omega), Y(\omega) = \delta_B(\omega)$. $XY = \delta_{A \cap B} = \delta_A \delta_B$. Now, treating linear combinations, works for monotonicity. In general, $X = X^+ - X^-$, $Y = Y^+ - Y^-$. $XY = (X^+ - X^-)(Y^+ - Y^-) = X^+ Y^+ - X^- Y^+ - X^+ Y^- + X^- Y^-$ which works.
\begin{theorem}
Kolmogorov's Strong Law: Let $X_1, X_2, ...$ identically independently distributed with $E(X_i) = \mu < \infty$. Then $\lim_{n \to \infty} \frac{s_n}{n} = \mu$ almost sure (where $s_n = X_1 + ... + X_n$.
\end{theorem}
\begin{proof}
Due to Etamadi. Four tricks: the Four-T's proof.
\begin{enumerate}
\item $X_1 = X_1^+ - X_1^-$, $\mu = \mu^+ - \mu^-$, so WLOG $X_1 \ge 0$.
\item (Truncation) $Y_i = X_i \delta_{ \{ X_i \le i \} }$. $T_n = \sum_{i=1}^n Y_i$. $\alpha > 1$ fixed. Let $u_n = \lfloor \alpha^n \rfloor$, $\epsilon > 0$. We show
$$\sum_{n=1}^\infty P \{ |\frac{T_{u_n} - E(T_{u_n})}{u_n}| \} < \infty$$
Use Tchebyshev:
$$Var(T_n) = \sum_{i=1}^n Var(Y_i) \le \sum_{i=1}^n E(Y_i^2)$$
$$ = \sum_{i=1}^n E(X_i^2 \delta_{\{ X_i \le i \}} \le n E(X_1^2 \delta_{\{X_1 \le n\}})$$
so by Tchebyshev:
$$\le \sum_{n=1}^\infty \frac{Var(T_{u_n})}{\epsilon^2 u_n^2} \le \frac{1}{\epsilon^2} \sum_{n=1}^\infty \frac{1}{u_n} E(X_1 \delta_{\{ x_1 \le u_n \}})$$
$$= \frac{1}{\epsilon^2} E(X_1^2 \sum_{n=1}^\infty \frac{ \delta(x_1 \le u_n) }{u_n} ) $$
For any $x$, let $N_x$ be the smallest integer such that $u_{N_x} > x$. Then we can bound $\sum_{u_n \ge x} \frac{1}{u_n} \le 2 \sum_{n \ge N_x} \frac{1}{\alpha^n} = \frac{K}{\alpha^{N_x}}$ where $K = \frac{2 \alpha}{\alpha - 1}$.
So we can bound our original chain by:
$$\le \frac{K}{\epsilon^2} E(X_1) < \infty$$
Use this with $\epsilon = \frac{1}{m}$. Get
$$\lim_{n \to \infty} \frac{T_{u_n} - E(T_{u_n}) }{ u_n} = 0 \textrm{ a.s.}$$
by Borel-Cantelli. So we have the Strong law for truncated vars on a subsequence (the $\alpha$ exponential subsequence). Now we need to fight back to $\frac{s_n}{n} \to \mu$.
\item (Remove Truncation) Easy Fact: if $x_i \to x$ then $\frac{1}{n} \sum_{n=1}^n x_i \to x$. Then
$$E(Y_i) = E(X_i \delta_{\{X_1 \le i\}}) \rightarrow \mu$$
so
$$\frac{1}{u_n} \sum_{i=1}^{u_n} E(Y_i) \rightarrow \mu$$
$$\lim \frac{T_{u_n}}{u_n} = \mu \textrm{ a.s.}$$
Now, consider $\sum_{n=1}^\infty P \{X_i \neq Y_i \} = \sum_{i=1}^\infty P \{ X_i = i \} \le \int_0^\infty P \{X_1 > t \} dt = E(X_1) < \infty$. So $P(X_i \neq Y_i \textrm{ i.o.}) = 0$ by Borel-Cantelli. So $\frac{s_{u_n}}{u_n} \to \mu$ almost surely.
\item (Interpolation) Go from theorem on subsequences to theorem everywhere. Given any integer $k$, choose $u_n$ so that $u_n \le k \le u_{n+1}$. Then
$$\frac{u_n}{u_{n+1}} \frac{s_{u_n}}{u_n} \le \frac{s_n}{n} \le \frac{u_{n+1}}{u_n} \frac{s_{u_{n+1}}}{u_{n+1}}$$
so
$$\frac{1}{\alpha} m \le \lim \inf \frac{s_n}{n} \le \lim \sup \frac{s_n}{n} = \alpha m \textrm{ a.s.}$$
so take $\alpha = 1 + \frac{1}{m}$. Then we have $\lim \frac{s_n}{n} = m$.
\end{enumerate}
\end{proof}
That was a pretty slick proof. There were four tricks. Call them the Four T's. Truncation, Tchebyshev, Tsubsequences, inTerpolation. Note: Used identically distributed by replacing by $X_1$, but barely used independence. Only used independence by saying sum of variance equal to variance of sum. Thus strong law holds for identically distributed pairwise independent random variables.
\\ \\
There are stationary $\{X_i\}$ which are pairwise independent, $X_i = \pm 1$, $P(X_i = 1) = \frac{1}{2}$, so have strong law, but central limit theorem fails. (Svante Janson). SLLN is a special case of Ergodic Theorem or Martingale Convergence Theorems.
\\ \\
Difference from Weak Law: Weak Law says $P(|\frac{s_n}{n} - \mu | > \epsilon) \to 0$ as $n \to \infty$. Strong Law says $\lim \frac{s_n}{n} - \mu$ almost surely. Strong law says it gets close and stays close forever (almost surely). There are random variables (they don't have a mean, but might fluctuate symmetrically) such that the weak law holds, but the strong law does not. If you want to know about this, look up ``Unfavorable Fair Games''.
\\ \\
The strong law of large numbers has a very clean statement, but no content, because there's no rate of convergence (the way we do for the weak law). What we would like is given $N$ and $\epsilon > 0$, look at the probability $P(|\frac{s_n}{n} - \mu | < \epsilon \textrm{ for all } n \ge N) = f(N, \epsilon)$.
\\ \\
Fact: $E(X_1) < \infty$ is necessary and sufficient for the small law of large numbers.
\\ \\
\begin{theorem}
Say $X_i$ i.i.d. and $E(X_1^-) < \infty$, $E(X_1^+) = \infty$. So $E(X_i) = \infty$. Then $\lim{s_n}{n} = \infty$ almost sure.
\end{theorem}
\begin{proof}
Since $\frac{1}{n} \sum X_i = \frac{1}{n} \sum X_i^+ - \frac{1}{n} \sum X_i^-$ but $\frac{1}{n} \sum X_i^- \to k$. Therefore, it's enough to assume that $X_i > 0$. $E(X_i) = \infty$> Then $\frac{1}{n} \sum_{i=1}^n X_i \ge \frac{1}{n} \sum_{i=1}^\infty X_i \delta_{\{X_i < u \}}$
so
$$\lim \frac{1}{n} \sum_{i=1}^n X_i \ge E(X_1 \delta_{\{X_1 \le u \}})$$
and let $u \to \infty$, and we win by monotone convergence.
\end{proof}
\begin{theorem}
$E(X) = \int_0^\infty P(X \ge t) dt = \int_0^\infty P(X > t) dt$
\end{theorem}
\begin{proof}
Let $X$ takes values $0, 1, 2, ...$ and $P(X = i) = p_i$. $\sum_{i=0}^\infty p_i = 1$. Then $E(X) = \sum_{i=1}^\infty P(X > i)$ (obvious by definition of expectation). Now, let it take on discrete values. It works for the same reason. By 1-2-3 proof, we're basically done.
\end{proof}
Example of use: Guessing game. Deck of $n$ cards labeled $1, ..., n $. Mixed, you try to guess value of each card, told if you're right or wrong. How should you guess? If you use the optimal strategy, what's the expected number of correct cards? Let's imagine we already know optimal strategy: keep guessing a card until you get it or run out of cards. Then the chance of getting $k$ or more correct is $\frac{1}{k!}$. Therefore, $E(X) = \sum_{i=1}^n \frac{1}{i!}$ where $X$ is the number correct. So as you keep playing forever means you get closer and closer to $e$.
\section{Week 7}
Poisson approximation and Stein's method. His lecture notes will be on the website for the course.
\\ \\
$X \sim Poi(\lambda), Y \sim Poi(\mu)$
\subsection{Problem 1}
Show $E[(X - \lambda)^3] = E[(X - \lambda)^4]$, $E[(X - \lambda^5)] = E[(X - \lambda^6)]$, ...
\\ \\
Poisson Heuristic: $\{X_i\}$ are 0/1-valued random variables and $W = \sum_{i \in I} X_i$, then if $P(X_i) = 1$ is small and $\lambda = E(W) = \sum P_i$ ``is a number'' and $\{X_i\}$ are not too dependent, then
$$P(W = j) \sim \frac{e^{-\lambda} \lambda^j}{j!}$$
\\ \\
\subsection{Problem 2}
We introduce $|| ||$ on probabilities on $\mathbb{N}$. Let $||P - Q || = \max_{A \subset \mathbb{N}} | P(A) - Q(A) | = \frac{1}{2} \sum_{n=0}^\infty |P(i) - Q(i)| = \frac{1}{2} \max_{||b||_\infty \le 1} |P(b) - Q(b)|$. The homework problem is to prove these equalities.
\\ \\
Real Example:
\\ \\
(Multiple birthday problem): Drop $n$ balls (people) uniformly at random into $B$ boxes (birthdays). What is the chance that you have $k$ or more balls (people) in the same box (having the same birthday)?
\\ \\
Let $I$ be the $n \choose k$ $k$-sets of $\{1, ..., n\}$. Let $X_S = 1$ if balls in $S$ all fall into the same box, and let $X_S = 0$ otherwise. Consider $P(X_S = 1) = \frac{1}{B^{|S| - 1}}$. Then $P(W = 0)$ is the chance that every $k$-tuple fails.
\\ \\
Just look at this in the notes.
\subsection{Problem 3}
Let $k = 2$, drop $n$ balls in $B$ boxes. The probability of box $i$ is $P_i$ with $0 < P_i < 1$, with $\sum_{i=1}^B P_i = 1$. Fix $B$ and $P_i$. Determine $n$ as a function of $B, P_i$ such that $P(W = 0) = 1/2$ (with error).
\subsection{Problem 4}
Consider $n$ boys and $n$ girls. Color the boys and girls $B$ colors. What is the chance that some boy has the same color as some girl?
\\ \\
The 3 basic problems of elementary probability are:
\begin{enumerate}
\item Birthday Problem
\item Coupon Collector's Problem
\item Matching Problem
\end{enumerate}
Coupon Collector's Problem: Drop $n$ balls into $B$ boxes with probabilities $P_i$. What is the probability that you cover every box? Define $X_i = 1$ if a box is empty, $0$ otherwise. We're interested in the probability $P(W = 0)$ where $\lambda = \sum_{i=1}^B ( 1 - P_i)^n$.
\\ \\
Matching Problem: 2 decks of cards labeled $1, ..., n$ are shuffled. Turned up two at a time. What is the chance of a match? Then let $X_i = 1$ if there is a match at time $i$ and $0$ otherwise. Then $P(W = 0)$ is the chance it fails. In this case, $\lambda = 1/n$.
\\ \\
\subsection{SIX (or Five) Problems}
1. Let $X \sim Poi(\lambda)$. Then $E(X - \lambda)^{2k-1} = E(X - \lambda)^{2k}$. Turns out not to be true for 3 and 4. (So isn't really a problem).
\\ \\
2. Birthday problem, probabilities of boxes are $P_i$.
\\ \\
3. Tot. Var. Equalities
\\ \\
4. Birthday problem boys and girls
\\ \\
5. (Stein) $\lambda \ge 3$, show $|f(k) - f(k+1)| \le \frac{1}{\lambda}$.
\\ \\
6. ``Test for clumping''
\\ \\
Last day, $|I| < \infty$, $X_i$ are zero/one. $P_i = P(X_i = 1)$, $P_{ij} = P(X_i = X_j = 1)$. $W = \sum_{i \in I} X_i$, $\lambda = \sum_{i \in I} P_i$. Then $|| \mathcal{L}_W - Po_\lambda || \le \min(3, 1/\lambda) \{ \sum_{i \in I} \sum_{j \in W_i} P_{ij} + \sum_{i \in I} \sum_{j \in \mathbb{N}} P_i P_j \}$. Have dependency graph, vertex set $I$. We say $i \sim j \Leftrightarrow X_i, X_j$ dependent.
\\ \\
\begin{proof}
(Stein's Method). Ingredient: Say $Z$ is an integer random variable. Then has a Poisson $\lambda$ distribution iff for each bounded $f: \mathbb{N} \to \mathbb{R}$
$$E \{ \lambda f(z+1)- z f(z) \} = 0$$
(This is actually obvious. Just write it all out).
\\ \\
Idea of proof: If $Z$ has $|E( \lambda f(z+1) - z f(z) |$ small, then $\mathcal{L}_Z$ is close to $Po_\lambda$.
\\ \\
For all $A \subset \mathbb{N}$, there exists a unique $f : \mathbb{N} \to \mathbb{R}$ with $f(0) = 0$, $f$ bounded and $\lambda f(k+1) - k f(k) = \delta_A(k) - P_\lambda(A)$. In fact, $|f(k) \le 1.25$ and $|f(k+1) - f(k)| \le \min(3, \frac{1}{\lambda})$.
\\ \\
Application: suppose $Z$ satisfies $E(\lambda f(z+1) - z f(z) = 0$ for all bounded $f$. Choose $f = f_A$. We get $E(\delta_A(Z) - P_\lambda(A)) = 0$, so $P(Z \in A) = P_\lambda(A)$.
\\ \\
Use previous statement to prove theorem.
\\ \\
$$P(W \in A) - P_\lambda(A)) = E ( \lambda f(w + 1) - Wf(W))$$
$$ = \sum_{i \in I} E(P_i(f(W + 1) - X_i f(W) ) = \Delta$$
Set $W_i = W - X_i$ and $V_i = \sum_{j \in N_i} X_j$ so $V_i$ independent of $X_i$.
$$X_i f(W) = X_i f(W_i + 1)$$
$$-\Delta = \sum_i E((X_i - P_i) f(W + 1)) + P_i E(f(W_i+1) - f(W+1))$$
$$ = \sum_{i \in I} E \{ (X_i - P_i) ( f(W_i + 1) - f(V_i) ) \} + P_i E(f(W_i + 1) - f(W + 1))$$
so by previous statement
$$|f(W_i + 1) - f(W + 1) | \le \min(3, \lambda^{-1}) X_i$$
Then for all $i$, $f(W_i + 1) - f(V_i + 1)$ is a telescoping sum of terms of form $F(U + 1) - f(U + X_i + 1)$.
\\ \\
... See continuation of notes on coursework page.
\end{proof}
Note we still need to prove (**)
\begin{lemma}
For all $A$, there exists a unique $f$ such that $f(0) = 0$, $\lambda f(k+1) - k f(k) = \delta_A(k) - P_{\lambda}(A)$ where $|f(k) | \le 1/25$, $|f(k+1) - f(k) | \le \min(3, \lambda^{-1})$.
\end{lemma}
\begin{proof}
First write down $f(k)$. $f(0) = 0$, $f(1) = \frac{ \delta_A(1) - P_\lambda(A) }{\lambda}$. Neat way to continue is to multiply recurrence by $\frac{\lambda^k}{k!}$. Then we get
$$\frac{\lambda^{k+1} f(k+1)}{k!} - \frac{\lambda^k f(k)}{(k-1)!} = \frac{\lambda^k}{k!} (\delta_A(k - P_\lambda(A)))$$
sum up to $k$ to get
$$f(k) = \frac{(k-1)!}{\lambda^k} \sum_{j=0}^{k-1} \frac{\lambda^j}{j!} (\delta_A(j) - P_\lambda(A))$$
... See notes on coursework page
\end{proof}
Remarks: 1. Everything is finite explicitly. Don't need anything to go to infinity. Usually we write $ \mathcal{L}_W \sim Po_\lambda$. Recall we said the strong law of large numbers is pretty empty, since it doesn't have a rate of convergence. But if we want to be quantitative, we have to ``pay'' with $|| ... || \le $ a bit of a mess.
\\ \\
2. The last page of the handout on the web reviews the literature. This treatment (dependency graph approach) is often called the Chen-Stein method. There's an extremely useful introduction to Stein's method, by Arratia-Gordon-Goldstein in Statistical Science.
\\ \\
3. Can replace no edge if independent with no edge if ``almost'' independent.
\\ \\
4. Very similar method works to prove CLT. If $\{X_i\}_{i \in I} \in \mathbb{R}$ and no $X_i$ dominates the rest and they're not too dependent, then $W = \sum_{i \in I} X_i$ and $E(W) = \mu$ and $Var(W) = \sigma^2$, then $\mathcal{L}_W \sim \mathcal{N}(\mu, \sigma^2)$.
\\ \\
5. There are two other approaches to Poisson's approximation using Stein's method. The first is called size-biased coupling (Barbour's Method) and the second is called method of exchangable pairs.
\section{Week 8}
\subsection{Central Limit Theorem}
Heuristic: if $X_1, ..., X_n$ are random variables with no one dominates, not too dependent, then
$$P( \frac{S_n - M_n}{\sigma_n} \le x ) = \Phi(x) $$
Notes will be posted online.
\section{Week 9}
Example: of R.V. that are pairwise independent, but that the CLT fails.
\\ \\
Let $\epsilon_1, ..., \epsilon_n$ be $\pm 1$ independent with probability 1/2. Let $I = \{ (i, j) : 1 \le i < j \le n \}$. Let $X_{(i, j)} = \epsilon_i \epsilon_j$.
\\ \\
These are Random Variables with mean 0. Build a triangular array with $n$-th row $X_{(i, j)}$ enumerated in some way (such that there are ${n \choose 2}$ things in the $n$-th row. Let $S_n = \sum X_{(i, j)}$ in the $n$-th row. Observe $E(X_i) = 0$, $Var(S_n) = {n \choose 2}$ and $X_{(i, j)}$ are pairwise independent.
\\ \\
Claim: The CLT fails for this. Look at $\frac{1}{\sqrt{{n \choose 2}}} S_n = \frac{1}{\sqrt{{n \choose 2}}} \{ \frac{ (\sum \epsilon_i)^2 - \sum \epsilon_i^2 }{2} \}$
$$\sim \frac{1}{\sqrt{2}} \{ \sum_{i=1}^n \frac{\epsilon_i}{\sqrt{n}}^2 - 1 \} \Rightarrow \frac{1}{\sqrt{2}} (z^2 - 1)$$
where $z^2 \sim \chi_1^2$ (which is not Normal).
\\ \\
\subsection{Weak Convergence}
\begin{defn}
On $\mathbb{R}$, suppose $F_n(x), F(x)$ are distribution functions. Say that $F_n \Rightarrow F$ (weak convergence) if $F_n(x) \to F(x)$ for every $x$ which is a continuity point of $F$.
\end{defn}
Example: Suppose that $F_n \leftrightarrow X_n = 1 - 1/n$ with probability 1/2, $1 + 1/n$ with probability 1/2. Let $F \leftrightarrow X = 1$. Then $F_n(1) = 1/2 \not\to F(1) = 1$, but $F_n(x) \to F(x)$ everywhere else.
\begin{defn}
We say $X_n \to_p X$ that $X_n$ converges in probability to $X$ if for all $\epsilon > 0$, $P \{ | X_n - X | > \epsilon \} \to 0$. Note for this, we need $X_n$ and $X$ to be jointly defined.
\end{defn}
\begin{theorem}
Let $X_n$, $X$ jointly defined. Then $X_n \to X$ almost sure $\Rightarrow X_n \to_p X \Rightarrow (X_n \Rightarrow X)$. All converses are false.
\end{theorem}
\begin{proof}
If $X_n \to X$ almost sure, then $P(|X_n - X| > \epsilon) \to 0$. Say $X_n \to_P X$. Then $\forall x \in \mathbb{R}$, $P \{ X \le x - \epsilon \} - P \{ |X - X_n| \ge \epsilon \} \le P \{X_n \le x \} \le P \{X \le x + \epsilon \} - P \{ |X - X_n | \ge \epsilon \}$.
\end{proof}
\begin{theorem}
(Slutzky Theorem) Let $(X_n, Y_n)$ be jointly defined. Let $X_n \Rightarrow X$ and $X_n - Y_n \to_p 0$. Then $Y_n \Rightarrow X$.
\end{theorem}
\begin{proof}
Take any $x \in \mathbb{R}$. Choose $y' < x < y''$ are continuity points of $F_x(.)$. Then, given this choice, $\epsilon > 0$ such that $y' < x - \epsilon < x < x + \epsilon < y''$. Then $P \{ X_n \le y' \} - P \{ |X_n - Y_n| \ge \epsilon \} \le P \{ Y_n \le x \} \le P \{ X_n \le y'' \} + P \{ |X_n - Y_n | \ge \epsilon \}$. Taking $n \to \infty$ gives $P \{X \le y' \} \le \lim\ \inf P \{ Y_n \le x \} \le \lim \sup P \{Y_n \le x \} \le P \{X \le y' \}$.
\\ \\
Now, let $x$ be a point of continuity of $F_x$. Then squeeze in $y'$ and $y''$. Then we're done.
\end{proof}
\subsection{Fourier Transforms and Weak Convergence}
Recall: $F(x)$ is the distribution function of Random Variable $X$. Let $F(x)$ be uniform on $[0,1]$. Suppose $F(x)$ is continuous. Then $P(F(x) \le x) = P(X \le F^{-1}(x)) = F(F^{-1}(x)) = x$.
\\ \\
Also, if $F(x)$ is any distribution function, then $X = F^{-1}(U)$ has distribution function $F(x)$.
\\ \\
So what happens if $F$ is not strictly monotone or is not continuous? Define $F^{-1}(u) = \inf \{ x : u \le F(x) \}$. Then the previous statements hold generally.
\begin{theorem}
(Skorhead) Suppose on $\mathbb{R}$, $F_n$, $F$ are distribution functions with $F_n \Rightarrow F$. Then for some $(\Omega, \mathcal{F}, P)$ and random variables $Y_n$, $Y$ $P\{Y_n \le x \} = F_n(x)$ and $P(Y \le x) = F(x)$ and $Y_n(\omega) \to Y(\omega)$ for each $\omega$.
\end{theorem}
\begin{proof}
Let $\Omega = (0, 1]$. Set $Y_n(u) = F^{-1}(u)$ where $F^{-1}$ is defined as above. Same for $Y(u)$.
\\ \\
Pick $u \in (0, 1)$. Choose $\epsilon > 0$ and $x : Y(u) - \epsilon < x < Y(u)$ and $F \{ x \} = 0$. Must have $u > F(x)$ so $u > F_n(x)$ for all $n$ sufficiently large, so $x < Y_n(u)$ for $n$ sufficiently large. $Y(u) - \epsilon < Y_n(u) \Rightarrow Y(u) \le \lim \inf Y_n(u)$. SImilarly, $u < u'$ so we get
$$\lim \sup Y_n(u) \le Y(u')$$
if $u$ is a point of continuity of $F(x)$.
\\ \\
We get $\lim \sup Y_n(u) \le Y(u)$ at only countably many points of discontinuity. Set $Y_n'(u) = Y'(u) = 0$ at these: $Y' - Y$. Else $P(Y(u) \le x) = F(x)$ and $Y_n'(u) \to Y'(u)$ for all $u$.
\end{proof}
Comment: If $P_n$, $P$ are probabilities on a complete metric space, $P_n \Rightarrow P$ if $\int_\Omega f(\omega) P_n(d \omega) \to \int_\Omega f(\omega) P(d \omega)$ for all bounded, continuous $f$. Skorohead says $P_n \Rightarrow P \Leftrightarrow \exists Y_n, Y$ with $P(Y_n \in B) = P_n(B)$.
\\ \\
Corrolary: on $\mathbb{R}$ if $F_n \Rightarrow F$ then for all bounded and continuous $f$, $\int_{-\infty}^\infty f(x) F_n(dx) \rightarrow \int_{-\infty}^\infty f(x) F(dx)$.
\\ \\
Corrolary if $F_n \Rightarrow F$ then $d_n(t) = \int_{-\infty}^\infty e^{itx} F_n(dx) \to d(t)$.
\begin{theorem}
(Continuity Theorem) $F_n \Rightarrow F \Leftrightarrow d_n(t) \rightarrow d(t)$ for all $t \in (-\infty, \infty)$. That is, if the Fourier transforms converge, the distribution functions do as well.
\end{theorem}
\begin{proof}
$\Leftarrow$\\