-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmeasure_theory.tex
343 lines (276 loc) · 16 KB
/
measure_theory.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
\documentclass{article}
\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\begin{document}
\title{Measure Theory notes}
\author{Dave Neary}
\maketitle
\section{Motivation for Lebesgue integral}
\subsection{Overview of the Riemann integral}
The Riemann integral of a continuous function $f:\mathbb{R} \rightarrow \mathbb{R}$ is defined as:
\[ \int_{a}^{b}f(x) dx = \lim_{n \rightarrow \infty}\sum_{i=1}^{n}f(x_{i}) \Delta x \]
where $x_{i} = a + (i-1)\Delta x$ and $\Delta x=\frac{b-a}{n}$
In other words, we partition the domain of the function into small slices,
and calculate the area under the curve by multiplying the width of the slices
by the value of the function at the beginning of the slice.
This works well for a certain class of functions, called Riemann-integrable
functions. These functions must satisfy the condition that the domain is $\mathbb{R}$,
and that limit above exists.
More generally, we can calculate an upper Riemann sum $U(f)$ by summing the
areas using $\sup{f(x)}$ on each partition, and a lower sum $L(f)$ by using $\inf{f(x)}$
for each interval. The function $f$ is Riemann integrable when $\lim L(f) = \lim U(x)$.
\subsection{Limitations of the Riemann integral}
It is possible to generalize the Riemann integral to two
or more dimensions, but the problem of finding an appropriate partition of the domain
means that for dimensions of the real numbers which are higher than $\mathbb{R}^2$,
the Riemann integral is limited. In addition, we would like to consider other classes
of domains than the reals for functions - for example, probability spaces or generic
Hilbert spaces - where some alternative idea of the area under the curve (or more
generally, the volume of a set) may make sense. Another limitation of the Riemann
integral is that there are useful classes of functions for which it does not converge,
but for which a reasonable value for the integral exists.
Another limitation of the Riemann integral is that there is only a very limited set of
functions for which it is possible to say
\[\int \sum_n f_n(x) dx = \sum_n \int f_n(x) dx \]
Namely, $f_n(x)$ must converge uniformly to $f(x)$, which is a very strong constraint.
As a result of these limitations, the idea of the Lebesgue integral is to partition
the function range instead of the domain. We then identify the subsets of the domain for
specific values of $f(x)$, and calculate their volume using a generic measure function
$\mu$. By taking finer and finer intervals of the range, we can get better and better
estimates of the volume under the function with respect to the domain and the measure.
The remainder of this document will describe the characteristics of a domain, the
constraints required for a measure, which types of functions we can integrate, and
a precise definition of the Lebesgue integral. We will also include a selection of
proofs and problems which we can use the Lebesgue integral to solve.
\section{Lebesgue Measure}
Working backwards, to define what we mean by an integrable function, we will need to
first define how to measure the volume of a subset of a domain (a measure), and to
define a measure, we must first define the types of sets which will be measurable.
\subsection{Measurable spaces}
Starting from a set $X$, a collection of subsets of $X$, $\mathcal{A}$, is called a
$\sigma$-algebra if it satisfies the following conditions:
\begin{enumerate}
\item $X \in \mathcal{A}$
\item For each $A \in \mathcal{A}$, $X \setminus A \in \mathcal{A}$
\item For a countable sequence of subsets $(A_n)_{n \in \mathbb{N}} \in \mathcal{A}$,
\[\bigcup_{n} A_n \in \mathcal{A} \]
\end{enumerate}
We will see when we define a measure why this is called a $\sigma$-algebra.
The pair $(X, \mathcal{A})$ is called a measurable space.
Given any collection of subsets $\mathcal{C}$ of subsets of $X$, we can generate a
smallest $\sigma$-algebra which contains $\mathcal{C}$. That is, there is a $\sigma$-algebra
$\mathcal{A}$ such that if $\mathcal{B}$ is a $\sigma$-algebra containing $\mathcal{C}$, then
$\mathcal{A} \subseteq \mathcal{B}$. We say that such a $\sigma$-algebra $\mathcal{A}$ is
generated by $\mathcal{C}$.
\subsubsection{Examples}
\begin{enumerate}
\item \textbf{Exercise:} If $X=\{1,2,3,4\}$, and the $\sigma$-algebra $\mathcal{A}$ is
generated by $\{\{1,2\},\{2,3\}\}$, what are the other members of
$\mathcal{A}$? \\
\textbf{Answer:} By condition 1 above, $X=\{1,2,3,4\} \in \mathcal{A}$,
and by condition 2, since $X \in \mathcal{A}$, $X \setminus X = \emptyset
\in \mathcal{A}$. Similarly, since $\{1,2\}$ and $\{2,3\} \in \mathcal{A}$,
$X \setminus \{1,2\} = \{3,4\}$ and $X \setminus \{2,3\} = \{1,4\} \in
\mathcal{A}$. By rule 3, $\{1,2\} \cup \{2,3\} = \{1,2,3\}$ and $\{2,3\} \cup
\{3,4\} = \{2,3,4\} \in \mathcal{A}$. And by rule 2 again, $X \setminus
\{1,2,3\} = \{4\}$ and $X \setminus \{2,3,4\} = \{1\} \in \mathcal{A}$. Back to
rule 3, $\{4\} \cup \{1,2\} = \{1,2,4\}$ and $\{1\} \cup \{3,4\} = \{1,3,4\} \in
\mathcal{A}$. Finally, $X \setminus \{1,2,4\} = \{3\}$ and $X \setminus
\{1,3,4\} = \{2\} \in \mathcal{A}$. Since each of the individual elements of
$X$ are in a subset on their own, we can now create all possible subsets
of $X$. $\mathcal{A} = \mathcal{P}(X)$, the power set of all subsets of $X$.
\item \textbf{Exercise:} Prove that for a countable sequence of subsets
$(A_n)_{n \in \mathbb{N}} \in \mathcal{A}$, a $\sigma$-algebra on $X$, that
\[\bigcap_{n} A_n \in \mathcal{A} \]
\textbf{Answer:} Define a sequence of sets $B_n = X \setminus A_n$. Then,
by condition 2, $B_n \in \mathcal{A}$ for all $n$. By condition 3,
\[ B = \bigcup_n B_n \in \mathcal{A} \]
\[ X \setminus B \in \mathcal{A} \]
By Demorgan's laws,
\[
X \setminus \bigcup_n B_n = \bigcap_n (X \setminus B_n) = \bigcap_n A_n
\]
So $\bigcap_n A_n \in \mathcal{A}$. QED.
\item \textbf{Exercise:} $(X, \mathcal{A})$ is a measurable space, with $Y \subset X$.
Prove that $(Y,\mathcal{A^\prime})$ is a measurable space, where
$\mathcal{A}^\prime = \{A \bigcap Y | A \in \mathcal{A}\}$\\
\textbf{Answer:} Since $\emptyset \in \mathcal{A}$, $\emptyset \bigcap Y =
\emptyset \in \mathcal{A}^\prime$. Similarly, since $X \in \mathcal{A}$,
$X \bigcap Y = Y \in \mathcal{A}^\prime$. \\
For any $A \in \mathcal{A}$, $X \setminus A \in \mathcal{A}$, and $(X \setminus A)
\bigcap Y = (X \bigcap Y) \setminus (A \bigcap Y) = Y \setminus (A \bigcap Y)$.
So if $A \bigcap Y \in \mathcal{A}^\prime$, then $Y \setminus (A \bigcap Y) \in
\mathcal{A}^\prime$\\
Finally, let $(A_i)_{i \in \mathbb{N}}$ be a sequence of sets in $\mathcal{A}$. Then
\[\bigcup_{i \in \mathbb{N}} A_i \in \mathcal{A} \]
Define a sequence $(B_i)_{i \in \mathbb{N}}$ with $B_i = Y \bigcap A_i$ for all $i$.
Then
\begin{equation}
\bigcup_{i \in \mathbb{N}} B_i = \bigcup_{i \in \mathbb{N}} (Y \bigcap A_i) \\
= Y \bigcap \left(\bigcup_{i \in \mathbb{N}} (A_i)\right) \in \mathcal{A}^\prime
\end{equation}
Therefore, $(Y, \mathcal{A^\prime})$ is a measure space.
\end{enumerate}
\subsection{Measures}
The extended real numbers is the set $\overline{\mathbb{R}} = \mathbb{R} \bigcup \{-\infty, \infty\}$.
\textbf{Definition:} A measure is a function $\mu:X \rightarrow \overline{\mathbb{R}_{0}^{+}}$ on a
measurable space $(X, \mathcal{A})$ which satisfies the conditions:
\begin{enumerate}
\item $\mu(\emptyset) = 0$
\item if $(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$ is a sequence of pairwise disjoint
sets (that is, $A_i \bigcap A_j = \emptyset$ if $i \ne j$), then:
\[ \mu \left( \bigcup_{i =1}^{\infty} A_i \right) = \sum_{i=1}^{\infty} \mu
\left( A_i \right) \]
\end{enumerate}
This characteristic of being able to turn a countable union of disjoint sets into a sum is why
$\mathcal{A}$ is called a $\sigma$-algebra.
In general, we can think of the measure of a set as its volume, or (for real functions) as
the area under the curve.
A measurable space $(X, \mathcal{A})$ with a measure $\mu$ is called a measure space, and
is written $(X, \mathcal{A}, \mu)$.
We can deduce a number of lemmas from this definition:
\textbf{Lemma:} If $A \subseteq B$ and $A, B \in \mathcal{A}$, then $\mu(A) \le \mu(B)$
\textbf{Lemma:} For a sequence of sets $(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$
\[ \mu \left( \bigcup_{i =1}^{\infty} A_i \right) \ge \sum_{i=1}^{\infty} \mu
\left( A_i \right) \]
\textbf{Lemma:} For a sequence of sets $(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$ with
$A_i \subseteq A_j$ if $i<j$ then
\[ \mu \left( \bigcup_{i =1}^{\infty} A_i \right)= \lim_{i \rightarrow \infty} \mu(A_i) \]
Similarly, for a sequence where $A_i \supseteq A_j$ for $i<j$,
$\mu \left( \bigcap_{i =1}^{\infty} A_i \right)= \lim_{i \rightarrow \infty} \mu(A_i)$
Some examples of measures are the trivial measure $\mu(A)=0$ for all $A \in \mathcal{A}$,
the counting measure $\mu(A) = |A|$ if A is finite, or $\infty$ if it is infinite, and the
Dirac measure $\delta_a(S) = 1$ if $a \in S$ or 0 otherwise.
\textbf{Exercise:} Prove that the trivial, counting, and Dirac functions are measures.
\begin{itemize}
\item \textbf{Trivial measure:} if $\mu(A)=0$ for all $A$, then
$\mu(\emptyset)=0$, and for a pairwise disjoint collection of
sets $(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$,
\[ \mu(\left( \bigcup_{i =1}^{\infty} A_i \right) = 0 \]
and since $\mu \left( A_i \right) = 0$ for all $i$,
\[ \sum_{i=1}^{\infty} \mu \left( A_i \right) = 0 =
\mu(\left( \bigcup_{i=1}^{\infty} A_i \right) \]
Therefore, $\mu$ is a measure.
\item \textbf{Counting measure:} $\mu(\emptyset) = |\emptyset|=0$
For a collection of pairwise disjoint sets
$(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$, if
\[ \mu(\left( \bigcup_{i =1}^{\infty} A_i \right) < \infty \]
then each of $\mu(A_i)$ is finite, and there is a finite
collection of subsets $(A_i)$ of $X$. Each element of
$\bigcup_{i =1}^{\infty} A_i$ is also an element of exactly one $A_i$
and each element of each $A_i$ is also an element of
$\bigcup_{i =1}^{\infty} A_i$ by definition.
If \[ \mu(\left( \bigcup_{i =1}^{\infty} A_i \right) = \infty \], then
for each $a \in \bigcup_{i =1}^{\infty} A_i$, $a \in A_i$ for some $i$.
Therefore, in either case,
\[ \sum_{i=1}^{\infty} \mu \left( A_i \right) =
\sum_{i=1}^{\infty} |A_i| =
\mu(\left( \bigcup_{i =1}^{\infty} A_i \right) \]
Therefore, the counting measure $\mu(A)=|A|$ is a measure.
\item \textbf{Dirac measure:} For an element $a \in X$, $\delta_a(A)=0$
if $a \notin A$
Since $a \notin \emptyset$, $\delta_a(\emptyset) = 0$.
Consider a pairwise disjoint collection of sets
$(A_i)_{i \in \mathbb{N}} \in \mathcal{A}$. If
$a \in \bigcup_{i =1}^{\infty} A_i $ then
\[\delta_a \left(\bigcup_{i=1}^{\infty} A_i \right) = 1 \]
and $a \in A_i$ for some $i \in \mathbb{N}$, and since
$(A_i)$ are pairwise disjoint, $\delta_a(A_i)=1$ and
$\delta_a(A_j)=0$ for $j \ne i$. Then
\[ \sum_{i=1}^{\infty} \delta_a \left( A_i \right) = 1 =
\delta_a(\left( \bigcup_{i=1}^{\infty} A_i \right) \]
If $a \notin \bigcup_{i =1}^{\infty} A_i $ then
$a \notin A_i$ for all $i$, and
\[ \sum_{i=1}^{\infty} \delta_a \left( A_i \right) = 0 =
\delta_a(\left( \bigcup_{i=1}^{\infty} A_i \right) \]
Therefore, $\delta_a$ is a measure.
\end{itemize}
\textbf{Exercise:} Prove the lemmas above.
\subsection{Measurable functions}
Let $\left(\mathcal{X}, \mathcal{A}\right)$ and $\left(\mathcal{Y}, \mathcal{B}\right)$
be measurable spaces. A function $f:\mathcal{X} \rightarrow \mathcal{Y}$ is measurable with
respect to the $\sigma$-algebras $\mathcal{A}, \mathcal{B}$ if, for each subset
$B \in \mathcal{B}$, $f^{-1}(B) \in \mathcal{A}$ (where $f^{-1}(B)$ is the pre-image of
the set $B$ under the function $f$, $\{x \in X \text: f(x) \in B\}$).
Thinking about useful collections of sets for a measure space, for functions mapping onto
$\overline{\mathbb{R}}$, we can generate a $\sigma$-algebra from the set of open intervals
in $\mathbb{R}$, plus $\infty$.
For the open interval $A=(a,b)$. we define the measure $\lambda (A) = b-a$. This measure
is called the Lebesgue measure.
A $\sigma$-algebra $\mathcal{A}$ generated from all of the open subsets of $\mathcal{X}$
is called the Borel $\sigma$-algebra. It is a useful concept, because by choosing a Borel
$\sigma$-algebra, $\mathcal{A}$ is also a topology, and we inherit all of the useful
theorems from topology too.
\textbf{Reminder:} A topological space is a nonempty set $X$ plus a set of subsets $A$ possessing
the properties:
\begin{enumerate}
\item $X, \emptyset \in A$
\item If $O_1 \in A$ and $O_2 \in A$, then $O_1 \bigcap O_2 \in A$
\item For a sequence of sets $\left(O_i\right)_{i\in \mathbb{N}} \in A$, the countable
union $\bigcup_{i\in \mathbb{N}} O_i \in A$
\end{enumerate}
\textbf{Exercise:}
\textbf{Exercise:}
\subsection{Lebesgue Integral}
We can now pull all of these ideas together to define the Lebesgue integral.
We define the characteristic function $\chi_E(x)$ of
the set $E$:
\[ \chi_E(x)=\left\{
\begin{array}{ll}
1 & x \in E\\
0 & x \notin E
\end{array} \right.
\]
A linear combination
\[ \phi(x) = \sum_{i=1}^{n}a_i\chi_{E_i}(x) \]
is called a simple function, if $\phi$ is measurable with respect to the $\sigma$-algebra
generated by the sets $(E_i)$, and assumes only a finite number of values
$\{a_1,a_2,...,a_n\}$.
The Lebesgue integral of a simple function
\[\phi = \int_X \left( \sum a_i \chi_{A_i}(x)\right) d\mu = \sum a_i \mu(A_i) \]
is the result of defining $A_i = \phi^{-1}(a_i)$ (that is,
$A_i = \{x:\phi(x)=a_i\}$) for each of the values $a_i$ that $\phi$ assumes. One consequence
of this definition is that $\left(A_i\right)$ is a sequence of pairwise disjoint sets.
We can define the Lebesgue integral for a measurable non-negative function $f:X \rightarrow
\overline{\mathbb{R}^{+}}$ with respect to a measure space $(X, \mathcal{A},\mu)$ as:
\[ \int_X f(x) d\mu = \sup\left\{\int_X \phi(x) d\mu: 0 \le \phi(x) \le f(x), \phi \textrm{ a
simple function} \right\} \]
In other words, we look over all of the simple functions that are less than $f$, and take
the supremum across all of them.
For functions which are not non-negative, we split $f(x)$ into
\[ g(x) = \max(f(x),0) \]
and
\[ h(x) = - \min(f(x,0)) \]
Then
\[\int_X f(x) d\mu = \int_X g(x) d\mu - \int_X h(x) d\mu \]
In other words, we split $f(x)$ into two non-negative functions, one representing the positive
part of $f$, and one representing the absolute value of the negative part of $f$, and we can
calculate the final interval by removing the negative area from the positive area.
For any continuous functions $f(x): X \rightarrow \overline{\mathbb{R}^{+}}$, we can construct
a sequence of simple functions $\{f_n(x)\}$ which converges pointwise to $f(x)$ as follows.
For $f_n(x)$, partition the range into $2^{2n}+1$ disjoint partitions $\{I_{n,i}\}$ with
\[ I_{n,i} = \left\{
\begin{array}{ll}
\left[\frac{i-1}{2^n},\frac{i}{2^n}\right) & 1 \le i \le 2^{2n} \\[3pt]
\left[\frac{i-1}{2^n},\infty\right) & i=2^{2n} + 1
\end{array} \right. \]
Then define $\{A_{n,i}\}_{i \le 2^{2n} + 1} = f^{-1}(I_{n,i})$, the preimage of $I_{n,i}$.
The collection $\{I_{n,k}\}$ cover $[0,\infty)$ for all $n$. The simple function
\[ f_n(x) = \sum_{i=1}^{2^{2n+1}}\frac{(i-1)}{2^n}\chi_{A_{n,i}}(x) \]
is a sequence of increasing functions which converge pointwise to $f(x)$.
\textbf{Exercise:} Prove that the sequence $f_n(x)$ above converges pointwise to $f(x)=x^2$ for all
$x \in \overline{\mathbb{R}^{+}}$.
\section{Lebesgue Integrals and Probability Theory}
Probability distributions all share some common characteristics which allow the application of measure
theory to be useful. Given a sample space $\Omega$ of possible outcomes, and an event space
$\mathcal{A}$, which is a $\sigma$-algebra, and a probability measure $P$, a measure space
$(\Omega, \mathcal{A}, P)$ is called a probability space if:
\begin{enumerate}
\item $P(\emptyset)=0$
\item $P(\Omega)=1$
\item if $\{A_{i}\}_{i=1}^{\infty } \subseteq {\mathcal{A}}$ is a countable collection of
pairwise disjoint sets, then:
\[ P(\bigcup _{i=1}^{\infty }A_{i}) = \sum_{i=1}^{\infty} P(A_{i}) \]
\end{enumerate}
\end{document}