Notions from measure theory

Riemann's approach
Problem set 1
Lebesgue's approach
Problem set 2
Riemann vs Lebesgue
Problem set 3
First taste of measure theory
Borel $\sigma$-algebra
Caratheodory extension
A toy example
Problem set 4
Back to Lebesgue integral
Problem set 5
Generalising the solution
Allowing unbounded functions and unbounded domains
Allowing negative values
Problem set 6
General measures
Problem set 7
Expectation as Lebesgue integral
Problem set 8
Two technical results
Problem set 9
Additivity
Problem set 10
Monotone convergence theorem (MCT)
Problem set 11
Fatou and DCT
Problem set 12
Radon-Nikodym theorem
Problem set 13

Notions from measure theory$ \newcommand{\calF}{{\mathcal F}} \newcommand{\calP}{{\mathcal P}} \newcommand{\calB}{{\mathcal B}} \newcommand{\calD}{{\mathcal D}} \newcommand{\ind}{{\mathbb 1}} \newcommand{\area}{\mathrm {area}} $

This page will develop the concept of Lebesgue integration starting from Riemann integration.

Riemann's approach

Video for this section

We all know about Riemann integration. We shall illustrate the idea with a positive, bounded function $f:[a,b]\rightarrow{\mathbb R}$. The idea is to measure the area under its graph by approximating it with steps functions with finitely many steps. We do this from both above and below. For this we partition the domain of the function into finitely many intervals and raise rectangles on them as follows.

The intuition is that if we take finer and finer partitions and raise the red rectangles as much as we can under the graph, we shall come arbitrarily close to the area under the graph. If we do the same from above the graph using the blue rectangles, then also we should come arbitrarily close to the same area. So our intuition dictates that

sup (red area) = inf(blue area),

and we plan to use this common value as the area under the curve. This brilliant intuition has just one loop hole, for many functions the sup does not equal the inf! We call such functions non-Riemann integrable, and try to avoid them at all costs. However, these bad functions cannot be completely avoided, as they crop up naturally from time to time, usually as the limit of Riemann integrable functions.

Problem set 1

EXERCISE 1: Draw the upper and lower set of rectangles for the following function using the given partition of the domain.

Lebesgue's approach

Video for this section

Lebesgue had a solution for this. Instead of partitioning the domain his plan was to partition the codomain. So he also got red rectangles below the graph and blue rectangles above as follows.

And like Riemann he also hoped that

sup (red area) = inf(blue area),

and he wanted to call this the area under the curve. Before exploring this idea further, let's get comfortable with splitting the codomain. Here is how you get the red approaximation: draw horizontal lines through the given partition of the codomain. This gives you some storeys as in a multi-storeyed building. For each point of the graph in a storey, bring it down to the floor of the storey to get the red approximation. Raise it to the ceiling of the storey to get the blue approximation.


The black point gives rise to the red and blue points.

Problem set 2

EXERCISE 2: Consider the following graph of a bounded, nonnegative function. Finitely many values are marked on the $y$-axis.

Draw horizontal lines through them, and obtain the red and blue areas.

EXERCISE 3: Repeat for the following function.

Riemann vs Lebesgue

Video for this section

Just based on these diagrams, you may think that Lebesgue's idea is no different from Riemann's idea. But actually, Lebesgue's approximations are more flexible than Riemann's. To understand this look at the graph below, where we have shown the lower Lebesgue approximation using just 4 points in the codomain.


Just three heights, but so many rectangles!

Each value in the codomain, can give birth to many rectangles, depending on the ups and downs of the curve.

Indeed, a single height can give rise to infinitely many "rectangles"! For instance, the function $$f(x) =\left\{\begin{array}{ll}1&\text{if }x\in{\mathbb Q}\cap[0,1]\\ 0&\text{otherwise.}\end{array}\right. $$ takes only two values, 0 and 1. Yet each value is taken infinitely often. So you can now feel why Lebesgue's approximations are more flexible than Riemann's:

Riemann's approximations are special cases of Lebesgue's approximations, but not vice versa.

As a result here the sup(red) and inf(blue) match for a more general class of functions. This also shows that if Riemann's sup(red) and inf(blue) areas meet, then so must Lebesgue's, and the meeting point would be the same.

Now we shall take a rigourous look at Lebesgue's idea. First we need a name for the functions that Lebesgue is using to approximate areas. We shall call them simple functions.

Definition: Simple function A function is called simple if it takes only finitely many values.

We can express a simple function mathematically using indicator functions. Let a simple function take only the values $c_1,...,c_k$ (all distinct). Let $A_i = \{\omega\in\Omega~:~f(\omega) = c_i\}.$ An example is shown below.


Clearly the $A_i$'s partition $\Omega$.

The $A_i$'s need not always be just finite union of intervals. For example, in case of the Dirichlet function, we have just two $A_i$'s, one is ${\mathbb Q}\cap [0,1]$ and the other ${\mathbb Q}^c\cap [0,1].$ However, we always have only finitely many $A_i$'s. We can now write the simple function as $$f(\omega) = \sum_{i=1}^k c_i\ind_{A_i}(\omega).$$ Lebesgue wanted to think that each $c_i$ constributes a "rectangle" with height $c_i$ on the base $A_i.$ Such a "rectangle" should have area $c_i\times$ length of $A_i$. But how to measure length of $A_i$'s? It is this question that first led him to create measure theory.

Problem set 3

EXERCISE 4: There are two countries $T$ and $S.$ Every inhabitant of $T$ is at least as tall as every inhabitant of $B.$ Consider the two statements:

the height of shortest inhabitant of $T$ equals that of the tallest inhabitant of $S.$
the height of shortest lady of $T$ equals that of the tallest lady of $S.$

Then which of the following is/are true?

$1\Rightarrow 2$ but $2\not\Rightarrow 1$
$2\Rightarrow 1$ but $1\not\Rightarrow 2$
$1\Leftrightarrow 2$
$1\not\Rightarrow 2$ and $2\not\Rightarrow 1$

EXERCISE 5: Let $A$ be the set of all step functions with domain $[0,1]$ with finitely many steps. Let $B$ be the set of all simple functions with the same domain. Then which of the following is true?

$A\subsetneq B$
$B\subsetneq A$

First taste of measure theory

Video for this section

Suppose that we are trying to defined Lebesgue integral over an interval $[a,b].$ Then we need to define 'length's of subsets of $[a,b].$ Instead of writing 'length', we shall use the letter $length(A)$ to mean 'length' of $A\subseteq[a,b].$

We know that if $A = [s,t]$, then $length(A)=t-s.$
In particular, $length(\{t\}) = length([a,a]= 0$ have length 0, and so $length((s,t))=length((s,t])=length([s,t))=t-s.$
Length should satisfy countable additivity (over disjoint sets). In other words, if $A_1,A_2,...\subseteq [a,b]$ are disjoint sets for which we have defined length, then $length(\cup A_n)$ should be $\sum length (A_n).$
Thus, the "length" of the set of all rationals in $length({\mathbb Q}\cap [a,b])=0,$ and hence The "length" of all irrationals in $length({\mathbb Q}^c\cap[0,1])=b-a.$

Proceeding in this way we can reasonably hope to extend the definition of $length$ to a class of subsets of ${\mathbb R}$ which contains all the intervals, and is closed wrt countable union, intersection and complementation. Can we really do this? The short answer is "Yes". We shall next see a more detailed answer.

Borel $\sigma$-algebra

Video for this section

First, let us make it clear what we mean by sets generated by a (possibly infinite) collection of subsets of $\Omega.$ For the finite case, we expanded the collection of sets step by step, and the procedure had stopped after the finitely many steps. If we start with infinitely many sets, the process won't end. So we need a careful definition of what we mean by the collection of sets generated by a given initial collection of sets:

Definition: Generated $ \sigma $-algebra Let $\calF$ be the initial (nonempty) collection of subsets of $\Omega.$ We shall call a collection (denoted by $\sigma(\calF)$) of subsets of $\Omega$ the $\sigma$-algebra generated by $\calF$ if

$\sigma(\calF)$ is closed under countable set operations (union, intersection, complementation)
$\calF\subseteq \sigma(\calF)$
$\sigma(\calF)$ is the smallest collection subject to the two condtions above.

By the way, any nonempty collection satisfying the first condition is called a $\sigma$-algebra.

Here the term "smallest" needs some attention. How do we know there is indeed such "the smallest" collection?

[Because...]
The power set of $\Omega$ is a $\sigma$-algebra containing $\calF$. Also, the intersection of any arbitrary number of $\sigma$-algebras is again a $\sigma$-algebra, as is trivially obvious from the definition. So we can take $\sigma(\calF)$ to be the interection of all $\sigma$-algebras containing $\calF.$

For our case, the initial collection consists of all the intervals.

Definition: Borel $ \sigma $-algebra The Borel $ \sigma $-algebra over $[a,b]$ is defined as the $\sigma$-algebra over $[a,b]$ generated by all the intervals in $[a,b].$

We shall denote the Borel $\sigma $-algebra over $[a,b]$ by $\calB([a,b])$ or just $\calB$ if $[a,b]$ is obvious from the context. The members of $\calB([a,b]).$are called Borel sets of $[a,b].$

Caratheodory extension

Video for this section

The Borel $\sigma$-algebra is generated by the intervals. Since we know how to define lengths of intervals, it seems a reasonable hope that we should be able to define length of each $B\in\calB$. Let us explore this hope with a little toy example.

A toy example

Let $\Omega=[0,10]$ which has length $10$. Also let $A,B$ be two intervals in $\Omega$ each having length $6$. Convince yourself that you can generate a collection of 16 sets starting with these. If you are confused, then look at the following diagram.

Now consider the question: since you the lengths of $\Omega, A$ and $B$, can you find out the lengths of all these 16 sets? Now, the answer will be "No", because you need to know how much $A$ and $B$ overlap. So you also need to know $length(A\cap B).$

It should not be difficult to see that this works for any finite initial collection, $\{A_1,...,A_n\}.$ If I tell you their lengths and the lengths of all the possible intersections, then you can find out the lengths of all the sets in the collection generated by these. This result has direct proof by induction.

Now, in our case, we know the lengths of all intervals in $[a,b].$ So we are starting with an infinite collection of sets for which the lengths are known. Since intersections of intervals are again intervals (possibly empty), we indeed know the lengths of all possible intersections, as well. Unfortunately, our induction argument does not work in the infinite case. Of course, intuitive imagination leads us to hope that it would be so. But intuitive imagination sometimes leads to wrong results:

Sum of two natural numbers is a natural number. So by induction the sum of any finite of number of natural numbers is also a natural number. So by the law of intuitive imagination, the sum of all natural numbers $1+2+3+\cdots$ should again be a natural number! Proof: Just keep on adding!

However, intuitive imagination is not always a liar. It does lead us to a plausible guess, which we need to prove/disprove using rigorous argument (in this case, using limits). And that's precisely what we shall do now.

As we have already commented it seems to be a reasonable hope that we should be able to define $length(B)$ for all $B\in\calB.$ This hope indeed comes true, thanks to the following theorem.

Caratheodory extension (one version) Let $\Omega$ be a nonempty set and $\calF$ be a collection of its subsets, closed under (finite) intersection such that $\Omega\in\calF.$ We have a function $\mu:\calF\rightarrow[0,\infty)$ that is countably additive, and $\mu(\Omega) < \infty.$ Then we can uniquely extend it to $\sigma(\calF).$

Proof:We shall not prove this in this course.[QED]

Our reasonable hope comes true once we take $\calF$ to be the collection of all intervals in $[a,b].$

Thus we are now able to define $length(B)$ for all Borel $B\subseteq[a,b].$ Usually, $length(B)$ is called the Lebesgue measure of $B,$ and is denoted by $\lambda(B).$ If $B$ is not Borel, then we take $\lambda(B)$ to be undefined.

Problem set 4

The following exercises show that what we did with "length" can be done with "probability", as well.

EXERCISE 6: Consider two a nonempty set $\Omega$ and two subsets $A,B$ in it. What is the smallest collection of subsets of $\Omega$ that contains $A,B$ and is closed under union, intesection and complementation? In other words, make a list of all subsets of $\Omega$ that you can make using union, intersection and complementation starting with $\Omega.$ Call this collection $\calB.$ Is this is a $\sigma$-algebra?

EXERCISE 7: (Continuation of the last exercise) Take $\calF = \{A,B\}.$ Then clearly $\calB$ is the $\sigma$-algebra generated by $\calF.$ Does this mean that if I specify probabilities for all elements of $\calF,$ then that would uniquely determine probabilities of all elements of $\calB?$ Just to get you started, I am specifying $P(A) = 0.4$ and $P(B) = 0.5.$ Compute (if possible) the probabilities of all events in $\calB.$

Back to Lebesgue integral

As not all subsets are measurable, he naturally restricted his attention to only those simple functions $\sum_i c_i\ind_{A_i}$, where the $A_i\in\calB.$

Historically, Lebesgue did not use Borel $\sigma$-algebra. He had constructed his own $\sigma$-algebra (which is now called the Lebesgue $\sigma$-algebra). It is a much larger superset of the Borel $\sigma$-algebra, and is also more difficult to work with. That is why most modern treatments of measure theory work with the Borel $\sigma$-algebra instead of the Lebesgue $\sigma$-algebra.

The next step in Lebesgue's intuition is to approximate the given function using simple functions, from below and from above.

We want to make sup(red) and inf(blue) equal. It turns out (a non-trivial theorem) that this will happen if and only if $$\forall B\in\calB~~f ^{-1} (B)\in \calB.$$ Such functions are called measurable functions. A word or warning here: Don't forget that we are working with only nonnegative, bounded functions over $[a,b].$ So we are stating that such a function is Lebesgue integrable iff it is measurable.

You will recall that we had arrived at precisely the same condition while defining random variables.

Problem set 5

EXERCISE 8: Find $\int f\, d \lambda,$ where $$f(x) = \left\{\begin{array}{ll}1&\text{if }x\in{\mathbb Q}^c\cap [0,1]\\ 0&\text{otherwise.}\end{array}\right. $$ What is $\int_0^1 f(x)\, dx$ using Riemann integration?

EXERCISE 9: Prove that $\sum_i c_i\ind_{A_i}$ is measureable if and only if $\forall i~~A_i\in\calB.$

Generalising the solution

Lebesgue's approach can be generalised in different ways. We shall discuss these now.

Allowing unbounded functions and unbounded domains

Since the Lebesgue integral exists for all bounded, non-negative measurable functions, hence it is enough to consider only the sup of the approximations from below. This immediately allows us to define Lebesgue integral for unbounded, measurable functions as well. We just allow the sup to be $\infty.$

Similarly we may now carry out the procedure over an unbounded domain, like ${\mathbb R}$ or $(0,\infty)$ etc. There are just a few problems that are easily sorted out:

In Caratheodory extension theorem we had a condition $\mu(\Omega)< \infty.$ This worked well with us since for any $a \leq b $ we had $\lambda([a,b]) = b-a < \infty.$ But clearly $\lambda({\mathbb R}) =\infty.$ Fortunately, Caratheodory extension theorem has a version that replaces the condition $\mu(\Omega)< \infty$ with $\mu(\Omega_i) < \infty$ where $\Omega = \cup_{i=1}^\infty \Omega_i$ is a disjoint union, and each $\Omega_i\in\calF.$ This condition is satisfied here as $\lambda((n,n+1]) < \infty.$
Also, now we need to make sure that for every set in the initial collection, we also know the length of its complement (earlier we could infer $length(A^c)$ as $length(\Omega)-length(A),$ which now may lead to $\infty-\infty.$).
There is also a minor trouble involving defining area of rectangles with base length infinite. Any "rectangle" with base length $\infty$ and positive height has area $\infty$, and any "rectangle" with zero height has zero area (even if its base has length $\infty$).

Now that we are allowing the Lebesgue integral to equal $\infty,$ we need a little shift in terminology: We shall say that a nonnegative function is Lebesgue integrable if its Lebesgue integral is finite.

With this shift in terminology it is possible to have measurable functions that are not Lebesgue integrable (just take some unbounded, measurable function with Lebesgue integral infinite).

Allowing negative values

Moving from non-negative functions to general functions is easy. For $f:{\mathbb R}\rightarrow{\mathbb R}$ we define $f_+ =\max\{f,0\}$ and $f_- =\max\{-f,0\}.$ Then $f = f_+-f_-.$ We define $\int f\, d \lambda = \int f_+\, d \lambda -\int f_-\, d \lambda,$ if both the integrals on the rhs are not $\infty.$

We shall say a function $f$ (possibly taking both positive and negative values) Lebesgue integrable if both $f_+$ and $f_-$ are Lebesgue integrable. This is equivalent to requiring $|f|$ to be Lebesgue integrable.

Problem set 6

EXERCISE 10: Find $\int f(x)\, d \lambda$ where $f:{\mathbb R}\rightarrow{\mathbb R}$ is defined as $f(x) =\left\{\begin{array}{ll}2&\text{if }x\in{\mathbb Q}^c\\-1&\text{otherwise.}\end{array}\right.$

EXERCISE 11: Find $\int f(x)\, d \lambda$ where $f(x)=x.$

EXERCISE 12: Show that $\int f\, d \lambda$ exists, then so must $\int (-f)\, d \lambda$ and $\int (-f)\, d \lambda = -\int f\, d \lambda.$

General measures

For a simple function $\sum_{i=1}^n c_i 1_{A_i}$ we defined the Lebesgue integral as $$\sum_{i=1}^n c_i \lambda(A_i),$$ where $c_i$ is the height of the $i$-th rectangle, and $\lambda(A_i)$ is the "length" of its base. Then we took supremum and infimum etc. It turns out that the entire process of defining Lebesgue integrals need only two properties of Lebesgue measure: its non-negativity and countable additivity over disjoint sets. The fact that the length of $(a,b)$ is $b-a$ is not important. This motivates the following generalisation of the concept of "length":

Our $f:{\mathbb R}\rightarrow{\mathbb R}$ may be replaced by $f:\Omega\rightarrow{\mathbb R}$ for any non-empty set $\Omega.$ Typical choices for $\Omega$ could be ${\mathbb R}^2$ or ${\mathbb R}^3$ or some finite set. However, any non-empty set would do in general.
We just need some way to measuring the size of subsets of $\Omega.$ For ${\mathbb R}$ we used "length". For subsets of ${\mathbb R}^2$ we may use "area", while "volume" may be used for subsets of ${\mathbb R}^3.$ For a finite $\Omega$ we may use "cardinality". All that we need is that the measure should be non-negative, and should add up over countably many disjoint sets, i.e., if we denote the "measure of size" of a set $A\subseteq\Omega$ by $\mu(A)$ then we want $\mu(A)\geq 0$ and $\mu(\cup_i A_i) = \sum_i\mu(A_i)$ for disjoint subsets $A_1,A_2,...$ of $\Omega.$
As in the case of "length", it may not be possible in general to define such $\mu$ for all subsets of $\Omega.$ It is enough to be able to define it for a class of subsets of $\Omega$. But in order to do mathematical manipulations, that class should be closed under countable set operations (union, intersection, complementation). Any such class, as we have already learned, is called a $\sigma$-algebra. Once we have decided upon the $\sigma$-algebra and the measure to use, we shall work with simple functions $\sum_{i=1}^n c_i 1_{A_i}$ where the $A_i$'s are in the $\sigma$-algebra, so that we may compute $\mu(A_i).$

Let us write down the definition of a measure clearly.

Definition: Measure Let $\Omega$ be any non-empty set. Let $\calF$ be any $\sigma$-algebra on $\Omega.$ Then by a measure we understand a function $\mu:\calF\rightarrow[0,\infty]$ such that $$\forall \mbox{disjoint }A_1,A_2,...\in\calF~~\mu(\cup A_i) = \sum\mu(A_i)$$

Notice that we have allowed $\mu(A) = \infty.$ For instance, "length" of $(0,\infty)$ is $\infty,$ and "area" of ${\mathbb R}^2$ is also $\infty.$

Everything else now follows as in case of "length", we have "red rectangles" from below and "blue rectangles" from above. For a very general class of functions we have sup(red) = inf(blue). For any such nice function, $f$, we define this common value to be the Lebesgue integral of $f$ wrt $\mu$, and write it as $$\int f\, d\mu.$$ In particular, if $f$ is itself a simple function $$f(x) = \sum_{i=1}^n c_i 1_{A_i},$$ then we have $$\int f\, d\mu = \sum_{i=1}^n c_i \mu(A_i).$$ Of course, we need all the $A_i$'s to be in the $\sigma$-algebra we are using. Otherwise, $\mu(A_i)$ would not make any sense.

Problem set 7

EXERCISE 13: Let $\Omega = \{1,...,9\}.$ We defne, for any $A\subseteq\Omega$, its measure as $\mu(A) = |A|,$ the number of elements in $A.$ Consider $f:\Omega\rightarrow{\mathbb R}$ as $f(i) = i (\mbox{mod }2) + 1.$ Find $\int f\, d\mu.$

EXERCISE 14: Show that any $\sigma$-algebra on $\Omega$ must contain $\phi$ and $\Omega.$

EXERCISE 15: Is $\calP(\Omega)$ a $\sigma$-algebra on $\Omega?$

[Hint]

Yes. It is the largest $\sigma$-algebra on $\Omega.$

EXERCISE 16: Is $\{\phi,\Omega\}$ a $\sigma$-algebra on $\Omega?$

[Hint]

Yes, it is the smallest $\sigma$-algebra on $\Omega.$

EXERCISE 17: Show that for any measure $\mu(\phi)=0.$

EXERCISE 18: If $\mu(A) = 3$ and $\mu(B) = 5.5$ and $\mu(A\cap B) = 2$, then what is $\mu(A\cup B)?$

When we take some probability measure in place of $\mu,$ we get the familiar definition of expectation.

Expectation as Lebesgue integral

We have mentioned earlier that for Lebesgue integrals sup(red)=inf(blue) for a rather general class of functions. In particular, when we worked with $f$ defined on ${\mathbb R}$ equipped with $\calB$ and $\lambda$, we had given a characterisation of all functions for which the Lebesgue integral $\int f\, d \lambda$ exists: $$\forall B\in\calB~~f ^{-1}(B)\in\calB.$$ It is not unexpected that if $f$ is defined on $\Omega$ equipped with $\calF$ and $\mu,$ then the characterisation is $$\forall B\in\calB~~f ^{-1}(B)\in\calF.$$ Such functions are called measurable functions. These functions are very nicely behaved. In particular, limits of measurable functions are again measurable. This is the most important reason for preferring Lebesgue integration over Riemann integration. If $f_n\rightarrow f$ pointwise, and $\int f_n\, d \mu$ exist for each $n,$ then the existence of $\int f\, d \mu$ is immediately guaraneteed.

Interestingly, the concept of a measurable function also arises in a different way in probability theory. Let $X$ be a random variable. We know that underlying every random variable there is a random experiment, or more precisely, a probability space $(\Omega,\calF, P)$ such that $X:\Omega\rightarrow{\mathbb R}.$ When we want to talk about things like $P(X\in B)$ we actually mean $P(\{\omega~:~X(\omega)\in B\})$ or $P(X ^{-1}(B)).$ In order for this to be defined we need $X ^{-1}(B)\in\calF.$ In other words, we need a random variable to be measurable. Indeed, this is part of the definition of a random variable.

Definition: Random variable Let $(\Omega,\calF, P)$ be a probability space. By a random variable we mean a measurable function $X:\Omega\rightarrow{\mathbb R},$ i.e., $$\forall B\in\calB~~X^{-1}(B)\in\calF.$$

Problem set 8

EXERCISE 25: Characterise measurable functions from $({\mathbb R},\calP({\mathbb R}))$ to $({\mathbb R},\{\phi,{\mathbb R}\}).$

EXERCISE 26: Characterise measurable functions from $({\mathbb R},\{\phi,{\mathbb R}\})$ to $({\mathbb R},\calP({\mathbb R}))$.

Two technical results

Theorem If $f:\Omega\rightarrow[0,\infty)$ is any function, then there is a non-decreasing sequence $(s_n)$ of simple functions such that $$\forall \omega\in\Omega~~s_n(\omega) \uparrow f(\omega).$$

In our course we shall call any such sequence $(s_n)$ a simplification of $f$ It is not a standard term.

Proof: For $n\in{\mathbb N}$ and $\omega\in\Omega$ we define $s_n$ as follows. First partition the codomain $[0,\infty)$ into $2$ intervals $[0,n)$ and $[n,\infty)$ and then subdivide the first into equal subintervals of length $2^{-n}.$ So you get $N=n2^n+1$ subintervals in all. Call these $[a_1,a_2),[a_2,a_3),...,[a_N,a_{N+1}),$ where $a_N=n$ and $a_{N+1}=\infty$. These constitute a partition of the codomain.

Now set $s_n(\omega) = a_k$ if $f(\omega) \in[ a_k,a_{k+1})$ for $k\in\{1,...,N\}$.

The following picture shows this process for $n=1$ and $n=2.$


Notice how the subdivisions for $n=2$ fit into those for $n=1.$

For each $\omega\in\Omega$ and for each $n\in{\mathbb N}$ we have $s_n(\omega)\leq s_{n+1}(\omega).$


A blue point cannot fall below the corresponding red point

[Because...]
If $s_n(\omega) = a$ and $s_{n+1}(\omega) = b,$ then $f(\omega)\in[a,a+2^{-n})$ and also $f(\omega)\in[b+2^{-n-1}).$
So, by the contruction of the partitions, $[b,b+2^{-n-1})\subseteq[a,2^{-n}).$
Thus, $a\leq b,$ as required.

Again, for each $\omega\in\Omega$ we have $s_n(\omega)\rightarrow f(\omega).$

[Because...]
To show:

Target
$\forall \omega\in\Omega~~\forall \epsilon>0~~\exists M\in{\mathbb N}~~\forall n\geq M ~~|f(\omega)-s_n(\omega)| < \epsilon.$

$\forall \omega$
Take any $\omega\in\Omega.$

$\forall \epsilon$
Take any $\epsilon>0.$

$\exists M$
Choose $M\in{\mathbb N}$ such that $M> f(\omega)$ and $2^{-M} < \epsilon.$ (Possible since ${\mathbb N}$ is unbounded above and $2^{-n}\rightarrow 0$ as $n\rightarrow \infty.$

$\forall n$
Take any $n\geq M.$

Check
Since $f(\omega) < M\leq n,$ hence $s_n(\omega) < n.$
Thus, $f(\omega) \in [s_n(\omega),s_n(\omega)+2^{-n}).$

This completes the proof. [QED]

The next step is to show that the red areas indeed converge to the supremum (since we are also allowing the supremum to be $\infty$, we should better say diverge to the supremum in that case).

Theorem Let $P$ be a probability on $\Omega$. If $f:\Omega\rightarrow[0,\infty)$ is measurable, and $s_n:\Omega\rightarrow[0,\infty)$'s constitute a measurable simplification of $f$, then $\int s_n\, dP\uparrow \int f\, dP.$

Proof: We shall only deal with the case $\int f\, dP < \infty.$ The case $\int f\, dP = \infty$ is left as an exercise.

The proof proceeds in a somewhat counterintuitive way. So read carefully.

We shall start by noticing that $\lim\int s_n\, dP$ indeed exists.

[Because...]
$\left(\int s_n\, dP\right)$ is a non-decreasing sequence bounded above (by $\int f\, dP$).

We want to show that this limit equals $\int f\, dP$. For this it is enough to show it is $\geq \int z\, dP$ for any nonnegative, simple function $z\leq f$.

[Because...]
$\int f\, dP = \sup\{\int z\, dP~:~ z\leq f,~~z \mbox{ simple}\},$

For this it is enough to show that $\forall \delta>0~~\lim \int s_n\, dP \geq \int z\, dP- \delta.$

[Because...]
Then $\lim \int s_n\, dP - \int z\, dP\geq -\delta.$
Since $\delta>0$ is arbitrary, this means $\lim \int s_n\, dP - \int z\, dP \geq $ all negative numbers. Only non-negative numbers can have that property!

So far, we have been just rewording our target. Now we start the main argument. We are showing:

$\forall$ simple measurable $0\leq z\leq f~~\forall \delta>0~~\lim \int s_n\, dP \geq \int z\, dP- \delta.$

Take any simple, measurable function $0\leq z\leq f$.

Take any $\delta>0.$

Let $A_n =\{s_n\geq z-\delta\}.$

Then $A_n\uparrow\Omega.$

[Because...]
Since $s_n$'s are non-decreasing, hence $A_1\subseteq A_2\subseteq A_3\subseteq\cdots.$
Also since $\forall\omega\in\Omega~~s_n(w)\uparrow f(w),$ hence $\cup_n A_n=\Omega.$

So $$\begin{eqnarray*} \int s_n\, dP & \geq & \int_{A_n} s_n\, dP ~~\left[\mbox{$\because s_n\geq 0$}\right] \\ & \geq & \int_{A_n}z-\delta\, dP~~\left[\mbox{$\because s_n \geq z-\delta$ over $A_n$.}\right]\\ & = & \int z\, dP-\int_{A_n^c}z\, dP-\delta P(A_n)\\ & = & \int z\, dP-\int_{A_n^c}z\, dP-\delta~~\left[\mbox{$\because P(A_n)\leq 1$.}\right]\\ & \geq & \int z\, dP-M\mu(A_n^c)-\delta, \end{eqnarray*}$$ where $M = \max Z$ (exists finitely, since $z$ is simple).

Hence $\lim \int s_n\, dP > \int z\, dP- M\times 0-\delta,$ completing the proof. [QED]

Since this last result deals with a probability, hence the integral is just expectation. So we can write the result as follows.

If $X$ is a non-negative random variable, and $X_n$'s are non-negative simple random variables such that $X_n\uparrow X$, then $E(X_n)\uparrow E(X)$.

Problem set 9

EXERCISE 27: Show that if $f$ in the first theorem is a measurable function, then the simplification constructed in the proof is also measurable.

EXERCISE 28: Show that the convergence in the first theorem is uniform if $f$ is bounded.

EXERCISE 29: Show that if, in the first theorem above, $f$ is measurable (w.r.t. any given $\sigma$-field $\calF$ over $\Omega$ and the Borel $\sigma$-field over ${\mathbb R}$), then so must be each $s_n.$

Additivity

We had stated last semester that if $X,Y$ are two jointly distributed random variables with expectations, and $a,b\in{\mathbb R}$ are any two numbers, then $aX+bY$ is also a random variable with expectation, and $E(aX+bY) = aE(X)+bE(Y).$

First we show that $E(X+Y) = E(X)+E(Y)$ in three steps.

Step 1: Show this when $X,Y$ are simple random variables. We have already done this last semester.

Step 2: Show this for non-negative $X,Y.$ Let $(S_n)$ and $(T_n)$ be simplifications for $X$ and $Y,$ respectively. Then $(S_n+T_n)$ is a simplification for $X+Y.$

Also $E(S_n+T_n) = E(S_n)+E(T_n).$ Te result now follows on taking limit of both sides.

Step 3: Show this for general $X,Y.$ Here we apply step 2 to $X_+, X_-, Y_+$ and $Y_-.$

Then we show that for $a>0$ we have $E(aX) = aE(X).$ This proof also proceeds in three steps (left as an exercise).

Finally, we show $E(-X)= -E(X).$ Let $Y = -X.$ Then $Y_+ = X_-$ and $Y_- = X_+.$ So $E(Y) = E(Y_+)-E(Y_-) = E(X_-)-E(X_+) = -E(X).$

Problem set 10

Monotone convergence theorem (MCT)

We would have been very happy, had there been a result saying: Whenever $X_n\rightarrow X$ we have $E(X_n)\rightarrow E(X)$. Unfortunately, this is not true in general (think of counterexamples). So we search for extra conditions under which it will be true.

The following theorem is just a restatement of the second technical result discussed earlier:

Theorem Let $X_n\rightarrow X$. Assume

$(X_n)$ is a non-negative, non-decreasing sequence,
$X_n$'s are all simple.

Then $E(X_n) \rightarrow E(X)$.

Interestingly, the last condition may be dropped (i.e., $X_n$'s need not be simple). This gives rise to the theorem below.

Monotone convergence theorem (MCT) Let $X_n\rightarrow X$. Assume

$(X_n)$ is a non-negative, non-decreasing sequence.

Then $E(X_n) \rightarrow E(X)$.

Proof: Enough to show simple random variables $Y_n$ such that $Y_n\uparrow X $ and $Y_n\leq X_n.$

[Because...]
We already know $E(Y_n)\uparrow E(X).$ But $E(X_n)$ is sandwiched between $E(Y_n)$ and $E(X).$

Let $(Z_{n,k})_k$ be a simplification of $X_n.$


Think of the $Z_{n,k}$'s like an infinite "matrix".

Each row of this "matrix" is a non-decreasing sequence.

Let $Y_n = \max\{Z_{1,n},...,Z_{n,n}\}.$

It is like extracting the upper triangular half of the "matrix" and taking maximum of each column.


How $Y_n$'s are computed

Then $Y_1\leq Y_2\leq\cdots$

[Because...]
$$\begin{eqnarray*} Y_{n+1} & = & \max\{Z_{1,n+1},...,Z_{n+1,n+1}\}\\ & \geq & \max\{Z_{1,n+1},...,Z_{n,n+1}\}~~\left[\mbox{$\because$ superset cannot have smaller max}\right]\\ & \geq & \max\{Z_{1,n},...,Z_{n,n}\}, \end{eqnarray*}$$ by non-decreasing property of $Z_{n,k}$ w.r.t. $k.$

Also $Y_n\leq X_n.$

[Because...]
$Z_{k,n}\leq X_k\leq X_n.$

Finally, $Y_n\uparrow X.$

[Because...]
We have $Y_k\leq X_k\leq X$. So $(Y_k)$ is a non-decreasing sequence bounded from above. So $\lim_k Y_k$ exists and $\lim_k Y_k\leq X$.
We have $Z_{n,k} \leq Y_k$ for $k\geq n$.
Taking limit as $k\rightarrow \infty,$ we have $X_n\leq \lim_k Y_k.$
Now taking limit as $n\rightarrow \infty,$ we have $X\leq \lim_k Y_k.$
Hence $\lim_k Y_k= X.$

This completes the proof. [QED]

Problem set 11

EXERCISE 30: If $(X_n)$ is a nonincreasing sequence of nonnegative random variables converging to some random variable $X,$ and $E(X_1)<\infty,$ then show that $E(X_n)\downarrow E(X).$ What if the assumption $E(X_1)<\infty$ is dropped?

EXERCISE 31: Suppose that $X_n$'s are nonnegative random variables. Show that $$E(\sum_1^\infty X_n) = \sum_1^\infty E(X_n).$$

Fatou and DCT

Fatou's lemma Let $(X_n)$ be a sequence of nonnegative random variables. Then $$E(\liminf X_n) \leq \liminf E(X_n).$$

Proof: Let $Y_n = \inf\{X_k~:~k\geq n\}.$

Then, by the definition of $\liminf$, we have $Y_n\uparrow \liminf X_n.$

So, by MCT, $E(Y_n)\rightarrow E(\liminf X_n).$

Now $Y_n \leq X_n,$ and hence $E(Y_n) \leq E(X_n).$

Hence $$E(\liminf X_n) \leq \liminf E(X_n),$$ as required. [QED]

Dominated Convergence Theorem (DCT) Let $X_n\rightarrow X.$ If $\forall n~~|X_n|\leq Y$ for some $Y$ with $E(|Y|)< \infty$, then $E|X_n-X|\rightarrow 0$ and so, in particular, $E(X_n)\rightarrow E(X).$

Proof: Clearly, $|X|\leq Y.$

So, by triangle inequality, $|X_n-X|\leq |X_n|+|X|\leq 2Y.$

Let $Z_n = 2Y-|X_n-X|.$ Then $Z_n$'s are all nonnegative random variables.

Applying Fatou's lemma to $(Z_n)$, we have $$E(\liminf Z_n)\leq \liminf E(Z_n).\hspace{1in} \mbox{(*)}$$ Now $$\liminf Z_n = 2Y-\limsup|X_n-X| = 2Y,$$ and $$\liminf E(Z_n) = 2E(Y)-\limsup E|X_n-X| .$$ So (*) becomes $$2E(Y)\leq 2E(Y)-\limsup E|X_n-X|,$$ or $\limsup E|X_n-X|\leq 0.$

Hence $E|X_n-X|\rightarrow 0,$ as required. [QED]

Problem set 12

EXERCISE 32: Show that $E(|X_n-X|)\rightarrow 0$ implies $E(X_n)\rightarrow E(X)$. Show that the converse is not true in general.

[Hint]

Toss an unbaised coin. For each $n\in{\mathbb N}$ we let $X_n = \left\{\begin{array}{ll}1&\text{if }\mbox{head}\\-1&\text{otherwise.}\end{array}\right..$ Thus all the $X_n$'s are exactly the same (based on the same toss). Also take $X\equiv 0$.

Then $\forall n\in{\mathbb N}~~E(X_n) =0$, and hence $E(X_n)\rightarrow E(X)$.

But $|X_n-X| \equiv 1$. SO $E(|X_n-X|) \not\rightarrow0$.

Radon-Nikodym theorem

Radon-Nikodym theorem Let $\mu$ be any $sigma$-finite measure on $(\Omega,\calF).$ Let $\nu$ be another meaure on $(\Omega,\calF)$ with the property that $$\forall B\in\calF~~(\mu(B)=0\Rightarrow\nu(B)=0).$$ Then there is a measurable f $f:\Omega\rightarrow{\mathbb R}$ such that for any measurable function $h:\Omega\rightarrow{\mathbb R}$ we have $$\int h\, d\nu = \int hf\, d\mu.$$ This $f$ is called a density of $\nu$ wrt $\mu.$

Proof:Omitted.[QED]

We have used a special case of this theorem, where $\nu$ is a probability measure and $\mu$ is the Lebesgue measure. Such probability measures are called absolutely continuous. We have worked with the special case where we had a density that was Riemann integrable as well.