Expectation

Expectation of a random variable
Simple random variables
Random variables taking countably many values
A couple of warnings
Properties of expectation
Relation of $E(X)$ with values of $X$
Transformation properties
Expectation of a function
Indicator trick
Finite existence of $E(X)$
Problems for practice

Expectation

Expectation of a random variable

For many random variables we see a striking example of statistical regularity. As an example, consider this gambling game:

A fair die is rolled. If it shows an odd number then I pay you Rs 20, else you pay me Rs 10.

A typical plot of my running average gain per game against number of games is as follows:

It is produced by the following code.

w = sample(6,1000,rep=T)
profit =c(-20,10,-20,10,-20,10)
X = profit[w]
avgX = cumsum(X)/(1:1000)
#png('image/explotnow.png')
plot(avgX,ty='l',xlab="#games played",ylab="My running avg gain")
#dev.off()

In fact, it is this phenomenon that first let man to study probability. If you run a gambling game a large number of time the running average profit per game becomes more and more stable. Gamblers wanted to guess this stable value beforehand. They argued as follows:

If I play this game a large number of times (say $n$ times), then approximately $\frac n2$ times I should get $10$ and the remaining $\frac n2$ times I should get $-20.$ So approximately my total gain would be $$ \frac n2\times 10 + \frac n2\times (-20). $$ So the average should be approximately this divided by $n,$ i.e., $$ \frac 12\times 10 + \frac 12\times (-20) = -5. $$

Indeed, this simple argument turns out to be remarkably accurate. Gamblers could not understand why it becomes so accurate as $n$ becomes large. But nevertheless they used this formula to find out what they could expect the random variable to do in the long run.

Simple random variables

A random variable is called simple if takes only finitely many values.

Definition: Expectation of simple random variables Let $X$ be a simpe random variable taking only the values values $x_1,x_2,...,x_k$ with probabilities $p_1,p_2,..., p_k$. Then we define the expectation of $X$ as $$ E(X) = \sum_1^k p_i x_i. $$

EXERCISE 1: (Easy)A random variable $X$ takes the values $-2, -1, 0, 1 $ and $2$ with probabilities $p,q,1-2p-2q, p$ and $q,$ respectively. Find $E(X).$

EXERCISE 2: (Easy)A random variable takes the values $1,2,...,10$ with probabilities $p_1,p_2,...,p_{10},$ respectively, where $\sum_i p_i = 1.$ Prove that $1\leq E(X)\leq 10.$ Also find $p_i$'s if $E(X) = 10.$

Random variables taking countably many values

Definition: If $X$ takes only countably many nonnegative values $x_1, x_2, ...$ with probabilities $p_1,p_2,...$ where $\sum p_i = 1,$ then $E(X)$ is defined as $$E(X) = \sum p_i x_i.$$

For any random variable $X$ we define $X_+ = \max\{X,0\}$ and $X_- = -\min\{X,0\}.$ Notice that both $X_+$ and $X_-$ are nonnegative.

EXERCISE 3: (Easy)Which of the following must be true?

$X = X_++X_-.$
$X = X_+-X_-.$
$X = X_--X_+.$
None of the above.

Definition: If $X$ takes only countably many values $x_1, x_2, ...$ with probabilities $p_1,p_2,...$ where $\sum p_i = 1,$ then $E(X)$ is defined as $$E(X) = \left\{\begin{array}{ll} E(X_+)-E(X_-)&\text{if }E(X_+), E(X_-)<\infty \\ \infty &\text{if }E(X_+)=\infty, E(X_-)<\infty \\ -\infty &\text{if }E(X_+)<\infty, E(X_-)=\infty \\ \end{array}\right..$$

Notice that we leave the case $E(X_+), E(X_-)=\infty $ unmentioned in the definition. This means $E(X)$ is undefined in this case.

EXAMPLE 1: A random variable takes the values $\pm 2^n$ for $n\in{\mathbb N}$ with probabilities $P(X=2^n) = P(X=-2^n) = 2^{-n-1}$ for $n\in{\mathbb N}$. What is $E(X)$?

SOLUTION: Here $X_+$ takes the values 0, 2, $2^2, 2^3,...$ with probabilities $2^{-1}, 2^{-2}, 2^{-3},...$.

So $E(X_+) = 0 + \frac 12+\frac 12+\frac 12+\cdots = \infty$.

Similarly, $E(X_-) = \infty$.

Hence $E(X)$ is undefined. ■

Theorem If $X$ takes only countably many values $x_1, x_2, ...$ with probabilities $p_1,p_2,...$ where $\sum p_i = 1,$ and $\sum |p_i x_i|< \infty,$ then $$E(X) = \sum p_i x_i.$$

It is possible to define expectation of random variables that are more general (taking uncountably many values). We shall give most general definition in the next semester. The following properties actually hold for the genral definition. Unless noted otherwise, we shall only prove them for the case of simple random variables. These proofs are actually the first steps in the general proofs that will come next semester.

A couple of warnings

Many students somehow get the idea that if a (discrete) random variable $X$ takes the values $x_1,x_2,..$. with probabilities $p_1,p_2,...$ where $\sum p_i = 1$, then $E(X) = \sum_i x_i p_i$. But this is wrong! It is wrong even if the sum converges! This formula is true in the following special cases:

If $X$ is simple, i.e., takes only finitely many values.
If $X$ takes only nonnegative (or only nonpositive) values (and here the formula holds even if the sum diverges).
If $\sum_i |x_i| p_i < \infty$.

Here is another point that students sometimes get wrong. When we say $E(X)$ is undefined we mean $E(X)$ is meaningless in that context, and not that $E(X)$ can be anything there.

Properties of expectation

Relation of $E(X)$ with values of $X$

Theorem If $X$ is a degenerate rv (i.e., takes only one value with probability 1), then $E(X)$ equals that value.

Proof:Easy.[QED]

Theorem If $X$ always takes values in $[a,b],$ then $E(X)$ must exist finitely, and lie in $[a,b].$

Proof: Easy. [QED]

The condition "$X$ always lies in $[a,b]$" may be written as $P(X\in[a,b])=1.$

TheoremLet $X$ and $Y$ be random variables taking only finitely many values, and $X\leq Y.$ Then $E(X)\leq E(Y).$

Proof: Here $X\leq Y$ means $\forall \omega\in\Omega~~X(\omega)\leq Y(\omega).$

As already mentioned, we shall restrict the proof to only the case where $X$ and $Y$ are both simple.

Let $X$ take values $x_1,...,x_m,$ and $Y$ take values $y_1,...,y_n.$

Let $p_{ij} = P(X=x_i, Y=y_j).$

Clearly, if $p_{ij}>0,$ then we must have $x_i\leq y_j.$

Now $$\begin{eqnarray*}E(X) & = & \sum_i x_i P(X=x_i) = \sum_i (x_i \sum_j p_{ij}) =\sum_i\sum_j (x_i p_{ij})\\ & \leq & \sum_i\sum_j (y_j p_{ij}) ~~[\because p_{ij}>0\Rightarrow x_i\leq y_j]\\ & = & \sum_j\sum_i (y_j p_{ij})[\because \mbox{addition is associative and commutative}]\\ & = & \sum_j (y_j \sum_i p_{ij}) = \sum_j y_j P(Y=y_j) = E(Y). \end{eqnarray*}$$ [QED]

Theorem Let $a\in{\mathbb R}$ be any number. If $P(X\leq a)=1,$ then $E(X)=a$ if and only if $X$ is degenerate at $a.$

EXERCISE 4: (Easy)Prove this for simple $X$.

However, if $a\in{\mathbb R}$ is replaced by $\infty,$ then the result fails, i.e., It is possible to have a random variable $X$ that is always finite (any real-valued random variable will do, since $\infty\not\in{\mathbb R}$) such that $E(X)=\infty.$ Of course, we cannot get a counterexample using simple random variables. However, such counterexamples exist for random variables taking countably many values, as shown below.

EXAMPLE 2: It is a standard fact that $\sum\frac{1}{n^2}<\infty.$ Let the sum be $c.$ (The exact value of $c$ which is $\frac{\pi^2}{6},$ is of no importance here).

Then consider a random variable $X$ that takes values in ${\mathbb N}$ and $P(X=n)=\frac{1}{cn^2}.$

Then $E(X) = \frac 1c\sum\frac 1n=\infty.$ ■

By the way, if $X$ can take values $x_1,x_2,...$, there is no guaranty that $E(X)$ will equal any of the $x_i$'s. For example, if $X$ is the outcome of a fair die, then $E(X)=3.5,$ which is not a possible outcome.

Transformation properties

Theorem Let $X$ be a random variable and let $a,b$ be constants. Then $E(a+bX) = a+bE(X).$

Proof: Prove it for simple $X$. [QED]

EXERCISE 5: (Easy) If $E(X) = \mu\in{\mathbb R},$ then what is $E(X-\mu)?$

Theorem Let $X,Y$ be two random variables both defined on the same probability space. We assume that both $E(X)$ and $E(Y)$ both exist finitely.

Then $E(X+Y)$ also exists finitely and we have $$ E(X+Y) = E(X)+ E(Y). $$

Next we shall need a new concept, that of a convex function. Graphically, $f(x)$ is a convex function if its graph is like a bowl opening upwards (possibly slanted). Some examples are shown below.

Mathematically we may define a convex function as follows.

While this definition is graphically quite intuitive, you may have seen other definitions of convexity elsewhere. Read here to learn more about equivalences between different definitions of convexity.

Definition: Convex function A function $f:{\mathbb R}\rightarrow{\mathbb R}$ is called convex if $\forall a\in{\mathbb R}$ there is a line $y = \ell_a(x)$ through $(a,f(a))$ that lies on or below the graph of $f(x),$ i.e., $$ \forall x\in{\mathbb R}~~ \ell_a(x) \leq f(x). $$

In the following diagram the blue line is $\ell_a.$ Both the red lines are candidates for $\ell_b.$

Jensen's inequality Let $X$ be a random variable and $f:{\mathbb R}\rightarrow{\mathbb R}$ be any convex function. We assume that $E(X)$ and $E(f(X))$ both exist finitely. Then $f(E(X))\leq E(f(X)).$

Proof: Let $\mu = E(X).$ Consider $\ell_\mu(x)$ as mentioned in the definition of convexity.

Since the graph of $\ell_\mu(x)$ is a straight line passing through $(\mu,f(\mu)),$ hence it must be of the form $$\ell_\mu(x) = f(\mu)+m(x-\mu),~~x\in{\mathbb R}.$$ So $$ E(f(X)) \geq E(\ell_\mu(X)) = E(f(\mu))+mE(X-\mu) = f(\mu)+0 = f(E(X)), $$ as required. [QED]

EXERCISE 6: (Medium)Which is larger $(E(X))^2$ or $E(X^2)?$ Assume that both exist finitely.

Expectation of a function

EXAMPLE 3: Suppose I have a random variable that takes values $-1,0$ and $1$ with probabilities $0.1, 0.5$ and $0.4,$ respectively. What is $E(X^2)?$

SOLUTION: Here $X^2$ is a new random variable. Call it $Y,$ say. Then $Y$ takes values $0$ and $1$ with probabilities $0.5$ each.

So $E(Y) = \frac 12.$ ■

Here is another technique to arrive at the same result. $$ E(X^2) = 0.1\times (-1)^2 + 0.5\times 0^2 + 0.4\times 1^2 = 0.5. $$ This technique is often easier because here we do not need to find the distribution of $Y=X^2$ first. Both these techniques will always give the same answer.

Law of the lazy statistician Let a (discrete) random variable $X$ take values $x_1,x_2,...$ with probabilities $p_1,p_2,...$. Let $h(\cdot)$ be any function defined on the set $\{x_1,x_2,...\}.$ If $\sum |p_i h(x_i)| <\infty,$ then we must have $$ E(h(X)) = \sum p_i h(x_i). $$ Also, if $\sum|p_i h(x_i)|=\infty$ and all but finitely many $h(x_i)$'s are $>0$ (resp, $<0$), then $E(h(X))=\infty $(resp, $-\infty$).

Proof: If $X$ takes only finitely many values, then the result follows from distributivity of multiplication over addition.

If $X $ takes countably infinitely many values, and $h(X)$ is non-negative, then define $$ U_n =\left\{\begin{array}{ll}h(X)&\text{if }X=x_1,...,x_n\\ 0&\text{otherwise.}\end{array}\right. $$ and proceed as for the proof of $E(X)=\sum p_i x_i.$ [QED]

Indicator trick

Often we have to find $E(X)$ where $X$ is the count of something, e.g., number of heads in 100 tosses of coin, or number of times something interesting happens. If you want to find $E(X)$ directly from the definition, then you need to find the distribution of $X$ first, which is often difficult. In such situatons the indicator trick may provide a short cut.

EXAMPLE 4: We have a ring of 20 lamps. A wind blows and a random subset of lamps go out. Find the expected number of singleton lights (i.e., lighted lamps with both neighbours off).


The singletons are shown with arrowheads

SOLUTION: Let $X$ be the number of singletons. Finding the distribution of $X$ is not very difficult, but still we shall demonstrate the use of the indicator trick.

We shall use the arrowheads as our random variables. Let the lamps be numbered from 1 to 20. Define $L_i=1$ if $i$-th lamp is on and is a singleton, and $0$ else. In other words, $L_i=1$ means we have put an arrow head at position $i.$

Each $L_i$ is called an indicator variable.

Clearly $X = L_1+\cdots+L_{20}.$

So $E(X) = E(L_1)+\cdots+E(L_{20}) = 20 E(L_1),$ since by symmetry all the $L_i$'s have the same distribution.

To find $E(L_1)$ we need to find just $P(L_1=1)$, which involves only lamp 1 and its two neighbours. It should be clear that $P(L_1) = \frac{1}{8}.$

Hence $E(X) = \frac{20}{8} = \frac 52.$ ■

Finite existence of $E(X)$

We know from the definition of expectation that it may come in four varieties: it may be finite, or $\infty$ or $-\infty$ or undefined. The finite case is the most useful, and it sometimes helps to know some sufficient conditions for this.

Theorem If $X$ is simple, then $E(X)$ must be finite.

Proof: Goes without saying! [QED]

Non-negative random variables have the advantage that their expectation is always defined (though may be $\infty$). Now, from any random variable $X$ we can easily manufacture a non-negative random variable, viz, $|X|.$ It is good to be able to relate $E(X)$ with $E(|X|).$

Theorem $E(|X|)$ is finite if and only if $E(X)$ is finite.

Proof: We define $X_+, X_-$ as usual.

Then $X = X_+-X_-$ and $|X| = X_++X_-$.

Then finiteness of $E(|X|)$ is equivalent to finiteness of both $E(X_+), E(X_-).$

Again, finiteness of $E(X)$ is equivalent to finiteness of both $E(X_+), E(X_-).$

Hence the result. [QED]

EXERCISE 7: (Medium)If $E(|X|)=\infty,$ then what can you say about $E(X)?$