[Home]

Table of contents


Finite existence of $E(X)$

We know from the definition of expectation that it may come in four varieties: it may be finite, or $\infty$ or $-\infty$ or undefined. The finite case is the most useful, and it sometimes helps to know some sufficient conditions for this.

Theorem If $X$ is simple, then $E(X)$ must be finite.

Proof: Goes without saying! [QED]

Non-negative random variables have the advantage that their expectation is always defined (though may be $\infty$). Now, from any random variable $X$ we can easily manufacture a non-negative random variable, viz, $|X|.$ It is good to be able to relate $E(X)$ with $E(|X|).$

Theorem $E(|X|)$ is finite if and only if $E(X)$ is finite.

Proof: We define $X_+, X_-$ as usual.

Then $X = X_+-X_-$ and $|X| = X_++X_-$.

Then finiteness of $E(|X|)$ is equivalent to finiteness of both $E(X_+), E(X_-).$

Again, finiteness of $E(X)$ is equivalent to finiteness of both $E(X_+), E(X_-).$

Hence the result. [QED]

::

EXERCISE 1: If $E(|X|)=\infty,$ then what can you say about $E(X)?$

Hint:

May be $\infty$ or $-\infty$ or undefined.

Theorem Let $X,Y$ be random variables defined on the same probability space. Let $|X|\leq|Y|.$ If $E(Y)$ is finite, then $E(X)$ must also be finite.

Proof: Since $E(Y)$ is finite, hence $E(|Y|)$ is finite. So $E(|X|)\leq E(|Y|)$ is also finite. Hence $E(X)$ is finite. [QED]

Theorem Let $X,Y$ be random variables defined on the same probability space. Let $E(X)$ and $E(Y)$ both be finite. Then $E(\max\{X,Y\})$ must also be finite.

Proof: Do it yourself. [QED]

Theorem Let $m<n$ be any two positive integers. If $E(X^n)$ exists finitely, then $E(X^m)$ must also exist finitely.

Proof: Use the fact that $|X^m| \leq \max\{1,|X^n|\}.$ Now use the last theorem. [QED]

The following theorem often proves very useful.

Markov inequality Let $X$ be any non-negative random variable. Let $\epsilon>0.$ Then $$E(X)\geq \epsilon P(X\geq\epsilon).$$

Proof:

Look at this diagram and see if you can prove the theorem.

[QED]

Variance

A random variable is, well, random. So it may very well differ from its expectation. By how much? A lot or a little? We can use expectation to find that out.

Definition: Variance If $E|X|<\infty,$ then we define variance of $X$ as $$ V(X) = E\big[ (X-E(X))^2 \big]. $$ It is either finite or $\infty.$

Theorem $V(X)\geq 0.$

Theorem If $E(X^2)<\infty,$ then $V(X)$ exists finitely, and $V(X) = E(X^2)-\big( E(X) \big)^2.$

Theorem $V(aX+b) = a^2 V(X).$

Theorem $V(X)=0$ if and only if $X$ is a degenerate random variable.

Chebyshev is also credited with designing a quadruped robot-like linkage.
Chebyshev inequality Let $V(X)<\infty.$ Then $$ \forall \epsilon>0~~P(|X-E(X)| \geq \epsilon) \leq \frac{V(X)}{\epsilon^2}. $$

Proof: Take any $\epsilon>0.$

Let $E(X)$ be denoted by $\mu.$

Define $$ f(x) = \left\{\begin{array}{ll} \epsilon^2 &\text{if }|x-\mu|\geq \epsilon\\ 0 &\text{otherwise.} \end{array}\right. $$

Then $\forall x~~f(x)\leq (x-\mu)^2.$

So $$\begin{eqnarray*} V(X) & = & E(X-\mu)^2\\ & \geq & E(f(X))\\ & = & \epsilon^2 P(|X_i-\mu| \geq\epsilon) + 0\times P(|X_i-\mu| <\epsilon). \end{eqnarray*}$$ Hence the result. [QED]

Moments

Definition: Raw and central moments The $k$-th raw moment of $X$ is $$ E(X^k) $$ and the $k$-th central moment of $X$ is $$ E\big[ (X-E(X))^k \big]. $$

Moment generating function

Definition: Moment generating function (MGF) For any random variable $X$ we define its moment generating function as the function $$ m_X(t) = E(e^{tX}). $$ The domain of this function conists of all $t\in{\mathbb R}$ for which the expectation exists.

Clearly, $m_X(0)$ always exists and equals $1.$

Theorem If, for some $k\in{\mathbb N}$, the moment $E(X^k)$ exists finitely, then the $k$-th derivative of $m_X(t)$ exists at $t=0,$ and equals $E(X^k).$

Proof: We shall not do the proof here. But here is the main idea: $$ e^{tX} = 1 + \frac{tX}{1!} + \frac{t^2X^2}{2!} + \frac{t^3X^3}{3!} + \cdots. $$ From this we want to write $$ E(e^{tX}) = 1 + \frac{tE(X)}{1!} + \frac{t^2E(X^2)}{2!} + \frac{t^3E(X^3)}{3!} + \cdots. $$ This is not a precise statement, because we do not know if all raw moments of $X$ exist finitely. Also, even if they do, is it valid to "distribute" expectation over an infinite sum?

Answers to these questions require deeper real analysis results than we know at this point.

However, assuming that this is valid, we may try to differentiate both sides to get $$ \frac{d}{dt} E(e^{tX}) = E(X) + \frac{2tE(X^2)}{2!} + \frac{3t^2E(X^3)}{3!} + \cdots. $$ Again this step needs justification. Can we "distribute" differentiation over an infinite sum?

Assuming that we can, puting $t=0$ indeed gives us $E(X).$

SImilarly, differentiating once again, and putting $t=0$ gives us $E(X^2),$ and so on. [QED]

We shall not spend much time with MGFs, because there is a better alternative called the characteristic function (CF).

Definition: Characteristic function (CF) For any (real-valued) random variable $X$ we define its characteristic function as the function $$ \phi_X(t) = E(e^{itX}),~~t\in{\mathbb R}. $$
Don't be nervous to see expectation of a complex random variable. It is simply $$ E(\cos tX) + i E(\sin tX). $$ CFs are better than MGFs because of two reasons, that we give as theorems below.

Theorem For any (real-valued) random variable, the CF is defined over entire ${\mathbb R}.$

Proof: This is obvious, since $\sin tX$ and $\cos tX$ are both bounded random variables, and hence have finite expectations. [QED]

Theorem If $X,Y$ are two random variables with the same CF, then they must have the same distribution.

Proof: Not in this course. [QED]

Indeed, this property has earned characteristic functions their name.

MGFs do not have this proprty. It is possible to get (rather ugly) counter-examples of random variables $X$ and $Y$ that both have the same MGF (in particluar both have the same domain $D\subseteq{\mathbb R}$), but still $X$ and $Y$ have different distributions. However, if the domain includes a neighbourhood of $0,$ then $X,Y$ must have the same distribution. This is stated in the following theorem.

TheoremLet $m_X(t)$ be defined for $t\in (-a,a)$ for some $a>0.$ Let $Y$ be a random variable with the same MGF. Then $X$ and $Y$ must have the same distribution.

Proof: Too difficult for this course. [QED]

We shall not spend proving any result on MGF here. We shall learn the proofs for CFs in the next semester.

Problems for practice

::

EXERCISE 2: A box has 6 red balls an 4 black balls. An SRSWR of size $n$ is selected. If $X$ is the number of red balls selected, then find PMF of $X$ and $E(X).$ Also solve the problem in the case of SRSWOR.

Hint:

For SRSWR: $P(X=x) = \binom{n}{x} \left(\frac{6}{10}\right)^x\left(\frac{4}{10}\right)^{n-x}$ for $x=0,1,...,n.$

For SRSWOR: $P(X=x) = \frac{\binom{6}{x} \binom{4}{n-x}}{\binom{10}{n}}$ for $x=0,1,...,n.$

By the way, this does not mean that $X$ can indeed take all the values from 0 to $n.$ For some of these values the probability is zero.

::

EXERCISE 3: Let $N$ be a positive integer. Let $$ f(x) = \left\{\begin{array}{ll}c 2^x &\text{if }x=1,2,...,N\\0&\text{otherwise.}\end{array}\right. $$ be a PMF. Find $c.$ Find $E(X)$ and $V(X)$ if $X$ has this PMF.

Hint:

For $f(x)$ to be a PMF we need $$f(1)+\cdots+f(N)=1.$$ Hence $$c = \frac{1}{2^{N+1}-2}.$$ So $$E(X) = \sum_1^N x f(x) = c\sum_1^N x 2^x = ...$$ Similarly, you can find $V(X).$

::

EXERCISE 4: An SRSWR of size 2 is drawn from $\{1,2,...,12\}.$ Let $X$ be the maximum of the two numbers selected. Find $E(X).$

Hint:

Here $X$ can take only the values $1,2,...,12.$

For $k\in\{1,2,...,12\}$ we have $$P(X\leq k) = P(X_1, X_2 \leq k) = \left(\frac{k}{12}\right)^2.$$ So $P(X=k) = \frac{k^2-(k-1)^2}{144} = \frac{2k-1}{144}.$

Hence $E(X) = \sum_1^{12} \frac{2k^2-k}{144}=....$

::

EXERCISE 5: An SRSWR of size $n$ is selected from $\{1,2,...,12\}.$ Let $a_n $ be the expected value of the maximum of the sample. Show that $a_n \leq a_{n+1}$ without explicily finding $a_n$ in terms of $n.$

Hint:

Let $X_1,...,X_{n+1}$ be an SRSWR of size $n+1$ from $\{1,...,12\}.$

Then $X_1,...,X_n$ is an SRSWR of size $n$ from $\{1,...,12\}.$

Let $U = \max\{X_1,...,x_{n+1}\}$ and $V = \max\{X_1,...,x_n\}.$

Then $U = \max\{V,X_{n+1}\} \geq V.$

So $E(U)\geq E(V).$

Hence $a_{n+1}\geq a_n,$ as required.

::

EXERCISE 6: 

Hint:

$P(0\leq X\leq 40) = 1-P(|X-\mu|>20)$ where $\mu=E(X)=20.$

By Chebyshev inequality $P(|X-\mu|> 20)\leq \frac{V(X)}{400} = \frac{1}{20}.$

Hence $P(0\leq X\leq 40)\geq 1-\frac{1}{20} = \frac{19}{20}.$

::

EXERCISE 7: 

Hint:

(a) By Markov inequality, $E(X)\geq 85 P(X> 85).$

So $P(X> 85) \leq \frac{75}{85}.$

(b) $P(65\leq X \leq 85) = P(|X-75|\leq 10) = 1- P(|X-75|> 10)\geq 1-\frac{V(X)}{100} = \frac 34$ by Chebyshev.

(c) Let the answer be $n$, and class average be $\bar X.$

Then $E(\bar X) = 75$ and $V(\bar X) = \frac{25}{n}.$

So, by the Chebyshev inequality, $P(|\bar X-75|\geq 5) \leq \frac{25}{5^2n} = \frac 1n. $

So we need $1-\frac 1n \geq 0.9$ or $n\geq 10.$

::

EXERCISE 8: 

Hint:

Here $P(X\leq x) = F_X(x) = F_Y\left(\frac{x-a}{b}\right) = P\left(Y\leq \frac{x-a}{b}\right) = P(a+bY\leq x).$

Since this holds for all $x\in{\mathbb R},$ hence $X$ and $a+bY$ have the same CDF.

Since $CDF$ is unique for a distribution, hence $X$ and $a+bY$ have the same distribution.

(a) $E(X) = E(a+bY) = a+bE(Y).$

(b) $V(X) = V(a+bY) = b^2 V(Y).$