Variance

Variance
Moments
Moment generating function
Problems for practice

Variance

A random variable is, well, random. So it may very well differ from its expectation. By how much? A lot or a little? We can use expectation to find that out.

Definition: Variance If $E|X|<\infty,$ then we define variance of $X$ as $$ V(X) = E\big[ (X-E(X))^2 \big]. $$ It is either finite or $\infty.$

Theorem $V(X)\geq 0.$

Theorem If $E(X^2)<\infty,$ then $V(X)$ exists finitely, and $V(X) = E(X^2)-\big( E(X) \big)^2.$

Theorem $V(aX+b) = a^2 V(X).$

Theorem $V(X)=0$ if and only if $X$ is a degenerate random variable.

Chebyshev is also credited with designing a quadruped robot-like linkage.

Chebyshev inequality Let $V(X)<\infty.$ Then $$ \forall \epsilon>0~~P(|X-E(X)| \geq \epsilon) \leq \frac{V(X)}{\epsilon^2}. $$

Proof: Take any $\epsilon>0.$

Define $$ Y = \left\{\begin{array}{ll} \epsilon^2 &\text{if }|X-E(X)|\geq \epsilon\\ 0 &\text{otherwise.} \end{array}\right. $$

Then $(X-E(X))^2\geq Y.$

So $$\begin{eqnarray*} V(X) & = & E(X-E(X))^2\\ & \geq & E(Y)\\ & = & \epsilon^2 P(|X_i-E(X)| \geq\epsilon) + 0\times P(|X_i-E(X)| <\epsilon). \end{eqnarray*}$$ Hence the result. [QED]

A point about the inequalities in the above theorem. There are two inequalities, one inside the probability, and one outside. Both are mixed inequalities. Obviously, you may make the first inequality strict (thereby weakening the result). However, you may not replace the other inequality with a strict one, because otherwise you will get $0 < 0$ for a degenerate $X$.

Moments

Definition: Raw and central moments The $k$-th raw moment of $X$ is $$ E(X^k) $$ and the $k$-th central moment of $X$ is $$ E\big[ (X-E(X))^k \big]. $$

Moment generating function

Definition: Moment generating function (MGF) For any random variable $X$ we define its moment generating function as the function $$ m_X(t) = E(e^{tX}). $$ The domain of this function conists of all $t\in{\mathbb R}$ for which the expectation exists finitely.

Clearly, $m_X(0)$ always exists and equals $1.$

Theorem If, for some $k\in{\mathbb N}$, the moment $E(X^k)$ exists finitely, then the $k$-th derivative of $m_X(t)$ exists at $t=0,$ and equals $E(X^k).$

Proof: We shall not do the proof here. But here is the main idea: $$ e^{tX} = 1 + \frac{tX}{1!} + \frac{t^2X^2}{2!} + \frac{t^3X^3}{3!} + \cdots. $$ From this we want to write $$ E(e^{tX}) = 1 + \frac{tE(X)}{1!} + \frac{t^2E(X^2)}{2!} + \frac{t^3E(X^3)}{3!} + \cdots. $$ This is not a precise statement, because we do not know if all raw moments of $X$ exist finitely. Also, even if they do, is it valid to "distribute" expectation over an infinite sum?

Answers to these questions require deeper real analysis results than we know at this point.

However, assuming that this is valid, we may try to differentiate both sides to get $$ \frac{d}{dt} E(e^{tX}) = E(X) + \frac{2tE(X^2)}{2!} + \frac{3t^2E(X^3)}{3!} + \cdots. $$ Again this step needs justification. Can we "distribute" differentiation over an infinite sum?

Assuming that we can, puting $t=0$ indeed gives us $E(X).$

SImilarly, differentiating once again, and putting $t=0$ gives us $E(X^2),$ and so on. [QED]

We shall not spend much time with MGFs, because there is a better alternative called the characteristic function (CF).

Definition: Characteristic function (CF) For any (real-valued) random variable $X$ we define its characteristic function as the function $$ \phi_X(t) = E(e^{itX}),~~t\in{\mathbb R}. $$

Don't be nervous to see expectation of a complex random variable. It is simply $$ E(\cos tX) + i E(\sin tX). $$ CFs are better than MGFs because of two reasons, that we give as theorems below.

Theorem For any (real-valued) random variable, the CF is defined over entire ${\mathbb R}.$

Proof: This is obvious, since $\sin tX$ and $\cos tX$ are both bounded random variables, and hence have finite expectations. [QED]

Theorem If $X,Y$ are two random variables with the same CF, then they must have the same distribution.

Proof: Not in this course. [QED]

Indeed, this property has earned characteristic functions their name.

MGFs do not have this proprty. It is possible to get (rather ugly) counter-examples of random variables $X$ and $Y$ that both have the same MGF (in particluar both have the same domain $D\subseteq{\mathbb R}$), but still $X$ and $Y$ have different distributions. However, if the domain includes a neighbourhood of $0,$ then $X,Y$ must have the same distribution. This is stated in the following theorem.

TheoremLet $m_X(t)$ be defined for $t\in (-a,a)$ for some $a>0.$ Let $Y$ be a random variable with the same MGF. Then $X$ and $Y$ must have the same distribution.

Proof: Too difficult for this course. [QED]

We shall not spend time proving any result on MGF here. You will learn the proofs for CFs in the third semester.

Problems for practice

EXERCISE 1: (Easy)A box has 6 red balls an 4 black balls. An SRSWR of size $n$ is selected. If $X$ is the number of red balls selected, then find PMF of $X$ and $E(X).$ Also solve the problem in the case of SRSWOR.

[Hint]

For SRSWR: $P(X=x) = \binom{n}{x} \left(\frac{6}{10}\right)^x\left(\frac{4}{10}\right)^{n-x}$ for $x=0,1,...,n.$

For SRSWOR: $P(X=x) = \frac{\binom{6}{x} \binom{4}{n-x}}{\binom{10}{n}}$ for $x=0,1,...,n.$

By the way, this does not mean that $X$ can indeed take all the values from 0 to $n.$ For some of these values the probability is zero.

EXERCISE 2: (Easy)Let $N$ be a positive integer. Let $$ f(x) = \left\{\begin{array}{ll}c 2^x &\text{if }x=1,2,...,N\\0&\text{otherwise.}\end{array}\right. $$ be a PMF. Find $c.$ Find $E(X)$ and $V(X)$ if $X$ has this PMF.

[Hint]

For $f(x)$ to be a PMF we need $$f(1)+\cdots+f(N)=1.$$ Hence $$c = \frac{1}{2^{N+1}-2}.$$ So $$E(X) = \sum_1^N x f(x) = c\sum_1^N x 2^x = ...$$ Similarly, you can find $V(X).$

EXERCISE 3: (Medium)An SRSWR of size 2 is drawn from $\{1,2,...,12\}.$ Let $X$ be the maximum of the two numbers selected. Find $E(X).$

[Hint]

Here $X$ can take only the values $1,2,...,12.$

For $k\in\{1,2,...,12\}$ we have $$P(X\leq k) = P(X_1, X_2 \leq k) = \left(\frac{k}{12}\right)^2.$$ So $P(X=k) = \frac{k^2-(k-1)^2}{144} = \frac{2k-1}{144}.$

Hence $E(X) = \sum_1^{12} \frac{2k^2-k}{144}=....$

EXERCISE 4: (Medium)An SRSWR of size $n$ is selected from $\{1,2,...,12\}.$ Let $a_n $ be the expected value of the maximum of the sample. Show that $a_n \leq a_{n+1}$ without explicily finding $a_n$ in terms of $n.$

[Hint]

Let $X_1,...,X_{n+1}$ be an SRSWR of size $n+1$ from $\{1,...,12\}.$

Then $X_1,...,X_n$ is an SRSWR of size $n$ from $\{1,...,12\}.$

Let $U = \max\{X_1,...,x_{n+1}\}$ and $V = \max\{X_1,...,x_n\}.$

Then $U = \max\{V,X_{n+1}\} \geq V.$

So $E(U)\geq E(V).$

Hence $a_{n+1}\geq a_n,$ as required.

EXERCISE 5: (Medium)

[Hint]

$P(0\leq X\leq 40) = 1-P(|X-\mu|>20)$ where $\mu=E(X)=20.$

By Chebyshev inequality $P(|X-\mu|> 20)\leq \frac{V(X)}{400} = \frac{1}{20}.$

Hence $P(0\leq X\leq 40)\geq 1-\frac{1}{20} = \frac{19}{20}.$

EXERCISE 6: (Medium)

[Hint]

(a) By Markov inequality, $E(X)\geq 85 P(X> 85).$

So $P(X> 85) \leq \frac{75}{85}.$

(b) $P(65\leq X \leq 85) = P(|X-75|\leq 10) = 1- P(|X-75|> 10)\geq 1-\frac{V(X)}{100} = \frac 34$ by Chebyshev.

Then $E(\bar X) = 75$ and $V(\bar X) = \frac{25}{n}.$

So, by the Chebyshev inequality, $P(|\bar X-75|\geq 5) \leq \frac{25}{5^2n} = \frac 1n. $

So we need $1-\frac 1n \geq 0.9$ or $n\geq 10.$

EXERCISE 7: (Medium)

[Hint]

Here $P(X\leq x) = F_X(x) = F_Y\left(\frac{x-a}{b}\right) = P\left(Y\leq \frac{x-a}{b}\right) = P(a+bY\leq x).$

Since this holds for all $x\in{\mathbb R},$ hence $X$ and $a+bY$ have the same CDF.

Since $CDF$ is unique for a distribution, hence $X$ and $a+bY$ have the same distribution.

(a) $E(X) = E(a+bY) = a+bE(Y).$

(b) $V(X) = V(a+bY) = b^2 V(Y).$