[Home]

Table of contents


Conditional distribution

Definition: Conditional distribution Let $X:\Omega\rightarrow S$ and $Y:\Omega\rightarrow T$ be joint distributed discrete random variables. Let $x\in S$ be some constant such that $P(X=x)> 0.$ Then the conditional distribution of $Y$ given $X=x$ is the probability distribution on $T$ $$ A\mapsto P(Y\in A | X = x). $$

EXAMPLE 1:  A 7-segment display shows any number from 0 to 9 at random (equal probabilities).

Let $X$ be the indicator random variable of whether the blue segment is on. Similarly, $Y$ is the indicator for the red segment. Find the conditional distribution of $Y$ given $X.$

SOLUTION: Here $X,Y$ both take values in $\{0,1\}.$ We need to find $P(Y=y | X=x)$ for $x,y\in\{0,1\}.$

Now $P(Y=1|X=1) = P(X=1,Y=1)/P(X=1).$

Both the blue and the red segments are on in only the numbers 3,4,5,6,8,9. So $P(X=1,Y=1) = \frac{6}{10}.$

The blue segment is on in the numbers 2,3,4,5,6,8,9. So $P(X=1) =\frac{7}{10}.$

Hence $P(Y=1|X=1) = P(X=1,Y=1)/P(X=1) = \frac 67.$

You should now be able to work out the other three conditional probabilities similarly. ■

We can define conditional CDF or conditional PMF in the obvious way.

Definition: Conditional expectation / variance Expectation (or variance) computed baed on a conditional distribution is called conditional expectation (variance).

It is important to understand that the conditional expectation/variance is a random variable, which is a function of the conditioning random variable.

Unconditionals in terms of conditionals

Remember the throm of total probability: $$ P(A) = P(B) P(A|B) + P(B^c)P(A|B^c), $$ where combined the two conditional probabilities of $A$ to arrive at the (unconditional) probability of $A?$

Well, we can do similar things with conditional expectation/variance also.

Tower property $E(Y) = E(E(Y|X)).$

Proof: Let $X$ take values $x_1,x_2,...$ and $Y$ take values $y_1,y_2,...$. Let the joint PMF of $(X,Y)$ be $$ P(X=x_i~\&~Y=y_j) = p_{ij}. $$ Then $P(Y=y_j | X=x_i) = \frac{p_{ij}}{p_{i\bullet}}.$

So $E(Y|X=x_i) = \sum_j y_j \frac{p_{ij}}{p_{i\bullet}}.$

Expectation of this is $$ \sum_i E(Y|X=x_i) p_{i\bullet} = \sum_i \sum_j y_j \frac{p_{ij}}{p_{i\bullet}}p_{i\bullet} = \sum_i \sum_j y_j p_{ij} = \sum_j y_j \sum_i p_{ij} = \sum_j y_j p_{\bullet j} = E(Y), $$ as required. [QED]

Many expectation problems can be handled step-bystep using this result. Here are some examples.

EXAMPLE 2:  A casino has two gambling games:

  1. Roll a fair die, and win Rs. $D$ if $D$ is the outcome.
  2. Roll two fair dice, and win Rs 5 if both show the same number, but lose Rs 5 otherwise.
You throw a coin with $P(Head)=\frac 13$ and decide to play game 1 if $Head,$ and game 2 if $Tail.$ What is your expected gain?

SOLUTION: Let $X$ be your gain (in Rs), and let $Y$ be the outcome of the toss.

Then $E(X|Y=Head) = 3.5$ and $E(X|Y=Tail) = 5\times\frac{6-30}{36}=-\frac{10}{3}.$

So, by the tower property, $E(X) = P(X|Y=Head)\times P(Y=Head)+P(X|Y=Tail)\times P(Y=Tail) = \cdots.$ ■

The tower property is very useful for computing expectations involving a random number of random variables. Here is an example.

EXAMPLE 3:  A random number $N$ of customers enter a shop in a day, where $N$ takes values in $\{1,...,100\}$ with equal probabilities. The $i$-th customer pays a random amount $X_i$, where $X_i$ takes values in $\{1,2,...,10+i\}$ ith equal probabilities. Assuming that $N,X_1,...,X_N$ are all independent, find the total expected payments by the customers on that day.

SOLUTION: We have $E(X_i) = \frac{11+i}{2}.$

So $E\left(\sum_1^N X_i|N\right) = \sum_1^N E(X_i|N) = \sum_1^N E(X_i) = \sum_1^N \frac{11+i}{2} = 5.5N+\frac{N(N+1)}{4}.$

By tower property, the required answer is $E\left(5.5N+\frac{N(N+1)}{4}\right)=\cdots.$ ■

EXAMPLE 4:  10 holes, numbered 1 to 10, in a row. 5 balls are dropped randomly in them (a hole may contain any number of balls). Call a ball "lonely" if there is no other ball in its hole or the adjacent holes. Find the expected number of lonely balls.

SOLUTION: Define the indicators $I_1,...,I_5$ as $$ I_i = \left\{\begin{array}{ll}1&\text{if }i\mbox{-th ball is lonely}\\0&\text{otherwise.}\end{array}\right. $$ Then the total number of lonely balls is $X = \sum I_i.$

So we are to find $E(X) = \sum E(I_i).$

Let $Y_i = $ the hole where the $i$-th ball has fallen.

Then $E(I_i|Y_i=1)$ is the conditional probability that all the balls except the $i$-th one has landed in holes $2,...,10$ given that the $i$-th ball has landed in hole 1.

You should be able to compute this easily. Similarly, you can compute $E(I_i|Y_i=k)$ for $k=1,...,10.$

Notice that $Y_i$ can take values $1,...,10$ with equal probabilities.

So tower property should provide the answer as $$ E(X) = \sum E(E(I_i|Y_i)) = \cdots. $$ ■

Theorem $V(Y) = E(V(Y|X)) + V(E(Y|X)).$

Proof: This follows directly from the tower property.

We know $$ V(Y|X) = E(Y^2|X) - E^2(Y|X), $$ and hence $$ E(V(Y|X)) = E(E(Y^2|X)) - E(E^2(Y|X)) = E(Y^2) - E(E^2(Y|X)). $$ Again, $$ V(E(Y|X)) = E(E^2(Y|X)) - E^2(E(Y|X)) = E(E^2(Y|X)) - E^2(Y). $$ So $$ E(V(Y|X)) + V(E(Y|X)) = E(Y^2)-E^2(Y) = V(Y), $$ as required. [QED]

More than 2 variables

If $X,Y,Z$ are jointly distributed random variables, then we can talk about conditional distribution of $Z$ given $(X,Y)$ or $X$ given $Z$ or $(X,Z)$ given $Y,$ etc. We can even condition step by step. For example, we can talk about $E(E(Z|X,Y)|X).$ This is a function of $X$ alone.

Substitution

Substition property Conditional distribution of $f(X,Y)$ given $X=x$ is the same as the conditional distribution of $f(x,Y)$ given $X=x.$

Proof: This follows immediately from the definition of conditional probability. [QED]

Problems for practice

::

EXERCISE 1: 

Here the word "density" is used to mean "PMF".

Hint:

(a) Once you realise that $f_X(x) = P(X=x)$, $f_Y(y) = P(Y=y)$ and $f_{Y|X}(y|x) = P(Y=y|X=x),$ the given equality is just theorem of total probability.

(b) The RHS is $E(E(Y|X))$ and so the equality is just the tower property.


::

EXERCISE 2: 

Hint:

$E(S_N) = E(E(S_N|N)) = E(N\mu) = \mu E(N).$

E(S_N^2) = E(E(S_N^2|N)) = E(N\sigma^2 + N^2\mu^2 ) = \sigma^2E(N^2)+\mu^2E(N^2).

The third equality follows directly from these two.


::

EXERCISE 3: 

Hint:

(a) $\frac 23.$

(b) $\frac 29.$

(c) $\frac{13}{27}.$


::

EXERCISE 4: 

You might like to solve (b) first.

Hint:

(b) $P(X=Y) = \frac 1N.$

(a) $P(X< Y) = P(Y < X) $ and $P(X< Y) + P(Y < X) P(X=Y)=1.$

Hence $P(X> Y) = \frac 12\times\left(1-\frac 19\right) = \frac 49.$

So $P(X\geq Y) = \frac 49+\frac 19=\frac 59.$


::

EXERCISE 5: 

Here Exercise 14 means the last exercise (i.e., Exercise 4 according to our numbering).

Hint:

(a) Let $U = \min(X,Y).$ Then $U$ can take values $0,...,N.$

$P(U=k) = P(U\geq k)-P(U\geq k+1).$

Now $P(U\geq k) = P(X,Y\geq k) = P(X\geq k)P(Y\geq k) = \left(\frac{N-k+1}{N+1}\right)^2.$

Similarly, $P(U\geq k+1) = \left(\frac{N-k}{N+1}\right)^2.$

So $P(U=k) = \frac{(N-k)^2-(N-k+1)^2}{(N+1)^2} = ... .$

(b) Let $T = \max(X,Y).$ Then $T$ can take values $0,...,N.$

$P(T=k) = P(U\leq k)-P(T\leq k-1).$

Now $P(T\leq k) = P(X,Y\leq k) = P(X\leq k)P(Y\leq k) = \left(\frac{k+1}{N}\right)^2.$

Similarly, $P(T\leq k-1) = \left(\frac{k}{N}\right)^2.$

So $P(T=k) = \left(\frac{(k+1)^2-k^2}{N^2} = \frac{2k+1}{N^2}.$

(c) $R=|Y-X|$ can take values in 0,1,...,$N.$

$P(R=0) = P(X=Y) = \frac{1}{N+1}.$

For $k=1,...,N,$ we have $P(R=k) = P(R=k \& X < Y) + P(R=k \& X=Y) + P(R=k \& X > Y).$

Now $P(R=k \& X=Y) =0.$

Also $P(R=k \& X < Y) =P(R=k \& X > Y).$

For $\{R=k\ & X < Y\}$ to happen we must have $X = 0,...,N-k$ and correspondingly $Y = k,...,N.$

So $P(R=k\ & X < Y) = \frac{N-k+1}{N}.$

Hence $P(R=k) = \frac{2(N-k+1)}{N}.$

::

EXERCISE 6: 

Hint:

(a) $P(X=x) = \sum_y P(X=x,Y=y) = \sum_y g(x)h(y) = g(x)\sum_y h(y).$

(b) $P(Y=y) = \sum_x P(X=x,Y=y) = \sum_x g(x)h(y) = h(y)\sum_x g(x).$

(c) We know that $\sum_x\sum_y P(X=x,Y=y) = 1.$ Hence $\sum_x\sum_y g(x)h(y) = 1,$ i.e., $\sum_xg(x)\sum_y h(y) = 1.$

(d) To show $\forall x, y~~P(X=x,Y=y) = P(X=x)P(Y=y).$

Take any $x,y.$

Then $P(X=x)P(Y=y) = \big[\sum_y h(y) \big]g(x)\big[\sum_x g(x) \big]h(y) = g(x)h(y) = P(X=x,Y=y).$


::

EXERCISE 7:  Here "density" means "PMF".

Hint:

(a) $(X_1,...,X_r)$ can take values $(x_1,...,x_r)$ where each $x_i$ is a nonnegative integer and $\sum_1^r x_i = 2r.$

We consider the random experiment of dropping the balls one by one into the boxes. For each ball have $r $ posible destinations.

So $|\Omega| = r^{2r}.$

Now fix some $(x_1,...,x_r)$ as above. The event $A=\{(X_1,...,X_r) = (x_1,...,x_r)\}$ may be obtained as follows.

Pick and order $x_i$ balls to drop into box $i$ one by one.

So $|A| = \frac{(2r)!}{x_1!\times\cdots\times x_r!}.$

Hence $P\{(X_1,...,X_r) = (x_1,...,x_r)\}= \frac{ |A| }{ |\Omega| }.$

(b) $\frac{ (2r)!}{(4r)^r}. $


::

EXERCISE 8: 

Hint:

(a) $P(X_1+X_2=k) = \binom{n}{k} (p_1+p_2)^k p_3^{n-k}$ for $k=0,1,...,n.$

(b) $$\begin{eqnarray*} P(X_2=y|X_1+X_2 = z) & = & \frac{P(X_2=y \& X_1+X_2=z)}{P(X_1+X_2=z)} \\ & = & \frac{P(X_1=z-y\&X_2=y)}{P(X_1+X_2=z)} \\ & = & \frac{ \frac{n!}{(z-y)!y!(n-z)!} p_1^{z-y} p_2^y p_3^{n-z} }{ \binom{n}{z} (p_1+p_2)^z p_3^{n-z} } & = & \cdots. \end{eqnarray*}$$


::

EXERCISE 9: 

Hint:

(a) $1-\left(\frac{5}{6}\right)^6.$

(b) For $n$ rolls $P($ at least one 6$)=1-\left(\frac 56\right)^n.$

We need $n$ such that $1-\left(\frac 56\right)^n\geq \frac 12.$

Direct computation shows $n\geq 4.$


::

EXERCISE 10: 

Imagine this set up: A coin with $P(H)=p$ is repeadly tossed. Success means $H.$

Hint:

$(1-p)^{x_r-r} p^r.$

Thanks to Amit Prakash Jena for correcting a mistake here.


::

EXERCISE 11: 

This is a continuation of the last problem.

Hint:

$P(T_1=x|N_n=1) = \frac{P(T_1=x\& N_n=1)}{P(N_n=1)} = \frac{p(1-p)^{n-1}}{np(1-p)^{n-1}}] = \frac 1n.$


::

EXERCISE 12: 

This is a continuation of the last problem.

Hint:

Same logic as in the last solution.


::

EXERCISE 13: 

Hint:

By symmetry, the answer is $\frac 1n$ if $k=1.$ So, for general $k$ the answer is $\frac kn.$


::

EXERCISE 14: 

Hint:

Let $I_j$ be the indicator variable for whether there is a record at position $j.$ Then $P(I_j=1)$ may be computed by total probability: $$ P(I_j=1) = \sum_{k=j}^n P(X_j=k)P(I_j=1|X_j=k). $$ Similarly for $P(I_jI_k=1).$


::

EXERCISE 15: 

Hint:

The problem is basically optimising $\sum P_i^2$ subject to $\sum P_i$ being fixed. Cauchy-Scwartz might help.


::

EXERCISE 16: 

Hint:

Let the black balls be labelled $b_1,...,b_m.$

Let $X_i=\left\{\begin{array}{ll}1&\text{if }\mbox{no white drawn before }b_i\\ 0&\text{otherwise.}\end{array}\right..$

Then $X= 1+\sum_1^m X_i.$

Also, $E(X_i) = \frac{1}{n+1}$. To see this consider the $n$ white balls plus $b_i.$ Out of these $n+1$ balls $b_i$ has the chance $\frac{1}{n+1}$ to come first.

(a) $V(X_i) = \frac{n}{(n+1)^2}.$

Also for $i\neq j$ we have $E(X_iX_j) = \frac{2}{(n+2)(n+1)}$ (because out of the $n$ white balls plus $b_i$ and $b_j$ any of the $\binom{n+2}{2}$ pairs can come first with equal probability).

(b) Let $Y_i$ be as given in the hint. Let's take an example to understand how $Y_i$'s are defined. Suppose that we have $m=20$ black and $n=3$ white balls. Here is one way they may turn up:
B B Y Y B B B B B Y
Then $Y_1 = 2$ (as there are two B's preceding the first W), $Y_2=0$ (since the second W is immediately after the first), and $Y_3 = 5$ (because there are as many B's between the second and third W).

We shall argue using bijection that $Y_i$'s are all identically distributed. Let's try to show that $P(Y_1=0) = P(Y_2=0).$ The outcome shown above is in the event $\{Y_2=0\}.$

Now just swap the first two W's (along with B's immediately preceding it) to get:
Y B B Y B B B B B Y
Clearly, this is another possible outcome which is inside $\{Y_1=0\}.$ It is not difficult to see (check!) that this swap is a bijection between the events $\{Y_1=0\}.$ and $\{Y_2=0\}.$ If the bijection is denoted by $f,$ then $\forall \omega\in\Omega~~P(\omega) = P(f(\omega))$ (why?)

Hence $P\{Y_1=0\} = P\{Y_2=0\}.$

In general, we see that $Y_i$'s are all identically distributed. Now (b) follows immediately from (a) applied to each $Y_i$ separately.


::

EXERCISE 17: 

Hint:

Let $T = \lambda X_1+ (1-\lambda) X_2.$

Then $V(T) = \lambda^2 V(X_1) + (1-\lambda)^2 V(X_2),$ since $X_1,X_2$ are independent.

Thus, $V(T) = \lambda^2 \sigma_1^2 + (1-\lambda)^2 \sigma_2^2 = f(\lambda),$ say.

Then $f'(\lambda) = 2 \sigma^2_1 \lambda - 2 \sigma^2_2(1-\lambda).$

Solving $f'(\lambda) = 0$ we get $\lambda = \frac{\sigma^2_2}{\sigma^2_1+\sigma^2_2}.$

This is desirable because we are giving more weight to the $X_i$ that has less variance (i.e., is more stable).


::

EXERCISE 18: 

Hint:

Just like $(a+b)(a-b) = a^2-b^2.$


::

EXERCISE 19: 

Do this only for discrete $X.$

Hint:

$E(X|Y=y) = \sum_x x P(X=x|Y=y) = \sum_x x P(X=x),$ since $X,Y$ independent.

Hence the result.


::

EXERCISE 20: 

Do this for discrete $X, Y$ only. If $X$ can take values $x_1,x_2,x_3,...$ with positive probabilities, then you are prove $$\forall i~~E(g(X)Y|X=x_i] = g(x_i)E(Y|X=x_i).$$

Hint:

Take any $i.$

Then $E(g(X)Y|X=x_i) = \sum_y g(x_i) y P(Y=y|X=x_i) = g(x_i) \sum_y y P(Y=y|X=x_i) = g(x_i) E(Y|X=x_i),$ as required.


::

EXERCISE 21: 


::

EXERCISE 22: 


::

EXERCISE 23: 

Will the result hold in general if the $X_i$'s are not independent?

Hint:

No, the result may not hold if the $X_i$'s have a dependence structure that is asymetric. A counterexample is as follows.

$X_1 = $ outcome of a roll of a fair die. $X_2$ is obtained from $X_1$ by swapping 1 and 2. $X_3$ is obtained from $X_1$ by swapping 1 and 3. Then $E(X_1|X_1+X_2+X_3=6)=1\neq \frac 63.$


::

EXERCISE 24: 


::

EXERCISE 25: 


::

EXERCISE 26: 


::

EXERCISE 27: 


::

EXERCISE 28: 


::

EXERCISE 29: 


::

EXERCISE 30: 


::

EXERCISE 31: 


::

EXERCISE 32: 


::

EXERCISE 33: 

Hint:

Let $X_i$ be the indicator for $i$-th red ball being a win.

There are $\binom{2n}{n}$ sequences of $n$ R's and $n$ B's in all. Let us count how many of these lead to $\{X_i=1\}.$

Split each such sequence into two parts, the part before the $i$-th R, and the part after. For instance, for $n=4$ and $i=3$ the sequence RRBRBRBB is split as RRBRBRBB.

For general $n$ and $i,$ the red part must consist of exactly $i-1$ R's and at most $i-1$ B's. The blue part consists of exactly $n-i$ R's and the remaining B's.

Let $N_{r,b} = $ number of sequences with exactly $r$ R's and $b$ B's. In other words, $N_{r,b} =\binom{r+b}{r} = \binom{r+b}{b}. $

Then, for any sequence in $\{X_i=1\}$ the red part may be selected in $$\sum_{j=0}^{i-1} \binom{i+j-1}{j}$$ ways. Here $j$ denotes the number of B's in the red part. Once we also count the matching number of blue parts for each value of $j$, we get the size of $\{X_i=1\}$ as $$\sum_{j=0}^{i-1} \binom{i+j-1}{j}\binom{2n-i-j}{n-j}.$$ Now you should be able to complete the rest.

[Thanks to Arnab Nayak for correcting a typo.]


::

EXERCISE 34: 

Hint:

(a) Let's take an example with $n=10$ and $k=3.$ We are showing the selected balls in red:
1 2 3 4 5 6 7 8 9 10
Here $X = 6$ and $R = 4.$

You should be able to see directly that in general $X+R=n.$