[Home]

Finite support discrete distributions


Finite support discrete distributions

Standard discrete distributions

Finite sample space

We shall now discuss four different PMF's: discrete uniform, Bernoulli, binomial and hypergeometric. In all these cases the sample space is a finite set.

Discrete uniform

Notation: DUnif($S$), where $S$ is any finite set.

Sample space: Any finite set $S.$

PMF: Let $S$ have $n$ outcomes. $$P(X=x) = \left\{\begin{array}{ll} \frac{1}{n} &\text{if }x\mbox{ is in }S\\ 0 &\text{if }x\mbox{ is not in }S.\\ \end{array}\right. $$ Where used: Often we have a random variable, $X,$ that can take finitely many values, and all the values are equally likely. Such a random variable has discrete uniform distribution.

Let us see how one arrives at the PMF given above. The sample space $S$ has $n$ outcomes. All these outcomes are equally likely. So $P(X=x)$ must be the same for all $x$ in $S.$ Now, the sum of the probabilities of all the outcomes in $S$ must be 1. So each $P(X=x)$ must be $1/n.$

Terminology: Such an $X$ is said to have (or follow) DUnif$(S)$ distribution. We also say that $X$ is a DUnif$(S)$ random variable, and write $X\sim$DUnif$(S).$

EXAMPLE 1:  Consider a fair die---a die that is equally likely to land on any of the six faces. If $X$ denotes the outcome of a roll of this die then $X$ has DUnif$\{1,2,3,4,5,6\}$ distribution. ■

EXAMPLE 2:  If you toss a fair coin (i.e., a coin that is equally heavy on either side) and $$ Y = \left\{\begin{array}{ll} -5 &\text{if }head\\ 10 &\text{if }tail\\ \end{array}\right., $$ then $Y\sim$DUnif$\{-5,10\}.$ ■

EXAMPLE 3:  $Y$ has DUnif$\{0,1,2,3\}$ distribution. Find $P(Y=2)$ and $P(Y\mbox{ is odd}).$

SOLUTION: The sample space has 4 outcomes in it. Each outcome is equally likely. Since the total of the probabilities must be 1, each outcome has probability $\frac14.$ In particular, $P(Y=2)=\frac14.$ Also, $P(Y\mbox{ is odd}) = P(Y=1\mbox{ or }Y=3) = P(Y=1)+P(Y=3)=\frac14+\frac14=\frac12.$ ■

::

EXERCISE 1: (Easy) $X$ follows DUnif$\{1,2,3,4,5\}$ distribution. Find the following probabilities.

  1. $P(X=2)$
  2. $P(X=3)$
  3. $P(X=6)$
  4. $P(X=2\mbox{ or }X=1)$
  5. $P(X\mbox{ is even})$
  6. $P(X\geq2)$

[Hint]

  1. $\frac15$
  2. $\frac15$
  3. $0$
  4. $\frac25$
  5. $\frac25$
  6. $\frac45$

::

EXERCISE 2: (Easy) $U$ is a DUnif$\{-2,-1,0,1,2\}$ random variable. Find the following probabilities.

  1. $P(U^2=1)$
  2. $P(2U-9=2)$
  3. $P(U^2-3U+2=0)$
  4. $P(U\neq0)$

[Hint]

  1. $\frac25$
  2. $0$
  3. $\frac25$
  4. $\frac45$

::

EXERCISE 3: (Easy) Express the distributions of the following random variables in terms of the discrete uniform distribution.

  1. The top card is drawn from a well-shuffled deck of cards. $$ Y = \left\{\begin{array}{ll} 1 &\text{if }heart\\ 2 &\text{if }diamond\\ 3 &\text{if }club\\ 4 &\text{if }spade\\ \end{array}\right.. $$
  2. A fair die is rolled. $$ Z = \left\{\begin{array}{ll} 0 &\text{if }even\\ 1 &\text{if }odd\\ \end{array}\right.. $$

[Hint]

A deck has 52 cards: 13 of them are hearts, 13 are diamonds, 13 clubs and 13 spades. When you pick the top card of a well-shuffled deck any of the 52 cards are equally likely to occur.

  1. DUnif$\{1,2,3,4\}.$
  2. DUnif$\{0,1\}.$

Expectation and variance: If $X\sim$DUnif$\{x_1,x_2,...,x_n\}$ then $$\begin{eqnarray*} E(X) &=& \frac{1}{n}\sum_{i=1}^nx_i\\ Var(X) &=& \frac1n\sum_{i=1}^n x_i^2 - \left(\frac{1}{n}\sum_{i=1}^nx_i\right)^2. \end{eqnarray*}$$

$$\begin{eqnarray*} E(X) &=& \sum_{i=1}^n x_iP(X=x_i)\\ &=& \sum_{i=1}^n x_i\frac1n\\ &=& \frac1n\sum_{i=1}^n x_i. \end{eqnarray*}$$

$$\begin{eqnarray*} E(X^2) &=& \sum_{i=1}^n x_i^2P(X=x_i)\\ &=& \sum_{i=1}^n x_i^2\frac1n\\ &=& \frac1n\sum_{i=1}^n x_i^2. \end{eqnarray*}$$

So $$ Var(X) = E(X^2) - (E(X))^2 = \frac1n\sum_{i=1}^n x_i^2-\left(\frac{1}{n}\sum_{i=1}^nx_i\right)^2. $$

EXAMPLE 4:  Compute mean and variance of a DUnif$\{1,...,n\}$ random variable, $X.$

SOLUTION:Here $x_1=1,x_2=2,...,x_n=n.$ In other words,, $x_i=i$ for $i=1,...,n.$ $E(X)=\frac1n\sum_{i=1}^n i = \frac{n+1}2.$ $E(X^2)=\frac1n\sum_{i=1}^n i^2 = \frac{(n+1)(2n+1)}{6}.$ Hence, $Var(X) = E(X^2)-(E(X))^2 = \frac{n^2-1}{12}.$ ■

::

EXERCISE 4: (Easy) If $Y$ has DUnif$\{0,1,...,n\}$ distribution, then compute $E(Y)$ and $Var(Y).$ But before you start algebraic manipulations, think if $E(X)$ and $Var(X)$ should be less, equal or more than what we obtained above.

[Hint]

\frac{n}2,\frac{n(n+2)}{12}.

Bernoulli

Notation: Bern($\theta$), where $0 < \theta < 1.$

Sample space: $\{0,1\}.$

PMF: $P(X=0)=1- \theta, P(X=1)=\theta$

Where used: As mentioned above Bern($\theta$) random variable can take only two values, 0 and 1. In fact `0' and `1' are used merely as names. The probability of any random experiment with only two possible outcomes is a Bernoulli distribution, if we call one of the outcomes as `0' and the other as `1'.

Terminology: Such an $X$ is said to have (or follow) Bern$(\theta)$ distribution. We also say that $X$ is a Bern$(\theta)$ random variable, and write $X\sim$Bern$(\theta).$

EXAMPLE 5:  Coin toss has two possible outcomes, head and tail. If we call head as `1' and tail as `0' then we have Bern($\theta$) random variable where $\theta=P(head).$ ■

Expectation and variance: If $X$ is a Bern($\theta$) random variable, then $$\begin{eqnarray*} E(X)&=&(1-\theta) \times 0 + \theta \times 1 = \theta\\ Var(X)&=&\theta(1-\theta). \end{eqnarray*}$$ This is because $E(X^2)=(1-\theta) \times 0^2 + \theta \times 1^2=\theta$

So $Var(X)=E(X^2)-(E(X))^2=\theta-\theta^2=\theta(1-\theta).$

EXAMPLE 6:  $X$ is a Bern(0.5) random variable. $Y$ has a Bern(0.2) distribution. $X$ and $Y$ are independent. Let $Z=XY.$ Show that $Z$ has Bern(0.1) distribution. Hence compute $Var(Z)$.

SOLUTION: Since X and Y can take only the values 0 and 1, hence their product, Z, can be only 0 or 1. So, Z must have a Bern$(\theta)$ distribution for some $\theta$. We have to show that $\theta=0.1.$

Now, $$\begin{eqnarray*} \theta &=& P(Z=1)\\ &=& P(XY=1)\\ &=& P(X=1 \mbox{ and } Y=1)\\ &=& P(X=1)P(Y=1) \mbox{ since $X,Y$ independent} \\ &=& 0.5 \times 0.2\\ &=& 0.1 \end{eqnarray*}$$

So by property of Bernoulli distribution we have $$ Var(Z)=0.01(1-0.01)=0.0099 $$ [Thanks to Tushti for corrected a typo.] ■

Binomial

Notation : Bin($n, \theta$), where n is some positive integer and $0 < \theta < 1.$

Sample Space : $\{0,1,\cdots,n\}.$

PMF: $$ P(X=x)=\left\{\begin{array}{ll} {n\choose x} \theta ^x (1-\theta)^{n-x}&\text{if }x=0,1,...,n\\ 0 &\text{otherwise.} \end{array}\right. $$

Terminology: Such an $X$ is said to have (or follow) Bin$(n,\theta)$ distribution. We also say that $X$ is a Bin$(n,\theta)$ random variable, and write $X\sim$Bin$(n,\theta).$

Where used : Suppose that we have some random experiment with only two possible outcomes. Let us call one of the outcomes `0', and the other `1'. If the probability of `1' is $\theta,$ then the outcome is a Bern($\theta$) random variable, that we have already seen. Now suppose that we perform the experiment $n$ times $independently$ and count the number of times we see `1'. Suppose that this number is $X$. Then $X$ is a random variable with Bin($n, \theta$) distribution.

Let us see how this description leads to the formula for the binomial PMF. For easy understanding, we shall work with 3 coin tosses. The coin has $P(head)=\theta.$ So here $n=4.$ Let us find out the probability of having exactly 2 heads among the 3 tosses. A typical outcome of the 4 tosses is $$ HHT.$$ Here `H' means `Head' and `T' means `Tail'. Thus, the above outcome means that the first two tosses have produced heads, the third toss has produced a tail. The probability of this is $$ \theta\times \theta\times (1-\theta) = \theta^2(1-\theta). $$ The $\theta$'s in the left hand side are for the heads, and the $(1-\theta)$ is for the tail. These are multiplied together because the coin tosses are independent.

The following table shows all the $2^3=8$ possible outcomes of the 3 tosses.

Outcome Probability
$HHH$ $\theta^3$
HHT $\theta^2(1-\theta)$
HTH $\theta^2(1-\theta)$
$HTT$ $\theta(1-\theta)^2$
THH $\theta^2(1-\theta)$
$THT$ $\theta(1-\theta)^2$
$TTH$ $ \theta(1-\theta)^2$
$TTT$ $(1-\theta)^3$

The outcomes for which we have exactly two heads have been shown in bold. Notice that there are exactly ${3 \choose 2} = 3$ such cases. Each of these has probability $\theta^2(1-\theta).$ Adding these we get the probability of having exactly two heads as $$ {3 \choose 2}\theta^2(1-\theta) = 3 \theta^2(1-\theta). $$

This method is general and can be used to find out the probability that there are exactly $x$ heads out of $n$ independent coin tosses:

$$ {n \choose x}\theta^x(1-\theta)^{n-x}, $$ which is the binomial PMF.

::

EXERCISE 5: (Easy) List all the possible outcomes of $n=4$ tosses having exactly $x=2$ heads. For example, one possible outcome is $HTHT.$ How many such outcomes are there?

[Hint]

HHTT,HTHT,HTTH,THHT,THTH,TTHH. There are 6.

EXAMPLE 7:  $U$ is distributed as Bin$(5,\frac14).$ Find $P(U=2).$

SOLUTION: $$\begin{eqnarray*} P(U=2)&=&{5\choose2}\left(\frac14\right)^2\left(1-\frac14\right)^{5-2}\\ &=&\frac{5!}{2!(5-2)!}\left(\frac14\right)^2\left(\frac34\right)^3\\ &=&\frac{5!}{2!3!}\cdot\frac{27}{1024}\\ &=&\frac{5\times4}{2}\cdot\frac{27}{1024}\\ &=&\frac{135}{512}. \end{eqnarray*}$$ ■

::

EXERCISE 6: (Easy) $Y$ is distributed as Bin$(4,\frac23).$ Find the following probabilities.

  1. $P(Y=0)$
  2. $P(Y=4)$
  3. $P(Y=1\mbox{ or }Y=3)$
  4. $P(Y\leq 3)$

[Hint]

  1. $\frac1{81}$
  2. $\frac{16}{81}$
  3. $\frac{40}{81}$
  4. $\frac{65}{81}$ [Thanks to Mayukh for correcting a typo here.]

::

EXERCISE 7: (Easy) $Z\sim$Bin$(5,\frac12).$ Find the following.

  1. $P(Z\mbox{ is odd})$
  2. $P(Z=-1)$
  3. $P(Z=2)-P(Z=3)$
  4. $P(Z=1)-P(Z=4)$

[Hint]

  1. $\frac12$
  2. $0$
  3. $0$
  4. $0$

::

EXERCISE 8: (Easy) If $X$ has Bin$(100,0.5)$ distribution. Then find $a\neq37$ such that $$ P(X=a) = P(X = 37). $$

[Hint]

$a=100-37=63.$

Notice that for any $n$ the Bin$(n,0.5)$ distribution is symmetric, i.e., $$\begin{eqnarray*}P(X=x) &=& {n\choose x}0.5^x(1-0.5)^{n-x}\\ & =&{n\choose n-x}0.5^{n-x}(1-0.5)^x\\ & =& P(X=n-x). \end{eqnarray*}$$

EXAMPLE 8:  A couple plans to have 6 babies (Good heavens!). The chance of a boy being born is 0.4. Let $X$ be the number of sons born to this couple. What is the distribution of $X$.

SOLUTION: Here, each birth is a random experiment with two possible outcomes: boy or girl. Let us call boy as `1' and girl as `0'. We assume that the outcome of one birth is independent of the others. So here a Bern(0.4) random experiment has been repeated independently 6 times, and $X$ denotes the number of `1' s. Hence $X$ has Bin(6, 0.4) distribution. ■

::

EXERCISE 9: (Easy) In the above example let $Y$ be the number of girls. What is the distribution of $Y$?

[Hint]

$Y\sim$Bin$(6,0.6).$

EXAMPLE 9:  A mobile tower is sending 10 signals to another mobile tower.Owing to mechanical problems a signal may become corrupted during transmission with probability 0.1. Corruption of one signal is independent of that of the others. Find the probability distribution of the number of corrupted signals. The communication is OK if 2 or less signals have been corrupted. Find the chance that communication is OK.

SOLUTION: Let $X =$ number of corrupted signals. Sending each signal is a random experiment with two possible outcomes: `corrupted' and `not corrupted'. Let us call `corrupted' as `1' and `not corrupted' as '0'. Then the random experiment has a Bern(0.1) outcome. We are repeating it 10 times independently and $X$ is the number of `1's. So $X$ has Bin(10, 0.1) distribution.

The probability that the communication is OK is $$\begin{eqnarray*} P(X \leq 2) &=& P(X=0) + P(X=1) + P(X=2)\\ &=& {10\choose 0}(0.1)^0(0.9)^{10-0} +\\ & & {10\choose 1}(0.1)^1(0.9)^{10-1} + {10\choose 2}(0.1)^2(0.9)^{10-2}\\ &=& 0.93 \end{eqnarray*}$$ ■

Expectation and variance: If $X$ has a Bin($n,\theta$) distribution, then $$\begin{eqnarray*} E(X) &=& n\theta\\ Var(X)& =& n\theta(1-\theta) \end{eqnarray*}$$

$$\begin{eqnarray*} E(X) &=& \sum_{x=0}^n x P(X=x)\\ &=& \sum_{x=0}^n x {n\choose x}\theta^x (1-\theta)^{n-x} \end{eqnarray*}$$ The first term in the sum (i.e., the term for $x=0$) is 0. So we can drop that term to get $$\begin{eqnarray*} \sum_{x=0}^n x {n\choose x}\theta^x (1-\theta)^{n-x} &=& \sum_{x=1}^n x {n\choose x}\theta^x (1-\theta)^{n-x}\\ &=& n \sum_{x=1}^n {n-1\choose x-1}\theta^x (1-\theta)^{n-x}\\ &=& n \theta \sum_{x=1}^n {n-1\choose x-1}\theta^{x-1}(1-\theta)^{n-x} \end{eqnarray*}$$ Now we shall put $y=x-1$ to get $$\begin{eqnarray*} n \theta \sum_{x=1}^n {n-1\choose x-1}\theta^{x-1}(1-\theta)^{n-x} &=& n \theta \sum_{y=0}^n {n-1\choose y}\theta^{y} (1-\theta)^{n-1-y}\\ &=& n \theta \left(\theta + (1-\theta)\right)^{n-1}\\ &=& n \theta. \end{eqnarray*}$$

$$\begin{eqnarray*} E(X(X-1)) &=& \sum_{x=0}^n x(x-1) P(X=x)\\ &=& \sum_{x=0}^n x(x-1) {n\choose x}\theta^x (1-\theta)^{n-x} \end{eqnarray*}$$ The first two terms (the terms corresponding to `$x=0$' and `$x=1$') are both zeros. Dropping them from the sum we get $$\begin{eqnarray*} \sum_{x=0}^n x(x-1) {n\choose x}\theta^x (1-\theta)^{n-x} &=& \sum_{x=2}^n x(x-1) {n\choose x}\theta^x (1-\theta)^{n-x}\\ &=& n(n-1) \sum_{x=2}^n {n-2\choose x-2}\theta^x (1-\theta)^{n-x}\\ &=& n(n-1) \theta^2 \sum_{x=2}^n {n-2\choose x-2}\theta^{x-2} (1-\theta)^{n-x}. \end{eqnarray*}$$ Putting $y=x-2$ this becomes $$\begin{eqnarray*} n(n-1) \theta^2 \sum_{y=0}^n {n-2\choose y}\theta^{y} (1-\theta)^{(n-2)-y}\\ = n(n-1) \theta^2 \left(\theta + (1-\theta)\right)^{n-2} &=& n(n-1) \theta^2 \end{eqnarray*}$$

$$\begin{eqnarray*} E(X^2) &=& E(X(X-1))+ E(X) \\ &=& n(n-1) \theta^2 + n \theta\\ &=& n^2\theta^2 -n \theta^2+ n \theta \end{eqnarray*}$$

$$\begin{eqnarray*} Var(X) &=& E(X^2)-(E(X))^2\\ &=& n^2\theta^2 -n \theta^2+ n \theta+ n^2 \theta^2\\ &=& n\theta(1-\theta) \end{eqnarray*}$$

EXAMPLE 10:  Assuming that $T\sim$Bin$(100,\frac15),$ find $E(T)$ and standard deviation of $T.$

SOLUTION: Here $n=100$ and $\theta=\frac15.$ So $E(T) = n \theta = 100\times \frac15 = 20.$ Similarly, $Var(T) = n \theta(1-\theta) = 100\times\frac15\times(1-\frac15)= 16.$ Hence standard deviation of $T$ is $\sqrt{16} = 4.$ ■

Theorem If $X$ is a Bin($m,\theta$) random variable, and $Y$ is a Bin($n,\theta$) random variable (same $\theta$ for both!) and $X,Y$ are independent, then $$ (X+Y)\sim Bin(m+n,\theta). $$

This result may be intuitively understood as follows. Suppose that we have a coin with $P(head) = \theta.$ Then $X$ is like the number of heads out of $m$ tosses of the coin, and $Y$ is like the number of heads out of $n$ other tosses. So $X+Y$ is like the total number of heads among $m+n$ tosses of the coin. Since the coin has $P(head) = \theta,$ hence we see that $X+Y$ has Bin$(m+n,\theta)$ distribution.

EXAMPLE 11:  What is the distribution of $U+V,$ if $U\sim$Bin$(10,0.1)$ and $V\sim$Bin$(6,0.1)?$

SOLUTION: $U+V\sim$Bin$(16,0.1).$ ■

::

EXERCISE 10: (Easy) If $X_1,X_2$ and $X_3$ are independent random variables with Bin$(5,\frac13),$ Bin$(10,\frac13)$ and Bin$(6,\frac13)$ distributions, respectively. Find the distribution of $(X_1+X_2+X_3).$

[Hint]

$(X_1+X_2+X_3)\sim$Bin$(21,\frac13).$

First add $X_1$ and $X_2.$ Now notice that $(X_1+X_2)$ is independent of $X_3.$

Hypergeometric

Notation: HypGeom($N,M,n$), where $N,M,n$ are all positive integers such that $M\leq N$ and $n\leq N.$

Sample Space: $\{\max\{0,n-(N-M)\},1,\cdots,\min\{n,M\}\}.$

PMF: $$ P(X=x)=\left\{\begin{array}{ll} \frac{{M\choose x} {N-M\choose n-x}}{{N\choose n}}&\text{if }x=0,...,n\\ 0 &\text{otherwise.} \end{array}\right. $$

Terminology: Such a random variable $X$ is said to have (or follow) HypGeom$(N,M,n)$ distribution. We also say that $X$ is a HypGeom$(N,M,n)$ random variable, and write $X\sim$HypGeom$(N,M,n).$

Where used: Suppose that in a box there are $N$ balls, among which $M$ are white and the remaining $N-M$ are black. You put your hand in the box and collect $n$ balls at random. Let $X$ be the number of white balls among those $n$ balls. Then $X$ is distributed as HypGeom($N,M,n$).

Let us derive the hypergeometric PMF using an example. Suppose we have $N=5$ balls out of which $M=3$ are white, the remaining $N-M=5-3=2$ being black. We pick $n=2$ balls at random. We want to find out the probability that exactly $x=1$ of these two picked balls is white. First, there are ${5\choose 2} = 10$ ways to pick 2 balls out of 5. These are:
Sampling 2 balls from 5
All these are equally likely. So each has probability $\frac{1}{10}.$ Out of these there are 6 cases where exactly one white ball has been picked. These cases are shown inside dashed boxes. This number 6 comes as follows. To get exactly 1 white ball you need to pick 1 ball from the $M=3$ white balls and the other $1$ ball from the $N-M=2$ black balls. This can be done in $$ {3\choose 1}{2\choose 1} = 3\times2 = 6 $$ ways. So the probability of having exactly one white ball in the sample of 2 balls is $$ \frac{{3\choose 1}{2\choose 1}}{{5\choose 2}} = \frac{6}{10} = \frac35. $$ The same argument gives you the hypergeometric PMF as shown below.
Components of HypGeom$(N,M,n)$ PMF

EXAMPLE 12:  Find $P(Y=2)$ where $Y\sim$HypGeom$(5,3,4).$

SOLUTION: Here $N=5, M=3$ and $n=4.$ So $$ P(Y=2) = \frac{{M\choose 2}{N-M\choose n-2}}{{N\choose n}} =\frac{{3\choose2}{2\choose 2}}{{5\choose4}}. $$ Now, ${3\choose2}=3,$ ${2\choose 2}=1$ and ${5\choose4}=5.$ So $P(Y=2) = \frac35.$ ■

::

EXERCISE 11: (Easy) $X$ is known to follow HypGeom$(10,4,8).$ Find the following probabilities.

  1. $P(X=2)$
  2. $P(X=3)$
  3. $P(X=5)$
  4. $P(X=1)$
  5. $P(X=4)$

[Hint]

  1. $\frac2{15}$
  2. $\frac8{15}$
  3. $0$
  4. $0$
  5. $\frac13$

::

EXERCISE 12: (Medium) Suppose that $Y$ has HypGeom$(100,30,40)$ distribution and $Z$ has HypGeom$(100,70,40)$ distribution. Then which of the following two probabilities is larger and why? $$ P(Y=10),\quad P(Z=30) $$

[Hint]

Equal. Think of the following descriptions of $Y$ and $Z.$ In a box there are 100 balls, 30 of which are white, the remaining 70 being black. You pick 40 balls at random. $Y$ is the number of white balls that you get, and $Z$ is the number of black balls that you get.

EXAMPLE 13:  In a class of 10 students there are 4 girls. If 3 students are selected at random from this class what is the chance that exactly 2 girls are selected?

SOLUTION:This is very much like the balls-in-a-box situation, where the 10 students are like 10 balls, the girls being the white balls, and the boys the black balls. Here $N=10,M=4,n=3.$ If $X$ is the number of girls selected then $$ X\sim HypGeom(10,4,3). $$ So $P($exactly 2 girls are selected$) = P(X=2),$ which is given by $$ P(X=2) = \frac{{4\choose 2} {10-4\choose 3-2}}{{10\choose 3}} = \frac{6\times 6}{120} = \frac3{10}. $$ ■

Expectation and variance: If $X$ is a HypGeom($N,M,n$) random variable then $$\begin{eqnarray*} E(X) &=& n\frac{M}{N},\\ Var(X) &=& \frac{(N-M)(N-n)Mn}{N^2(N-1)}. \end{eqnarray*}$$

$$\begin{eqnarray*} E(X) &=& \sum_{x=0}^n xP(X=x) =\sum_{x=0}^n x\frac{{M \choose x} {N-M \choose n-x}} {{N \choose n}}\\ &=& \frac1{{N \choose n}}\sum_{x=0}^n x {M \choose x} {N-M \choose n-x} \end{eqnarray*}$$ The `$x=0$' term is zero. We drop it to get $$\begin{eqnarray*} \frac1{{N \choose n}}\sum_{x=0}^n x {M \choose x} {N-M \choose n-x} &=& \frac1{{N \choose n}}\sum_{x=1}^n x {M \choose x} {N-M \choose n-x}\\ &=& \frac1{{N \choose n}}M \sum_{x=1}^n {M-1 \choose x-1} {N-M \choose n-x} \\ &=& \frac1{{N \choose n}} {N-1 \choose n-1} \\ &=& M\frac{n! (N-n)!}{N!}\times\frac{(N-1)!}{(n-1)!(N-n)!} \\ &=& \frac{M}{N}n \end{eqnarray*}$$

$$\begin{eqnarray*} E(X(X-1)) &=& \sum_{x=0}^n x(x-1)P(X=x) \\ &=& \sum_{x=0}^n x(x-1)\frac{{M \choose x}{N-M \choose n-x}}{{N \choose n}} \\ &=& \frac1{{N \choose n}}M(M-2) \sum_{x=2}^n {M-2 \choose x-2}{N-M \choose n-x} \\ &=& \frac1{{N \choose n}}M(M-2) {N-2 \choose n-2} \\ &=& \frac{M(M-1)}{N(N-1)}n(n-1) \end{eqnarray*}$$

$$\begin{eqnarray*} E(X^2) &=& E(X(X-1)) + E(X) \\ &=& \frac{M(M-1)}{N(N-1)}n(n-1) + \frac{M}{N}n \end{eqnarray*}$$

$$\begin{eqnarray*} Var(X) &=& E(X^2) - (E(X))^2 \\ &=& \frac{M(M-1)}{N(N-1)}n(n-1) + \frac{M}{N}n-\left(\frac{M}{N}n\right)^2 \\ &=& \frac{(N-M)(N-n)Mn}{N^2(N-1)}. \end{eqnarray*}$$

::

EXERCISE 13: (Easy) Compute $E(X)$ and $Var(X)$ where $X$ has HypGeom$(N,M,n)$ distribution with

  1. $N=10,M=5,n=6$
  2. $N=20,M=3,n=2$
  3. $N=12,M=4,n=4$
  4. $N=15,M=15,n=15$

[Hint]

  1. $3,\frac23.$
  2. $\frac{3}{10},\frac{459}{1900}.$
  3. $\frac43,\frac{64}{99}.$
  4. $15,0$

Capture-recapture method

Here is a simulation program for the capture-recapture program that ChatGPT wrote according to my specification: capture-recapture simulation. There are lots fish in a pond (total number $N$ is unknown). You want to estimate $N$. First you capture 200 random fish and tag them (colour them red in the simulation), then independently recapture 100 fish (circle them in blue). The program will tell you the number of tagged fish in the recapture. Your job is to estimate $N$ based on that. The true value of $N$ is, by the way, $1000$ (which you are not supposed to know as a statistician). See how close you get to this true value.

Relation between binomial and hypergeometric

Theorem Let $X_i\sim Binom(n_i,p)$ for $i=1,2,$ where $X_1,X_2$ are independent. Then the conditional distribution of $X_1$ given $X_1+X_2=k$ is $HyperGeom(n_1+n_2,n_1,k).$

Proof: $$\begin{eqnarray*} P(X_1=x_1|X_1+X_2=k) & = & \frac{P(X_1=x_1~\&~X_1+X_2=k)}{P(X_1+X_2=k)}\\ & = & \frac{P(X_1=x_1~\&~X_2=k-x_1)}{P(X_1+X_2=k)}\\ & = & \frac{P(X_1=x_1)P(X_2=k-x_1)}{P(X_1+X_2=k)}\\ & = & \frac{\binom{n_1}{x_1}p^{x_1}(1-p)^{n_1-x_1}\binom{n_2}{k-x_1}p^{k-x_1}(1-p)^{n_2-k+x_1}}{\binom{n_1+n_2}{k}p^k(1-p)^{n_1+n_2-k}}\\ & = & \frac{\binom{n_1}{x_1}\binom{n_2}{k-x_1}}{\binom{n_1+n_2}{k}}, \end{eqnarray*}$$ as required.

[Thanks to Avigyan for pointing out a minor typo.] [QED]

Theorem Let $p\in (0,1).$ Let $N,M\rightarrow \infty $ so that $\frac MN\rightarrow p.$ Let $n,k\in{\mathbb N}$ be fixed. Then $$ \frac{\binom{M}{k} \binom{N-M}{n-k} }{ \binom{N}{n} } \rightarrow \binom{n}{k} p^k (1-p)^{n-k}. $$

Proof: This should be intuitively obvious, because as the number of balls in the box becomes very large picking a ball hardly has any effect on its composition. So SRSWOR starts behaving like SRSWR.

More precisely, writing $R = N-M$ and $r = n-k,$ we have $$\begin{eqnarray*} \frac{\binom{M}{k} \binom{N-M}{n-k} }{ \binom{N}{n} } & = & \frac{ M(M-1)\cdots (M-k+1) }{N(N-1)\cdots (N-k+1) } \times \frac{R(R-1)\cdots(R-r+1) }{ (N-k)\cdots (N-n+1)} \times \frac{ n! }{ k! (n-k)! }. \end{eqnarray*}$$ Now $\frac M N\rightarrow p$ and so $\frac R N\rightarrow 1-p.$

So, since $n,k$ are fixed, we have $$ \frac{ M(M-1)\cdots (M-k+1) }{N(N-1)\cdots (N-k+1) } \rightarrow p^k $$ and similarly $$\frac{R(R-1)\cdots(R-r+1) }{ (N-k)\cdots (N-n+1)} \rightarrow (1-p)^{n-k}.$$ Hence the result. [QED]

Problems for practice

::

EXERCISE 14: (Easy)[dtwo1.png]

[Hint]

$\frac{P(X=i)}{P(X=i-1)} = \frac{7-i}{i}$ for $i=1,...,6$. This is $\gtreqless 1$ according as $3 \gtreqless i$.

::

EXERCISE 15: (Medium)[dtwo2.png]

[Hint]

See the hint of the last problem.

::

EXERCISE 16: (Medium)[dtwo3.png]

[Hint]

Model the situation as follows. Each passenger independently flips a coin with $P(head) = 0.95$. If tail, they cancel their booking.

Let $X = $ number of passengers showing up.

Then $X\sim Binom(52, 0.(5)$. We want to find $P(X\leq 50) = 1-P(X=51)-P(X=52) = \cdots$.

::

EXERCISE 17: (Medium)[dtwo4.png]

[Hint]

The number of ways to write a string of $x_1$ first outcomes, $x_2$ second outcomes and so on, is $$\frac{n!}{x_1!\cdots x_r!}$$. Each such outcome has probability $p_1^{x_1}\cdots p_r^{x_r}$.

::

EXERCISE 18: (Medium)[dtwo5.png]

Here $1\leq k < r$.

[Hint]

Consider the event $A$ defined as {one of the first $k$ outcomes has occured}.

Then $X_1+\cdots+X_k = $ number of times $A$ has occured out of $n$ trials. Also $P(A)= p_1+\cdots+p_r$.

Hence $X_1+\cdots+X_k \sim Binom(n,p_1+\cdots+p_r)$.

Now you can write down the required PMF esily.

::

EXERCISE 19: (Medium)[dtwo6.png]

[Hint]

Model the situation as follows. Each customer rolls a 3 faced die. The three faces are "Ordinary", "Colour" and "Browse". The probabilities are $0.5, 0.3, 0.2$, respectively. The rolls are independent.

Let $X_1, X_2, X_3$ be the numbers of customers (out of 5) in each category.

So the required probability is $\frac{5!}{1!2!2!} (0.5)^1(0.2)^2(0.3)^2$.

::

EXERCISE 20: (Medium)[dtwo7.png]

::

EXERCISE 21: (Medium)[dtwo8.png]

[Hint]

Toss a coin with $P(head)=p$ independently until you get $r$ heads. Let $X$ be the number of tails obtained.

::

EXERCISE 22: (Hard)[dtwo9.png]

[Hint]

First part: Let us start by considering one possible sequence of results that will require exactly 7 games: AAABBBB. But BBBBAAA is not possible, because here the decision is already clear after the fourth game. So we need a sequence of length 7 such that there are exactly 4 of the last symbol. Each such sequence must have exactly 3 of the other symbol. Each such sequence ending with A has probability $p^4(1-p)^3$ while each such sequence ending with B has probability $p^3(1-p)^4$.

All that remains is to count how many sequences of these two types there are. It is an easy calculus exercise to show that the maximum occurs at $p=\frac 12$.

Second part: If $i=2$, the decision must be made after playing at most 3 games (pigeon hole) and at least 2 games. Find the probability of each as in the first part. The case $i=3$ is similar.

::

EXERCISE 23: (Medium)[dtwo10.png]

[Hint]

Let $X,Y$ be, respectively, the number of heads obtained by $A,B$. Then $X\sim Binom(k,0.5)$ and $Y\sim Binom(n-k,0.5)$. Also they are independent. Also $X+Y\sim Binom(n,0.5)$.

$$\begin{eqnarray*} P(X = Y) & = & \sum_{i=0}^k P(X=i,\, Y=i)\\ & = & \sum_{i=0}^k P(X=i)P(Y=i)~~\left[\mbox{$\because X,Y$ independent}\right]\\ & = & 2^{-n}\sum_{i=0}^k \binom k i \binom {n-k} i~~\left[\mbox{$\because X,Y$ independent}\right]\\ \end{eqnarray*}$$ Also $P(X+Y=k) = \binom n k 2^{-n}$.

So we need to show $\sum_{i=0}^k \binom k i \binom {n-k} i = \binom n k$.

Now $\sum_{i=0}^k \binom k i \binom {n-k} i = \sum_{i=0}^k \binom k {k-i} \binom {n-k} i = \binom {k+(n-k)} {(k-i)+i} =\binom n k$.

::

EXERCISE 24: (Hard)[dtwo11.png]

[Hint]

(b) Let $A = \{\underbrace{TT...T}_kH~:~k\in{\mathbb N}\}$ and $B = \{\underbrace{HH...H}_kT~:~k\in{\mathbb N}\}$. Then we need to check if $P(A) = P(B)$. But $P(A) = p(1-p)\times\frac{1}{1-p} = p$ and $P(B) = p(1-p)\times\frac 1p = 1-p$.

::

EXERCISE 25: (Medium)[dtwo12.png]

[Hint]

Here $k$ changeovers mean there are $k+1$ "runs" of identical outcomes. Each possible sequence has the same probability $2^{-n}$. So enough to count the number of ways to split $n$ into $k+1$ positive parts.

::

EXERCISE 26: (Medium)[dtwo16.png]

[Hint]

Let $X_i$ be indicator for there being a changeover at position $i$ for $i=2,...,n.$ By definition, there cannot be a changeover at $i=1.$

Then the answer is $\sum_2^n E(X_i). $ Now $X_i = 1$ iff the $(i-1)$-th and $i$-th tosses result in either $HT$ or $TH.$

So $E(X_i) = \cdots.$

::

EXERCISE 27: (Medium)[dtwo18.png]

::

EXERCISE 28: (Hard)[dtwo19.png]

::

EXERCISE 29: [dtwo20.png]

::

EXERCISE 30: [dtwo21.png]

::

EXERCISE 31: [dtwo22.png]

[Hint]

Let $X_i=\left\{\begin{array}{ll}1&\text{if }$i$-th coupon was not seen before\\ 0&\text{otherwise.}\end{array}\right.. $

Let $X = \sum_1^k X_i$. To find $E(X_i)$ and $V(X_i)$.

Clearly, $X_1\equiv 1$. Now find $E(X_i)$ and $E(X_iX_j)$ for $i]\neq j$.

::

EXERCISE 32: [dtwo23.png]

::

EXERCISE 33: At a party $N$ men throw their hats into a corner, and each man picks up a hat randomly. Let $X$ be the number of men who get their own hats. Find $E(X)$ and $V(X).$

::

EXERCISE 34: [dtwo25.png]

::

EXERCISE 35: [dtwo26.png]

::

EXERCISE 36: [dtwo27.png]

::

EXERCISE 37: [dtwo28.png]

::

EXERCISE 38: [dtwo29.png]

::

EXERCISE 39: [dtwo30.png]

::

EXERCISE 40: [dtwo31.png]

::

EXERCISE 41: [dtwo32.png]

Here Exercise 16 refers to the last problem.

::

EXERCISE 42: [dtwo33.png]

::

EXERCISE 43: [dtwo34.png]

::

EXERCISE 44: [dtwo35.png]

::

EXERCISE 45: [dtwo36.png]