[Home]

Probability Generating Function


Probability Generating Function

What is a PGF?

Video for this section

Definition: Probability generating function (PGF)Let a random variable $X$ take only nonnegative integer values. Then its probability generating function (PGF) is defined as the power series $$p(t) = p_0 + p_1 t + p_2 t^2 +\cdots, $$ where $p_k = P(X=k)$ for $k=0,1,2,..$.

EXAMPLE 1: A random variable takes the values $1,2,5$ each with probability $\frac 13.$ Find its PGF.

SOLUTION: $p(t) = \frac 13(t+t^2+t^5)$. ■

::

EXERCISE 1: (Easy) Find the PGF of a random variable with $Binom(n,p)$ distribution.

Analytical properties

Video for this section

We have defined PGF as a power series, but we have not mentioned anything about its convergence yet.

Theorem Any PGF $p(t)$ must converge absolutely for $|t|\leq 1$.

Proof:If $|t|\leq 1$, then $\sum_k |p_k t^k| \leq \sum_k p_k = 1$.[QED]

::

EXERCISE 2: (Easy) Think of a case where the radius of convergence is $\infty$.

::

EXERCISE 3: (Easy) Think of a case where the radius of convergence is $1$.

TheoremA power series can be differentiated term by term any number of times in the interior of its interval of convergence.

Proof:Standard theorem from real analysis.[QED]

Probabilistic properties

Video for this section

Thanks to the above theorem, we can recover the probabilities from the PGF by repeated differentiation at $t=0$: $$p_n = \frac{p^{(n)}(0)}{n!}$$. Hence we get the following theorem.
Theorem If $X$ are $Y$ are two random variables both taking only nonnegative integer values, and they have PGFs matching over any open neighbourhood of $0$, then their distributions must match.

::

EXERCISE 4: (Easy) Show that for $n\in{\mathbb N}$ we have $$E(X(X-1)\cdots (X-n+1)) = p^{(n)}(1)$$.

::

EXERCISE 5: (Easy) Show that $E(t^X) = p(t)$.

::

EXERCISE 6: (Easy) Show that if $X,Y$ are independent random variables taking nonnegative integer values, with PGFs $\xi(t)$ and $\psi(t)$, then the PGF of $X+Y$ is $\xi(t)\psi(t)$.

[Hint]

Use the last exercise.

An application: Branching process

Imagine a cell that will split into two cells after exactly one minute. Then, after one more minute, each of these two cells will again split into two. If it goes on like this, then we shall have$2^n$ cells in the $n$-th generation. Clearly understand how the generations are numbered. The initial cell belonged to generation 0. We shall call the number of cells in the $n$-th generation $X_n$. Thus, $X_n = 2^n$. Also, notice that when a cell splits into children, the original cell ceases to exist.

This branching process is a deterministic one. Now let us consider a random branching process. Here again we start with a single cell in generation 0. Thus $X_0 = 1$. After a minute this cell "splits into" a random number of cells. The number may be any nonnegative integer. In particular, we allow the number to be 0 or 1 with the following interpretations: This number is the size of generation 1. We shall call it $X_1$. Let the distribution of $X_1$ be given by $P(X_1=k) = p_k$ for $k=0,1,2,...$.

After one more minute each cell in generation 1 will independently split into children following the same distribution. And the process will continue.

As before, $X_n$ will denote the number of cells in the $n$-th generation.

Animation

Cells will be shown as dots. The red dots denote cells that have died without leaving any child. Try the animation a number of times to explore various possibilities. svg

Math

We can do various interesting math with this process. But our aim is to show case a use of PGFs. So the problem we shall address is "What is the extinction probability for this process?"

By extinction we mean the event that $X_n=0$ for some $n\in{\mathbb N}$. Notice that if some $X_n=0$, then we must have $X_{n+1}= X_{n+2}=\cdots = 0$ also. So the extinction event is $$\bigcup_{n\in{\mathbb N}}\{X_n=0\}$$. Since $\{X_1=0\}\subseteq \{X_2=0\}\subseteq\cdots, $ hence the extinction probability is $\lim_{n\rightarrow\infty} P(X_n=0)=\theta$, say.

How to find it in terms of $p_0,p_1,p_2,..$.?

Simple cases

If $p_0 = 1$ (which implies $p_1=p_2=\cdots=0$), then the extinction probability is surely 1.

If $p_0>0$, but $p_0+p_1 =1$ (which implies $p_2=p_3=\cdots=0$), then also the extinction probability is 1 (why?)

In these cases, we had no births to counter the deaths. But if $p_n>0 $ for some $n\geq 2$, then we have births, and the interaction between births and deaths becomes rather complicated. That is where PGFs come to our help.

Using PGF

Let $\xi(t)$ be the PGF of $X_1$. In other words, $$\xi(t) = p_0+p_1t+p_2t^2+\cdots.$$

The next exercise is of central importance. ::

EXERCISE 7: (Medium) Show that $X_2 $ has PGF $\xi_2(t)=\xi(\xi(t))$ for $|t| < 1$.

[Hint]

$\xi_2(t) = E(t^{X_2}) = E\big( E(t^{X_2}|X_1) \big)$ by the tower property.

Let us compute $E(t^{X_2}|X_1=k)$ for $k=0,1,2,... $. In particular we shall focus on $k=3$ (the other cases being similar). Thus we are in a situation where the first generation consists of the 3 individuals, and the second generation consists of their children.
The second generation colour coded by parents
We have colour-coded the second generation individuals by the their parents. Let $Y_i$ be the contribution of the $i$-th individual of the first generation. We have assumed that each individual splits identically and independently of the rest. So $Y_1, Y_2, Y_3$ are IID random variables.

So $E(t^{X_2}|X_1=3) = E(t^{Y_1+Y_2+Y_3}|X_1=3) = E(t^{Y_1+Y_2+Y_3})$, since $Y_i$'s are independent of $X_1$ (number of my children has nothing to do with the number of my siblings!).

Now $E(t^{Y_1+Y_2+Y_3}) =E(t^{Y_1}t^{Y_2}t^{Y_3}) = E(t^{Y_1})E(t^{Y_2})E(t^{Y_3})$, since $Y_i$'s are independent (number of my children has nothing to do with how many children my siblings have!).

Finally, $E(t^{Y_1+Y_2+Y_3}) =E(t^{Y_1})E(t^{Y_2})E(t^{Y_3}) = \xi(t)\xi(t)\xi(t) = \big(\xi(t)\big)^3$, since $Y_i$'s all have the same distribution.

Clearly, the $3$ in the exponent came from our choice of $k$. So, in general, $E(t^{X_2}|X_1=k) = \xi(t)^k$ for $k=0,1,2,..$.

Thus, $E(t^{X_2}) = E\big(E(t^{X_2}|X_1) \big) = \sum_{k=0}^\infty \xi(t)^k p_k =\xi(\xi(t))$.

Notice that convergence is not a problem, because we have assumed $|t| < 1$, and so $|\xi(t)| < 1$ as well.

::

EXERCISE 8: (Medium) In general show that for $n\in{\mathbb N}$ the PGF of $X_n$ is $\xi_n(t) = \xi_{n-1}(\xi(t))=(\underbrace{\xi\circ\cdots\circ\xi}_{n})(t)$.

Now $P(X_n=0) = \xi_n(0)$.

So the extinction probability is $\theta = \lim_n \xi_n(0)$.

Clearly, since $\xi(t)$ is a continuous function, $\theta = \xi(\theta)$. In other words, $\theta$ must be a fixed point of the PGF.

Exploring fixed points

How many fixed points can $\xi(t) $ have? Surely $1$ is a fixed point, since $\xi(1) = 1$. If it is the only one, then $\theta$ must be $1$.

::

EXERCISE 9: (Easy) Show that in the simple cases discussed earlier $1$ is the only fixed point.

Notice that $\xi'(t)$ is always nonnegative. In fact, except in the trivial case of $p_0=1$, it is positive. So $\xi(t)$ is a strictly increasing function. Again, except in the case where $p_0+p_1=1$, the second derivative $\xi''(t)$ is positive, and so $\xi(t)$ is a convex function. Such a function can intersect the $y=x $ line at most twice.

EXAMPLE 2: If $p_0=0.1, p_1=0.5$ and $p_2=0.4$ (so the other $p_k$'s are all zeroes), then the graph of $\xi(t)$ looks as shown below.

You can notice two fixed points (i.e., points where the black curve cuts the red diagonal). One of the fixed points is at $t=1$. The other is in $(0,1)$. Which of the two fixed points will $\theta$ equal?

The answer is $\theta$ will always be the smallest fixed point. This follows from the exercises below.

::

EXERCISE 10: (Medium) Let $\mu$ be any fixed point of $\xi(t)$. Then show that $\forall n\in{\mathbb N}~~\xi_n(0)\leq \mu$.

::

EXERCISE 11: (Easy) Use the last exercise to show that $\theta$ must be $\leq$ all fixed points of $\xi(t)$.

Wrapping up

We have seen that (excepting the trivial case $p_0=0$) there are only two possibilities: exactly one fixed point (which must be $1$) or exactly two. In the first case, $\theta=1$ and in the second it is the smaller fixed point.

It will be nice if we have a quick way to know (based on the $p_n$'s) which case we are in.

The two cases are shown below graphically:
One simple way to distinguish them is by $\xi'(1)$. In the first case $\xi'(1) \leq 1$ and in the other $\xi'(1) > 1$. Just a little point here: we know that $\xi(t)$ converges over $[-1,1]$, but may not converge beyond $1$. So when we talk about the derivative at $1$, we mean the left hand derivative. But fortunately, the term by term differentiation rule works for finding this one-sided derivative as well. So $\xi'(1) = p_1+2p_2+3p_3+\cdots$ (may be $\infty$).

So the final answer is: Could you arrive at this impressive answer without using PGF?

::

EXERCISE 12: (Medium)

A brief note about probability generating functions: If $X$ takes non-negative integer values with $p_i = P(X=i)$ for $i=0,1,2,...$ then its probability genrating function is $$\Phi_X(t) = p_0 + p_1t + p_2 t^2 +\cdots.$$ Clearly this converges absolutely for $|t|\leq 1.$ In this problem we are assuming that it converges for all $t\in{\mathbb R}.$

[Hint]

(a) Let $Y =\left\{\begin{array}{ll}t^{x_0}&\text{if }X\leq x_0\\ 0&\text{otherwise.}\end{array}\right.. $

Then, for $t\in[0,1],$ we have $Y\leq t^X.$ (Remember that $x\mapsto t^x$ is a non-increasing function for $t\in[0,1]$).

So $E(Y)\leq E(t^X).$ Now $E(Y) = t^{x_0}P(X\leq x_0).$

Hence the result.

(b) Let $Z =\left\{\begin{array}{ll}t^{x_0}&\text{if }X\geq x_0\\ 0&\text{otherwise.}\end{array}\right.. $

Then, for $t\geq 1,$ we have $Z \leq t^X.$

Hence the result follows as in (a).