Multinomial distribution (part 1)

The multinomial distribution is a direct generalisation of the binomial distribution. We often think of the $Binom(n,p)$ distribution as the distribution of the number, $X$, of heads obtained in $n$ independent tosses of a coin with $P(head)=p.$ If we replace the coin with a die with probabilities $p_1,...,p_6$ for the different faces, and let $X_i$ denote the frequency of the $i$-th face in $n$ independent rolls of the die, then the joint distribution of $(X_1,...,X_6)$ is called multinomial with parameters $n,\v p$, where $\v p=(p_1,...,p_6)'.$

In general, we have the following definition.

Definition: Multinomial distribution Let $n\in{\mathbb N}$ and let $\v p=(p_1,...,p_k)'$ be a probability vector. Then $Multinom(n,\v p)$ is the discrete distribution with PMF $$f(x_1,...,x_k) =\left\{\begin{array}{ll}\frac{n!}{x_1!\cdots x_k!} p_1^{x_1}\cdots p_k^{x_k}&\text{if }\sum x_i=n \mbox{ and } \forall i~~x_i\in\{0,1,2,...,n\}\\ 0&\text{otherwise.}\end{array}\right. $$

Just as $Bernoulli(p)\equiv Binom(1,p)$ is an important special case, and deals with a single toss of the coin, similarly $Multinom(1,\v p)$ deals with a single roll of the die.

The following facts are simple generalisations of corresponding facts from the binomial distribution.

Theorem If $\v X\sim Multinom(m,\v p)$ and $\v Y\sim Multinom(n,\v p)$ are independent, then $$\v X + \v Y\sim Multinom(m+n,\v p).$$

In particular, we can think of $Multinom(n,\v p)$ as the distribution of the sum of $n$ independent $Multinom(1,\v p)$ random vectors.

Multinomial distribution (part 2)

Video for this section

Suppose we roll the same die twice independently. Accordingly we get $\v X,\v Y$ both having $ Multinom(1,\v p)$ distribution. Let $\v X=(X_1,...,X_k)'$, $\v Y=(Y_1,...,Y_k)'$ and $\v p=(p_1,...,p_k)'$, then

$\forall i~~E(X_i) = E(X_i^2)=p_i.$
$\forall i\neq j~~E(X_iX_j) = 0$
$\forall i,j~~E(X_iY_j) = p_ip_j$

An immediate consequence of this is the following theorem.

Theorem If $\v X\sim Multinom(n,\v p)$ where $\v X=(X_1,...,X_k)'$ and $\v p=(p_1,...,p_k)'$, then

$\forall i~~E(X_i) = np_i$
$\forall i~~V(X_i) = np_i(1-p_i)$
$\forall i\neq j~~cov(X_i,X_j) = -np_ip_j.$

Problem set 1

EXERCISE 1: If $(X_1,...,X_k)\sim Multinom(n,(p_1,...,p_k))$ for $k\geq 3,$ then find the distribution of $X_1+X_3.$

Dirichlet distribution

Video for this section

Now we are going to work with a multivariate distribution called the Dirichlet distribution, which is a multivariate generalisation of the Beta distribution.

Definition: Dirichlet distribution We say that $(X_1,...,X_p)$ has Dirichlet distribution with parameters $a_1,...,a_p,a_{p+1}>0$ and write $(X_1,...,X_p)\sim Dir(a_1,...,a_p,a_{p+1}),$ if the joint density is $$f(x_1,...,x_p) = \left\{\begin{array}{ll}c x_1^{a_1-1}x_2^{a_2-1}\cdots x_p^{a_p-1}(1-x_1-\cdots-x_p)^{a_{p+1}-1}&\text{if }(x_1,...,x_p)\in D_p\\ 0&\text{otherwise.}\end{array}\right. $$ where $$D_p = \{(x_1,...,x_p)~:~\forall i~~x_i\geq 0,~~\sum_1^p x_i\leq1\},$$ and $$c = \frac{\Gamma(a_1+\cdots+a_{p+1})}{\Gamma(a_1)\cdots \Gamma(a_{p+1})}.$$

Look at this density carefully and get comfortable with the fact that there are only $p$ of the $x_i$'s, while we have $p+1$ of the $a_i$'s.

When $p=1$ we have $X_1\sim Beta(a_1,a_2).$ This is supported on $D_1 = [0,1].$

For $p=2$ and $p=3$ the supports are shown below.


Supports of Dirichlet distribution

In general, shapes like $D_p$ are called simplices (singular simplex) in ${\mathbb R}^p.$

It is not immediately obvious that the total integral of this function is indeed 1. However, it is easy for $p=1,$ because if $X_1\sim Dir(a_1,a_2)$ then $X_1\sim Beta(a_1,a_2).$ Starting with this as the basis, we can use induction over $p$ to establish the general case (easy, try it!).

The following properties are all obvious from the definition.

Theorem If $(X_1,X_2,...,X_p)\sim Dir(a_1,a_2,...,a_p,a_{p+1})$, then

for any $k\geq 2$ and distinct $i_1,...,i_k\in\{1,...,p\}$ we have $(X_{i_1},...,X_{i_k})\sim Dir(a_{i_1},...,a_{i_k},a-(a_{i_1}+\cdots+a_{i_k})),$ where $a = a_1+\cdots+a_{p+1}$.
each $X_i\sim Beta\left(a_i,\sum_{j\neq i} a_j\right).$

We can immediately write down the mean (i.e., expectation) and variance of each $X_i$ from results of Beta distribution.

Problem set 2

EXERCISE 2: If $(X_1,...,X_p)\sim Dir(a_1,...,a_p,a_{p+1}),$ then find the joint distribution of $(X_1+X_2,X_3,...,X_p).$

EXERCISE 3: Let $\v \Pi$ be a random vector $(\Pi_1,...,\Pi_k,1-\Pi_1-\cdots\Pi_k),$ where $(\Pi_1,...,\Pi_k)\sim Dir(a_1,...,a_{k+1}).$ Let the conditional distribution of $\v X$ given $(\Pi_1,...,\Pi_k)$ be $Multinom(n,\v \Pi).$ Then show that the conditional distribution of $\v \Pi$ given $\v X$ is $Dir(a_1+X_1,...,a_{k+1}+X_{k+1}).$

This is the multivariate analogue of beta-binomial distribution used in Bayesian machine learning that we had discussed earlier.

EXERCISE 4: Let $X_1,...,X_{p+1}$ be independent with $X_i\sim Gamma(1,\alpha_i).$ Let $S=X_1+\cdots+X_{p+1}.$ Then show that $$\left( \frac{X_1}{S},...,\frac{X_p}{S} \right)\sim Dir(\alpha_1,...,a_{p+1}).$$

Miscellaneous problems

EXERCISE 5: [wilks1.png]

The abbreviation "p.f." means PMF. Equation (6.3.3) referred to in the problem is as follows.

Equation 6.3.3

EXERCISE 6: [wilks3.png]

This exercise is from a very old book, that uses a definition of multinomial notationally different from the modern defintion that we are using. If we roll a $6$-faced die with probabilities $p_1,...,p_6$ independently $n$ times, and $X_i = $ number of times face $i$ shows up, then we shall write $(X_1,...,X_6)\sim Multinom(n,p_1,...,p_6).$ But this book will write $(X_1,...,X_5)\sim m(n,p_1,...,p_5).$ In this notation, $X_1+\cdots+X_5\leq n$ and $p_1+\cdots+p_5\leq 1.$

EXERCISE 7: [wilks4.png]

EXERCISE 8: [wilks5.png]

EXERCISE 9: [wilks6.png]

This problem is wrong. The distribution should be $Dir\left(\frac 12,...,\frac 12\right)$.

EXERCISE 10: [wilks8.png]