Video for this section
The multinomial distribution is a direct generalisation of the binomial distribution. We often think of the $Binom(n,p)$
distribution as the distribution of the number, $X$, of heads obtained in $n$
independent tosses of a coin with
$P(head)=p.$ If we replace the coin with a die with probabilities $p_1,...,p_6$ for the
different faces, and let
$X_i$ denote the frequency of the $i$-th face in $n$ independent rolls of the die, then
the joint distribution of
$(X_1,...,X_6)$ is called multinomial with parameters $n,\v p$, where $\v p=(p_1,...,p_6)'.$
In general, we have the following definition.
Just as $Bernoulli(p)\equiv Binom(1,p)$ is an important special case, and deals with a single toss of the coin, similarly
$Multinom(1,\v p)$ deals with a single roll of the die.
The following facts are simple generalisations of corresponding facts from the binomial distribution.
In particular, we can think of $Multinom(n,\v p)$ as the distribution of the sum of $n$ independent $Multinom(1,\v p)$
random vectors.
Video for this section
Suppose we roll the same die twice independently. Accordingly we get
$\v X,\v Y$ both having $ Multinom(1,\v p)$ distribution. Let $\v
X=(X_1,...,X_k)'$, $\v Y=(Y_1,...,Y_k)'$ and $\v
p=(p_1,...,p_k)'$, then
$\forall i~~E(X_i) = E(X_i^2)=p_i.$
$\forall i\neq j~~E(X_iX_j) = 0$
$\forall i,j~~E(X_iY_j) = p_ip_j$
An immediate consequence of this is the following theorem.
Video for this section
Now we are going to work with a multivariate distribution called the Dirichlet distribution,
which is a multivariate generalisation of the Beta distribution.
Look at this density carefully and get comfortable with the fact that there are only $p$ of the $x_i$'s, while
we have $p+1$ of the $a_i$'s.
When $p=1$ we have $X_1\sim Beta(a_1,a_2).$ This is supported on $D_1 = [0,1].$
For $p=2$ and $p=3$ the supports are shown below.
Supports of Dirichlet distribution
In general, shapes like $D_p$ are called simplices (singular simplex) in ${\mathbb R}^p.$
It is not immediately obvious that the total integral of this function is indeed 1. However, it is easy for $p=1,$ because
if $X_1\sim Dir(a_1,a_2)$ then $X_1\sim Beta(a_1,a_2).$
Starting with this as the basis, we can use induction over $p$ to establish the general case (easy, try it!).
The following properties are all obvious from the definition.
We can immediately write down the mean (i.e., expectation) and variance of each $X_i$ from results of Beta distribution.
EXERCISE 2: If $(X_1,...,X_p)\sim Dir(a_1,...,a_p,a_{p+1}),$ then find the joint distribution of $(X_1+X_2,X_3,...,X_p).$
EXERCISE 3: Let $\v \Pi$ be a random vector $(\Pi_1,...,\Pi_k,1-\Pi_1-\cdots\Pi_k),$ where
$(\Pi_1,...,\Pi_k)\sim Dir(a_1,...,a_{k+1}).$ Let the conditional distribution of $\v
X$ given $(\Pi_1,...,\Pi_k)$ be $Multinom(n,\v \Pi).$
Then show that the conditional distribution
of $\v \Pi$ given $\v X$ is $Dir(a_1+X_1,...,a_{k+1}+X_{k+1}).$
This is the multivariate analogue of beta-binomial distribution used in Bayesian machine learning that we had discussed earlier.
::
EXERCISE 4: Let $X_1,...,X_{p+1}$ be independent with $X_i\sim Gamma(1,\alpha_i).$ Let
$S=X_1+\cdots+X_{p+1}.$ Then show that
$$\left( \frac{X_1}{S},...,\frac{X_p}{S} \right)\sim Dir(\alpha_1,...,a_{p+1}).$$
The abbreviation "p.f." means PMF. Equation (6.3.3) referred to in the problem is as follows.
::
EXERCISE 6: [wilks3.png]
This exercise is from a very old book, that uses a definition of multinomial notationally different from the modern defintion
that we are using. If we roll a $6$-faced die with probabilities $p_1,...,p_6$
independently $n$ times, and $X_i = $ number of times face $i$ shows up, then
we shall write $(X_1,...,X_6)\sim Multinom(n,p_1,...,p_6).$ But this book will write
$(X_1,...,X_5)\sim m(n,p_1,...,p_5).$ In this notation, $X_1+\cdots+X_5\leq n$ and $p_1+\cdots+p_5\leq 1.$
::
EXERCISE 7: [wilks4.png]
::
EXERCISE 8: [wilks5.png]
::
EXERCISE 9: [wilks6.png]
This problem is wrong. The distribution should be $Dir\left(\frac 12,...,\frac 12\right)$.
::