next up previous contents index
Next: Some other matrices Up: Semiseparable matrices as covariance Previous: Semiseparable matrices as covariance   Contents   Index


The multinomial distribution

Before defining the multinomial distribution, the binomial distribution is discussed. These comments are based on the book [180] and on the web pages: http://www.stat.yale.edu/Courses/1997-98/
101/binom.htm. The use of the distribution is explained with an example. Briefly the binomial distribution describes the number of ``success'' outcomes when a Bernoulli experiment is repeated $ n$ times, independently.

More precisely: the binomial distribution describes the behavior of a count variable $ X$, which counts the number of successes in $ n$ observations, if the following conditions are satisfied:

The binomial distribution is denoted as $ B(n,p)$, with $ n$ denoting the number of observations and $ p$ the chance of success. This distribution is used in several examples, for example in:

Example 23   Suppose we have a group of individuals with a certain gene. These people have a 0.60 probability of getting a certain disease. If a study is performed on 300 individuals, with this specific gene, then the distribution of the variable describing the number of people who will get the disease has the following binomial distribution: $ B(300,0.6)$.

Example 24   The number of sixes rolled by a single die in $ 30$ rolls has a binomial distribution $ B(30,1/6)$.

The distribution function for the binomial distribution $ B(n,p)$ satisfies the following equation:

$\displaystyle \D(X=k) = C(n,k) p^k (1-p)^{n-k}.$    

where

$\displaystyle C(n,k) = \frac{n!}{k! (n-k)!}.$    

The mean and the variance of this distribution can be calculated, using the formulas above and are: (denote the mean as $ \mu_X$ and the variance as $ \sigma_X^2$)
$\displaystyle \mu_X = np$   $\displaystyle \sigma_X^2=np(1-p).$  

Before giving the distribution function we will try to explain what is meant with a multinomial distribution. A multinomial trials process is a sequence of independent, identically distributed random variables $ U_1, U_2, \ldots $, where each random variable can take now $ k$ values. For the Bernoulli process, this corresponds to $ k = 2$, (success and failure). Therefore this is a generalization of a Bernoulli trials process. We denote the outcomes by the integers $ 1,
2,\ldots, k$, for simplicity. This means that for a trial variable $ U_j$ we can write the distribution function in the following way.

$\displaystyle p_i = \D(U_j = i)$    for $\displaystyle i = 1, 2,\ldots, k$   % latex2html id marker 32714
$\displaystyle \mbox{ (and for any $j$\ ).}$    

Of course $ p_i > 0$ for each $ i$ and $ p_1 + p_2 + \cdots + p_k = 1$.

As with the binomial distribution, we are interested in the variables counting the number of times each outcome has occurred, where in the binomial case one variable for counting was enough, here we need $ k-1$. Thus, let (with $ \char93 $ we denote the cardinality of the set)

$\displaystyle Z_i = \char93 \{j \in \{1, 2,\ldots, n\}$    for which $\displaystyle U_j = i\}$    for $\displaystyle i = 1, 2,\ldots, k,$    

where $ n$ is the number of observations.

Note that

$\displaystyle Z_1 + Z_2 + \cdots + Z_k = n.$    

So if we know the values of $ k-1$ of the counting variables, we can find the value of the remaining counting variable as already mentioned before. Generalizing the binomial distribution we get the following function: (More information about these distributions can be found at http://www.math.uah.edu/statold/bernoulli/)

$\displaystyle \D(Z_1=j_1,Z_2=j_2, \ldots , Z_n=j_k) = k p_1 ^{j_1} p_2^{j_2} \ldots p_k^{j_k} $

with $ j_1+j_2+\cdots+ j_k=n$ and $ p_1 + p_2 + \cdots + p_k = 1$. Before we start calculating the covariance matrix, we will give an example:

Example 25   It is very easy to see that the dice experiment also fits in here. For example if one rolls 10 dice, we can calculate the probability that 1 and 2 occur once, and the other occur all two times. To calculate this, one needs the multinomial distribution.

For this distribution: $ \E(Z_i) = n p_i$, $ \Var(Z_i) = n p_i(1 - p_i)$ and $ \Cov(Z_i, Z_j) = n p_i p_j$. As an example we calculate the mean using the following binonium, and multinonium formulas:

$\displaystyle C(n,j)$ $\displaystyle =$ $\displaystyle \frac{n!}{(n-j)! j!}$  
$\displaystyle (x+y)^n$ $\displaystyle =$ $\displaystyle \sum_{j=0}^n C(n,j) x^{n-j} y^j$  
$\displaystyle C(n;j_1,\ldots,j_k)$ $\displaystyle =$ $\displaystyle \frac{n!}{j_1!\ldots j_k!}$  
$\displaystyle (x_1+x_2+ \cdots + x_k)^n$ $\displaystyle =$ $\displaystyle \sum_{j_1+j_2+ \cdots+ j_k=n} C(n;j_1,\ldots,j_k) x_1^{j_1}\ldots x_k^{j_k}.$  

For this distribution we calculate the mean of the variable $ X_1$, all the other variances and covariances can be calculated in a complete analogous way.
$\displaystyle \E(X_1)$ $\displaystyle =$ $\displaystyle \sum_{j_1} \ldots \sum_{j_k} j_1 \D(X_1=j_1,\ldots,X_k=j_k)$  
  $\displaystyle =$ $\displaystyle \sum_{j_1} \ldots \sum_{j_k} j_1 C(n;j_1,\ldots,j_k) p_1^{j_1}
\ldots p_k^{j_k}$  
  $\displaystyle =$ $\displaystyle \sum_{j_1} j_1 p_1^{j_1}\left( \sum_{j_2}\ldots\sum_{j_k}
\frac{n!}{j_1! \ldots j_k!} p_2^{j_2} \ldots p_k^{j_k} \right)$  


  $\displaystyle =$ $\displaystyle \sum_{j_1} \frac{j_1}{j_1!} p_1^{j_1}\left( \sum_{j_2}\ldots\sum_...
...frac{n! (n-j_1)!}{(n-j_1)! j_2! \ldots j_k!} p_2^{j_2} \ldots p_k^{j_k} \right)$  
  $\displaystyle =$ $\displaystyle \sum_{j_1} \frac{j_1 n!}{j_1! (n-j_1)!} p_1^{j_1} (p_2+\cdots+ p_k)^{j_2+\cdots+j_k}$  
  $\displaystyle =$ $\displaystyle \sum_{j_1} j_1 \frac{n!}{j_1! (n-j_1)!} p_1^{j_1}
(1-p_1)^{n-j_1}$  
  $\displaystyle =$ $\displaystyle \E(Z)$  
  $\displaystyle =$ $\displaystyle n p_1$  

where $ Z$ has a binomial distribution $ B(n,p_1)$, which gives us the last equality. The covariance matrix of this distribution looks like

% latex2html id marker 32816
$\displaystyle V= n \left( \begin{array}{ccccc} p_1...
...s \\  -p_1 p_k & -p_2 p_k & -p_3 p_k & \ldots & p_k(1-p_k) \end{array} \right).$    

This matrix can be rewritten as a semiseparable plus diagonal matrix in the following way. Denote:
$\displaystyle u$ $\displaystyle =$ $\displaystyle [p_1,p_2,p_3,\ldots,p_n]^T$  
$\displaystyle v$ $\displaystyle =$ $\displaystyle [-p_1,-p_2,-p_3,\ldots,-p_n]^T$  
$\displaystyle D$ $\displaystyle =$ $\displaystyle \diag(p_1,p_2,\ldots,p_n),$  

then the matrix can be written, in a more compact form as

$\displaystyle V= D + u v^T.$    

This is a semiseparable plus diagonal matrix.

The main reason of writing the covariance matrices in this form, is the simple expression of the inverses of this type of matrices. The book [95] states several theorems about the inverses of tridiagonal and semiseparable matrices, resulting in an inversion formula for this type of matrices, namely:

$\displaystyle V^{-1} = D^{-1} + \gamma \hat{u} \hat{v}^T$    

with $ \hat{u}_i = u_i/d_{i}$, $ \hat{v}_i=v_i/d_{i}$ and $ \gamma=-\left(1+\alpha \sum_{i=1}^{n} u_i v_i
d_{i}^{-1}\right)^{-1},$ with $ d_i$ as the diagonal elements of $ D$. This leads to the following explicit structure of the matrix $ V^{-1}$,

% latex2html id marker 32853
$\displaystyle V^{-1}=\left( \begin{array}{ccccc} \...
...\\  \gamma & \gamma &\dots & \gamma & \frac{1}{p_n}+\gamma \end{array} \right).$    

This matrix is clearly again semiseparable plus diagonal, as we expected according to the theorems of Chapter 1.


next up previous contents index
Next: Some other matrices Up: Semiseparable matrices as covariance Previous: Semiseparable matrices as covariance   Contents   Index
Raf Vandebril 2004-05-03