[Home]

Matrix algorithms
Gauss-Jordan elimination
Pivoting
Inversion using Gauss-Jordan elimination
Determinant using Gaussian elimination
$QR$ decomposition
Householder transformation
Using Householder for $QR$
Efficient implementation
Application to least squares
What if $A$ is not full column rank?
Eigenanalysis: power method
When does it work?

$\newcommand{\bx}{{\bf x}} \newcommand{\by}{{\bf y}} \newcommand{\bu}{{\bf u}} \newcommand{\bb}{{\bf b}} \newcommand{\bv}{{\bf v}} \renewcommand{\bc}{{\bf c}} \newcommand{\bz}{{\bf 0}} \newcommand{\ba}{{\bf a}} \newcommand{\bq}{{\bf q}} \newcommand{\bp}{{\bf p}} $ Matrix algorithms

Last updated on: TUE MAY 04 IST 2021

Matrix algorithms

Gauss-Jordan elimination

We shall start with a few concepts already familiar to you.

EXAMPLE: Consider the following three equations: $$\begin{eqnarray*} 2x-3y+z& =& 5\\ 3x+y+2z& =& 15\\ 2x + 3z& =& -2 \end{eqnarray*}$$ This is an example of a system of linear equations.

We can write a system of $m$ linear equations in $n$ unknowns using matrix notation: $$ A \bx = \bb, $$ where $A_{m\times n}$ is called the coefficient matrix, $\bx_{n\times 1}$ is called the vector of unknowns, and $\bb_{m\times 1}$ is called the rhs vector.

EXAMPLE: The system in the last example can be written as $$ \left[\begin{array}{ccccccccccc}2& -3& 1\\3& 1& 2\\2& 0& 3 \end{array}\right] \left[\begin{array}{ccccccccccc}x\\y\\z \end{array}\right] = \left[\begin{array}{ccccccccccc}5\\15\\-2 \end{array}\right]. $$ Thus, here $$ A = \left[\begin{array}{ccccccccccc}2& -3& 1\\3& 1& 2\\2& 0& 3 \end{array}\right],~~~\bx = \left[\begin{array}{ccccccccccc}x\\y\\z \end{array}\right],~~~ \bb = \left[\begin{array}{ccccccccccc}5\\15\\-2 \end{array}\right]. $$

If the rhs vector $\bb$ is $\bz$, then we call the system homogeneous, else we call it non-homogeneous.

Any value of $\bx$ for which $A\bx = \bb,$ is called a solution of the system. A system of linear equations has either exactly one solution, or infinitely many solutions or no solution at all. In the last case we call the system inconsistent. Otherwise it is called consistent. The next example shows the method for solving a system of linear equations that we learn at high school.

EXAMPLE: Consider the following system of three linear equations, which we call $\alpha_1,\beta_1$ and $\gamma_1.$ $$\begin{array}{lrrrrrrrrrrrr} \alpha_1 :~~~& x& -y& +z& =& 2 \\ \beta_1 :~~~& 2x& +5y& -z& =& 9 \\ \gamma_1 :~~~& x& +2y& -3z& =& -4 \end{array}$$ In high school we used to solve this by eliminating the unknowns one by one until only one remained. Here we shall do this for all the unknowns simultaneously. Let us first eliminate $x$ from the last two equations by subtracting multiples of the first equation from them. Here are the resulting 3 equations, which we call $\alpha_2,\beta_2$ and $\gamma_2.$ $$\begin{array}{lrrrrrrrrrrrr} \alpha_2=& \alpha_1 :~~~& x& -y& +z& =& 2 \\ \beta_2=& \beta_1-2\alpha_1 :~~~& & 7y& -3z& =& 5 \\ \gamma_2=& \gamma_1-\alpha_1 :~~~& & 3y& -4z& =& -6 \end{array}$$ We want the coefficient of $y$ in the second equation to be $1:$ $$\begin{array}{lrrrrrrrrrrrr} \alpha_3=& \alpha_2 :~~~& x& -y & +z& =& 2 \\ \beta_3=& \frac 17\beta_2 :~~~& & y& -\frac 37z& =& \frac 57 \\ \gamma_3=& \gamma_2 :~~~& & 3y & -4z& =& -6 \end{array}$$ Now let us eliminate $y$ from the all the equations except the second one: $$\begin{array}{lrrrrrrrrrrrr} \alpha_4=& \alpha_3+\beta_2 :~~~& x& & +\frac 47z& =& \frac{19}{7} \\ \beta_4=& \beta_3 :~~~& & y& -\frac 37z& =& \frac 57 \\ \gamma_4=& \gamma_3-3\beta_2 :~~~& & & -\frac{19}{7}z& =& -\frac{57}{7} \end{array}$$ Next, we want the coefficient of $z$ in the third equation to be $1:$ $$\begin{array}{lrrrrrrrrrrrr} \alpha_5=& \alpha_4 :~~~& x& & +\frac 47z& =& \frac{19}{7} \\ \beta_5=& \beta_4 :~~~& & y& -\frac 37z& =& \frac 57 \\ \gamma_5=& -\frac{7}{19}\gamma_4 :~~~& & & z& =& 3 \end{array}$$ Finally, eliminate $z$ from all but the last equation: $$\begin{array}{lrrrrrrrrrrrr} \alpha_6=& \alpha_5-\frac 74\gamma_5 :~~~& x& & & =& 1 \\ \beta_6=& \beta_5+\frac 73 \gamma_5 :~~~& & y& & =& 2 \\ \gamma_6=& \gamma_5 :~~~& & & z& =& 3 \end{array}$$ This gives us the final solution.

This is what is called Gauss-Jordan elimination in computational matrix theory. While doing Gauss-Jordan elimination it is customary to write the system at each step in the augmented matrix form. This is done in the example below.

EXAMPLE: The augmented matrix form of the given system is as follows. $$\left[\begin{array}{rrr|r} 1& -1& 1& 2 \\ 2& 5& -1& 9 \\ 1& 2& -3& -4 \end{array}\right]$$ It is obtained by appending the rhs after the matrix. We draw a vertical line to keep the rhs separate.

Here is a sequence of augmented matrices that we encountered during the process: $$ \left[\begin{array}{rrr|r} \fbox1 & -1 & 1 & 2\\ 2 & 5 & -1 & 7\\ 1 & 2 & -3 & -4 \end{array}\right] \stackrel{SP}{\longrightarrow} \left[\begin{array}{rrr|r} 1 & -1 & 1 & 2\\ 0 & \fbox7 & -3 & 5\\ 0 & 3 & -4 & -6 \end{array}\right] \stackrel{M}{\longrightarrow} \left[\begin{array}{rrr|r} 1 & -1 & 1 & 2\\ 0 & \fbox1 & -\frac 37 & \frac 57\\ 0 & 3 & -4 & -6 \end{array}\right] \stackrel{SP}{\longrightarrow} \left[\begin{array}{rrr|r} 1 & 0 & \frac 47 & 2\\ 0 & 1 & -\frac 37 & \frac 57\\ 0 & 0 & \fbox{$-\frac{19}{7}$} & -\frac{57}{7} \end{array}\right] \stackrel{M}{\longrightarrow} \left[\begin{array}{rrr|r} 1 & 0 & \frac 47 & 2\\ 0 & 1 & -\frac 37 & \frac 57\\ 0 & 0 & \fbox1 & 3 \end{array}\right] \stackrel{SP}{\longrightarrow} \left[\begin{array}{rrr|r} 1 & 0 & 0 & 1\\ 0 & 1 & 0 & 2\\ 0 & 0 & 1 & 3 \end{array}\right] $$ Here we have used 3 symbols $S$, $M$ and $P$. Let us understand them. Before each step we choose an entry in the lhs of the augmented matrix (framed inside rectangles above). This is called the pivot, and its row and column are called the pivotal row and pivotal column. Initially the top left hand entry is chosen as the pivot.

An $M$-step divides the pivotal row by the pivot (so that the pivot becomes $1$).
An $S$-step subtracts suitable multiples of the pivotal row from the other rows to make the all the entries in the pivotal column zero (except the pivot itself).
The $P$ step moves the pivot one step downwards and to the right.

Gauss-Jordan elimination (without pivoting) In Gauss-Jordan elimination of a $n\times n$ system we start with the pivot at the $(1,1)$-th position. Then we perform the following operations. $$ \underbrace{(M,S,P),\cdots,(M,S,P)} _{n-1\mbox{ times}},M,S. $$

If the current position of the pivot is $(p,p)$, then the following R code achieves the steps:

M = function() {mat[p,] <<- mat[p,]/mat[p,p]; print(mat)}
S = function(i) { mat[i,] <<- mat[i,] - mat[p,]*mat[i,p]; print(mat)}
P = function() { p <<- p + 1; print(p) }

Here we have used the less known <<- operator. If x is a variable defined outside a function, and you write x = 5 inside the function, then this assignment has no effect outside the function:

x = 3
f = function() {x = 5}
f()
x

Try these to see that x is still 3. However, if you use the <<- operator, then the assignment is visible outside the function:

x = 3
f = function() {x <<- 5}
f()
x

Now x is 5.

To try these out let us create a random system:

A = matrix(sample(100, 16, rep=TRUE),4,4)
b = sample(100, 4, rep=T)

The sample function draws an SRSWOR or SRSWR. The first argument specifies the population, the second is the sample size. The rep argument specifies whether replacement is allowed.

Let's make the augmented matrix:

mat = cbind(A,b)
p = 1

Now let's apply our steps:

M()
S(2)
S(3)
S(4)

That finishes the first pass. The second pass starts by updating the pivot:

P()
M()
S(1)
S(3)
S(4)

Pivoting

The $M$- and $S$-steps require the pivot to be nonzero. However, in the above algorithm the pivot may become zero.

EXAMPLE: If the system is $$\begin{array}{lrrrrrrrrrrrr} & 3 y & - 5z & = & -1\\ 4x & + y & + z & = & 6\\ -x & - 8y & - z & = & -4 \end{array}$$ then the $(1,1)$-th entry is 0.

Yet it is hardly a problem, because we just have to use some other equation to eliminate $x$ with. For this we may choose just any equation that has $x$ in it (i.e., the coefficient of $x$ is nonzero). For example, we may choose the $3$rd equation: equations to rewrite the system as $$\begin{array}{lrrrrrrrrrrrr} & 3 y & - 5z & = & -1\\ 4x & + y & + z & = & 6\\ \fbox{$-x$} & - 8y & - z & = & -4 \end{array}$$ Or we could have eliminated some other variable instead of $x$ using the first equation: $$\begin{array}{lrrrrrrrrrrrr} & \fbox{$3 y$} & - 5z & = & -1\\ 4x & + y & + z & = & 6\\ -x & - 8y & - z & = & -4 \end{array}$$

These are called pivoting. It is useful even when the pivot is nonzero, but is very small in absolute value. This is because division by numbers near zero in a computer introduces large errors in the output.

In terms of augmented matrices, pivoting means choosing the pivot position appropriately at the start of each step. There are just two rules to keep in mind:

the pivot must not be zero (i.e., if you plan to eliminate a variable using some equation, then that variable should be present in that equation!).
if you have already used the $(i,j)$-th entry as the pivot in some step, then no future pivot should be from the $i$-th row or the $j$-th column. In other words, never eliminate the same variable twice, and never use the same equation to eliminate multiple variables (else, the first variable may come back!).

Also we follow another desirable property:

Among all available pivot positions (i.e., all $(i,j)$'s where the $i$-th row and the $j$-th column are still unused), choose the pivot as the entry farthest from $0.$

Most textbooks present pivoting in terms of swapping rows and columns of the augmented matrix. For example, if at the very first step you'd like to use the $(2,3)$-th entry as the pivot, then they would say: swap rows 1 and 2, and swap columns 1 and 3. Thus, for them the pivot position at the $k$-th step is always $(k,k).$ If you want anything else at that position, you manually swap the rows and columns to bring it at the $(k,k)$-th position.

In our example we swap row 2 with row 3, and column 2 with column 3 to get $$ \left[\begin{array}{rrr|r} \fbox0 & 3 & -5 & -1\\ 4 & 1 & 1 & 6\\ -1 & -8 & -1 & -4 \end{array}\right] \stackrel{P}{\longrightarrow} \left[\begin{array}{rrr|r} \fbox{-8} & -1 & -1 & -4\\ 1 & 4 & 1 & 6\\ 3 & 0 & -5 & -1 \end{array}\right] $$ Here $P$ is our symbol for pivoting.

Recall that the columns of the matrix correspond to the variables of the equations. So swapping the columns also involves reordering the variables. A simple way to keep track of the order of the variables is to write the variables above the columns. If we call the variables as $x,y,z$ in the last example then we shall write: $$ \left[\begin{array}{rrr|r} x & y & z\\ \hline \fbox0 & 3 & -5 & -1\\ 4 & 1 & 1 & 6\\ -1 & -8 & -1 & -4 \end{array}\right] \stackrel{P}{\longrightarrow} \left[\begin{array}{rrr|r} y & x & z\\ \hline \fbox{-8} & -1 & -1 & -4\\ 1 & 4 & 1 & 6\\ 3 & 0 & -5 & -1 \end{array}\right] $$ To do this in R we just need to change our P function.

P = function() {
  p <<- p + 1
  n = nrow(mat)
  candidates = mat[p:n,p:n]
  argmax = which.max(abs(candidates))
  #cat('argmax=',argmax)
  newRow = p + (argmax-1) %% nrow(candidates)
  newCol = p + floor((argmax-1) / nrow(candidates))
  cat('Best candidate at', newRow, newCol,'\n')

# Now we shall swap the rows p and newRow.
  tmp = mat[newRow,]
  mat[newRow,] <<- mat[p,]
  mat[p,] <<- tmp

# We also need to swap the row heads accordingly.
# First we extract the current row heads.
  rheads = rownames(mat)
# Next we perform the swap.
  tmp = rheads[newRow]
  rheads[newRow] = rheads[p]
  rheads[p] = tmp
# Finally we assign the swapped array of row heads to the matrix.
  rownames(mat) <<- rheads

# Similarly, for the column swap.
  tmp = mat[,newCol]
  mat[,newCol] <<- mat[,p]
  mat[,p] <<- tmp
  cheads = colnames(mat)
  tmp = cheads[newCol]
  cheads[newCol] = cheads[p]
  cheads[p] = tmp
  colnames(mat) <<- cheads

  print(mat)
}

Of course, we need to keep track of our swaps. For this we shall employ the ability of R to attach names to each row and column:

rownames(A) = 1:4
colnames(A) = paste('x',1:4,sep='')
A

Now proceed as before:

( mat = cbind(A, b) )
p = 0

Notice we are starting with pivot position 0. The only differecen is that we start with a pivoting step.

P()
M()
S(2)
S(3)
S(4)

The next pass is similar:

P()
M()
S(1)
S(3)
S(4)

Here you end up with

0 0 1 | 0.371951
1 0 0 | 1.33537
0 1 0 | 0.286585

The left hand side is a permuation matrix, which tells us how to interpret the right hand side: $z = 0.371951$, $x = 1.33537$ and $y = 0.286585$.

Gauss-Jordan elimination (with pivoting) We perform the following steps: $$ \underbrace{P,M,S,\cdots,P,M,S} _{n\mbox{ times}}. $$

Inversion using Gauss-Jordan elimination

Gauss-Jordan elimination (with pivoting) may be used to find inverse of a given nonsingular square matrix, since finding inverse is the same as solving the system $$ AX = I. $$

EXAMPLE: Suppose that we want to find the inverse of the matrix $\left[\begin{array}{ccccccccccc}1& -1& 1 \\ 2& 5& -1 \\ 1& 2& -3 \end{array}\right].$ Then we append an identity matrix of the same size to the right of this matrix to get the augmented matrix $$\left[\begin{array}{rrr|rrr} 1& -1& 1& 1 & 0 & 0\\ 2& 5& -1& 0 & 1& 0\\ 1& 2& -3& 0& 0& 1 \end{array}\right].$$ Now go on applying Gauss-Jordan elimination until the left hand matrix is reduced to a permuation matrix. The right hand part at the final step will give the required inverse (after the permutation). $$ \left[\begin{array}{rrr|rrr} \fbox1& -1& 1& 1 & 0 & 0\\ 2& 5& -1& 0 & 1& 0\\ 1& 2& -3& 0& 0& 1 \end{array}\right] \stackrel{M,S,P}{\longrightarrow} \left[\begin{array}{rrr|rrr} 1& -1& 1& 1& 0& 0 \\ 0& \fbox7 & -3& -2& 1& 0 \\ 0& 3 & -4& -1& 0& 1 \end{array}\right] \stackrel{M,S,P}{\longrightarrow} \left[\begin{array}{rrr|rrr} 1& 0& \frac{4}{7}& \frac{5}{7}& \frac{1}{7}& 0 \\ 0& 1& \frac{-3}{7}& \frac{-2}{7}& \frac{1}{7}& 0 \\ 0& 0 & \fbox{$\frac{-19}{7}$}& \frac{-1}{7}& \frac{-3}{7}& 1 \end{array}\right] \stackrel{M,S,P}{\longrightarrow} \left[\begin{array}{rrr|rrr} 1& 0& 0& \frac{91}{133}& \frac{7}{133}& \frac{4}{19} \\ 0& 1& 0& \frac{-35}{133}& \frac{28}{133}& \frac{-3}{19} \\ 0& 0& 1& \frac{1}{19}& \frac{3}{19}& \frac{-7}{19} \end{array}\right]$$ Thus, the required inverse is $$\left[\begin{array}{ccccccccccc} \frac{91}{133}& \frac{7}{133}& \frac{4}{19} \\\frac{-35}{133}& \frac{28}{133}& \frac{-3}{19} \\\frac{1}{19}& \frac{3}{19}& \frac{-7}{19} \end{array}\right].$$

Why this works: It is based on the simple fact from linear algebra that if we form the matrix product $AB,$ then the $j$-th column of $AB$ is actually $A\bb_j,$ where $\bb_j$ is the $j$-th column of $B.$

From this it follows that the row operations are effectively multiplication by a matrix from the left. For example,

multiplying a row by a scalar is like this: $$ \left[\begin{array}{ccccccccccc}1 & 0 & 0\\0 & 2 & 0\\0& 0 & 1 \end{array}\right] \left[\begin{array}{ccccccccccc} b_{11} & b_{12} & b_{13} & b{14}\\ b_{21} & b_{22} & b_{23} & b{24}\\ b_{31} & b_{32} & b_{33} & b{34} \end{array}\right] = \left[\begin{array}{ccccccccccc} b_{11} & b_{12} & b_{13} & b{14}\\ 2b_{21} & 2b_{22} & 2b_{23} & 2b{24}\\ b_{31} & b_{32} & b_{33} & b{34} \end{array}\right]. $$
Adding a multiple of a row to another is like: $$ \left[\begin{array}{ccccccccccc}1 & 0 & 3\\0 & 1 & 0\\0& 0 & 1 \end{array}\right] \left[\begin{array}{ccccccccccc} b_{11} & b_{12} & b_{13} & b{14}\\ b_{21} & b_{22} & b_{23} & b{24}\\ b_{31} & b_{32} & b_{33} & b{34} \end{array}\right] = \left[\begin{array}{ccccccccccc} b_{11}+3b_{31} & b_{12}+3b_{32} & b_{13}+2b_{33} & b{14}+2b_{34}\\ 2b_{21} & 2b_{22} & 2b_{23} & 2b{24}\\ b_{31} & b_{32} & b_{33} & b{34} \end{array}\right]. $$

If a row operation is invertible (i.e., it is possible for you to recover the original matrix from the transformed matrix), then the corresponding premultiplir matrix must be nonsingular. Both the above operations are invertible (assuming the scalar multiplier in the first is nonzero), and so the corresponding premultiplier matrices are nonsingular. These premultipliers are called elementary matrices. They may be obtained by applying the row operations to the identity matrix. For example, to obtain the premultiplier for "subtracting 5 times the 4th row to the 2nd in a $5\times n$ matrix" start with the $5\times5$ identity matrix $$ \left[\begin{array}{ccccccccccc} 1&0&0&0&0\\ 0&1&0&0&0\\ 0&0&1&0&0\\ 0&0&0&1&0\\ 0&0&0&0&1 \end{array}\right]. $$ Then apply the operation on this to get $$ \left[\begin{array}{ccccccccccc} 1&0&0&0&0\\ 0&1&0&-5&0\\ 0&0&1&0&0\\ 0&0&0&1&0\\ 0&0&0&0&1 \end{array}\right]. $$ This is the reqired premultiplier (why?).

Thus, each step in the GJ algorithm is a left multiplication by a nonsingular matrix. Suppose that we start with $(A~|~I)$ and end with $(P~|~ B),$ where $P$ is a permutation matrix. This means $$ (P~|~B) = N(A~|~I), $$ where $N$ is nonsingular matrix representing the combined effect of all the GJ steps. Then $$ P = NA \mbox{ and } B = N. $$ Since $P$ is a permutation matrix, hence $P'P=I.$ So $P'NA = I.$ Thus, $A ^{-1} = P'N = P'B.$ In other words, you permute the rows of $B$ according to $P'$ to get $A ^{-1}.$

Determinant using Gaussian elimination

EXERCISE: Find out how one may use the Gauss-Jordan elimination algorithm to compute the determinant.

PROJECT:

Write a program that will apply Gauss-Jordan elimination to an arbitrary matrix of order $m\times n$ over ${\mathbb R}.$ It should perform the row swaps only when the pivot is zero. The resulting form is called the reduced row echelon form (RREF) of the matrix. This form has the interesting property that any two matrices with the same size and same row space must have the same RREF. The program should also find bases of the column space, row space and null space of the matrix using the computed RREF.

$QR$ decomposition

Theorem For any $n$ by $p$ matrix $A$ with $n\geq p$ we have an $n$ by $n$ orthogonal matrix $Q$ and an $n$ by $p$ upper triangular matrix $R$ such that $$A = QR.$$

This decomposition is the matrix version of the Gram-Schmidt Orthogonalization (GSO). [If you do not know GSO, then skip to the next heading: Householder transformation.]To see this we first consider the case where $A$ is full column rank ( i.e., the columns are independent.) Call the columns $$ \ba_1,...,\ba_p. $$ Apply GSO to get an orthonormal set of vectors $$ \bq_1,...,\bq_p $$ given by $$ \bq_i = unit(\ba_i-\sum_{j=1}^{i-1}(\bq_j'\ba_i) \bq_j) $$ computed in the order $i=1,...,p.$ Here $unit(\bv)$ is defined as $$ unit(\bv) = \bv/\|\bv\|,\quad \bv\neq \bz. $$ Notice that $span\{\ba_1,...,\ba_i\}=span\{\bq_1,...,\bq_i\}.$ So we can write $$\ba_j = r_{1j} \bq_1 + r_{2j} \bq_2 + \cdots + r_{jj} \bq_i.$$ Indeed, we have $$r_{ij} = \bq_i' \ba_j \mbox{ for } i \leq j.$$ Define the matrix $R$ using the $r_{ij}$'s, and form $Q$ with $\bq_i$'s as its columns.

If $A$ is not full column rank, then some $(\ba_i-\sum_{j=1}^{i-1}(\bq_j'\ba_i)\bq_j)$ will be zero, hence we cannot apply $unit$ to it. But then we can take $\bq_i$ equal to any unit norm vector orthogonal to $\bq_1,...,\bq_{i-1}$ and set $r_{ii}=0.$

However, GSO is not the best way to compute $QR$ decomposition of a matrix. This is because in the $unit$ steps you have to divide by the norms which may be too small. Division by small numbers in a computer may lead to numerical instability. The standard way to implement it is using Householder's transformation, that we discuss next.

Householder transformation

If $A$ is any orthogonal matrix, then we now that $\|Ax\| = \|x\|.$ In other words an orthogonal matrix does not change shape or size of an object. It can only rotate and reflect it. We want to ask if the reverse is true:

If $x\neq y$ are two vectors of same length, then does there exist an orthogonal $A$ that takes $x$ to $y$ and vice versa? That is, we are looking for an orthogonal $A$ (possibly depending on $x,y$) such that $$ Ax = y \mbox{ and } Ay=x? $$

The answer is ''Yes.'' In fact, there may be many. Householder's transform is one such: $$ A = I-2uu', \mbox{ where } u = unit(x-y). $$

EXERCISE: Show that this $A$ is orthogonal and it sends $x$ to $y$ and vice versa.

EXAMPLE: In general you need $n^2$ scalar multiplications to multiply an $n\times n$ matrix with an $n\times 1$ vector. However, show that if the matrix is a Householder matrix then only $2n+1$ scalar multiplications are needed.

SOLUTION: $$ A\bv = (I-2\bu\bu')\bv = \bv-2\bu\bu'\bv. $$ Now $\bu'\bv$ is a scalar that requires $n$ multiplications to compute. Multiply that with $2$ (one more multiplication) to get the scalar $2\bu'\bv = \lambda,$ say. Finally, multiply each entry of $\bu$ with $\lambda $ (requiring $n$ more multiplications).

Let's play with the idea using R:

unit = function(v) {v/sqrt(sum(v*v))}
hmult = function(u, x) {x - 2* sum(u*x)*u}
( x = unit(c(1,2,3)) )
( y = unit(c(4,2,5)) )
( u = unit(x-y) )
hmult(u,x)

EXERCISE: Show that in 2 dimensions Householder's transform is the only such transform. Show that this uniqueness does not hold for higher dimensions.

Using Householder for $QR$

The idea is to shave the columns of $X$ one by one by multiplying with Householder matrices. For any non zero vector $u$ define $H_u$ as the Householder matrix that reduces $u$ to $$ v = \left[\begin{array}{ccccccccccc}\|u\|^2\\0\\\vdots\\0 \end{array}\right]. $$

EXERCISE: Let us partition a vector $\bu$ as $$ \bu_{n\times 1} = \left[\begin{array}{ccccccccccc}\bu_1\\\bu_2 \end{array}\right], $$ where both $\bu_1,\bu_2$ are vectors ($\bu_2$ being $k\times 1).$ Consider the $n\times n$ matrix $$ H = \left[\begin{array}{ccccccccccc}I & 0\\0 & H_{\bu_2} \end{array}\right]. $$ Show that $H$ is orthogonal and compute $Hu.$ We shall say that $H$ shaves the last $k-1$ elements of $u.$

Let the first column of $X$ be $a.$ Let $H_1$ shave its lower $n-1$ entries. Consider the second column $b$ of $H_1A.$ Let $H_2$ shave off its lower $n-2$ entries. Next let $c$ denote the third column of $H_2H_1X,$ and so on. Proceeding in this way, we get $H_1,H_2,...,H_p$ all of which are orthogonal Householder matrices. Define $$ Q = (H_1H_2\cdots H_p)' $$ and $R = Q'X$ to get a $QR$ decomposition.

EXERCISE: Carry this out for the following $5\times 4$ case. $$ \left[\begin{array}{ccccccccccc} 1& 2 & 3 & 4\\ 1& 3 & 2 & 4\\ 1& -2 & 5 & 0\\ 7& 2 & 1 & 3\\ -2& 8 & 5 & 4 \end{array}\right] $$

We shall now apply it to a $4\times 3$ matrix. The steps are shown in the diagram below (red is the entry computed in that step, grey means already computed, black means 0):


Steps in the computation of $R$ and the $\bu$'s

shaver = function(x) {
   x[1] = x[1] - sqrt(sum(x*x))
   unit(x)
}

Next, let's create a matrix:

H1 = sample(1000, 6)
H2 = sample(1000, 6)
H3 = sample(1000, 6)
H4 = sample(1000, 6)
H5 = sample(1000, 6)

OK, ready for first pass:

(u=shaver(H1))
(H1 = hmult(u,H1)) #Just for checking!
(H2 = hmult(u,H2))
(H3 = hmult(u,H3))
(H4 = hmult(u,H4))
(H5 = hmult(u,H5))

The second pass is similar:

(u=shaver(H2[2:6]))
(H2[2:6] = hmult(u,H2[2:6]))
(H3[2:6] = hmult(u,H3[2:6]))
(H4[2:6] = hmult(u,H4[2:6]))
(H5[2:6] = hmult(u,H5[2:6]))

and so on.

Efficient implementation

Notice that though the Householder matrix $$ I - 2\bu\bu' $$ is an $n\times n$ matrix, it is actually determined by only $n$ numbers. Thus, we can effectively store the matrix in linear space. In particular, the matrix $H_1$ needs only $n$ spaces, $H_2$ needs only $n-1$ spaces and so on. So we shall try to store these in the ``shaved'' parts of $X.$ Let $H_1 = 1-2\bu_1\bu_1'$ and $H_1X$ be partitioned as $$ \left[\begin{array}{ccccccccccc}\alpha & v'\\0 & X_1 \end{array}\right]. $$ Then we shall try to store $\bu_1$ in place of the 0's. But $\bu_1$ is an $n\times 1$ vector, while we have only $n-1$ zeroes. So the standard practice is to store $\alpha$ (which is the squared norm of the first column) in a separate array, and store $\bu_1$ in place of the first column of $A.$ The final output will be a $n\times p$ matrix and a $p$-dimensional vector (which is like an extra row in $A).$ The matrix is packed with the $u$'s and the strictly upper triangular part of $R:$


Output of efficient $QR$ decomposition

The ``extra row'' stores the diagonal entries of $R.$ It is possible to ``unpack'' $Q$ from the $u$'s. However, if we need $Q$ only to multiply some $x$ to get $Qx,$ then even this unpacking is not necessary.

EXERCISE: Write a program that performs the above multiplication without explicitly computing $Q.$

Application to least squares

An important use of the $QR$ decomposition is in solving least squares problems. Here we are given a (possibly inconsistent) system $$ A\bx = \bb, $$ where $A$ is full column rank (need not be square.) Then we have the following theorem.

Theorem The above system has unique least square solution $\bx$ given by $$ \bx = (A'A)^{-1}A'\bb. $$ Note that the full column rankness of $A$ guarantees the existence of the inverse.

However, computing the inverse directly and then performing matrix multiplication is not an efficient algorithm. A better way (which is used in standard software packages) is to first form a $QR$ decomposition of $A$ as $A = QR.$ The given system now looks like $$ QR\bx = \bb. $$ The lower part of $R$ is made of zeros: $$ Q\left[\begin{array}{ccccccccccc}R_1\\O \end{array}\right]\bx = \bb, $$ or $$ \left[\begin{array}{ccccccccccc}R_1\\O \end{array}\right]\bx = Q'\bb, $$ Partition $Q'\bb$ appropriately to get $$ \left[\begin{array}{ccccccccccc}R_1\\O \end{array}\right]\bx = \left[\begin{array}{ccccccccccc}\bc_1\\\bc_2 \end{array}\right], $$ where $\bc_1$ is $p\times 1.$ This system is made of two systems: $$ R_1 \bx = \bc_1 $$ and $$ O \bx = \bc_2. $$ The first system is always consistent and can be solved by back-substitution. The second system is trivial and inconsistent unless $\bc_2=\bz.$

EXERCISE: Show that $x=R_1^{-1}\bc_1$ is the least square solution of the original system. Notice that you use back-substituting to find this $\bx$ and not direct inversion of $R_1.$

EXERCISE: Find a use of $\bc_2$ to compute the residual sum of squares: $$ \|\bb-A\bx\|^2. $$

PROJECT:

Write a program to find a least squares solution to the system $A\bx=\bb$. Your program should first implement the above efficient version of $QR$ decomposition of $A.$ Your program should be able to detect if $A$ is not full column rank, in which case it should stop with an error message. If $A$ is full column rank, then the program should output the unique least squares solution. Your program must never compute any Householder matrix explicitly.

What if $A$ is not full column rank?

You'll detect this during computation of the $QR$ decomposition:

If $A$ is not full column rank, then for some $k$, the $k$-th column of $A$ will be in the span of the preceding columns. At that point, the norm of $\bu$ will be zero.

There are two things you can do if you detect such a situation.

One is to take $I$ in place of the Householder matrix for that step. This is natural, because if the entries are already zero, then there is no need to "shave" them further!
However, if you aim is to compute an ONB of column space of $A$, or to compute least squares solution, then you should proceed differently. You should "throw away" those columns of $A$ that are in the spans of the preceding columns.

The following example illustrates both these approaches.

EXAMPLE: Let's take $$ A=\left[\begin{array}{ccccccccccc}3 & 6 & 1\\4 & 8 & 3\\0&0&4 \end{array}\right]. $$ Clearly, $r(A) = 2.$ The first step of the $QR$ algorithm will go smoothly, converting $A$ to $$ \left[\begin{array}{ccccccccccc} 5 & 10 & *\\ 0 & 0 & *\\ 0 & 0 & * \end{array}\right]. $$ In the second step we run into trouble. We are supposed to "shave" the two entries under the 10. But they are already zeroes. So the first approach will simply move on to the third step, and finally produce a $3\times 3$ upper triangular $R.$ This $R$ will not help you much to get any least squares solution of a system $A\bx = \bb.$

In the second approach, we shall throw away the second column to get: $$ \left[\begin{array}{ccccccccccc} 5 & *\\ 0 & *\\ 0 & * \end{array}\right]. $$ Then it will proceed to perform the second step again on this new matrix, i.e., the lowest $*$ will get shaved. The output of this version of the $QR$ algorithm is an $R$ matrix of size $3\times 2$. The top $2\times 2$ portion is a nonsingular upper triangular matrix, which will allow you to compute a least squares solution.

Eigenanalysis: power method

Let us start with the definition of eigenvalues and eigenvectors:

Definition: Eigenvalue and eigenvector Let $A_{n\times n}$ be any real/complex matrix. Then $\lambda\in{\mathbb C}$ is called an eigenvalue of $A$ with corresponding eigenvector $\bv\in{\mathbb C}^n$ if $\bv\neq\bz$ and $$ A \bv = \lambda \bv. $$

The importance of these concepts is by no means obvious from the definition. However, finding the eigenvalues and eigenvectors of matrices is a fundamental problem of numerical linear algebra. The subject in its entirety is well beyond the scope of the present course. Here We shall try to find only a single eigenvector and a corresponding eigenvalue of a given square matrix.

Our algorithm is very simple: We start with a vector $\bv$ and constructs the sequence $$ unit(A\bv), unit(A^2\bv), unit(A^3\bv), unit(A^4\bv),... $$ Here $unit(\bv)$ is the unit vector along a nonzero vector $\bv$, i.e., $$unit(\bv) = \bv/\|\bv\|.$$ Under some condition on $\bv,$ the sequence "converges" to an eigenvector.

We shall not explore the most general condition under which this algorithm works. But here is the most commonly used sufficient condition.

TheoremLet $A_{n\times n}$ have all real eigenvalues, $\lambda_1,...,\lambda_n$, with $$ \lambda_1 > |\lambda_2|\geq \cdot \geq |\lambda_n|. $$ Let $\bv = \sum_i c_i \bv_i$, where $\bv_i$'s are eigenvectors corresponding to $\lambda_i$'s, and $c_1\neq 0.$

Then the sequence $$ unit(A\bv), unit(A^2\bv), unit(A^3\bv), unit(A^4\bv),... $$ converges to $\bv_1$ if $ c_1>0$ and to $-\bv_1$ if $ c_1 < 0.$

Proof: Here $$ A^k \bv = \sum_i c_i \lambda_i^k \bv_i = \lambda_1^k \left(c_1 \bv_1 + \sum_{i=2}^n c_i \left(\tfrac{\lambda_i}{\lambda_1}\right)^k \bv_i\right). $$ So $$ unit(A^k \bv) = unit\left(c_1 \bv_1 + \sum_{i=2}^n c_i \left(\tfrac{\lambda_i}{\lambda_1}\right)^k \bv_i\right). $$ Now, if $(\bx_n)$ is a sequence of vectors with $\bx_n\rightarrow \bx\neq \bz,$ then $unit(\bx_n)\rightarrow unit(\bx).$

Hence $unit(A^k\bv) \rightarrow unit(c_1 \bv_1) = unit(\bv_1)$ if $c_1 >0$ and $-unit(\bv)$ is $ c_1 <0,$ as required. [QED]

Let's take a computational example:

A  = matrix(runif(9),3,3)
(u = unit(1:3))
(u= unit(A%*% u))

Just keep on running the last line repeatedly.

Comments

To post an anonymous comment, click on the "Name" field. This will bring up an option saying "I'd rather post as a guest."

Table of contents

Comments