LU -Factorization, Cholesky
Factorization, Reduced Row Echelon
Form
6.1
Motivating Example: Curve Interpolation
Curve interpolation is a problem that arises frequently in computer graphics and in robotics
(path planning). There are many ways of tackling this problem and in this section we will
describe a solution using cubic splines. Such splines consist of cubic Bézier curves. They
are often used because they are cheap to implement and give more flexibility than quadratic
Bézier curves.
A cubic Bézier curve C(t) (in
2
3
R
or R ) is specified by a list of four control points
(b0, b2, b2, b3) and is given parametrically by the equation
C(t) = (1 − t)3 b0 + 3(1 − t)2t b1 + 3(1 − t)t2 b2 + t3 b3.
Clearly, C(0) = b0, C(1) = b3, and for t ∈ [0, 1], the point C(t) belongs to the convex hull of
the control points b0, b1, b2, b3. The polynomials
(1 − t)3, 3(1 − t)2t, 3(1 − t)t2, t3
are the Bernstein polynomials of degree 3.
Typically, we are only interested in the curve segment corresponding to the values of t in
the interval [0, 1]. Still, the placement of the control points drastically affects the shape of the
curve segment, which can even have a self-intersection; See Figures 6.1, 6.2, 6.3 illustrating
various configuations.
149
150
CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM
b1
b2
b0
b3
Figure 6.1: A “standard” Bézier curve
b1
b3
b0
b2
Figure 6.2: A Bézier curve with an inflexion point
6.1. MOTIVATING EXAMPLE: CURVE INTERPOLATION
151
b2
b1
b0
b3
Figure 6.3: A self-intersecting Bézier curve
Interpolation problems require finding curves passing through some given data points and
possibly satisfying some extra constraints.
A Bézier spline curve F is a curve which is made up of curve segments which are Bézier
curves, say C1, . . . , Cm (m ≥ 2). We will assume that F defined on [0, m], so that for
i = 1, . . . , m,
F (t) = Ci(t − i + 1), i − 1 ≤ t ≤ i.
Typically, some smoothness is required between any two junction points, that is, between
any two points Ci(1) and Ci+1(0), for i = 1, . . . , m − 1. We require that Ci(1) = Ci+1(0)
(C0-continuity), and typically that the derivatives of Ci at 1 and of Ci+1 at 0 agree up to
second order derivatives. This is called C2-continuity, and it ensures that the tangents agree
as well as the curvatures.
There are a number of interpolation problems, and we consider one of the most common
problems which can be stated as follows:
Problem: Given N + 1 data points x0, . . . , xN , find a C2 cubic spline curve F , such that
F (i) = xi, for all i, 0 ≤ i ≤ N (N ≥ 2).
A way to solve this problem is to find N + 3 auxiliary points d−1, . . . , dN+1 called de Boor
control points from which N Bézier curves can be found. Actually,
d−1 = x0 and dN+1 = xN
152
CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM
so we only need to find N + 1 points d0, . . . , dN .
It turns out that the C2-continuity constraints on the N Bézier curves yield only N − 1
equations, so d0 and dN can be chosen arbitrarily. In practice, d0 and dN are chosen according
to various end conditions, such as prescribed velocities at x0 and xN . For the time being, we
will assume that d0 and dN are given.
Figure 6.4 illustrates an interpolation problem involving N + 1 = 7 + 1 = 8 data points.
The control points d0 and d7 were chosen arbitrarily.
d2
d1
x2
x
d
1
7
d3
x3
d0
d6
x6
x4
x5
d4
d5
x0 = d−1
x7 = d8
Figure 6.4: A C2 cubic interpolation spline curve passing through the points x0, x1, x2, x3,
x4, x5, x6, x7
It can be shown that d1, . . . , dN−1 are given by the linear system
7
1
d
6x
d
2
1
1 − 32 0
1
4
1
0 d2
6x2
.
.
.
. . ... ...
.. =
..
.
0
1
4
1 d
6x
N −2
N −2
1
7
d
6x
d
2
N −1
N −1 − 32 N
It can be shown that the above matrix is invertible because it is strictly diagonally
dominant.
6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION
153
Once the above system is solved, the Bézier cubics C1, . . ., CN are determined as follows
(we assume N ≥ 2): For 2 ≤ i ≤ N − 1, the control points (bi0, bi1, bi2, bi3) of Ci are given by
bi0 = xi−1
2
1
bi1 = d
d
3 i−1 + 3 i
1
2
bi2 = d
d
3 i−1 + 3 i
bi3 = xi.
The control points (b10, b11, b12, b13) of C1 are given by
b10 = x0
b11 = d0
1
1
b12 = d
d
2 0 + 2 1
b13 = x1,
and the control points (bN
0 , bN
1 , bN
2 , bN
3 ) of CN are given by
bN
0 = xN −1
1
1
bN
1 =
d
d
2 N−1 + 2 N
bN
2 = dN
bN
3 = xN .
We will now describe various methods for solving linear systems. Since the matrix of the
above system is tridiagonal, there are specialized methods which are more efficient than the
general methods. We will discuss a few of these methods.
6.2
Gaussian Elimination and LU -Factorization
Let A be an n × n matrix, let b ∈ n
R
be an n-dimensional vector and assume that A is
invertible. Our goal is to solve the system Ax = b. Since A is assumed to be invertible,
we know that this system has a unique solution, x = A−1b. Experience shows that two
counter-intuitive facts are revealed:
(1) One should avoid computing the inverse, A−1, of A explicitly. This is because this
would amount to solving the n linear systems, Au(j) = ej, for j = 1, . . . , n, where
e
n
j = (0, . . . , 1, . . . , 0) is the jth canonical basis vector of R
(with a 1 is the jth slot).
By doing so, we would replace the resolution of a single system by the resolution of n
systems, and we would still have to multiply A−1 by b.
154
CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM
(2) One does not solve (large) linear systems by computing determinants (using Cramer’s
formulae). This is because this method requires a number of additions (resp. multipli-
cations) proportional to (n + 1)! (resp. (n + 2)!).
The key idea on which most direct methods (as opposed to iterative methods, that look
for an approximation of the solution) are based is that if A is an upper-triangular matrix,
which means that aij = 0 for 1 ≤ j < i ≤ n (resp. lower-triangular, which means that
aij = 0 for 1 ≤ i < j ≤ n), then computing the solution, x, is trivial. Indeed, say A is an
upper-triangular matrix
a
1 1
a1 2 · · · a1 n−2
a1 n−1
a1 n
0
a
2 2
· · · a2 n−2
a2 n−1
a2 n
. .
..
..
..
0
0
.
.
.
.
A =
.
. .
..
..
.
.
.
0
0
· · ·
0
an−1 n−1 an−1 n
0
0
· · ·
0
0
an n
Then, det(A) = a1 1a2 2 · · · an n = 0, which implies that ai i = 0 for i = 1, . . . , n, and we can
solve the system Ax = b from bottom-up by back-substitution. That is, first we compute
xn from the last equation, next plug this value of xn into the next to the last equation and
compute xn−1 from it, etc. This yields
xn = a−1
n nbn
xn−1 = a−1
n−1 n−1(bn−1 − an−1 nxn)
...
x1 = a−1
1 1 (b1 − a1 2x2 − · · · − a1 nxn).
Note that the use of determinants can be avoided to prove that if A is invertible then
ai i = 0 for i = 1, . . . , n. Indeed, it can be shown directly (by induction) that an upper (or
lower) triangular matrix is invertible iff all its diagonal entries are nonzero.
If A is lower-triangular, we solve the system from top-down by forward-substitution.
Thus, what we need is a method for transforming a matrix to an equivalent one in upper-
triangular form. This can be done by elimination. Let us illustrate this method on the
following example:
2x
+
y
+
z
=
5
4x
− 6y
= −2
−2x + 7y + 2z = 9.
We can eliminate the variable x from the second and the third equation as follows: Subtract
twice the first equation from the second and add the first equation to the third. We get the
6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION
155
new system
2x +
y
+
z
=
5
− 8y − 2z = −12
8y + 3z =
14.
This time, we can eliminate the variable y from the third equation by adding the second
equation to the third:
2x +
y
+
z
=
5
− 8y − 2z = −12
z
=
2.
This last system is upper-triangular. Using back-substitution, we find the solution: z = 2,
y = 1, x = 1.
Observe that we have performed only row operations. The general method is to iteratively
eliminate variables using simple row operations (namely, adding or subtracting a multiple of
a row to another row of the matrix) while simultaneously applying these operations to the
vector b, to obtain a system, M Ax = M b, where M A is upper-triangular. Such a method is
called Gaussian elimination. However, one extra twist is needed for the method to work in
all cases: It may be necessary to permute rows, as illustrated by the following example:
x
+
y
+
z
= 1
x
+
y
+ 3z
= 1
2x + 5y + 8z = 1.
In order to eliminate x from the second and third row, we subtract the first row from the
second and we subtract twice the first row from the third:
x +
y
+
z
= 1
2z
= 0
3y + 6z = −1.
Now, the trouble is that y does not occur in the second row; so, we can’t eliminate y from
the third row by adding or subtracting a multiple of the second row to it. The remedy is
simple: Permute the second and the third row! We get the system:
x +
y
+
z
= 1
3y + 6z = −1
2z
= 0,
which is already in triangular form. Another example where some permutations are needed
is:
z
=
1
−2x + 7y + 2z =
1
4x
− 6y
= −1.
156
CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM
First, we permute the first and the second row, obtaining
−2x + 7y + 2z =
1
z
=
1
4x
− 6y
= −1,
and then, we add twice the first row to the third, obtaining:
−2x + 7y + 2z = 1
z
= 1
8y + 4z = 1.
Again, we permute the second and the third row, getting
−2x + 7y + 2z = 1
8y + 4z = 1
z
= 1,
an upper-triangular system. Of course, in this example, z is already solved and we could
have eliminated it first, but for the general method, we need to proceed in a systematic
fashion.
We now describe the method of Gaussian Elimination applied to a linear system, Ax = b,
where A is assumed to be invertible. We use the variable k to keep track of the stages of
elimination. Initially, k = 1.
(1) The first step is to pick some nonzero entry, ai 1, in the first column of A. Such an
entry must exist, since A is invertible (otherwise, the first column of A would be the
zero vector, and the columns of A would not be linearly independent. Equivalently, we
would have det(A) = 0). The actual choice of such an element has some impact on the
numerical stability of the method, but this will be examined later. For the time being,
we assume that some arbitrary choice is made. This chosen element is called the pivot
of the elimination step and is denoted π1 (so, in this first step, π1 = ai 1).
(2) Next, we permute the row (i) corresponding to the pivot with the first row. Such a
step is called pivoting. So, after this permutation, the first element of the first row is
nonzero.
(3) We now eliminate the variable x1 from all rows except the first by adding suitable
multiples of the first row to these rows. More precisely we add −ai 1/π1 times the first
row to the ith row, for i = 2, . . . , n. At the end of this step, all entries in the first
column are zero except the first.
(4) Increment k by 1. If k = n, stop. Otherwise, k < n, and then iteratively repeat steps
(1), (2), (3) on the (n − k + 1) × (n − k + 1) subsystem obtained by deleting the first
k − 1 rows and k − 1 columns from the current system.
6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION
157
If we let A1 = A and Ak = (akij) be the matrix obtained after k − 1 elimination steps
(2 ≤ k ≤ n), then the kth elimination step is applied to the matrix Ak of the form
ak
1 1
ak12 · · · · · · · · · ak1n
ak
2 2
· · · · · · · · · ak2n
. .
..
..
.
.
.
A
k =
.
ak
k k
· · · akk n
..
..
.
.
akn k · · · akn n
Actually, note
akij = aii j
for all i, j with 1 ≤ i ≤ k − 2 and i ≤ j ≤ n, since the first k − 1 rows remain unchanged
after the (k − 1)th step.
We will prove later that det(Ak) = ± det(A). Consequently, Ak is invertible. The fact
that Ak is invertible iff A is invertible can also be shown without determinants from the fact
that there is some invertible matrix Mk such that Ak = MkA, as we will see shortly.
Since Ak is invertible, some entry ak with k
i k
≤ i ≤ n is nonzero. Otherwise, the last
n − k + 1 entries in the first k columns of Ak would be zero, and the first k columns of
A
k−1
k would yield k vectors in R
. But then, the first k columns of Ak would be linearly
dependent and Ak would not be invertible, a contradiction.
So, one the entries ak with k
i k
≤ i ≤ n can be chosen as pivot, and we permute the kth
row with the ith row, obtaining the matrix αk = (αk ). The new pivot is π
, and we
j l
k = αk
k k
zero the entries i = k + 1, . . . , n in column k by adding −αk /π
i k
k times row k to row i. At
the end of this step, we have Ak+1. Observe that the first k − 1 rows of Ak are identical to
the first k − 1 rows of Ak+1.
It is easy to figure out what kind of matrices perform the elementary row operations
used during Gaussian elimination. The key point is that if A = P B, where A, B are m × n
matrices and P is a square matrix of dimension m, if (as usual) we denote the rows of A and
B by A1, . . . , Am and B1, . . . , Bm, then the formula
m
aij =
pikbkj
k=1
giving the (i, j)th entry in A shows that the ith row of A is a linear combination of the rows
of B:
Ai = pi1B1 + · · · + pimBm.
Therefore, multiplication of a matrix on the left by a square matrix performs row opera-
tions. Similarly, multiplication of a matrix on the right by a square matrix performs column
operations
158
CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM
The permutation of the kth row with the ith row is achieved by multiplying A on the left
by the transposition matrix P (i, k), which is the matrix obtained from the identity matrix
by permuting rows i and k, i.e.,
1
1
0
1
1
P (i, k) =
.
. .
.
1
1
0
1
1
Observe that det(P (i, k)) = −1. Furthermore, P (i, k) is symmetric (P (i, k) = P (i, k)), and
P (i, k)−1 = P (i, k).
During the permutation step (2), if row k and row i need to be permuted, the matrix A
is multiplied on the left by the matrix Pk such that Pk = P (i, k), else we set Pk = I.
Adding β times row j to row i is achieved by multiplying A on the left by the elementary
matrix ,
Ei,j;β = I + βei j,
where
1 if k = i and l = j
(ei j)k l =
0 if k = i or l = j,
i.e.,
1
1
1
1
1
1
β
1
1
E
.
.
.
.
i,j;β =
.
or Ei,j;β =
.
.
1
1
β
1
1
1
1
1
1
On the left, i > j, and on the right, i < j. Observe that the inverse of Ei,j;β = I + βei j is
Ei,j;−β = I − βei j and that det(Ei,j;β) = 1. Therefore, during step 3 (the elimination step),
the matrix A is multiplied on the left by a product, Ek, of matrices of the form Ei,k;β , with
i,k
i > k.
6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION
159
Consequently, we see that
Ak+1 = EkPkAk,
and then
Ak = Ek−1Pk−1 · · · E1P1A.
This justifies the claim made earlier, that Ak = MkA for some invertible matrix Mk; we can
pick
Mk = Ek−1Pk−1 · · · E1P1,
a product of invertible matrices.
The fact that det(P (i, k)) = −1 and that det(Ei,j;β) = 1 implies immediately the fact
claimed above: We always have
det(Ak) = ± det(A).
Furthermore, since
Ak = Ek−1Pk−1 · · · E1P1A
and since Gaussian elimination stops for k = n, the matrix
An = En−1Pn−1 · · · E2P2E1P1A
is upper-triangular. Also note that if we let M = En−1Pn−1 · · · E2P2E1P1, then det(M) = ±1,
and
det(A) = ± det(An).
The matrices P (i, k) and Ei,j;β are called elementary matrices. We can summarize the
above discussion in the following theorem:
Theorem 6.1. (Gaussian Elimination) Let A be an n × n matrix (invertible or not). Then
<