Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

/ Home / Mathematics (Academic) / Basics of Algebra, Topology, and Differential Calculus

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 6

Gaussian Elimination,

LU -Factorization, Cholesky

Factorization, Reduced Row Echelon

Form

6.1

Motivating Example: Curve Interpolation

Curve interpolation is a problem that arises frequently in computer graphics and in robotics

(path planning). There are many ways of tackling this problem and in this section we will

describe a solution using cubic splines. Such splines consist of cubic Bézier curves. They

are often used because they are cheap to implement and give more flexibility than quadratic

Bézier curves.

A cubic Bézier curve C(t) (in

or R ) is specified by a list of four control points

(b0, b2, b2, b3) and is given parametrically by the equation

C(t) = (1 − t)3 b0 + 3(1 − t)2t b1 + 3(1 − t)t2 b2 + t3 b3.

Clearly, C(0) = b0, C(1) = b3, and for t ∈ [0, 1], the point C(t) belongs to the convex hull of

the control points b0, b1, b2, b3. The polynomials

(1 − t)3, 3(1 − t)2t, 3(1 − t)t2, t3

are the Bernstein polynomials of degree 3.

Typically, we are only interested in the curve segment corresponding to the values of t in

the interval [0, 1]. Still, the placement of the control points drastically affects the shape of the

curve segment, which can even have a self-intersection; See Figures 6.1, 6.2, 6.3 illustrating

various configuations.

149

150

CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM

Figure 6.1: A “standard” Bézier curve

Figure 6.2: A Bézier curve with an inflexion point

6.1. MOTIVATING EXAMPLE: CURVE INTERPOLATION

151

Figure 6.3: A self-intersecting Bézier curve

Interpolation problems require finding curves passing through some given data points and

possibly satisfying some extra constraints.

A Bézier spline curve F is a curve which is made up of curve segments which are Bézier

curves, say C1, . . . , Cm (m ≥ 2). We will assume that F defined on [0, m], so that for

i = 1, . . . , m,

F (t) = Ci(t − i + 1), i − 1 ≤ t ≤ i.

Typically, some smoothness is required between any two junction points, that is, between

any two points Ci(1) and Ci+1(0), for i = 1, . . . , m − 1. We require that Ci(1) = Ci+1(0)

(C0-continuity), and typically that the derivatives of Ci at 1 and of Ci+1 at 0 agree up to

second order derivatives. This is called C2-continuity, and it ensures that the tangents agree

as well as the curvatures.

There are a number of interpolation problems, and we consider one of the most common

problems which can be stated as follows:

Problem: Given N + 1 data points x0, . . . , xN , find a C2 cubic spline curve F , such that

F (i) = xi, for all i, 0 ≤ i ≤ N (N ≥ 2).

A way to solve this problem is to find N + 3 auxiliary points d−1, . . . , dN+1 called de Boor

control points from which N Bézier curves can be found. Actually,

d−1 = x0 and dN+1 = xN

152

CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM

so we only need to find N + 1 points d0, . . . , dN .

It turns out that the C2-continuity constraints on the N Bézier curves yield only N − 1

equations, so d0 and dN can be chosen arbitrarily. In practice, d0 and dN are chosen according

to various end conditions, such as prescribed velocities at x0 and xN . For the time being, we

will assume that d0 and dN are given.

Figure 6.4 illustrates an interpolation problem involving N + 1 = 7 + 1 = 8 data points.

The control points d0 and d7 were chosen arbitrarily.

x0 = d−1

x7 = d8

Figure 6.4: A C2 cubic interpolation spline curve passing through the points x0, x1, x2, x3,

x4, x5, x6, x7

It can be shown that d1, . . . , dN−1 are given by the linear system

 7

 

d 





1 − 32 0

1

0  d2 



6x2





 

. 







. . ... ...

 

..  = 

 .



 







0

1 d









 

N −2



N −2



N −1

N −1 − 32 N

It can be shown that the above matrix is invertible because it is strictly diagonally

dominant.

6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION

153

Once the above system is solved, the Bézier cubics C1, . . ., CN are determined as follows

(we assume N ≥ 2): For 2 ≤ i ≤ N − 1, the control points (bi0, bi1, bi2, bi3) of Ci are given by

bi0 = xi−1

bi1 = d

3 i−1 + 3 i

bi2 = d

3 i−1 + 3 i

bi3 = xi.

The control points (b10, b11, b12, b13) of C1 are given by

b10 = x0

b11 = d0

b12 = d

2 0 + 2 1

b13 = x1,

and the control points (bN

0 , bN

1 , bN

2 , bN

3 ) of CN are given by

0 = xN −1

1 =

2 N−1 + 2 N

2 = dN

3 = xN .

We will now describe various methods for solving linear systems. Since the matrix of the

above system is tridiagonal, there are specialized methods which are more efficient than the

general methods. We will discuss a few of these methods.

6.2

Gaussian Elimination and LU -Factorization

Let A be an n × n matrix, let b ∈ n

be an n-dimensional vector and assume that A is

invertible. Our goal is to solve the system Ax = b. Since A is assumed to be invertible,

we know that this system has a unique solution, x = A−1b. Experience shows that two

counter-intuitive facts are revealed:

(1) One should avoid computing the inverse, A−1, of A explicitly. This is because this

would amount to solving the n linear systems, Au(j) = ej, for j = 1, . . . , n, where

j = (0, . . . , 1, . . . , 0) is the jth canonical basis vector of R

(with a 1 is the jth slot).

By doing so, we would replace the resolution of a single system by the resolution of n

systems, and we would still have to multiply A−1 by b.

154

CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM

(2) One does not solve (large) linear systems by computing determinants (using Cramer’s

formulae). This is because this method requires a number of additions (resp. multipli-

cations) proportional to (n + 1)! (resp. (n + 2)!).

The key idea on which most direct methods (as opposed to iterative methods, that look

for an approximation of the solution) are based is that if A is an upper-triangular matrix,

which means that aij = 0 for 1 ≤ j < i ≤ n (resp. lower-triangular, which means that

aij = 0 for 1 ≤ i < j ≤ n), then computing the solution, x, is trivial. Indeed, say A is an

upper-triangular matrix

a



1 1

a1 2 · · · a1 n−2

a1 n−1

a1 n



2 2

· · · a2 n−2

a2 n−1

a2 n 



. .

.. 





A = 

 .



. .

.. 











· · ·

an−1 n−1 an−1 n

· · ·

an n

Then, det(A) = a1 1a2 2 · · · an n = 0, which implies that ai i = 0 for i = 1, . . . , n, and we can

solve the system Ax = b from bottom-up by back-substitution. That is, first we compute

xn from the last equation, next plug this value of xn into the next to the last equation and

compute xn−1 from it, etc. This yields

xn = a−1

n nbn

xn−1 = a−1

n−1 n−1(bn−1 − an−1 nxn)

...

x1 = a−1

1 1 (b1 − a1 2x2 − · · · − a1 nxn).

Note that the use of determinants can be avoided to prove that if A is invertible then

ai i = 0 for i = 1, . . . , n. Indeed, it can be shown directly (by induction) that an upper (or

lower) triangular matrix is invertible iff all its diagonal entries are nonzero.

If A is lower-triangular, we solve the system from top-down by forward-substitution.

Thus, what we need is a method for transforming a matrix to an equivalent one in upper-

triangular form. This can be done by elimination. Let us illustrate this method on the

following example:

− 6y

= −2

−2x + 7y + 2z = 9.

We can eliminate the variable x from the second and the third equation as follows: Subtract

twice the first equation from the second and add the first equation to the third. We get the

6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION

155

new system

2x +

− 8y − 2z = −12

8y + 3z =

14.

This time, we can eliminate the variable y from the third equation by adding the second

equation to the third:

2x +

− 8y − 2z = −12

This last system is upper-triangular. Using back-substitution, we find the solution: z = 2,

y = 1, x = 1.

Observe that we have performed only row operations. The general method is to iteratively

eliminate variables using simple row operations (namely, adding or subtracting a multiple of

a row to another row of the matrix) while simultaneously applying these operations to the

vector b, to obtain a system, M Ax = M b, where M A is upper-triangular. Such a method is

called Gaussian elimination. However, one extra twist is needed for the method to work in

all cases: It may be necessary to permute rows, as illustrated by the following example:

= 1

+ 3z

= 1

2x + 5y + 8z = 1.

In order to eliminate x from the second and third row, we subtract the first row from the

second and we subtract twice the first row from the third:

x +

= 1

= 0

3y + 6z = −1.

Now, the trouble is that y does not occur in the second row; so, we can’t eliminate y from

the third row by adding or subtracting a multiple of the second row to it. The remedy is

simple: Permute the second and the third row! We get the system:

x +

= 1

3y + 6z = −1

= 0,

which is already in triangular form. Another example where some permutations are needed

is:

−2x + 7y + 2z =

− 6y

= −1.

156

CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM

First, we permute the first and the second row, obtaining

−2x + 7y + 2z =

− 6y

= −1,

and then, we add twice the first row to the third, obtaining:

−2x + 7y + 2z = 1

= 1

8y + 4z = 1.

Again, we permute the second and the third row, getting

−2x + 7y + 2z = 1

8y + 4z = 1

= 1,

an upper-triangular system. Of course, in this example, z is already solved and we could

have eliminated it first, but for the general method, we need to proceed in a systematic

fashion.

We now describe the method of Gaussian Elimination applied to a linear system, Ax = b,

where A is assumed to be invertible. We use the variable k to keep track of the stages of

elimination. Initially, k = 1.

(1) The first step is to pick some nonzero entry, ai 1, in the first column of A. Such an

entry must exist, since A is invertible (otherwise, the first column of A would be the

zero vector, and the columns of A would not be linearly independent. Equivalently, we

would have det(A) = 0). The actual choice of such an element has some impact on the

numerical stability of the method, but this will be examined later. For the time being,

we assume that some arbitrary choice is made. This chosen element is called the pivot

of the elimination step and is denoted π1 (so, in this first step, π1 = ai 1).

(2) Next, we permute the row (i) corresponding to the pivot with the first row. Such a

step is called pivoting. So, after this permutation, the first element of the first row is

nonzero.

(3) We now eliminate the variable x1 from all rows except the first by adding suitable

multiples of the first row to these rows. More precisely we add −ai 1/π1 times the first

row to the ith row, for i = 2, . . . , n. At the end of this step, all entries in the first

column are zero except the first.

(4) Increment k by 1. If k = n, stop. Otherwise, k < n, and then iteratively repeat steps

(1), (2), (3) on the (n − k + 1) × (n − k + 1) subsystem obtained by deleting the first

k − 1 rows and k − 1 columns from the current system.

6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION

157

If we let A1 = A and Ak = (akij) be the matrix obtained after k − 1 elimination steps

(2 ≤ k ≤ n), then the kth elimination step is applied to the matrix Ak of the form

ak



1 1

ak12 · · · · · · · · · ak1n



2 2

· · · · · · · · · ak2n



. .

.. 



. 





k =







k k

· · · akk n



.. 



. 

akn k · · · akn n

Actually, note

akij = aii j

for all i, j with 1 ≤ i ≤ k − 2 and i ≤ j ≤ n, since the first k − 1 rows remain unchanged

after the (k − 1)th step.

We will prove later that det(Ak) = ± det(A). Consequently, Ak is invertible. The fact

that Ak is invertible iff A is invertible can also be shown without determinants from the fact

that there is some invertible matrix Mk such that Ak = MkA, as we will see shortly.

Since Ak is invertible, some entry ak with k

i k

≤ i ≤ n is nonzero. Otherwise, the last

n − k + 1 entries in the first k columns of Ak would be zero, and the first k columns of

k−1

k would yield k vectors in R

. But then, the first k columns of Ak would be linearly

dependent and Ak would not be invertible, a contradiction.

So, one the entries ak with k

i k

≤ i ≤ n can be chosen as pivot, and we permute the kth

row with the ith row, obtaining the matrix αk = (αk ). The new pivot is π

, and we

j l

k = αk

k k

zero the entries i = k + 1, . . . , n in column k by adding −αk /π

i k

k times row k to row i. At

the end of this step, we have Ak+1. Observe that the first k − 1 rows of Ak are identical to

the first k − 1 rows of Ak+1.

It is easy to figure out what kind of matrices perform the elementary row operations

used during Gaussian elimination. The key point is that if A = P B, where A, B are m × n

matrices and P is a square matrix of dimension m, if (as usual) we denote the rows of A and

B by A1, . . . , Am and B1, . . . , Bm, then the formula

aij =

pikbkj

k=1

giving the (i, j)th entry in A shows that the ith row of A is a linear combination of the rows

of B:

Ai = pi1B1 + · · · + pimBm.

Therefore, multiplication of a matrix on the left by a square matrix performs row opera-

tions. Similarly, multiplication of a matrix on the right by a square matrix performs column

operations

158

CHAPTER 6. GAUSSIAN ELIMINATION, LU, CHOLESKY, ECHELON FORM

The permutation of the kth row with the ith row is achieved by multiplying A on the left

by the transposition matrix P (i, k), which is the matrix obtained from the identity matrix

by permuting rows i and k, i.e.,

1



























P (i, k) =



. .

 .





























Observe that det(P (i, k)) = −1. Furthermore, P (i, k) is symmetric (P (i, k) = P (i, k)), and

P (i, k)−1 = P (i, k).

During the permutation step (2), if row k and row i need to be permuted, the matrix A

is multiplied on the left by the matrix Pk such that Pk = P (i, k), else we set Pk = I.

Adding β times row j to row i is achieved by multiplying A on the left by the elementary

matrix ,

Ei,j;β = I + βei j,

where

1 if k = i and l = j

(ei j)k l =

0 if k = i or l = j,

i.e.,

1



1



























































i,j;β =

or Ei,j;β =

























































On the left, i > j, and on the right, i < j. Observe that the inverse of Ei,j;β = I + βei j is

Ei,j;−β = I − βei j and that det(Ei,j;β) = 1. Therefore, during step 3 (the elimination step),

the matrix A is multiplied on the left by a product, Ek, of matrices of the form Ei,k;β , with

i,k

i > k.

6.2. GAUSSIAN ELIMINATION AND LU -FACTORIZATION

159

Consequently, we see that

Ak+1 = EkPkAk,

and then

Ak = Ek−1Pk−1 · · · E1P1A.

This justifies the claim made earlier, that Ak = MkA for some invertible matrix Mk; we can

pick

Mk = Ek−1Pk−1 · · · E1P1,

a product of invertible matrices.

The fact that det(P (i, k)) = −1 and that det(Ei,j;β) = 1 implies immediately the fact

claimed above: We always have

det(Ak) = ± det(A).

Furthermore, since

Ak = Ek−1Pk−1 · · · E1P1A

and since Gaussian elimination stops for k = n, the matrix

An = En−1Pn−1 · · · E2P2E1P1A

is upper-triangular. Also note that if we let M = En−1Pn−1 · · · E2P2E1P1, then det(M) = ±1,

and

det(A) = ± det(An).

The matrices P (i, k) and Ei,j;β are called elementary matrices. We can summarize the

above discussion in the following theorem:

Theorem 6.1. (Gaussian Elimination) Let A be an n × n matrix (invertible or not). Then