Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 16

Singular Value Decomposition and

Polar Form

16.1

Singular Value Decomposition for

Square Matrices

In this section we assume that we are dealing with a real Euclidean space E. Let f : E → E

be any linear map. In general, it may not be possible to diagonalize f . We show that every

linear map can be diagonalized if we are willing to use two orthonormal bases. This is the

celebrated singular value decomposition (SVD). A close cousin of the SVD is the polar form

of a linear map, which shows how a linear map can be decomposed into its purely rotational

component (perhaps with a flip) and its purely stretching part.

The key observation is that f ∗ ◦ f is self-adjoint, since

(f ∗ ◦ f)(u), v = f(u), f(v) = u, (f∗ ◦ f)(v) .

Similarly, f ◦ f∗ is self-adjoint.

The fact that f ∗ ◦ f and f ◦ f∗ are self-adjoint is very important, because it implies that

f ∗ ◦ f and f ◦ f∗ can be diagonalized and that they have real eigenvalues. In fact, these

eigenvalues are all nonnegative. Indeed, if u is an eigenvector of f ∗ ◦ f for the eigenvalue λ,

then

(f ∗ ◦ f)(u), u = f(u), f(u)

and

(f ∗ ◦ f)(u), u = λ u, u ,

and thus

λ u, u = f (u), f (u) ,

which implies that λ ≥ 0, since −, − is positive definite. A similar proof applies to f ◦ f∗.

Thus, the eigenvalues of f ∗ ◦ f are of the form σ21, . . . , σ2r or 0, where σi > 0, and similarly

423

424

CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

for f ◦ f∗. The situation is even better, since we will show shortly that f∗ ◦ f and f ◦ f∗

have the same eigenvalues.

Remark: Given any two linear maps f : E → F and g : F → E, where dim(E) = n and

dim(F ) = m, it can be shown that

(−λ)m det(g ◦ f − λ In) = (−λ)n det(f ◦ g − λ Im),

and thus g ◦ f and f ◦ g always have the same nonzero eigenvalues!

Definition 16.1. The square roots σi > 0 of the positive eigenvalues of f ∗ ◦ f (and f ◦ f∗)

are called the singular values of f .

Definition 16.2. A self-adjoint linear map f : E → E whose eigenvalues are nonnegative is

called positive semidefinite (or positive), and if f is also invertible, f is said to be positive

definite. In the latter case, every eigenvalue of f is strictly positive.

We just showed that f ∗ ◦ f and f ◦ f∗ are positive semidefinite self-adjoint linear maps.

This fact has the remarkable consequence that every linear map has two important decom-

positions:

1. The polar form.

2. The singular value decomposition (SVD).

The wonderful thing about the singular value decomposition is that there exist two or-

thonormal bases (u1, . . . , un) and (v1, . . . , vn) such that, with respect to these bases, f is

a diagonal matrix consisting of the singular values of f , or 0. Thus, in some sense, f can

always be diagonalized with respect to two orthonormal bases. The SVD is also a useful tool

for solving overdetermined linear systems in the least squares sense and for data analysis, as

we show later on.

First, we show some useful relationships between the kernels and the images of f , f ∗,

f ∗ ◦ f, and f ◦ f∗. Recall that if f : E → F is a linear map, the image Im f of f is the

subspace f (E) of F , and the rank of f is the dimension dim(Im f ) of its image. Also recall

that (Theorem 4.11)

dim (Ker f ) + dim (Im f ) = dim (E),

and that (Propositions 9.9 and 11.10) for every subspace W of E,

dim (W ) + dim (W ⊥) = dim (E).

16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES

425

Proposition 16.1. Given any two Euclidean spaces E and F , where E has dimension n

and F has dimension m, for any linear map f : E → F , we have

Ker f = Ker (f ∗ ◦ f),

Ker f ∗ = Ker (f ◦ f∗),

Ker f = (Im f ∗)⊥,

Ker f ∗ = (Im f )⊥,

dim(Im f ) = dim(Im f ∗),

and f , f ∗, f ∗ ◦ f, and f ◦ f∗ have the same rank.

Proof. To simplify the notation, we will denote the inner products on E and F by the same

symbol −, − (to avoid subscripts). If f(u) = 0, then (f∗ ◦ f)(u) = f∗(f(u)) = f∗(0) = 0,

and so Ker f ⊆ Ker (f∗ ◦ f). By definition of f∗, we have

f (u), f (u) = (f ∗ ◦ f)(u), u

for all u ∈ E. If (f∗ ◦ f)(u) = 0, since −, − is positive definite, we must have f(u) = 0,

and so Ker (f ∗ ◦ f) ⊆ Ker f. Therefore,

Ker f = Ker (f ∗ ◦ f).

The proof that Ker f ∗ = Ker (f ◦ f∗) is similar.

By definition of f ∗, we have

f (u), v = u, f ∗(v)

for all u ∈ E and all v ∈ F .

(∗)

This immediately implies that

Ker f = (Im f ∗)⊥

and Ker f ∗ = (Im f )⊥.

Let us explain why Ker f = (Im f ∗)⊥, the proof of the other equation being similar.

Because the inner product is positive definite, for every u ∈ E, we have

u ∈ Ker f

iff f (u) = 0

iff f (u), v = 0 for all v,

by (∗) iff u, f∗(v) = 0 for all v,

iff u ∈ (Im f∗)⊥.

Since

dim(Im f ) = n − dim(Ker f)

and

dim(Im f ∗) = n − dim((Im f∗)⊥),

426

CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

from

Ker f = (Im f ∗)⊥

we also have

dim(Ker f ) = dim((Im f ∗)⊥),

from which we obtain

dim(Im f ) = dim(Im f ∗).

Since

dim(Ker (f ∗ ◦ f)) + dim(Im (f∗ ◦ f)) = dim(E),

Ker (f ∗ ◦ f) = Ker f and Ker f = (Im f∗)⊥, we get

dim((Im f ∗)⊥) + dim(Im (f ∗ ◦ f)) = dim(E).

Since

dim((Im f ∗)⊥) + dim(Im f ∗) = dim(E),

we deduce that

dim(Im f ) = dim(Im (f ∗ ◦ f)).

A similar proof shows that

dim(Im f ∗) = dim(Im (f ◦ f∗)).

Consequently, f , f ∗, f ∗ ◦ f, and f ◦ f∗ have the same rank.

We will now prove that every square matrix has an SVD. Stronger results can be obtained

if we first consider the polar form and then derive the SVD from it (there are uniqueness

properties of the polar decomposition). For our purposes, uniqueness results are not as

important so we content ourselves with existence results, whose proofs are simpler. Readers

interested in a more general treatment are referred to [42].

The early history of the singular value decomposition is described in a fascinating paper

by Stewart [97]. The SVD is due to Beltrami and Camille Jordan independently (1873,

1874). Gauss is the grandfather of all this, for his work on least squares (1809, 1823)

(but Legendre also published a paper on least squares!). Then come Sylvester, Schmidt, and

Hermann Weyl. Sylvester’s work was apparently “opaque.” He gave a computational method

to find an SVD. Schmidt’s work really has to do with integral equations and symmetric and

asymmetric kernels (1907). Weyl’s work has to do with perturbation theory (1912). Autonne

came up with the polar decomposition (1902, 1915). Eckart and Young extended SVD to

rectangular matrices (1936, 1939).

16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES

427

Theorem 16.2. (Singular value decomposition) For every real n × n matrix A there are two

orthogonal matrices U and V and a diagonal matrix D such that A = V DU , where D is of

the form

σ

1

. . .

σ2 . . .

D =  .

.

.

.  ,

 ..

..

. .

.. 

. . . σn

where σ1, . . . , σr are the singular values of f , i.e., the (positive) square roots of the nonzero

eigenvalues of A A and A A , and σr+1 = · · · = σn = 0. The columns of U are eigenvectors

of A A, and the columns of V are eigenvectors of A A .

Proof. Since A A is a symmetric matrix, in fact, a positive semidefinite matrix, there exists

an orthogonal matrix U such that

A A = U D2U ,

with D = diag(σ1, . . . , σr, 0, . . . , 0), where σ21, . . . , σ2r are the nonzero eigenvalues of A A,

and where r is the rank of A; that is, σ1, . . . , σr are the singular values of A. It follows that

U A AU = (AU ) AU = D2,

and if we let fj be the jth column of AU for j = 1, . . . , n, then we have

fi, fj = σ2iδij, 1 ≤ i, j ≤ r

and

fj = 0,

r + 1 ≤ j ≤ n.

If we define (v1, . . . , vr) by

vj = σ−1f

j

j ,

1 ≤ j ≤ r,

then we have

vi, vj = δij,

1 ≤ i, j ≤ r,

so complete (v1, . . . , vr) into an orthonormal basis (v1, . . . , vr, vr+1, . . . , vn) (for example,

using Gram–Schmidt). Now, since fj = σjvj for j = 1 . . . , r, we have

vi, fj = σj vi, vj = σjδi,j,

1 ≤ i ≤ n, 1 ≤ j ≤ r

and since fj = 0 for j = r + 1, . . . , n,

vi, fj = 0 1 ≤ i ≤ n, r + 1 ≤ j ≤ n.

If V is the matrix whose columns are v1, . . . , vn, then V is orthogonal and the above equations

prove that

V AU = D,

428

CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

which yields A = V DU , as required.

The equation A = V DU

implies that

A A = U D2U ,

AA = V D2V ,

which shows that A A and AA

have the same eigenvalues, that the columns of U are

eigenvectors of A A, and that the columns of V are eigenvectors of AA .

Theorem 16.2 suggests the following definition.

Definition 16.3. A triple (U, D, V ) such that A = V D U , where U and V are orthogonal

and D is a diagonal matrix whose entries are nonnegative (it is positive semidefinite) is called

a singular value decomposition (SVD) of A.

The proof of Theorem 16.2 shows that there are two orthonormal bases (u1, . . . , un) and

(v1, . . . , vn), where (u1, . . . , un) are eigenvectors of A A and (v1, . . . , vn) are eigenvectors

of AA . Furthermore, (u1, . . . , ur) is an orthonormal basis of Im A , (ur+1, . . . , un) is an

orthonormal basis of Ker A, (v1, . . . , vr) is an orthonormal basis of Im A, and (vr+1, . . . , vn)

is an orthonormal basis of Ker A .

Using a remark made in Chapter 3, if we denote the columns of U by u1, . . . , un and the

columns of V by v1, . . . , vn, then we can write

A = V D U = σ1v1u1 + · · · + σrvrur .

As a consequence, if r is a lot smaller than n (we write r

n), we see that A can be

reconstructed from U and V using a much smaller number of elements. This idea will be

used to provide “low-rank” approximations of a matrix. The idea is to keep only the k top

singular values for some suitable k

r for which σk+1, . . . σr are very small.

Remarks:

(1) In Strang [101] the matrices U, V, D are denoted by U = Q2, V = Q1, and D = Σ, and

an SVD is written as A = Q1ΣQ2 . This has the advantage that Q1 comes before Q2 in

A = Q1ΣQ2 . This has the disadvantage that A maps the columns of Q2 (eigenvectors

of A A) to multiples of the columns of Q1 (eigenvectors of A A ).

(2) Algorithms for actually computing the SVD of a matrix are presented in Golub and

Van Loan [47], Demmel [25], and Trefethen and Bau [106], where the SVD and its

applications are also discussed quite extensively.

(3) The SVD also applies to complex matrices. In this case, for every complex n×n matrix

A, there are two unitary matrices U and V and a diagonal matrix D such that

A = V D U ∗,

where D is a diagonal matrix consisting of real entries σ1, . . . , σn, where σ1, . . . , σr are

the singular values of A, i.e., the positive square roots of the nonzero eigenvalues of

A∗A and A A∗, and σr+1 = . . . = σn = 0.

16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES

429

A notion closely related to the SVD is the polar form of a matrix.

Definition 16.4. A pair (R, S) such that A = RS with R orthogonal and S symmetric

positive semidefinite is called a polar decomposition of A.

Theorem 16.2 implies that for every real n × n matrix A, there is some orthogonal matrix

R and some positive semidefinite symmetric matrix S such that

A = RS.

This is easy to show and we will prove it below. Furthermore, R, S are unique if A is

invertible, but this is harder to prove.

For example, the matrix

1

1

1

1 

1

1

1

−1 −1

A =

2

1

−1

1

−1

1 −1 −1

1

is both orthogonal and symmetric, and A = RS with R = A and S = I, which implies that

some of the eigenvalues of A are negative.

Remark: In the complex case, the polar decomposition states that for every complex n × n

matrix A, there is some unitary matrix U and some positive semidefinite Hermitian matrix

H such that

A = U H.

It is easy to go from the polar form to the SVD, and conversely.

Given an SVD decomposition A = V D U , let R = V U

and S = U D U . It is clear

that R is orthogonal and that S is positive semidefinite symmetric, and

RS = V U U D U = V D U = A.

Going the other way, given a polar decomposition A = R1S, where R1 is orthogonal

and S is positive semidefinite symmetric, there is an orthogonal matrix R2 and a positive

semidefinite diagonal matrix D such that S = R2D R2 , and thus

A = R1R2D R2 = V D U ,

where V = R1R2 and U = R2 are orthogonal.

430

CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

The eigenvalues and the singular values of a matrix are typically not related in any

obvious way. For example, the n × n matrix

1 2

0

0

. . . 0 0

0

1

2

0

. . . 0 0

0

0

1

2

. . . 0 0

A =

.

. .

.

.

 ..

..

. . ... ... .. ..

0 0 . . .

0

1

2 0

0 0 . . .

0

0

1 2

0 0 . . .

0

0

0 1

has the eigenvalue 1 with multiplicity n, but its singular values, σ1 ≥ · · · ≥ σn, which are

the positive square roots of the eigenvalues of the matrix B = A A with

1 2

0

0

. . . 0 0

2

5

2

0

. . . 0 0

0

2

5

2

. . . 0 0

B =

.

. .

.

.

 ..

..

. . ... ... .. ..

0 0 . . .

2

5

2 0

0 0 . . .

0

2

5 2

0 0 . . .

0

0

2 5

have a wide spread, since

σ1 = cond

σ

2(A) ≥ 2n−1.

n

If A is a complex n × n matrix, the eigenvalues λ1, . . . , λn and the singular values

σ1 ≥ σ2 ≥ · · · ≥ σn of A are not unrelated, since

σ21 · · · σ2n = det(A∗A) = | det(A)|2

and

|λ1| · · · |λn| = | det(A)|,

so we have

|λ1| · · · |λn| = σ1 · · · σn.

More generally, Hermann Weyl proved the following remarkable theorem:

Theorem 16.3. (Weyl’s inequalities, 1949 ) For any complex n×n matrix, A, if λ1, . . . , λn ∈

C are the eigenvalues of A and σ1, . . . , σn ∈ R+ are the singular values of A, listed so that

|λ1| ≥ · · · ≥ |λn| and σ1 ≥ · · · ≥ σn ≥ 0, then

|λ1| · · · |λn| = σ1 · · · σn and

|λ1| · · · |λk| ≤ σ1 · · · σk, for k = 1, . . . , n − 1.

16.2. SINGULAR VALUE DECOMPOSITION FOR RECTANGULAR MATRICES

431

A proof of Theorem 16.3 can be found in Horn and Johnson [56], Chapter 3, Section

3.3, where more inequalities relating the eigenvalues and the singular values of a matrix are

given.

Theorem 16.2 can be easily extended to rectangular m × n matrices, as we show in the

next section (for various versions of the SVD for rectangular matrices, see Strang [101] Golub

and Van Loan [47], Demmel [25], and Trefethen and Bau [106]).

16.2

Singular Value Decomposition for

Rectangular Matrices

Here is the generalization of Theorem 16.2 to rectangular matrices.

Theorem 16.4. (Singular value decomposition) For every real m × n matrix A, there are

two orthogonal matrices U (n × n) and V (m × m) and a diagonal m × n matrix D such that

A = V D U , where D is of the form

σ

1

. . .

σ2 . . .

 .

.

.

. 

 ..

..

. .

.. 

σ1

. . .

0 . . . 0

σ2 . . .

0 . . . 0

D =

. . . σ

n or D =  .

.

.

.

.

 ,

.

 ..

..

. .

..

0

..

0

 0

.. . . . 0 

. . . σ

 .

.

.. ..

.. 

m

0 . . . 0

 .

.

.

. 

.

0

.. . . . 0

where σ1, . . . , σr are the singular values of f , i.e. the (positive) square roots of the nonzero

eigenvalues of A A and A A , and σr+1 = . . . = σp = 0, where p = min(m, n). The columns

of U are eigenvectors of A A, and the columns of V are eigenvectors of A A .

Proof. As in the proof of Theorem 16.2, since A A is symmetric positive semidefinite, there

exists an n × n orthogonal matrix U such that

A A = U Σ2U ,

with Σ = diag(σ1, . . . , σr, 0, . . . , 0), where σ21, . . . , σ2r are the nonzero eigenvalues of A A,

and where r is the rank of A. Observe that r ≤ min{m, n}, and AU is an m × n matrix. It

follows that

U A AU = (AU ) AU = Σ2,

and if we let f

m

j ∈ R

be the jth column of AU for j = 1, . . . , n, then we have

fi, fj = σ2iδij, 1 ≤ i, j ≤ r

432

CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM

and

fj = 0,

r + 1 ≤ j ≤ n.

If we define (v1, . . . , vr) by

vj = σ−1f

j

j ,

1 ≤ j ≤ r,

then we have

vi, vj = δij,

1 ≤ i, j ≤ r,

so complete (v1, . . . , vr) into an orthonormal basis (v1, . . . , vr, vr+1, . . . , vm) (for example,

using Gram–Schmidt).

Now, since fj = σjvj for j = 1 . . . , r, we have

vi, fj = σj vi, vj = σjδi,j,

1 ≤ i ≤ m, 1 ≤ j ≤ r

and since fj = 0 for j = r + 1, . . . , n, we have

vi, fj = 0 1 ≤ i ≤ m, r + 1 ≤ j ≤ n.

If V is the matrix whose columns are v1, . . . , vm, then V is an m × m orthogonal matrix and

if m ≥ n, we let

σ

1

. . .

σ2 . . .

 .

.

.

. 

 ..

..

. .

.. 

Σ

D =

=

. . . σ

n ,

0

m−n

.

 0

.. . . . 0 

 ..

.. ..

.. 

 .

.

.

. 

.

0

.. . . . 0

else if n ≥ m, then we let

σ

1

. . .

0 . . . 0

σ2 . . .

0 . . . 0

D =  .

.

.

.

.

 .

 ..

..

. .

.. 0 .. 0

. . . σm 0 . . . 0

In either case, the above equations prove that

V AU = D,

which yields A = V DU , as required.

The equation A = V DU

implies that

A A = U D DU = U diag(σ21, . . . , σ2r, 0, . . . , 0)U

n−r

16.2. SINGULAR VALUE DECOMPOSITION FOR RECTANGULAR MATRICES

433

and

AA = V DD V

= V diag(σ21, . . . , σ2r, 0, . . . , 0)V ,

m−r

which shows that A A and AA have the same nonzero eigenvalues, that the columns of U

are eigenvectors of A A, and that the columns of V are eigenvectors of AA .

A triple (U, D, V ) such that A = V D U

is called a singular value decomposition (SVD)

of A.

Even though the matrix D is an m × n rectangular matrix, since its only nonzero entries

are on the descending diagonal, we still say that D is a diagonal matrix.

If we view A as the representation of a linear map f : E → F , where dim(E) = n and

dim(F ) = m, the proof of Theorem 16.4 shows that there are two orthonormal bases (u1, . . .,

un) and (v1, . . . , vm) for E and F , respectively, where (u1, . . . , un) are eigenvectors of f ∗ ◦ f

and (v1, . . . , vm) are eigenvectors of f ◦ f∗. Furthermore, (u1, . . . , ur) is an orthonormal basis

of Im f ∗, (ur+1, . . . , un) is an orthonormal basis of