Polar Form
16.1
Singular Value Decomposition for
Square Matrices
In this section we assume that we are dealing with a real Euclidean space E. Let f : E → E
be any linear map. In general, it may not be possible to diagonalize f . We show that every
linear map can be diagonalized if we are willing to use two orthonormal bases. This is the
celebrated singular value decomposition (SVD). A close cousin of the SVD is the polar form
of a linear map, which shows how a linear map can be decomposed into its purely rotational
component (perhaps with a flip) and its purely stretching part.
The key observation is that f ∗ ◦ f is self-adjoint, since
(f ∗ ◦ f)(u), v = f(u), f(v) = u, (f∗ ◦ f)(v) .
Similarly, f ◦ f∗ is self-adjoint.
The fact that f ∗ ◦ f and f ◦ f∗ are self-adjoint is very important, because it implies that
f ∗ ◦ f and f ◦ f∗ can be diagonalized and that they have real eigenvalues. In fact, these
eigenvalues are all nonnegative. Indeed, if u is an eigenvector of f ∗ ◦ f for the eigenvalue λ,
then
(f ∗ ◦ f)(u), u = f(u), f(u)
and
(f ∗ ◦ f)(u), u = λ u, u ,
and thus
λ u, u = f (u), f (u) ,
which implies that λ ≥ 0, since −, − is positive definite. A similar proof applies to f ◦ f∗.
Thus, the eigenvalues of f ∗ ◦ f are of the form σ21, . . . , σ2r or 0, where σi > 0, and similarly
423
424
CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM
for f ◦ f∗. The situation is even better, since we will show shortly that f∗ ◦ f and f ◦ f∗
have the same eigenvalues.
Remark: Given any two linear maps f : E → F and g : F → E, where dim(E) = n and
dim(F ) = m, it can be shown that
(−λ)m det(g ◦ f − λ In) = (−λ)n det(f ◦ g − λ Im),
and thus g ◦ f and f ◦ g always have the same nonzero eigenvalues!
Definition 16.1. The square roots σi > 0 of the positive eigenvalues of f ∗ ◦ f (and f ◦ f∗)
are called the singular values of f .
Definition 16.2. A self-adjoint linear map f : E → E whose eigenvalues are nonnegative is
called positive semidefinite (or positive), and if f is also invertible, f is said to be positive
definite. In the latter case, every eigenvalue of f is strictly positive.
We just showed that f ∗ ◦ f and f ◦ f∗ are positive semidefinite self-adjoint linear maps.
This fact has the remarkable consequence that every linear map has two important decom-
positions:
1. The polar form.
2. The singular value decomposition (SVD).
The wonderful thing about the singular value decomposition is that there exist two or-
thonormal bases (u1, . . . , un) and (v1, . . . , vn) such that, with respect to these bases, f is
a diagonal matrix consisting of the singular values of f , or 0. Thus, in some sense, f can
always be diagonalized with respect to two orthonormal bases. The SVD is also a useful tool
for solving overdetermined linear systems in the least squares sense and for data analysis, as
we show later on.
First, we show some useful relationships between the kernels and the images of f , f ∗,
f ∗ ◦ f, and f ◦ f∗. Recall that if f : E → F is a linear map, the image Im f of f is the
subspace f (E) of F , and the rank of f is the dimension dim(Im f ) of its image. Also recall
that (Theorem 4.11)
dim (Ker f ) + dim (Im f ) = dim (E),
and that (Propositions 9.9 and 11.10) for every subspace W of E,
dim (W ) + dim (W ⊥) = dim (E).
16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES
425
Proposition 16.1. Given any two Euclidean spaces E and F , where E has dimension n
and F has dimension m, for any linear map f : E → F , we have
Ker f = Ker (f ∗ ◦ f),
Ker f ∗ = Ker (f ◦ f∗),
Ker f = (Im f ∗)⊥,
Ker f ∗ = (Im f )⊥,
dim(Im f ) = dim(Im f ∗),
and f , f ∗, f ∗ ◦ f, and f ◦ f∗ have the same rank.
Proof. To simplify the notation, we will denote the inner products on E and F by the same
symbol −, − (to avoid subscripts). If f(u) = 0, then (f∗ ◦ f)(u) = f∗(f(u)) = f∗(0) = 0,
and so Ker f ⊆ Ker (f∗ ◦ f). By definition of f∗, we have
f (u), f (u) = (f ∗ ◦ f)(u), u
for all u ∈ E. If (f∗ ◦ f)(u) = 0, since −, − is positive definite, we must have f(u) = 0,
and so Ker (f ∗ ◦ f) ⊆ Ker f. Therefore,
Ker f = Ker (f ∗ ◦ f).
The proof that Ker f ∗ = Ker (f ◦ f∗) is similar.
By definition of f ∗, we have
f (u), v = u, f ∗(v)
for all u ∈ E and all v ∈ F .
(∗)
This immediately implies that
Ker f = (Im f ∗)⊥
and Ker f ∗ = (Im f )⊥.
Let us explain why Ker f = (Im f ∗)⊥, the proof of the other equation being similar.
Because the inner product is positive definite, for every u ∈ E, we have
u ∈ Ker f
iff f (u) = 0
iff f (u), v = 0 for all v,
by (∗) iff u, f∗(v) = 0 for all v,
iff u ∈ (Im f∗)⊥.
Since
dim(Im f ) = n − dim(Ker f)
and
dim(Im f ∗) = n − dim((Im f∗)⊥),
426
CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM
from
Ker f = (Im f ∗)⊥
we also have
dim(Ker f ) = dim((Im f ∗)⊥),
from which we obtain
dim(Im f ) = dim(Im f ∗).
Since
dim(Ker (f ∗ ◦ f)) + dim(Im (f∗ ◦ f)) = dim(E),
Ker (f ∗ ◦ f) = Ker f and Ker f = (Im f∗)⊥, we get
dim((Im f ∗)⊥) + dim(Im (f ∗ ◦ f)) = dim(E).
Since
dim((Im f ∗)⊥) + dim(Im f ∗) = dim(E),
we deduce that
dim(Im f ) = dim(Im (f ∗ ◦ f)).
A similar proof shows that
dim(Im f ∗) = dim(Im (f ◦ f∗)).
Consequently, f , f ∗, f ∗ ◦ f, and f ◦ f∗ have the same rank.
We will now prove that every square matrix has an SVD. Stronger results can be obtained
if we first consider the polar form and then derive the SVD from it (there are uniqueness
properties of the polar decomposition). For our purposes, uniqueness results are not as
important so we content ourselves with existence results, whose proofs are simpler. Readers
interested in a more general treatment are referred to [42].
The early history of the singular value decomposition is described in a fascinating paper
by Stewart [97]. The SVD is due to Beltrami and Camille Jordan independently (1873,
1874). Gauss is the grandfather of all this, for his work on least squares (1809, 1823)
(but Legendre also published a paper on least squares!). Then come Sylvester, Schmidt, and
Hermann Weyl. Sylvester’s work was apparently “opaque.” He gave a computational method
to find an SVD. Schmidt’s work really has to do with integral equations and symmetric and
asymmetric kernels (1907). Weyl’s work has to do with perturbation theory (1912). Autonne
came up with the polar decomposition (1902, 1915). Eckart and Young extended SVD to
rectangular matrices (1936, 1939).
16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES
427
Theorem 16.2. (Singular value decomposition) For every real n × n matrix A there are two
orthogonal matrices U and V and a diagonal matrix D such that A = V DU , where D is of
the form
σ
1
. . .
σ2 . . .
D = .
.
.
. ,
..
..
. .
..
. . . σn
where σ1, . . . , σr are the singular values of f , i.e., the (positive) square roots of the nonzero
eigenvalues of A A and A A , and σr+1 = · · · = σn = 0. The columns of U are eigenvectors
of A A, and the columns of V are eigenvectors of A A .
Proof. Since A A is a symmetric matrix, in fact, a positive semidefinite matrix, there exists
an orthogonal matrix U such that
A A = U D2U ,
with D = diag(σ1, . . . , σr, 0, . . . , 0), where σ21, . . . , σ2r are the nonzero eigenvalues of A A,
and where r is the rank of A; that is, σ1, . . . , σr are the singular values of A. It follows that
U A AU = (AU ) AU = D2,
and if we let fj be the jth column of AU for j = 1, . . . , n, then we have
fi, fj = σ2iδij, 1 ≤ i, j ≤ r
and
fj = 0,
r + 1 ≤ j ≤ n.
If we define (v1, . . . , vr) by
vj = σ−1f
j
j ,
1 ≤ j ≤ r,
then we have
vi, vj = δij,
1 ≤ i, j ≤ r,
so complete (v1, . . . , vr) into an orthonormal basis (v1, . . . , vr, vr+1, . . . , vn) (for example,
using Gram–Schmidt). Now, since fj = σjvj for j = 1 . . . , r, we have
vi, fj = σj vi, vj = σjδi,j,
1 ≤ i ≤ n, 1 ≤ j ≤ r
and since fj = 0 for j = r + 1, . . . , n,
vi, fj = 0 1 ≤ i ≤ n, r + 1 ≤ j ≤ n.
If V is the matrix whose columns are v1, . . . , vn, then V is orthogonal and the above equations
prove that
V AU = D,
428
CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM
which yields A = V DU , as required.
The equation A = V DU
implies that
A A = U D2U ,
AA = V D2V ,
which shows that A A and AA
have the same eigenvalues, that the columns of U are
eigenvectors of A A, and that the columns of V are eigenvectors of AA .
Theorem 16.2 suggests the following definition.
Definition 16.3. A triple (U, D, V ) such that A = V D U , where U and V are orthogonal
and D is a diagonal matrix whose entries are nonnegative (it is positive semidefinite) is called
a singular value decomposition (SVD) of A.
The proof of Theorem 16.2 shows that there are two orthonormal bases (u1, . . . , un) and
(v1, . . . , vn), where (u1, . . . , un) are eigenvectors of A A and (v1, . . . , vn) are eigenvectors
of AA . Furthermore, (u1, . . . , ur) is an orthonormal basis of Im A , (ur+1, . . . , un) is an
orthonormal basis of Ker A, (v1, . . . , vr) is an orthonormal basis of Im A, and (vr+1, . . . , vn)
is an orthonormal basis of Ker A .
Using a remark made in Chapter 3, if we denote the columns of U by u1, . . . , un and the
columns of V by v1, . . . , vn, then we can write
A = V D U = σ1v1u1 + · · · + σrvrur .
As a consequence, if r is a lot smaller than n (we write r
n), we see that A can be
reconstructed from U and V using a much smaller number of elements. This idea will be
used to provide “low-rank” approximations of a matrix. The idea is to keep only the k top
singular values for some suitable k
r for which σk+1, . . . σr are very small.
Remarks:
(1) In Strang [101] the matrices U, V, D are denoted by U = Q2, V = Q1, and D = Σ, and
an SVD is written as A = Q1ΣQ2 . This has the advantage that Q1 comes before Q2 in
A = Q1ΣQ2 . This has the disadvantage that A maps the columns of Q2 (eigenvectors
of A A) to multiples of the columns of Q1 (eigenvectors of A A ).
(2) Algorithms for actually computing the SVD of a matrix are presented in Golub and
Van Loan [47], Demmel [25], and Trefethen and Bau [106], where the SVD and its
applications are also discussed quite extensively.
(3) The SVD also applies to complex matrices. In this case, for every complex n×n matrix
A, there are two unitary matrices U and V and a diagonal matrix D such that
A = V D U ∗,
where D is a diagonal matrix consisting of real entries σ1, . . . , σn, where σ1, . . . , σr are
the singular values of A, i.e., the positive square roots of the nonzero eigenvalues of
A∗A and A A∗, and σr+1 = . . . = σn = 0.
16.1. SINGULAR VALUE DECOMPOSITION FOR SQUARE MATRICES
429
A notion closely related to the SVD is the polar form of a matrix.
Definition 16.4. A pair (R, S) such that A = RS with R orthogonal and S symmetric
positive semidefinite is called a polar decomposition of A.
Theorem 16.2 implies that for every real n × n matrix A, there is some orthogonal matrix
R and some positive semidefinite symmetric matrix S such that
A = RS.
This is easy to show and we will prove it below. Furthermore, R, S are unique if A is
invertible, but this is harder to prove.
For example, the matrix
1
1
1
1
1
1
1
−1 −1
A =
2
1
−1
1
−1
1 −1 −1
1
is both orthogonal and symmetric, and A = RS with R = A and S = I, which implies that
some of the eigenvalues of A are negative.
Remark: In the complex case, the polar decomposition states that for every complex n × n
matrix A, there is some unitary matrix U and some positive semidefinite Hermitian matrix
H such that
A = U H.
It is easy to go from the polar form to the SVD, and conversely.
Given an SVD decomposition A = V D U , let R = V U
and S = U D U . It is clear
that R is orthogonal and that S is positive semidefinite symmetric, and
RS = V U U D U = V D U = A.
Going the other way, given a polar decomposition A = R1S, where R1 is orthogonal
and S is positive semidefinite symmetric, there is an orthogonal matrix R2 and a positive
semidefinite diagonal matrix D such that S = R2D R2 , and thus
A = R1R2D R2 = V D U ,
where V = R1R2 and U = R2 are orthogonal.
430
CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM
The eigenvalues and the singular values of a matrix are typically not related in any
obvious way. For example, the n × n matrix
1 2
0
0
. . . 0 0
0
1
2
0
. . . 0 0
0
0
1
2
. . . 0 0
A =
.
. .
.
.
..
..
. . ... ... .. ..
0 0 . . .
0
1
2 0
0 0 . . .
0
0
1 2
0 0 . . .
0
0
0 1
has the eigenvalue 1 with multiplicity n, but its singular values, σ1 ≥ · · · ≥ σn, which are
the positive square roots of the eigenvalues of the matrix B = A A with
1 2
0
0
. . . 0 0
2
5
2
0
. . . 0 0
0
2
5
2
. . . 0 0
B =
.
. .
.
.
..
..
. . ... ... .. ..
0 0 . . .
2
5
2 0
0 0 . . .
0
2
5 2
0 0 . . .
0
0
2 5
have a wide spread, since
σ1 = cond
σ
2(A) ≥ 2n−1.
n
If A is a complex n × n matrix, the eigenvalues λ1, . . . , λn and the singular values
σ1 ≥ σ2 ≥ · · · ≥ σn of A are not unrelated, since
σ21 · · · σ2n = det(A∗A) = | det(A)|2
and
|λ1| · · · |λn| = | det(A)|,
so we have
|λ1| · · · |λn| = σ1 · · · σn.
More generally, Hermann Weyl proved the following remarkable theorem:
Theorem 16.3. (Weyl’s inequalities, 1949 ) For any complex n×n matrix, A, if λ1, . . . , λn ∈
C are the eigenvalues of A and σ1, . . . , σn ∈ R+ are the singular values of A, listed so that
|λ1| ≥ · · · ≥ |λn| and σ1 ≥ · · · ≥ σn ≥ 0, then
|λ1| · · · |λn| = σ1 · · · σn and
|λ1| · · · |λk| ≤ σ1 · · · σk, for k = 1, . . . , n − 1.
16.2. SINGULAR VALUE DECOMPOSITION FOR RECTANGULAR MATRICES
431
A proof of Theorem 16.3 can be found in Horn and Johnson [56], Chapter 3, Section
3.3, where more inequalities relating the eigenvalues and the singular values of a matrix are
given.
Theorem 16.2 can be easily extended to rectangular m × n matrices, as we show in the
next section (for various versions of the SVD for rectangular matrices, see Strang [101] Golub
and Van Loan [47], Demmel [25], and Trefethen and Bau [106]).
16.2
Singular Value Decomposition for
Rectangular Matrices
Here is the generalization of Theorem 16.2 to rectangular matrices.
Theorem 16.4. (Singular value decomposition) For every real m × n matrix A, there are
two orthogonal matrices U (n × n) and V (m × m) and a diagonal m × n matrix D such that
A = V D U , where D is of the form
σ
1
. . .
σ2 . . .
.
.
.
.
..
..
. .
..
σ1
. . .
0 . . . 0
σ2 . . .
0 . . . 0
D =
. . . σ
n or D = .
.
.
.
.
,
.
..
..
. .
..
0
..
0
0
.. . . . 0
. . . σ
.
.
.. ..
..
m
0 . . . 0
.
.
.
.
.
0
.. . . . 0
where σ1, . . . , σr are the singular values of f , i.e. the (positive) square roots of the nonzero
eigenvalues of A A and A A , and σr+1 = . . . = σp = 0, where p = min(m, n). The columns
of U are eigenvectors of A A, and the columns of V are eigenvectors of A A .
Proof. As in the proof of Theorem 16.2, since A A is symmetric positive semidefinite, there
exists an n × n orthogonal matrix U such that
A A = U Σ2U ,
with Σ = diag(σ1, . . . , σr, 0, . . . , 0), where σ21, . . . , σ2r are the nonzero eigenvalues of A A,
and where r is the rank of A. Observe that r ≤ min{m, n}, and AU is an m × n matrix. It
follows that
U A AU = (AU ) AU = Σ2,
and if we let f
m
j ∈ R
be the jth column of AU for j = 1, . . . , n, then we have
fi, fj = σ2iδij, 1 ≤ i, j ≤ r
432
CHAPTER 16. SINGULAR VALUE DECOMPOSITION AND POLAR FORM
and
fj = 0,
r + 1 ≤ j ≤ n.
If we define (v1, . . . , vr) by
vj = σ−1f
j
j ,
1 ≤ j ≤ r,
then we have
vi, vj = δij,
1 ≤ i, j ≤ r,
so complete (v1, . . . , vr) into an orthonormal basis (v1, . . . , vr, vr+1, . . . , vm) (for example,
using Gram–Schmidt).
Now, since fj = σjvj for j = 1 . . . , r, we have
vi, fj = σj vi, vj = σjδi,j,
1 ≤ i ≤ m, 1 ≤ j ≤ r
and since fj = 0 for j = r + 1, . . . , n, we have
vi, fj = 0 1 ≤ i ≤ m, r + 1 ≤ j ≤ n.
If V is the matrix whose columns are v1, . . . , vm, then V is an m × m orthogonal matrix and
if m ≥ n, we let
σ
1
. . .
σ2 . . .
.
.
.
.
..
..
. .
..
Σ
D =
=
. . . σ
n ,
0
m−n
.
0
.. . . . 0
..
.. ..
..
.
.
.
.
.
0
.. . . . 0
else if n ≥ m, then we let
σ
1
. . .
0 . . . 0
σ2 . . .
0 . . . 0
D = .
.
.
.
.
.
..
..
. .
.. 0 .. 0
. . . σm 0 . . . 0
In either case, the above equations prove that
V AU = D,
which yields A = V DU , as required.
The equation A = V DU
implies that
A A = U D DU = U diag(σ21, . . . , σ2r, 0, . . . , 0)U
n−r
16.2. SINGULAR VALUE DECOMPOSITION FOR RECTANGULAR MATRICES
433
and
AA = V DD V
= V diag(σ21, . . . , σ2r, 0, . . . , 0)V ,
m−r
which shows that A A and AA have the same nonzero eigenvalues, that the columns of U
are eigenvectors of A A, and that the columns of V are eigenvectors of AA .
A triple (U, D, V ) such that A = V D U
is called a singular value decomposition (SVD)
of A.
Even though the matrix D is an m × n rectangular matrix, since its only nonzero entries
are on the descending diagonal, we still say that D is a diagonal matrix.
If we view A as the representation of a linear map f : E → F , where dim(E) = n and
dim(F ) = m, the proof of Theorem 16.4 shows that there are two orthonormal bases (u1, . . .,
un) and (v1, . . . , vm) for E and F , respectively, where (u1, . . . , un) are eigenvectors of f ∗ ◦ f
and (v1, . . . , vm) are eigenvectors of f ◦ f∗. Furthermore, (u1, . . . , ur) is an orthonormal basis
of Im f ∗, (ur+1, . . . , un) is an orthonormal basis of