12.1
Eigenvectors and Eigenvalues of a Linear Map
Given a finite-dimensional vector space E, let f : E → E be any linear map. If, by luck,
there is a basis (e1, . . . , en) of E with respect to which f is represented by a diagonal matrix
λ
1
0
. . .
0
. .
..
0
λ
.
.
D =
2
,
..
. .
.
. . . .
0
0
. . .
0
λn
then the action of f on E is very simple; in every “direction” ei, we have
f (ei) = λiei.
We can think of f as a transformation that stretches or shrinks space along the direction
e1, . . . , en (at least if E is a real vector space). In terms of matrices, the above property
translates into the fact that there is an invertible matrix P and a diagonal matrix D such
that a matrix A can be factored as
A = P DP −1.
When this happens, we say that f (or A) is diagonalizable, the λis are called the eigenvalues
of f , and the eis are eigenvectors of f . For example, we will see that every symmetric matrix
can be diagonalized. Unfortunately, not every matrix can be diagonalized. For example, the
matrix
1 1
A1 = 0 1
can’t be diagonalized. Sometimes, a matrix fails to be diagonalizable because its eigenvalues
do not belong to the field of coefficients, such as
0 −1
A2 =
,
1
0
317
318
CHAPTER 12. EIGENVECTORS AND EIGENVALUES
whose eigenvalues are ±i. This is not a serious problem because A2 can be diagonalized over
the complex numbers. However, A1 is a “fatal” case! Indeed, its eigenvalues are both 1 and
the problem is that A1 does not have enough eigenvectors to span E.
The next best thing is that there is a basis with respect to which f is represented by an
upper triangular matrix. In this case we say that f can be triangularized . As we will see in
Section 12.2, if all the eigenvalues of f belong to the field of coefficients K, then f can be
triangularized. In particular, this is the case if K = C.
Now, an alternative to triangularization is to consider the representation of f with respect
to two bases (e1, . . . , en) and (f1, . . . , fn), rather than a single basis. In this case, if K = R
or K = C, it turns out that we can even pick these bases to be orthonormal, and we get a
diagonal matrix Σ with nonnegative entries, such that
f (ei) = σifi,
1 ≤ i ≤ n.
The nonzero σis are the singular values of f , and the corresponding representation is the
singular value decomposition, or SVD . The SVD plays a very important role in applications,
and will be considered in detail later.
In this section, we focus on the possibility of diagonalizing a linear map, and we introduce
the relevant concepts to do so. Given a vector space E over a field K, let I denote the identity
map on E.
Definition 12.1. Given any vector space E and any linear map f : E → E, a scalar λ ∈ K
is called an eigenvalue, or proper value, or characteristic value of f if there is some nonzero
vector u ∈ E such that
f (u) = λu.
Equivalently, λ is an eigenvalue of f if Ker (λI − f) is nontrivial (i.e., Ker (λI − f) = {0}).
A vector u ∈ E is called an eigenvector, or proper vector, or characteristic vector of f if
u = 0 and if there is some λ ∈ K such that
f (u) = λu;
the scalar λ is then an eigenvalue, and we say that u is an eigenvector associated with
λ. Given any eigenvalue λ ∈ K, the nontrivial subspace Ker (λI − f) consists of all the
eigenvectors associated with λ together with the zero vector; this subspace is denoted by
Eλ(f ), or E(λ, f), or even by Eλ, and is called the eigenspace associated with λ, or proper
subspace associated with λ.
Note that distinct eigenvectors may correspond to the same eigenvalue, but distinct
eigenvalues correspond to disjoint sets of eigenvectors.
Remark: As we emphasized in the remark following Definition 7.4, we require an eigenvector
to be nonzero. This requirement seems to have more benefits than inconvenients, even though
12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP
319
it may considered somewhat inelegant because the set of all eigenvectors associated with an
eigenvalue is not a subspace since the zero vector is excluded.
Let us now assume that E is of finite dimension n. The next proposition shows that the
eigenvalues of a linear map f : E → E are the roots of a polynomial associated with f.
Proposition 12.1. Let E be any vector space of finite dimension n and let f be any linear
map f : E → E. The eigenvalues of f are the roots (in K) of the polynomial
det(λI − f).
Proof. A scalar λ ∈ K is an eigenvalue of f iff there is some nonzero vector u = 0 in E such
that
f (u) = λu
iff
(λI − f)(u) = 0
iff (λI − f) is not invertible iff, by Proposition 5.14,
det(λI − f) = 0.
In view of the importance of the polynomial det(λI −f), we have the following definition.
Definition 12.2. Given any vector space E of dimension n, for any linear map f : E → E,
the polynomial Pf (X) = χf (X) = det(XI − f) is called the characteristic polynomial of
f . For any square matrix A, the polynomial PA(X) = χA(X) = det(XI − A) is called the
characteristic polynomial of A.
Note that we already encountered the characteristic polynomial in Section 5.7; see Defi-
nition 5.8.
Given any basis (e1, . . . , en), if A = M(f ) is the matrix of f w.r.t. (e1, . . . , en), we can
compute the characteristic polynomial χf (X) = det(XI −f) of f by expanding the following
determinant:
X − a1 1
−a1 2
. . .
−a1 n
−a2 1
X − a2 2 . . .
−a2 n
det(XI − A) =
..
.
.
.
.
.
..
. .
..
−an 1
−an 2
. . . X − an n
If we expand this determinant, we find that
χA(X) = det(XI − A) = Xn − (a1 1 + · · · + an n)Xn−1 + · · · + (−1)n det(A).
The sum tr(A) = a1 1 + · · · + an n of the diagonal elements of A is called the trace of A. Since
we proved in Section 5.7 that the characteristic polynomial only depends on the linear map
f , the above shows that tr(A) has the same value for all matrices A representing f . Thus,
320
CHAPTER 12. EIGENVECTORS AND EIGENVALUES
the trace of a linear map is well-defined; we have tr(f ) = tr(A) for any matrix A representing
f .
Remark: The characteristic polynomial of a linear map is sometimes defined as det(f −XI).
Since
det(f − XI) = (−1)n det(XI − f),
this makes essentially no difference but the version det(XI − f) has the small advantage
that the coefficient of Xn is +1.
If we write
χA(X) = det(XI − A) = Xn − τ1(A)Xn−1 + · · · + (−1)kτk(A)Xn−k + · · · + (−1)nτn(A),
then we just proved that
τ1(A) = tr(A) and τn(A) = det(A).
It is also possible to express τk(A) in terms of determinants of certain submatrices of A. For
any nonempty subset, I ⊆ {1, . . . , n}, say I = {i1, . . . , ik}, let AI,I be the k × k submatrix
of A whose jth column consists of the elements ai
, where h = 1, . . . , k. Then, it can be
h ij
shown that
τk(A) =
det(AI,I).
I⊆{1,...,n}
|I|=k
If all the roots, λ1, . . . , λn, of the polynomial det(XI − A) belong to the field K, then we
can write
χA(X) = det(XI − A) = (X − λ1) · · · (X − λn),
where some of the λis may appear more than once. Consequently,
χA(X) = det(XI − A) = Xn − σ1(λ)Xn−1 + · · · + (−1)kσk(λ)Xn−k + · · · + (−1)nσn(λ),
where
σk(λ) =
λi,
I⊆{1,...,n} i∈I
|I|=k
the kth symmetric function of the λi’s. From this, it clear that
σk(λ) = τk(A)
and, in particular, the product of the eigenvalues of f is equal to det(A) = det(f ), and the
sum of the eigenvalues of f is equal to the trace tr(A) = tr(f ), of f ; for the record,
tr(f ) = λ1 + · · · + λn
det(f ) = λ1 · · · λn,
12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP
321
where λ1, . . . , λn are the eigenvalues of f (and A), where some of the λis may appear more
than once. In particular, f is not invertible iff it admits 0 has an eigenvalue.
Remark: Depending on the field K, the characteristic polynomial χA(X) = det(XI − A)
may or may not have roots in K. This motivates considering algebraically closed fields,
which are fields K such that every polynomial with coefficients in K has all its root in K.
For example, over K = R, not every polynomial has real roots. If we consider the matrix
cos θ − sin θ
A =
,
sin θ
cos θ
then the characteristic polynomial det(XI − A) has no real roots unless θ = kπ. However,
over the field C of complex numbers, every polynomial has roots. For example, the matrix
above has the roots cos θ ± i sin θ = e±iθ.
It is possible to show that every linear map f over a complex vector space E must have
some (complex) eigenvalue without having recourse to determinants (and the characteristic
polynomial). Let n = dim(E), pick any nonzero vector u ∈ E, and consider the sequence
u, f (u), f 2(u), . . . , f n(u).
Since the above sequence has n + 1 vectors and E has dimension n, these vectors must be
linearly dependent, so there are some complex numbers c0, . . . , cm, not all zero, such that
c0f m(u) + c1f m−1(u) + · · · + cmu = 0,
where m ≤ n is the largest integer such that the coefficient of fm(u) is nonzero (m must
exits since we have a nontrivial linear dependency). Now, because the field C is algebraically
closed, the polynomial
c0Xm + c1Xm−1 + · · · + cm
can be written as a product of linear factors as
c0Xm + c1Xm−1 + · · · + cm = c0(X − λ1) · · · (X − λm)
for some complex numbers λ1, . . . , λm ∈ C, not necessarily distinct. But then, since c0 = 0,
c0f m(u) + c1f m−1(u) + · · · + cmu = 0
is equivalent to
(f − λ1I) ◦ · · · ◦ (f − λmI)(u) = 0.
If all the linear maps f −λiI were injective, then (f −λ1I)◦· · ·◦(f −λmI) would be injective,
contradicting the fact that u = 0. Therefore, some linear map f − λiI must have a nontrivial
kernel, which means that there is some v = 0 so that
f (v) = λiv;
322
CHAPTER 12. EIGENVECTORS AND EIGENVALUES
that is, λi is some eigenvalue of f and v is some eigenvector of f .
As nice as the above argument is, it does not provide a method for finding the eigenvalues
of f , and even if we prefer avoiding determinants as a much as possible, we are forced to
deal with the characteristic polynomial det(XI − f).
Definition 12.3. Let A be an n × n matrix over a field, K. Assume that all the roots of
the characteristic polynomial χA(X) = det(XI − A) of A belong to K, which means that
we can write
det(XI − A) = (X − λ1)k1 · · · (X − λm)km,
where λ1, . . . , λm ∈ K are the distinct roots of det(XI − A) and k1 + · · · + km = n. The
integer, ki, is called the algebraic multiplicity of the eigenvalue λi and the dimension of the
eigenspace, Eλ = Ker(λ
i
iI − A), is called the geometric multiplicity of λi. We denote the
algebraic multiplicity of λi by alg(λi) and its geometric multiplicity by geo(λi).
By definition, the sum of the algebraic multiplicities is equal to n but the sum of the
geometric multiplicities can be strictly smaller.
Proposition 12.2. Let A be an n × n matrix over a field K and assume that all the roots of
the characteristic polynomial χA(X) = det(XI−A) of A belong to K. For every eigenvalue λi
of A, the geometric multiplicity of λi is always less than or equal to its algebraic multiplicity,
that is,
geo(λi) ≤ alg(λi).
Proof. To see this, if ni is the dimension of the eigenspace, Eλ , associated with the eigen-
i
value, λi, we can form a basis obtained by picking a basis of Eλ and completing this basis.
i
With respect to this new basis, our matrix is of the form
λ
B
A =
iIni
0
D
and a simple determinant calculation shows that
det(XI − A) = det(XI − A ) = (X − λi)ni det(XIn−n − D).
i
Therefore, (X −λi)ni divides the characteristic polynomial of A , and thus, the characteristic
polynomial of A. It follows that ni is less than or equal to the algebraic multiplicity of λi.
The following proposition shows an interesting property of eigenspaces.
Proposition 12.3. Let E be any vector space of finite dimension n and let f be any linear
map. If u1, . . . , um are eigenvectors associated with pairwise distinct eigenvalues λ1, . . . , λm,
then the family (u1, . . . , um) is linearly independent.
12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP
323
Proof. Assume that (u1, . . . , um) is linearly dependent. Then, there exists µ1, . . . , µk ∈ K
such that
µ1ui + · · · + µ
= 0,
1
kuik
where 1 ≤ k ≤ m, µi = 0 for all i, 1 ≤ i ≤ k, {i1, . . . , ik} ⊆ {1, . . . , m}, and no proper
subfamily of (ui , . . . , u ) is linearly dependent (in other words, we consider a dependency
1
ik
relation with k minimal). Applying f to this dependency relation, we get
µ1λi u + · · · + µ
u = 0,
1
i1
kλik ik
and if we multiply the original dependency relation by λi and subtract it from the above,
1
we get
µ2(λi − λ )u + · · · + µ
− λ )u = 0,
2
i1
i2
k(λik
i1
ik
which is a linear dependency among a proper subfamily of (ui , . . . , u ), a contradiction.
1
ik
Thus, from Proposition 12.3, if λ1, . . . , λm are all the pairwise distinct eigenvalues of f
(where m ≤ n), we have a direct sum
Eλ ⊕ · · · ⊕ E
1
λm
of the eigenspaces Eλ . Unfortunately, it is not always the case that
i
E = Eλ ⊕ · · · ⊕ E .
1
λm
When
E = Eλ ⊕ · · · ⊕ E ,
1
λm
we say that f is diagonalizable (and similarly for any matrix associated with f ). Indeed,
picking a basis in each Eλ , we obtain a matrix which is a diagonal matrix consisting of the
i
eigenvalues, each λi occurring a number of times equal to the dimension of Eλ . This happens
i
if the algebraic multiplicity and the geometric multiplicity of every eigenvalue are equal. In
particular, when the characteristic polynomial has n distinct roots, then f is diagonalizable.
It can also be shown that symmetric matrices have real eigenvalues and can be diagonalized.
For a negative example, we leave as exercise to show that the matrix
1 1
M =
0 1
cannot be diagonalized, even though 1 is an eigenvalue. The problem is that the eigenspace
of 1 only has dimension 1. The matrix
cos θ − sin θ
A =
sin θ
cos θ
cannot be diagonalized either, because it has no real eigenvalues, unless θ = kπ. However,
over the field of complex numbers, it can be diagonalized.
324
CHAPTER 12. EIGENVECTORS AND EIGENVALUES
12.2
Reduction to Upper Triangular Form
Unfortunately, not every linear map on a complex vector space can be diagonalized. The
next best thing is to “triangularize,” which means to find a basis over which the matrix has
zero entries below the main diagonal. Fortunately, such a basis always exist.
We say that a square matrix A is an upper triangular matrix if it has the following shape,
a
1 1
a1 2 a1 3 . . .
a1 n−1
a1 n
0
a2 2 a2 3 . . .
a2 n−1
a2 n
0
0
a3 3 . . .
a3 n−1
a3 n
.
.
.
.
.
.
,
..
..
..
. .
..
..
0
0
0
. . . a
n−1 n−1
an−1 n
0
0
0
. . .
0
an n
i.e., ai j = 0 whenever j < i, 1 ≤ i, j ≤ n.
Theorem 12.4. Given any finite dimensional vector space over a field K, for any linear
map f : E → E, there is a basis (u1, . . . , un) with respect to which f is represented by an
upper triangular matrix (in Mn(K)) iff all the eigenvalues of f belong to K. Equivalently,
for every n × n matrix A ∈ Mn(K), there is an invertible matrix P and an upper triangular
matrix T (both in Mn(K)) such that
A = P T P −1
iff all the eigenvalues of A belong to K.
Proof. If there is a basis (u1, . . . , un) with respect to which f is represented by an upper
triangular matrix T in Mn(K), then since the eigenvalues of f are the diagonal entries of T ,
all the eigenvalues of f belong to K.
For the converse, we proceed by induction on the dimension n of E. For n = 1 the result
is obvious. If n > 1, since by assumption f has all its eigenvalue in K, pick some eigenvalue
λ1 ∈ K of f, and let u1 be some corresponding (nonzero) eigenvector. We can find n − 1
vectors (v2, . . . , vn) such that (u1, v2, . . . , vn) is a basis of E, and let F be the subspace of
dimension n − 1 spanned by (v2, . . . , vn). In the basis (u1, v2 . . . , vn), the matrix of f is of
the form
λ
1
a1 2 . . . a1 n
0
a2 2 . . . a2 n
U = .
.
.
. ,
..
..
. .
..
0
an 2 . . . an n
since its first column contains the coordinates of λ1u1 over the basis (u1, v2, . . . , vn). If we
let p : E → F be the projection defined such that p(u1) = 0 and p(vi) = vi when 2 ≤ i ≤ n,
the linear map g : F → F defined as the restriction of p ◦ f to F is represented by the
12.2. REDUCTION TO UPPER TRIANGULAR FORM
325
(n − 1) × (n − 1) matrix V = (ai j)2≤i,j≤n over the basis (v2, . . . , vn). We need to prove
that all the eigenvalues of g belong to K. However, since the first column of U has a single
nonzero entry, we get
χU (X) = det(XI − U) = (X − λ1) det(XI − V ) = (X − λ1)χV (X),
where χU (X) is the characteristic polynomial of U and χV (X) is the characteristic polynomial
of V . It follows that χV (X) divides χU (X), and since all the roots of χU (X) are in K, all
the roots of χV (X) are also in K. Consequently, we can apply the induction hypothesis, and
there is a basis (u2, . . . , un) of F such that g is represented by an upper triangular matrix
(bi j)1≤i,j≤n−1. However,
E = Ku1 ⊕ F,
and thus (u1, . . . , un) is a basis for E. Since p is the projection from E = Ku1 ⊕ F onto F
and g : F → F is the restriction of p ◦ f to F , we have
f (u1) = λ1u1
and
i
f (ui+1) = a1 iu1 +
bi juj+1
j=1
for some a1 i ∈ K, when 1 ≤ i ≤ n − 1. But then the matrix of f with respect to (u1, . . . , un)
is upper triangular.
For the matrix version, we assume that A is the matrix of f with respect to some basis,
Then, we just proved that there is a change of basis matrix P such that A = P T P −1 where
T is upper triangular.
If A = P T P −1 where T is upper triangular, note that the diagonal entries of T are the
eigenvalues λ1, . . . , λn of A. Indeed, A and T have the same characteristic polynomial. Also,
if A is a real matrix whose eigenvalues are all real, then P can be chosen to real, and if A
is a rational matrix whose eigenvalues are all rational, then P can be chosen rational. Since
any polynomial over C has all its roots in C, Theorem 12.4 implies that every complex n × n
matrix can be triangularized.
If E is a Hermitian space, the proof of Theorem 12.4 can be easily adapted to prove that
there is an orthonormal basis (u1, . . . , un) with respect to which the matrix of f is upper
triangular. This is usually known as Schur’s lemma.
Theorem 12.5. (Schur decomposition) Given any linear map f : E → E over a complex
Hermitian space E, there is an orthonormal basis (u1, . . . , un) with respect to which f is
represented by an upper triangular matrix. Equivalently, for every n × n matrix A ∈ Mn(C),
there is a unitary matrix U and an upper triangular matrix T such that
A = U T U ∗.
326
CHAPTER 12. EIGENVECTORS AND EIGENVALUES
If A is real and if all its eigenvalues are real, then there is an orthogonal matrix Q and a
real upper triangular matrix T such that
A = QT Q .
Proof. During the induction, we choose F to be the orthogonal complement of Cu1 and we
pick orthonormal bases. If E is a real Euclidean space and if the eigenvalues of f are all
real, the proof also goes through with real matrices.
Using, Theorem 12.5, we can derive the fact that if A is a Hermitian matrix, then there
is a unitary matrix U and a real diagonal matrix D such that A = U DU ∗. Indeed, since
A∗ = A, we get
U T U ∗ = U T ∗U ∗,
which implies that T = T ∗. Since T is an upper triangular matrix, T ∗ is a lower triangular
matrix, which implies that T is a real diagonal matrix. In fact, applying this result to a
(real) symmetric matrix A, we obtain the fact that all the eigenvalues of a symmetric matrix
are real, and by applying Theorem 12.5 again, we conclude that A = QDQ , where Q is
orthogonal and D is a real diagonal matrix. We will also prove this in Chapter 13.
When A has complex eigenvalues, there is a version of Theorem 12.5 involving only real
matrices provided that we allow T to be block upper-triangular (the diagonal entries may
be 2 × 2 matrices or real entries).
Theorem 12.5 is not a very practical result but it is a useful theoretical result to cope
with matrices that cannot be diagonalized. For example, it can be used to prove that
every complex matrix is the limit of a sequence of diagonalizable matrices that have distinct
eigenvalues!
12.3
Location of Eigenvalues
If A is an n × n complex (or real) matrix A, it would be useful to know, even roughly, where
the eigenvalues of A are located in the complex plane C. The Gershgorin discs provide some