Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 12

Eigenvectors and Eigenvalues

12.1

Eigenvectors and Eigenvalues of a Linear Map

Given a finite-dimensional vector space E, let f : E → E be any linear map. If, by luck,

there is a basis (e1, . . . , en) of E with respect to which f is represented by a diagonal matrix

λ

1

0

. . .

0

. .

..

 0

λ

.

. 

D =

2

 ,

 ..

. .

 .

. . . .

0 

0

. . .

0

λn

then the action of f on E is very simple; in every “direction” ei, we have

f (ei) = λiei.

We can think of f as a transformation that stretches or shrinks space along the direction

e1, . . . , en (at least if E is a real vector space). In terms of matrices, the above property

translates into the fact that there is an invertible matrix P and a diagonal matrix D such

that a matrix A can be factored as

A = P DP −1.

When this happens, we say that f (or A) is diagonalizable, the λis are called the eigenvalues

of f , and the eis are eigenvectors of f . For example, we will see that every symmetric matrix

can be diagonalized. Unfortunately, not every matrix can be diagonalized. For example, the

matrix

1 1

A1 = 0 1

can’t be diagonalized. Sometimes, a matrix fails to be diagonalizable because its eigenvalues

do not belong to the field of coefficients, such as

0 −1

A2 =

,

1

0

317

318

CHAPTER 12. EIGENVECTORS AND EIGENVALUES

whose eigenvalues are ±i. This is not a serious problem because A2 can be diagonalized over

the complex numbers. However, A1 is a “fatal” case! Indeed, its eigenvalues are both 1 and

the problem is that A1 does not have enough eigenvectors to span E.

The next best thing is that there is a basis with respect to which f is represented by an

upper triangular matrix. In this case we say that f can be triangularized . As we will see in

Section 12.2, if all the eigenvalues of f belong to the field of coefficients K, then f can be

triangularized. In particular, this is the case if K = C.

Now, an alternative to triangularization is to consider the representation of f with respect

to two bases (e1, . . . , en) and (f1, . . . , fn), rather than a single basis. In this case, if K = R

or K = C, it turns out that we can even pick these bases to be orthonormal, and we get a

diagonal matrix Σ with nonnegative entries, such that

f (ei) = σifi,

1 ≤ i ≤ n.

The nonzero σis are the singular values of f , and the corresponding representation is the

singular value decomposition, or SVD . The SVD plays a very important role in applications,

and will be considered in detail later.

In this section, we focus on the possibility of diagonalizing a linear map, and we introduce

the relevant concepts to do so. Given a vector space E over a field K, let I denote the identity

map on E.

Definition 12.1. Given any vector space E and any linear map f : E → E, a scalar λ ∈ K

is called an eigenvalue, or proper value, or characteristic value of f if there is some nonzero

vector u ∈ E such that

f (u) = λu.

Equivalently, λ is an eigenvalue of f if Ker (λI − f) is nontrivial (i.e., Ker (λI − f) = {0}).

A vector u ∈ E is called an eigenvector, or proper vector, or characteristic vector of f if

u = 0 and if there is some λ ∈ K such that

f (u) = λu;

the scalar λ is then an eigenvalue, and we say that u is an eigenvector associated with

λ. Given any eigenvalue λ ∈ K, the nontrivial subspace Ker (λI − f) consists of all the

eigenvectors associated with λ together with the zero vector; this subspace is denoted by

Eλ(f ), or E(λ, f), or even by Eλ, and is called the eigenspace associated with λ, or proper

subspace associated with λ.

Note that distinct eigenvectors may correspond to the same eigenvalue, but distinct

eigenvalues correspond to disjoint sets of eigenvectors.

Remark: As we emphasized in the remark following Definition 7.4, we require an eigenvector

to be nonzero. This requirement seems to have more benefits than inconvenients, even though

12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP

319

it may considered somewhat inelegant because the set of all eigenvectors associated with an

eigenvalue is not a subspace since the zero vector is excluded.

Let us now assume that E is of finite dimension n. The next proposition shows that the

eigenvalues of a linear map f : E → E are the roots of a polynomial associated with f.

Proposition 12.1. Let E be any vector space of finite dimension n and let f be any linear

map f : E → E. The eigenvalues of f are the roots (in K) of the polynomial

det(λI − f).

Proof. A scalar λ ∈ K is an eigenvalue of f iff there is some nonzero vector u = 0 in E such

that

f (u) = λu

iff

(λI − f)(u) = 0

iff (λI − f) is not invertible iff, by Proposition 5.14,

det(λI − f) = 0.

In view of the importance of the polynomial det(λI −f), we have the following definition.

Definition 12.2. Given any vector space E of dimension n, for any linear map f : E → E,

the polynomial Pf (X) = χf (X) = det(XI − f) is called the characteristic polynomial of

f . For any square matrix A, the polynomial PA(X) = χA(X) = det(XI − A) is called the

characteristic polynomial of A.

Note that we already encountered the characteristic polynomial in Section 5.7; see Defi-

nition 5.8.

Given any basis (e1, . . . , en), if A = M(f ) is the matrix of f w.r.t. (e1, . . . , en), we can

compute the characteristic polynomial χf (X) = det(XI −f) of f by expanding the following

determinant:

X − a1 1

−a1 2

. . .

−a1 n

−a2 1

X − a2 2 . . .

−a2 n

det(XI − A) =

..

.

.

.

.

.

..

. .

..

−an 1

−an 2

. . . X − an n

If we expand this determinant, we find that

χA(X) = det(XI − A) = Xn − (a1 1 + · · · + an n)Xn−1 + · · · + (−1)n det(A).

The sum tr(A) = a1 1 + · · · + an n of the diagonal elements of A is called the trace of A. Since

we proved in Section 5.7 that the characteristic polynomial only depends on the linear map

f , the above shows that tr(A) has the same value for all matrices A representing f . Thus,

320

CHAPTER 12. EIGENVECTORS AND EIGENVALUES

the trace of a linear map is well-defined; we have tr(f ) = tr(A) for any matrix A representing

f .

Remark: The characteristic polynomial of a linear map is sometimes defined as det(f −XI).

Since

det(f − XI) = (−1)n det(XI − f),

this makes essentially no difference but the version det(XI − f) has the small advantage

that the coefficient of Xn is +1.

If we write

χA(X) = det(XI − A) = Xn − τ1(A)Xn−1 + · · · + (−1)kτk(A)Xn−k + · · · + (−1)nτn(A),

then we just proved that

τ1(A) = tr(A) and τn(A) = det(A).

It is also possible to express τk(A) in terms of determinants of certain submatrices of A. For

any nonempty subset, I ⊆ {1, . . . , n}, say I = {i1, . . . , ik}, let AI,I be the k × k submatrix

of A whose jth column consists of the elements ai

, where h = 1, . . . , k. Then, it can be

h ij

shown that

τk(A) =

det(AI,I).

I⊆{1,...,n}

|I|=k

If all the roots, λ1, . . . , λn, of the polynomial det(XI − A) belong to the field K, then we

can write

χA(X) = det(XI − A) = (X − λ1) · · · (X − λn),

where some of the λis may appear more than once. Consequently,

χA(X) = det(XI − A) = Xn − σ1(λ)Xn−1 + · · · + (−1)kσk(λ)Xn−k + · · · + (−1)nσn(λ),

where

σk(λ) =

λi,

I⊆{1,...,n} i∈I

|I|=k

the kth symmetric function of the λi’s. From this, it clear that

σk(λ) = τk(A)

and, in particular, the product of the eigenvalues of f is equal to det(A) = det(f ), and the

sum of the eigenvalues of f is equal to the trace tr(A) = tr(f ), of f ; for the record,

tr(f ) = λ1 + · · · + λn

det(f ) = λ1 · · · λn,

12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP

321

where λ1, . . . , λn are the eigenvalues of f (and A), where some of the λis may appear more

than once. In particular, f is not invertible iff it admits 0 has an eigenvalue.

Remark: Depending on the field K, the characteristic polynomial χA(X) = det(XI − A)

may or may not have roots in K. This motivates considering algebraically closed fields,

which are fields K such that every polynomial with coefficients in K has all its root in K.

For example, over K = R, not every polynomial has real roots. If we consider the matrix

cos θ − sin θ

A =

,

sin θ

cos θ

then the characteristic polynomial det(XI − A) has no real roots unless θ = kπ. However,

over the field C of complex numbers, every polynomial has roots. For example, the matrix

above has the roots cos θ ± i sin θ = e±iθ.

It is possible to show that every linear map f over a complex vector space E must have

some (complex) eigenvalue without having recourse to determinants (and the characteristic

polynomial). Let n = dim(E), pick any nonzero vector u ∈ E, and consider the sequence

u, f (u), f 2(u), . . . , f n(u).

Since the above sequence has n + 1 vectors and E has dimension n, these vectors must be

linearly dependent, so there are some complex numbers c0, . . . , cm, not all zero, such that

c0f m(u) + c1f m−1(u) + · · · + cmu = 0,

where m ≤ n is the largest integer such that the coefficient of fm(u) is nonzero (m must

exits since we have a nontrivial linear dependency). Now, because the field C is algebraically

closed, the polynomial

c0Xm + c1Xm−1 + · · · + cm

can be written as a product of linear factors as

c0Xm + c1Xm−1 + · · · + cm = c0(X − λ1) · · · (X − λm)

for some complex numbers λ1, . . . , λm ∈ C, not necessarily distinct. But then, since c0 = 0,

c0f m(u) + c1f m−1(u) + · · · + cmu = 0

is equivalent to

(f − λ1I) ◦ · · · ◦ (f − λmI)(u) = 0.

If all the linear maps f −λiI were injective, then (f −λ1I)◦· · ·◦(f −λmI) would be injective,

contradicting the fact that u = 0. Therefore, some linear map f − λiI must have a nontrivial

kernel, which means that there is some v = 0 so that

f (v) = λiv;

322

CHAPTER 12. EIGENVECTORS AND EIGENVALUES

that is, λi is some eigenvalue of f and v is some eigenvector of f .

As nice as the above argument is, it does not provide a method for finding the eigenvalues

of f , and even if we prefer avoiding determinants as a much as possible, we are forced to

deal with the characteristic polynomial det(XI − f).

Definition 12.3. Let A be an n × n matrix over a field, K. Assume that all the roots of

the characteristic polynomial χA(X) = det(XI − A) of A belong to K, which means that

we can write

det(XI − A) = (X − λ1)k1 · · · (X − λm)km,

where λ1, . . . , λm ∈ K are the distinct roots of det(XI − A) and k1 + · · · + km = n. The

integer, ki, is called the algebraic multiplicity of the eigenvalue λi and the dimension of the

eigenspace, Eλ = Ker(λ

i

iI − A), is called the geometric multiplicity of λi. We denote the

algebraic multiplicity of λi by alg(λi) and its geometric multiplicity by geo(λi).

By definition, the sum of the algebraic multiplicities is equal to n but the sum of the

geometric multiplicities can be strictly smaller.

Proposition 12.2. Let A be an n × n matrix over a field K and assume that all the roots of

the characteristic polynomial χA(X) = det(XI−A) of A belong to K. For every eigenvalue λi

of A, the geometric multiplicity of λi is always less than or equal to its algebraic multiplicity,

that is,

geo(λi) ≤ alg(λi).

Proof. To see this, if ni is the dimension of the eigenspace, Eλ , associated with the eigen-

i

value, λi, we can form a basis obtained by picking a basis of Eλ and completing this basis.

i

With respect to this new basis, our matrix is of the form

λ

B

A =

iIni

0

D

and a simple determinant calculation shows that

det(XI − A) = det(XI − A ) = (X − λi)ni det(XIn−n − D).

i

Therefore, (X −λi)ni divides the characteristic polynomial of A , and thus, the characteristic

polynomial of A. It follows that ni is less than or equal to the algebraic multiplicity of λi.

The following proposition shows an interesting property of eigenspaces.

Proposition 12.3. Let E be any vector space of finite dimension n and let f be any linear

map. If u1, . . . , um are eigenvectors associated with pairwise distinct eigenvalues λ1, . . . , λm,

then the family (u1, . . . , um) is linearly independent.

12.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP

323

Proof. Assume that (u1, . . . , um) is linearly dependent. Then, there exists µ1, . . . , µk ∈ K

such that

µ1ui + · · · + µ

= 0,

1

kuik

where 1 ≤ k ≤ m, µi = 0 for all i, 1 ≤ i ≤ k, {i1, . . . , ik} ⊆ {1, . . . , m}, and no proper

subfamily of (ui , . . . , u ) is linearly dependent (in other words, we consider a dependency

1

ik

relation with k minimal). Applying f to this dependency relation, we get

µ1λi u + · · · + µ

u = 0,

1

i1

kλik ik

and if we multiply the original dependency relation by λi and subtract it from the above,

1

we get

µ2(λi − λ )u + · · · + µ

− λ )u = 0,

2

i1

i2

k(λik

i1

ik

which is a linear dependency among a proper subfamily of (ui , . . . , u ), a contradiction.

1

ik

Thus, from Proposition 12.3, if λ1, . . . , λm are all the pairwise distinct eigenvalues of f

(where m ≤ n), we have a direct sum

Eλ ⊕ · · · ⊕ E

1

λm

of the eigenspaces Eλ . Unfortunately, it is not always the case that

i

E = Eλ ⊕ · · · ⊕ E .

1

λm

When

E = Eλ ⊕ · · · ⊕ E ,

1

λm

we say that f is diagonalizable (and similarly for any matrix associated with f ). Indeed,

picking a basis in each Eλ , we obtain a matrix which is a diagonal matrix consisting of the

i

eigenvalues, each λi occurring a number of times equal to the dimension of Eλ . This happens

i

if the algebraic multiplicity and the geometric multiplicity of every eigenvalue are equal. In

particular, when the characteristic polynomial has n distinct roots, then f is diagonalizable.

It can also be shown that symmetric matrices have real eigenvalues and can be diagonalized.

For a negative example, we leave as exercise to show that the matrix

1 1

M =

0 1

cannot be diagonalized, even though 1 is an eigenvalue. The problem is that the eigenspace

of 1 only has dimension 1. The matrix

cos θ − sin θ

A =

sin θ

cos θ

cannot be diagonalized either, because it has no real eigenvalues, unless θ = kπ. However,

over the field of complex numbers, it can be diagonalized.

324

CHAPTER 12. EIGENVECTORS AND EIGENVALUES

12.2

Reduction to Upper Triangular Form

Unfortunately, not every linear map on a complex vector space can be diagonalized. The

next best thing is to “triangularize,” which means to find a basis over which the matrix has

zero entries below the main diagonal. Fortunately, such a basis always exist.

We say that a square matrix A is an upper triangular matrix if it has the following shape,

a

1 1

a1 2 a1 3 . . .

a1 n−1

a1 n

0

a2 2 a2 3 . . .

a2 n−1

a2 n 

0

0

a3 3 . . .

a3 n−1

a3 n 

.

.

.

.

.

.

 ,

..

..

..

. .

..

.. 

0

0

0

. . . a

n−1 n−1

an−1 n

0

0

0

. . .

0

an n

i.e., ai j = 0 whenever j < i, 1 ≤ i, j ≤ n.

Theorem 12.4. Given any finite dimensional vector space over a field K, for any linear

map f : E → E, there is a basis (u1, . . . , un) with respect to which f is represented by an

upper triangular matrix (in Mn(K)) iff all the eigenvalues of f belong to K. Equivalently,

for every n × n matrix A ∈ Mn(K), there is an invertible matrix P and an upper triangular

matrix T (both in Mn(K)) such that

A = P T P −1

iff all the eigenvalues of A belong to K.

Proof. If there is a basis (u1, . . . , un) with respect to which f is represented by an upper

triangular matrix T in Mn(K), then since the eigenvalues of f are the diagonal entries of T ,

all the eigenvalues of f belong to K.

For the converse, we proceed by induction on the dimension n of E. For n = 1 the result

is obvious. If n > 1, since by assumption f has all its eigenvalue in K, pick some eigenvalue

λ1 ∈ K of f, and let u1 be some corresponding (nonzero) eigenvector. We can find n − 1

vectors (v2, . . . , vn) such that (u1, v2, . . . , vn) is a basis of E, and let F be the subspace of

dimension n − 1 spanned by (v2, . . . , vn). In the basis (u1, v2 . . . , vn), the matrix of f is of

the form

λ

1

a1 2 . . . a1 n

 0

a2 2 . . . a2 n

U =  .

.

.

.  ,

 ..

..

. .

.. 

0

an 2 . . . an n

since its first column contains the coordinates of λ1u1 over the basis (u1, v2, . . . , vn). If we

let p : E → F be the projection defined such that p(u1) = 0 and p(vi) = vi when 2 ≤ i ≤ n,

the linear map g : F → F defined as the restriction of p ◦ f to F is represented by the

12.2. REDUCTION TO UPPER TRIANGULAR FORM

325

(n − 1) × (n − 1) matrix V = (ai j)2≤i,j≤n over the basis (v2, . . . , vn). We need to prove

that all the eigenvalues of g belong to K. However, since the first column of U has a single

nonzero entry, we get

χU (X) = det(XI − U) = (X − λ1) det(XI − V ) = (X − λ1)χV (X),

where χU (X) is the characteristic polynomial of U and χV (X) is the characteristic polynomial

of V . It follows that χV (X) divides χU (X), and since all the roots of χU (X) are in K, all

the roots of χV (X) are also in K. Consequently, we can apply the induction hypothesis, and

there is a basis (u2, . . . , un) of F such that g is represented by an upper triangular matrix

(bi j)1≤i,j≤n−1. However,

E = Ku1 ⊕ F,

and thus (u1, . . . , un) is a basis for E. Since p is the projection from E = Ku1 ⊕ F onto F

and g : F → F is the restriction of p ◦ f to F , we have

f (u1) = λ1u1

and

i

f (ui+1) = a1 iu1 +

bi juj+1

j=1

for some a1 i ∈ K, when 1 ≤ i ≤ n − 1. But then the matrix of f with respect to (u1, . . . , un)

is upper triangular.

For the matrix version, we assume that A is the matrix of f with respect to some basis,

Then, we just proved that there is a change of basis matrix P such that A = P T P −1 where

T is upper triangular.

If A = P T P −1 where T is upper triangular, note that the diagonal entries of T are the

eigenvalues λ1, . . . , λn of A. Indeed, A and T have the same characteristic polynomial. Also,

if A is a real matrix whose eigenvalues are all real, then P can be chosen to real, and if A

is a rational matrix whose eigenvalues are all rational, then P can be chosen rational. Since

any polynomial over C has all its roots in C, Theorem 12.4 implies that every complex n × n

matrix can be triangularized.

If E is a Hermitian space, the proof of Theorem 12.4 can be easily adapted to prove that

there is an orthonormal basis (u1, . . . , un) with respect to which the matrix of f is upper

triangular. This is usually known as Schur’s lemma.

Theorem 12.5. (Schur decomposition) Given any linear map f : E → E over a complex

Hermitian space E, there is an orthonormal basis (u1, . . . , un) with respect to which f is

represented by an upper triangular matrix. Equivalently, for every n × n matrix A ∈ Mn(C),

there is a unitary matrix U and an upper triangular matrix T such that

A = U T U ∗.

326

CHAPTER 12. EIGENVECTORS AND EIGENVALUES

If A is real and if all its eigenvalues are real, then there is an orthogonal matrix Q and a

real upper triangular matrix T such that

A = QT Q .

Proof. During the induction, we choose F to be the orthogonal complement of Cu1 and we

pick orthonormal bases. If E is a real Euclidean space and if the eigenvalues of f are all

real, the proof also goes through with real matrices.

Using, Theorem 12.5, we can derive the fact that if A is a Hermitian matrix, then there

is a unitary matrix U and a real diagonal matrix D such that A = U DU ∗. Indeed, since

A∗ = A, we get

U T U ∗ = U T ∗U ∗,

which implies that T = T ∗. Since T is an upper triangular matrix, T ∗ is a lower triangular

matrix, which implies that T is a real diagonal matrix. In fact, applying this result to a

(real) symmetric matrix A, we obtain the fact that all the eigenvalues of a symmetric matrix

are real, and by applying Theorem 12.5 again, we conclude that A = QDQ , where Q is

orthogonal and D is a real diagonal matrix. We will also prove this in Chapter 13.

When A has complex eigenvalues, there is a version of Theorem 12.5 involving only real

matrices provided that we allow T to be block upper-triangular (the diagonal entries may

be 2 × 2 matrices or real entries).

Theorem 12.5 is not a very practical result but it is a useful theoretical result to cope

with matrices that cannot be diagonalized. For example, it can be used to prove that

every complex matrix is the limit of a sequence of diagonalizable matrices that have distinct

eigenvalues!

12.3

Location of Eigenvalues

If A is an n × n complex (or real) matrix A, it would be useful to know, even roughly, where

the eigenvalues of A are located in the complex plane C. The Gershgorin discs provide some