Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 22

Annihilating Polynomials and the

Primary Decomposition

22.1

Annihilating Polynomials and the Minimal Poly-

nomial

In Section 5.7, we explained that if f : E → E is a linear map on a K-vector space E, then

for any polynomial p(X) = a0Xd + a1Xd−1 + · · · + ad with coefficients in the field K, we can

define the linear map p(f ) : E → E by

p(f ) = a0f d + a1f d−1 + · · · + adid,

where f k = f ◦ · · · ◦ f, the k-fold composition of f with itself. Note that

p(f )(u) = a0f d(u) + a1f d−1(u) + · · · + adu,

for every vector u ∈ E. Then, we showed that if E is finite-dimensional and if χf (X) =

det(XI − f) is the characteristic polynomial of f, by the Cayley–Hamilton Theorem, we

have

χf (f ) = 0.

This fact suggests looking at the set of all polynomials p(X) such that

p(f ) = 0.

We say that the polynomial p(X) annihilates f . It is easy to check that the set Ann(f )

of polynomials that annihilate f is an ideal. Furthermore, when E is finite-dimensional,

the Cayley–Hamilton Theorem implies that Ann(f ) is not the zero ideal. Therefore, by

Proposition 20.9, there is a unique monic polynomial mf that generates Ann(f ). Results

from Chapter 20, especially about gcd’s of polynomials, will come handy.

Definition 22.1. If f : E → E, is linear map on a finite-dimensional vector space E, the

unique monic polynomial mf that generates the ideal Ann(f ) of polynomials which annihilate

f (the annihilator of f ) is called the minimal polynomial of f .

589

590 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

The minimal polynomial mf of f is the monic polynomial of smallest degree that an-

nihilates f . Thus, the minimal polynomial divides the characteristic polynomial χf , and

deg(mf ) ≥ 1. For simplicity of notation, we often write m instead of mf .

If A is any n × n matrix, the set Ann(A) of polynomials that annihilate A is the set of

polynomials

p(X) = a0Xd + a1Xd−1 + · · · + ad−1X + ad

such that

a0Ad + a1Ad−1 + · · · + ad−1A + adI = 0.

It is clear that Ann(A) is a nonzero ideal and its unique monic generator is called the minimal

polynomial of A. We check immediately that if Q is an invertible matrix, then A and Q−1AQ

have the same minimal polynomial. Also, if A is the matrix of f with respect to some basis,

then f and A have the same minimal polynomial.

The zeros (in K) of the minimal polynomial of f and the eigenvalues of f (in K) are

intimately related.

Proposition 22.1. Let f : E → E be a linear map on some finite-dimensional vector space

E. Then, λ ∈ K is a zero of the minimal polynomial mf of f iff λ is an eigenvalue of f iff

λ is a zero of χf . Therefore, the minimal and the characteristic polynomials have the same

zeros (in K), except for multiplicities.

Proof. First, assume that m(λ) = 0 (with λ ∈ K, and writing m instead of mf ). If so, using

polynomial division, m can be factored as

m = (X − λ)q,

with deg(q) < deg(m). Since m is the minimal polynomial, q(f ) = 0, so there is some

nonzero vector v ∈ E such that u = q(f)(v) = 0. But then, because m is the minimal

polynomial,

0 = m(f )(v)

= (f − λid)(q(f)(v))

= (f − λid)(u),

which shows that λ is an eigenvalue of f .

Conversely, assume that λ ∈ K is an eigenvalue of f. This means that for some u = 0,

we have f (u) = λu. Now, it is easy to show that

m(f )(u) = m(λ)u,

and since m is the minimal polynomial of f , we have m(f )(u) = 0, so m(λ)u = 0, and since

u = 0, we must have m(λ) = 0.

22.2. MINIMAL POLYNOMIALS OF DIAGONALIZABLE LINEAR MAPS

591

If we assume that f is diagonalizable, then its eigenvalues are all in K, and if λ1, . . . , λk

are the distinct eigenvalues of f , then by Proposition 22.1, the minimal polynomial m of f

must be a product of powers of the polynomials (X − λi). Actually, we claim that

m = (X − λ1) · · · (X − λk).

For this this, we just have to show that m annihilates f . However, for any eigenvector u of

f , one of the linear maps f − λiid sends u to 0, so

m(f )(u) = (f − λ1id) ◦ · · · ◦ (f − λkid)(u) = 0.

Since E is spanned by the eigenvectors of f , we conclude that

m(f ) = 0.

Therefore, if a linear map is diagonalizable, then its minimal polynomial is a product of

distinct factors of degree 1. It turns out that the converse is true, but this will take a little

work to establish it.

22.2

Minimal Polynomials of Diagonalizable

Linear Maps

In this section, we prove that if the minimal polynomial mf of a linear map f is of the form

mf = (X − λ1) · · · (X − λk)

for disctinct scalars λ1, . . . , λk ∈ K, then f is diagonalizable. This is a powerful result that

has a number of implications. We need of few properties of invariant subspaces.

Given a linear map f : E → E, recall that a subspace W of E is invariant under f if

f (u) ∈ W for all u ∈ W .

Proposition 22.2. Let W be a subspace of E invariant under the linear map f : E → E

(where E is finite-dimensional). Then, the minimal polynomial of the restriction f | W of

f to W divides the minimal polynomial of f , and the characteristic polynomial of f | W

divides the characteristic polynomial of f .

Sketch of proof. The key ingredient is that we can pick a basis (e1, . . . , en) of E in which

(e1, . . . , ek) is a basis of W . Then, the matrix of f over this basis is a block matrix of the

form

B C

A =

,

0 D

where B is a k × k matrix, D is a (n − k) × (n − k) matrix, and C is a k × (n − k) matrix.

Then

det(XI − A) = det(XI − B) det(XI − D),

592 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

which implies the statement about the characteristic polynomials. Furthermore,

Bi C

Ai =

i

0

Di ,

for some k × (n − k) matrix Ci. It follows that any polynomial which annihilates A also

annihilates B and D. So, the minimal polynomial of B divides the minimal polynomial of

A.

For the next step, there are at least two ways to proceed. We can use an old-fashion argu-

ment using Lagrange interpolants, or use a slight generalization of the notion of annihilator.

We pick the second method because it illustrates nicely the power of principal ideals.

What we need is the notion of conductor (also called transporter).

Definition 22.2. Let f : E → E be a linear map on a finite-dimensional vector space E, let

W be an invariant subspace of f , and let u be any vector in E. The set Sf (u, W ) consisting

of all polynomials q ∈ K[X] such that q(f)(u) ∈ W is called the f-conductor of u into W .

Observe that the minimal polynomial mf of f always belongs to Sf (u, W ), so this is a

nontrivial set. Also, if W = (0), then Sf (u, (0)) is just the annihilator of f . The crucial

property of Sf (u, W ) is that it is an ideal.

Proposition 22.3. If W is an invariant subspace for f , then for each u ∈ E, the f-conductor

Sf (u, W ) is an ideal in K[X].

We leave the proof as a simple exercise, using the fact that if W invariant under f , then

W is invariant under every polynomial q(f ) in f .

Since Sf (u, W ) is an ideal, it is generated by a unique monic polynomial q of smallest

degree, and because the minimal polynomial mf of f is in Sf (u, W ), the polynomial q divides

m.

Proposition 22.4. Let f : E → E be a linear map on a finite-dimensional space E, and

assume that the minimal polynomial m of f is of the form

m = (X − λ1)r1 · · · (X − λk)rk,

where the eigenvalues λ1, . . . , λk of f belong to K. If W is a proper subspace of E which is

invariant under f , then there is a vector u ∈ E with the following properties:

(a) u /

∈ W ;

(b) (f − λid)(u) ∈ W , for some eigenvalue λ of f.

22.2. MINIMAL POLYNOMIALS OF DIAGONALIZABLE LINEAR MAPS

593

Proof. Observe that (a) and (b) together assert that the f -conductor of u into W is a

polynomial of the form X − λi. Pick any vector v ∈ E not in W , and let g be the conductor

of v into W . Since g divides m and v /

∈ W , the polynomial g is not a constant, and thus it

is of the form

g = (X − λ1)s1 · · · (X − λk)sk,

with at least some si > 0. Choose some index j such that sj > 0. Then X − λj is a factor

of g, so we can write

g = (X − λj)q.

By definition of g, the vector u = q(f )(v) cannot be in W , since otherwise g would not be

of minimal degree. However,

(f − λjid)(u) = (f − λjid)(q(f)(v))

= g(f )(v)

is in W , which concludes the proof.

We can now prove the main result of this section.

Theorem 22.5. Let f : E → E be a linear map on a finite-dimensional space E. Then f is

diagonalizable iff its minimal polynomial m is of the form

m = (X − λ1) · · · (X − λk),

where λ1, . . . , λk are distinct elements of K.

Proof. We already showed in Section 22.2 that if f is diagonalizable, then its minimal poly-

nomial is of the above form (where λ1, . . . , λk are the distinct eigenvalues of f ).

For the converse, let W be the subspace spanned by all the eigenvectors of f . If W = E,

since W is invariant under f , by Proposition 22.4, there is some vector u /

∈ W such that for

some λj, we have

(f − λjid)(u) ∈ W.

Let v = (f − λjid)(u) ∈ W . Since v ∈ W , we can write

v = w1 + · · · + wk

where f (wi) = λiwi (either wi = 0 or wi is an eigenvector for λi), and so, for every polynomial

h, we have

h(f )(v) = h(λ1)w1 + · · · + h(λk)wk,

which shows that h(f )(v) ∈ W for every polynomial h. We can write

m = (X − λj)q

594 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

for some polynomial q, and also

q − q(λj) = p(X − λj)

for some polynomial p. We know that p(f )(v) ∈ W , and since m is the minimal polynomial

of f , we have

0 = m(f )(u) = (f − λjid)(q(f)(u)),

which implies that q(f )(u) ∈ W (either q(f)(u) = 0, or it is an eigenvector associated with

λj). However,

q(f )(u) − q(λj)u = p(f)((f − λjid)(u)) = p(f)(v),

and since p(f )(v) ∈ W and q(f)(u) ∈ W , we conclude that q(λj)u ∈ W . But, u /

∈ W , which

implies that q(λj) = 0, so λj is a double root of m, a contradiction. Therefore, we must have

W = E.

Remark: Proposition 22.4 can be used to give a quick proof of Theorem 12.4.

Using Theorem 22.5, we can give a short proof about commuting diagonalizable linear

maps. If F is a family of linear maps on a vector space E, we say that F is a commuting

family iff f ◦ g = g ◦ f for all f, g ∈ F.

Proposition 22.6. Let F be a finite commuting family of diagonalizable linear maps on a

vector space E. There exists a basis of E such that every linear map in F is represented in

that basis by a diagonal matrix.

Proof. We proceed by induction on n = dim(E). If n = 1, there is nothing to prove. If

n > 1, there are two cases. If all linear maps in F are of the form λid for some λ ∈

K, then the proposition holds trivially. In the second case, let f ∈ F be some linear

map in F which is not a scalar multiple of the identity. In this case, f has at least two

distinct eigenvalues λ1, . . . , λk, and because f is diagonalizable, E is the direct sum of the

corresponding eigenspaces Eλ , . . . , E . For every index i, the eigenspace E

is invariant

1

λk

λi

under f and under every other linear map g in F, since for any g ∈ F and any u ∈ Eλ ,

i

because f and g commute, we have

f (g(u)) = g(f (u)) = g(λiu) = λig(u)

so g(u) ∈ Eλ . Let F

. By

i

i be the family obtained by restricting each f ∈ F to Eλi

proposition 22.2, the minimal polynomial of every linear map f | Eλ in F

i

i divides the

minimal polynomial mf of f , and since f is diagonalizable, mf is a product of distinct

linear factors, so the minimal polynomial of f | Eλ is also a product of distinct linear

i

factors. By Theorem 22.5, the linear map f | Eλ is diagonalizable. Since k > 1, we have

i

dim(Eλ ) < dim(E) for i = 1, . . . , k, and by the induction hypothesis, for each i there is

i

a basis of Eλ over which f | E

is represented by a diagonal matrix. Since the above

i

λi

argument holds for all i, by combining the bases of the Eλ , we obtain a basis of E such that

i

the matrix of every linear map f ∈ F is represented by a diagonal matrix.

22.3. THE PRIMARY DECOMPOSITION THEOREM

595

Remark: Proposition 22.6 also holds for infinite commuting familes F of diagonalizable

linear maps, because E being finite dimensional, there is a finite subfamily of linearly in-

dependent linear maps in F spanning F. There is also an analogous result for commuting

families of linear maps represented by upper triangular matrices.

22.3

The Primary Decomposition Theorem

If f : E → E is a linear map and λ ∈ K is an eigenvalue of f, recall that the eigenspace Eλ

associated with λ is the kernel of the linear map λid − f. If all the eigenvalues λ1 . . . , λk of

f are in K, it may happen that

E = Eλ ⊕ · · · ⊕ E ,

1

λk

but in general there are not enough eigenvectors to span E. What if we generalize the notion

of eigenvector and look for (nonzero) vectors u such that

(λid − f)r(u) = 0, for some r ≥ 1?

Then, it turns out that if the minimal polynomial of f is of the form

m = (X − λ1)r1 · · · (X − λk)rk,

then r = ri does the job for λi; that is, if we let

Wi = Ker (λiid − f)ri,

then

E = W1 ⊕ · · · ⊕ Wk.

This result is very nice but seems to require that the eigenvalues of f all belong to K.

Actually, it is a special case of a more general result involving the factorization of the

minimal polynomial m into its irreducible monic factors (See Theorem 20.16),

m = pr1

1 · · · prk ,

k

where the pi are distinct irreducible monic polynomials over K.

Theorem 22.7. (Primary Decomposition Theorem) Let f : E → E be a linear map on the

finite-dimensional vector space E over the field K. Write the minimal polynomial m of f as

m = pr1

1 · · · prk ,

k

where the pi are distinct irreducible monic polynomials over K, and the ri are positive inte-

gers. Let

Wi = Ker (pri(f )),

i = 1, . . . , k.

i

Then

596 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

(a) E = W1 ⊕ · · · ⊕ Wk.

(b) Each Wi is invariant under f .

(c) The minimal polynomial of the restriction f | Wi of f to Wi is pri.

i

Proof. The trick is to construct projections πi using the polynomials prj so that the range

j

of πi is equal to Wi. Let

gi = m/pri =

prj .

i

j

j=i

Note that

prig

i

i = m.

Since p1, . . . , pk are irreducible and distinct, they are relatively prime. Then, using Proposi-

tion 20.13, it is easy to show that g1, . . . , gk are relatively prime. Otherwise, some irreducible

polynomial p would divide all of g1, . . . , gk, so by Proposition 20.13 it would be equal to one

of the irreducible factors pi. But, that pi is missing from gi, a contradiction. Therefore, by

Proposition 20.14, there exist some polynomials h1, . . . , hk such that

g1h1 + · · · + gkhk = 1.

Let qi = gihi and let πi = qi(f ) = gi(f )hi(f ). We have

q1 + · · · + qk = 1,

and since m divides qiqj for i = j, we get

π1 + · · · + πk = id

πiπj = 0,

i = j.

(We implicitly used the fact that if p, q are two polynomials, the linear maps p(f ) ◦ q(f)

and q(f ) ◦ p(f) are the same since p(f) and q(f) are polynomials in the powers of f, which

commute.) Composing the first equation with πi and using the second equation, we get

π2i = πi.

Therefore, the πi are projections, and E is the direct sum of the images of the πi. Indeed,

every u ∈ E can be expressed as

u = π1(u) + · · · + πk(u).

Also, if

π1(u) + · · · + πk(u) = 0,

then by applying πi we get

0 = π2i(u) = πi(u), i = 1, . . . k.

22.3. THE PRIMARY DECOMPOSITION THEOREM

597

To finish proving (a), we need to show that

Wi = Ker (pri(f )) = π

i

i(E).

If v ∈ πi(E), then v = πi(u) for some u ∈ E, so

pri(f )(v) = pri(f )(π

i

i

i(u))

= pri(f )g

i

i(f )hi(f )(u)

= hi(f )pri(f )g

i

i(f )(u)

= hi(f )m(f )(u) = 0,

because m is the minimal polynomial of f . Therefore, v ∈ Wi.

Conversely, assume that v ∈ Wi = Ker (pri(f)). If j = i, then g

, so

i

j hj is divisible by pri

i

gj(f )hj(f )(v) = πj(v) = 0,

j = i.

Then, since π1 + · · · + πk = id, we have v = πiv, which shows that v is in the range of πi.

Therefore, Wi = Im(πi), and this finishes the proof of (a).

If pri(f )(u) = 0, then pri(f )(f (u)) = f (pri(f )(u)) = 0, so (b) holds.

i

i

i

If we write fi = f | Wi, then pri(f

(f ) = 0 on W

i

i) = 0, because pri

i

i (its kernel). Therefore,

the minimal polynomial of fi divides pri. Conversely, let q be any polynomial such that

i

q(fi) = 0 (on Wi). Since m = prig

i

i, the fact that m(f )(u) = 0 for all u ∈ E shows that

pri(f )(g

i

i(f )(u)) = 0,

u ∈ E,

and thus Im(gi(f )) ⊆ Ker (pri(f)) = W

i

i. Consequently, since q(f ) is zero on Wi,

q(f )gi(f ) = 0 for all u ∈ E.

But then, qgi is divisible by the minimal polynomial m = prig

and g

i

i of f , and since pri

i

i are

relatively prime, by Euclid’s Proposition, pri must divide q. This finishes the proof that the

i

minimal polynomial of fi is pri, which is (c).

i

If all the eigenvalues of f belong to the field K, we obtain the following result.

Theorem 22.8. (Primary Decomposition Theorem, Version 2) Let f : E → E be a lin-

ear map on the finite-dimensional vector space E over the field K. If all the eigenvalues

λ1, . . . , λk of f belong to K, write

m = (X − λ1)r1 · · · (X − λk)rk

for the minimal polynomial of f ,

χf = (X − λ1)n1 · · · (X − λk)nk

for the characteristic polynomial of f , with 1 ≤ ri ≤ ni, and let

Wi = Ker (λiid − f)ri,

i = 1, . . . , k.

Then

598 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

(a) E = W1 ⊕ · · · ⊕ Wk.

(b) Each Wi is invariant under f .

(c) dim(Wi) = ni.

(d) The minimal polynomial of the restriction f | Wi of f to Wi is (X − λi)ri.

Proof. Parts (a), (b) and (d) have already been proved in Theorem 22.8, so it remains to

prove (c). Since Wi is invariant under f , let fi be the restriction of f to Wi. The characteristic

polynomial χf of f

i

i divides χ(f ), and since χ(f ) has all its roots in K , so does χi(f ). By

Theorem 12.4, there is a basis of Wi in which fi is represented by an upper triangular matrix,

and since (λiid − f)ri = 0, the diagonal entries of this matrix are equal to λi. Consequently,

χf = (X − λ

i

i)dim(Wi),

and since χf divides χ(f ), we conclude hat

i

dim(Wi) ≤ ni, i = 1, . . . , k.

Because E is the direct sum of the Wi, we have dim(W1) + · · · + dim(Wk) = n, and since

n1 + · · · + nk = n, we must have

dim(Wi) = ni,

i = 1, . . . , k,

proving (c).

If λ ∈ K is an eigenvalue of f, it is customary to define a generalized eigenvector of f as

a nonzero vector u ∈ E such that

(λid − f)r(u) = 0, for some r ≥ 1.

It is clear that Ker (λid − f)i ⊆ Ker (λid − f)i+1 for all i ≥ 1, and the index of λ is defined

as the smallest r ≥ 1 such that

Ker (λid − f)r = Ker (λid − f)r+1.

By Theorem 22.8(d), if λ = λi, the index of λi is equal to ri.

Another important consequence of Theorem 22.8 is that f can be written as the sum of

a diagonalizable and a nilpotent linear map (which commute). If we write

D = λ1π1 + · · · + λkπk,

where the πk are the projections defined in the proof of Theorem 22.7, since

πk + · · · + πk = id,

22.3. THE PRIMARY DECOMPOSITION THEOREM

599

we have

f = f πk + · · · + fπk,

and so we get

f − D = (f − λ1id)π1 + · · · + (f − λkid)πk.

Since the πi are polynomials in f , they commute with f , and if we write N = f − D, using

the properties of the πi, we get

N r = (f − λ1id)rπ1 + · · · + (f − λkid)rπk.

Therefore, if r = max{ri}, we have (f − λkid)r = 0 for i = 1, . . . , k, which implies that

N r = 0.

A linear map g : E → E is said to be nilpotent if there is some positive integer r such

that gr = 0.

Since N is a polynomial in f , it commutes with f , and thus with D. Since the λi are

distinct, by Theorem 22.5, the linear map D is diagonalizable, so we have shown that when

all the eigenvalues of f belong to K, there exist a diagonalizable linear map D and a nilpotent

linear map N , such that

f = D + N

DN = N D,

and N and D are polynomials in f .

A decomposition of f as above is called a Jordan decomposition. In fact, we can prove

more: The maps D and N are uniquely determined by f .

Theorem 22.9. (Jordan Decomposition) Let f : E → E be a linear map on the finite-

dimensional vector space E over the field K. If all the eigenvalues λ1, . . . , λk of f belong to

K, then there exist a diagonalizable linear map D and a nilpotent linear map N such that

f = D + N

DN = N D.

Furthermore, D and N are uniquely determined by the above equations and they are polyno-

mials in f .

Proof. We already proved the existence part. Suppose we also have f = D + N , with

D N = N D , where D is diagonalizable, N is nilpotent, and both are polynomials in f .

We need to prove that D = D and N = N .

Since D and N commute with one another and f = D + N , we see that D and N

commute with f . Then, D and N commute with any polynomial in f ; hence they commute

with D and N . From

D + N = D + N ,

600 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION

we get

D − D = N − N,

and D, D , N, N commute with one another. Since D and D are both diagonalizable and

commute, by Proposition 22.6, they are simultaneousy diagonalizable, so D − D is diago-

nalizable. Since N and N commute, by the binomial formula, for any r ≥ 1,

r

r

(N − N)r =

(−1)j

(N )r−jN j.

j

j=0

Since both N and N are nilpotent, we have N r1 = 0 and (N )r2 = 0, for some r1, r2 > 0, so

for r ≥ r1 + r2, the right-han