Primary Decomposition
22.1
Annihilating Polynomials and the Minimal Poly-
nomial
In Section 5.7, we explained that if f : E → E is a linear map on a K-vector space E, then
for any polynomial p(X) = a0Xd + a1Xd−1 + · · · + ad with coefficients in the field K, we can
define the linear map p(f ) : E → E by
p(f ) = a0f d + a1f d−1 + · · · + adid,
where f k = f ◦ · · · ◦ f, the k-fold composition of f with itself. Note that
p(f )(u) = a0f d(u) + a1f d−1(u) + · · · + adu,
for every vector u ∈ E. Then, we showed that if E is finite-dimensional and if χf (X) =
det(XI − f) is the characteristic polynomial of f, by the Cayley–Hamilton Theorem, we
have
χf (f ) = 0.
This fact suggests looking at the set of all polynomials p(X) such that
p(f ) = 0.
We say that the polynomial p(X) annihilates f . It is easy to check that the set Ann(f )
of polynomials that annihilate f is an ideal. Furthermore, when E is finite-dimensional,
the Cayley–Hamilton Theorem implies that Ann(f ) is not the zero ideal. Therefore, by
Proposition 20.9, there is a unique monic polynomial mf that generates Ann(f ). Results
from Chapter 20, especially about gcd’s of polynomials, will come handy.
Definition 22.1. If f : E → E, is linear map on a finite-dimensional vector space E, the
unique monic polynomial mf that generates the ideal Ann(f ) of polynomials which annihilate
f (the annihilator of f ) is called the minimal polynomial of f .
589
590 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
The minimal polynomial mf of f is the monic polynomial of smallest degree that an-
nihilates f . Thus, the minimal polynomial divides the characteristic polynomial χf , and
deg(mf ) ≥ 1. For simplicity of notation, we often write m instead of mf .
If A is any n × n matrix, the set Ann(A) of polynomials that annihilate A is the set of
polynomials
p(X) = a0Xd + a1Xd−1 + · · · + ad−1X + ad
such that
a0Ad + a1Ad−1 + · · · + ad−1A + adI = 0.
It is clear that Ann(A) is a nonzero ideal and its unique monic generator is called the minimal
polynomial of A. We check immediately that if Q is an invertible matrix, then A and Q−1AQ
have the same minimal polynomial. Also, if A is the matrix of f with respect to some basis,
then f and A have the same minimal polynomial.
The zeros (in K) of the minimal polynomial of f and the eigenvalues of f (in K) are
intimately related.
Proposition 22.1. Let f : E → E be a linear map on some finite-dimensional vector space
E. Then, λ ∈ K is a zero of the minimal polynomial mf of f iff λ is an eigenvalue of f iff
λ is a zero of χf . Therefore, the minimal and the characteristic polynomials have the same
zeros (in K), except for multiplicities.
Proof. First, assume that m(λ) = 0 (with λ ∈ K, and writing m instead of mf ). If so, using
polynomial division, m can be factored as
m = (X − λ)q,
with deg(q) < deg(m). Since m is the minimal polynomial, q(f ) = 0, so there is some
nonzero vector v ∈ E such that u = q(f)(v) = 0. But then, because m is the minimal
polynomial,
0 = m(f )(v)
= (f − λid)(q(f)(v))
= (f − λid)(u),
which shows that λ is an eigenvalue of f .
Conversely, assume that λ ∈ K is an eigenvalue of f. This means that for some u = 0,
we have f (u) = λu. Now, it is easy to show that
m(f )(u) = m(λ)u,
and since m is the minimal polynomial of f , we have m(f )(u) = 0, so m(λ)u = 0, and since
u = 0, we must have m(λ) = 0.
22.2. MINIMAL POLYNOMIALS OF DIAGONALIZABLE LINEAR MAPS
591
If we assume that f is diagonalizable, then its eigenvalues are all in K, and if λ1, . . . , λk
are the distinct eigenvalues of f , then by Proposition 22.1, the minimal polynomial m of f
must be a product of powers of the polynomials (X − λi). Actually, we claim that
m = (X − λ1) · · · (X − λk).
For this this, we just have to show that m annihilates f . However, for any eigenvector u of
f , one of the linear maps f − λiid sends u to 0, so
m(f )(u) = (f − λ1id) ◦ · · · ◦ (f − λkid)(u) = 0.
Since E is spanned by the eigenvectors of f , we conclude that
m(f ) = 0.
Therefore, if a linear map is diagonalizable, then its minimal polynomial is a product of
distinct factors of degree 1. It turns out that the converse is true, but this will take a little
work to establish it.
22.2
Minimal Polynomials of Diagonalizable
Linear Maps
In this section, we prove that if the minimal polynomial mf of a linear map f is of the form
mf = (X − λ1) · · · (X − λk)
for disctinct scalars λ1, . . . , λk ∈ K, then f is diagonalizable. This is a powerful result that
has a number of implications. We need of few properties of invariant subspaces.
Given a linear map f : E → E, recall that a subspace W of E is invariant under f if
f (u) ∈ W for all u ∈ W .
Proposition 22.2. Let W be a subspace of E invariant under the linear map f : E → E
(where E is finite-dimensional). Then, the minimal polynomial of the restriction f | W of
f to W divides the minimal polynomial of f , and the characteristic polynomial of f | W
divides the characteristic polynomial of f .
Sketch of proof. The key ingredient is that we can pick a basis (e1, . . . , en) of E in which
(e1, . . . , ek) is a basis of W . Then, the matrix of f over this basis is a block matrix of the
form
B C
A =
,
0 D
where B is a k × k matrix, D is a (n − k) × (n − k) matrix, and C is a k × (n − k) matrix.
Then
det(XI − A) = det(XI − B) det(XI − D),
592 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
which implies the statement about the characteristic polynomials. Furthermore,
Bi C
Ai =
i
0
Di ,
for some k × (n − k) matrix Ci. It follows that any polynomial which annihilates A also
annihilates B and D. So, the minimal polynomial of B divides the minimal polynomial of
A.
For the next step, there are at least two ways to proceed. We can use an old-fashion argu-
ment using Lagrange interpolants, or use a slight generalization of the notion of annihilator.
We pick the second method because it illustrates nicely the power of principal ideals.
What we need is the notion of conductor (also called transporter).
Definition 22.2. Let f : E → E be a linear map on a finite-dimensional vector space E, let
W be an invariant subspace of f , and let u be any vector in E. The set Sf (u, W ) consisting
of all polynomials q ∈ K[X] such that q(f)(u) ∈ W is called the f-conductor of u into W .
Observe that the minimal polynomial mf of f always belongs to Sf (u, W ), so this is a
nontrivial set. Also, if W = (0), then Sf (u, (0)) is just the annihilator of f . The crucial
property of Sf (u, W ) is that it is an ideal.
Proposition 22.3. If W is an invariant subspace for f , then for each u ∈ E, the f-conductor
Sf (u, W ) is an ideal in K[X].
We leave the proof as a simple exercise, using the fact that if W invariant under f , then
W is invariant under every polynomial q(f ) in f .
Since Sf (u, W ) is an ideal, it is generated by a unique monic polynomial q of smallest
degree, and because the minimal polynomial mf of f is in Sf (u, W ), the polynomial q divides
m.
Proposition 22.4. Let f : E → E be a linear map on a finite-dimensional space E, and
assume that the minimal polynomial m of f is of the form
m = (X − λ1)r1 · · · (X − λk)rk,
where the eigenvalues λ1, . . . , λk of f belong to K. If W is a proper subspace of E which is
invariant under f , then there is a vector u ∈ E with the following properties:
(a) u /
∈ W ;
(b) (f − λid)(u) ∈ W , for some eigenvalue λ of f.
22.2. MINIMAL POLYNOMIALS OF DIAGONALIZABLE LINEAR MAPS
593
Proof. Observe that (a) and (b) together assert that the f -conductor of u into W is a
polynomial of the form X − λi. Pick any vector v ∈ E not in W , and let g be the conductor
of v into W . Since g divides m and v /
∈ W , the polynomial g is not a constant, and thus it
is of the form
g = (X − λ1)s1 · · · (X − λk)sk,
with at least some si > 0. Choose some index j such that sj > 0. Then X − λj is a factor
of g, so we can write
g = (X − λj)q.
By definition of g, the vector u = q(f )(v) cannot be in W , since otherwise g would not be
of minimal degree. However,
(f − λjid)(u) = (f − λjid)(q(f)(v))
= g(f )(v)
is in W , which concludes the proof.
We can now prove the main result of this section.
Theorem 22.5. Let f : E → E be a linear map on a finite-dimensional space E. Then f is
diagonalizable iff its minimal polynomial m is of the form
m = (X − λ1) · · · (X − λk),
where λ1, . . . , λk are distinct elements of K.
Proof. We already showed in Section 22.2 that if f is diagonalizable, then its minimal poly-
nomial is of the above form (where λ1, . . . , λk are the distinct eigenvalues of f ).
For the converse, let W be the subspace spanned by all the eigenvectors of f . If W = E,
since W is invariant under f , by Proposition 22.4, there is some vector u /
∈ W such that for
some λj, we have
(f − λjid)(u) ∈ W.
Let v = (f − λjid)(u) ∈ W . Since v ∈ W , we can write
v = w1 + · · · + wk
where f (wi) = λiwi (either wi = 0 or wi is an eigenvector for λi), and so, for every polynomial
h, we have
h(f )(v) = h(λ1)w1 + · · · + h(λk)wk,
which shows that h(f )(v) ∈ W for every polynomial h. We can write
m = (X − λj)q
594 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
for some polynomial q, and also
q − q(λj) = p(X − λj)
for some polynomial p. We know that p(f )(v) ∈ W , and since m is the minimal polynomial
of f , we have
0 = m(f )(u) = (f − λjid)(q(f)(u)),
which implies that q(f )(u) ∈ W (either q(f)(u) = 0, or it is an eigenvector associated with
λj). However,
q(f )(u) − q(λj)u = p(f)((f − λjid)(u)) = p(f)(v),
and since p(f )(v) ∈ W and q(f)(u) ∈ W , we conclude that q(λj)u ∈ W . But, u /
∈ W , which
implies that q(λj) = 0, so λj is a double root of m, a contradiction. Therefore, we must have
W = E.
Remark: Proposition 22.4 can be used to give a quick proof of Theorem 12.4.
Using Theorem 22.5, we can give a short proof about commuting diagonalizable linear
maps. If F is a family of linear maps on a vector space E, we say that F is a commuting
family iff f ◦ g = g ◦ f for all f, g ∈ F.
Proposition 22.6. Let F be a finite commuting family of diagonalizable linear maps on a
vector space E. There exists a basis of E such that every linear map in F is represented in
that basis by a diagonal matrix.
Proof. We proceed by induction on n = dim(E). If n = 1, there is nothing to prove. If
n > 1, there are two cases. If all linear maps in F are of the form λid for some λ ∈
K, then the proposition holds trivially. In the second case, let f ∈ F be some linear
map in F which is not a scalar multiple of the identity. In this case, f has at least two
distinct eigenvalues λ1, . . . , λk, and because f is diagonalizable, E is the direct sum of the
corresponding eigenspaces Eλ , . . . , E . For every index i, the eigenspace E
is invariant
1
λk
λi
under f and under every other linear map g in F, since for any g ∈ F and any u ∈ Eλ ,
i
because f and g commute, we have
f (g(u)) = g(f (u)) = g(λiu) = λig(u)
so g(u) ∈ Eλ . Let F
. By
i
i be the family obtained by restricting each f ∈ F to Eλi
proposition 22.2, the minimal polynomial of every linear map f | Eλ in F
i
i divides the
minimal polynomial mf of f , and since f is diagonalizable, mf is a product of distinct
linear factors, so the minimal polynomial of f | Eλ is also a product of distinct linear
i
factors. By Theorem 22.5, the linear map f | Eλ is diagonalizable. Since k > 1, we have
i
dim(Eλ ) < dim(E) for i = 1, . . . , k, and by the induction hypothesis, for each i there is
i
a basis of Eλ over which f | E
is represented by a diagonal matrix. Since the above
i
λi
argument holds for all i, by combining the bases of the Eλ , we obtain a basis of E such that
i
the matrix of every linear map f ∈ F is represented by a diagonal matrix.
22.3. THE PRIMARY DECOMPOSITION THEOREM
595
Remark: Proposition 22.6 also holds for infinite commuting familes F of diagonalizable
linear maps, because E being finite dimensional, there is a finite subfamily of linearly in-
dependent linear maps in F spanning F. There is also an analogous result for commuting
families of linear maps represented by upper triangular matrices.
22.3
The Primary Decomposition Theorem
If f : E → E is a linear map and λ ∈ K is an eigenvalue of f, recall that the eigenspace Eλ
associated with λ is the kernel of the linear map λid − f. If all the eigenvalues λ1 . . . , λk of
f are in K, it may happen that
E = Eλ ⊕ · · · ⊕ E ,
1
λk
but in general there are not enough eigenvectors to span E. What if we generalize the notion
of eigenvector and look for (nonzero) vectors u such that
(λid − f)r(u) = 0, for some r ≥ 1?
Then, it turns out that if the minimal polynomial of f is of the form
m = (X − λ1)r1 · · · (X − λk)rk,
then r = ri does the job for λi; that is, if we let
Wi = Ker (λiid − f)ri,
then
E = W1 ⊕ · · · ⊕ Wk.
This result is very nice but seems to require that the eigenvalues of f all belong to K.
Actually, it is a special case of a more general result involving the factorization of the
minimal polynomial m into its irreducible monic factors (See Theorem 20.16),
m = pr1
1 · · · prk ,
k
where the pi are distinct irreducible monic polynomials over K.
Theorem 22.7. (Primary Decomposition Theorem) Let f : E → E be a linear map on the
finite-dimensional vector space E over the field K. Write the minimal polynomial m of f as
m = pr1
1 · · · prk ,
k
where the pi are distinct irreducible monic polynomials over K, and the ri are positive inte-
gers. Let
Wi = Ker (pri(f )),
i = 1, . . . , k.
i
Then
596 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
(a) E = W1 ⊕ · · · ⊕ Wk.
(b) Each Wi is invariant under f .
(c) The minimal polynomial of the restriction f | Wi of f to Wi is pri.
i
Proof. The trick is to construct projections πi using the polynomials prj so that the range
j
of πi is equal to Wi. Let
gi = m/pri =
prj .
i
j
j=i
Note that
prig
i
i = m.
Since p1, . . . , pk are irreducible and distinct, they are relatively prime. Then, using Proposi-
tion 20.13, it is easy to show that g1, . . . , gk are relatively prime. Otherwise, some irreducible
polynomial p would divide all of g1, . . . , gk, so by Proposition 20.13 it would be equal to one
of the irreducible factors pi. But, that pi is missing from gi, a contradiction. Therefore, by
Proposition 20.14, there exist some polynomials h1, . . . , hk such that
g1h1 + · · · + gkhk = 1.
Let qi = gihi and let πi = qi(f ) = gi(f )hi(f ). We have
q1 + · · · + qk = 1,
and since m divides qiqj for i = j, we get
π1 + · · · + πk = id
πiπj = 0,
i = j.
(We implicitly used the fact that if p, q are two polynomials, the linear maps p(f ) ◦ q(f)
and q(f ) ◦ p(f) are the same since p(f) and q(f) are polynomials in the powers of f, which
commute.) Composing the first equation with πi and using the second equation, we get
π2i = πi.
Therefore, the πi are projections, and E is the direct sum of the images of the πi. Indeed,
every u ∈ E can be expressed as
u = π1(u) + · · · + πk(u).
Also, if
π1(u) + · · · + πk(u) = 0,
then by applying πi we get
0 = π2i(u) = πi(u), i = 1, . . . k.
22.3. THE PRIMARY DECOMPOSITION THEOREM
597
To finish proving (a), we need to show that
Wi = Ker (pri(f )) = π
i
i(E).
If v ∈ πi(E), then v = πi(u) for some u ∈ E, so
pri(f )(v) = pri(f )(π
i
i
i(u))
= pri(f )g
i
i(f )hi(f )(u)
= hi(f )pri(f )g
i
i(f )(u)
= hi(f )m(f )(u) = 0,
because m is the minimal polynomial of f . Therefore, v ∈ Wi.
Conversely, assume that v ∈ Wi = Ker (pri(f)). If j = i, then g
, so
i
j hj is divisible by pri
i
gj(f )hj(f )(v) = πj(v) = 0,
j = i.
Then, since π1 + · · · + πk = id, we have v = πiv, which shows that v is in the range of πi.
Therefore, Wi = Im(πi), and this finishes the proof of (a).
If pri(f )(u) = 0, then pri(f )(f (u)) = f (pri(f )(u)) = 0, so (b) holds.
i
i
i
If we write fi = f | Wi, then pri(f
(f ) = 0 on W
i
i) = 0, because pri
i
i (its kernel). Therefore,
the minimal polynomial of fi divides pri. Conversely, let q be any polynomial such that
i
q(fi) = 0 (on Wi). Since m = prig
i
i, the fact that m(f )(u) = 0 for all u ∈ E shows that
pri(f )(g
i
i(f )(u)) = 0,
u ∈ E,
and thus Im(gi(f )) ⊆ Ker (pri(f)) = W
i
i. Consequently, since q(f ) is zero on Wi,
q(f )gi(f ) = 0 for all u ∈ E.
But then, qgi is divisible by the minimal polynomial m = prig
and g
i
i of f , and since pri
i
i are
relatively prime, by Euclid’s Proposition, pri must divide q. This finishes the proof that the
i
minimal polynomial of fi is pri, which is (c).
i
If all the eigenvalues of f belong to the field K, we obtain the following result.
Theorem 22.8. (Primary Decomposition Theorem, Version 2) Let f : E → E be a lin-
ear map on the finite-dimensional vector space E over the field K. If all the eigenvalues
λ1, . . . , λk of f belong to K, write
m = (X − λ1)r1 · · · (X − λk)rk
for the minimal polynomial of f ,
χf = (X − λ1)n1 · · · (X − λk)nk
for the characteristic polynomial of f , with 1 ≤ ri ≤ ni, and let
Wi = Ker (λiid − f)ri,
i = 1, . . . , k.
Then
598 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
(a) E = W1 ⊕ · · · ⊕ Wk.
(b) Each Wi is invariant under f .
(c) dim(Wi) = ni.
(d) The minimal polynomial of the restriction f | Wi of f to Wi is (X − λi)ri.
Proof. Parts (a), (b) and (d) have already been proved in Theorem 22.8, so it remains to
prove (c). Since Wi is invariant under f , let fi be the restriction of f to Wi. The characteristic
polynomial χf of f
i
i divides χ(f ), and since χ(f ) has all its roots in K , so does χi(f ). By
Theorem 12.4, there is a basis of Wi in which fi is represented by an upper triangular matrix,
and since (λiid − f)ri = 0, the diagonal entries of this matrix are equal to λi. Consequently,
χf = (X − λ
i
i)dim(Wi),
and since χf divides χ(f ), we conclude hat
i
dim(Wi) ≤ ni, i = 1, . . . , k.
Because E is the direct sum of the Wi, we have dim(W1) + · · · + dim(Wk) = n, and since
n1 + · · · + nk = n, we must have
dim(Wi) = ni,
i = 1, . . . , k,
proving (c).
If λ ∈ K is an eigenvalue of f, it is customary to define a generalized eigenvector of f as
a nonzero vector u ∈ E such that
(λid − f)r(u) = 0, for some r ≥ 1.
It is clear that Ker (λid − f)i ⊆ Ker (λid − f)i+1 for all i ≥ 1, and the index of λ is defined
as the smallest r ≥ 1 such that
Ker (λid − f)r = Ker (λid − f)r+1.
By Theorem 22.8(d), if λ = λi, the index of λi is equal to ri.
Another important consequence of Theorem 22.8 is that f can be written as the sum of
a diagonalizable and a nilpotent linear map (which commute). If we write
D = λ1π1 + · · · + λkπk,
where the πk are the projections defined in the proof of Theorem 22.7, since
πk + · · · + πk = id,
22.3. THE PRIMARY DECOMPOSITION THEOREM
599
we have
f = f πk + · · · + fπk,
and so we get
f − D = (f − λ1id)π1 + · · · + (f − λkid)πk.
Since the πi are polynomials in f , they commute with f , and if we write N = f − D, using
the properties of the πi, we get
N r = (f − λ1id)rπ1 + · · · + (f − λkid)rπk.
Therefore, if r = max{ri}, we have (f − λkid)r = 0 for i = 1, . . . , k, which implies that
N r = 0.
A linear map g : E → E is said to be nilpotent if there is some positive integer r such
that gr = 0.
Since N is a polynomial in f , it commutes with f , and thus with D. Since the λi are
distinct, by Theorem 22.5, the linear map D is diagonalizable, so we have shown that when
all the eigenvalues of f belong to K, there exist a diagonalizable linear map D and a nilpotent
linear map N , such that
f = D + N
DN = N D,
and N and D are polynomials in f .
A decomposition of f as above is called a Jordan decomposition. In fact, we can prove
more: The maps D and N are uniquely determined by f .
Theorem 22.9. (Jordan Decomposition) Let f : E → E be a linear map on the finite-
dimensional vector space E over the field K. If all the eigenvalues λ1, . . . , λk of f belong to
K, then there exist a diagonalizable linear map D and a nilpotent linear map N such that
f = D + N
DN = N D.
Furthermore, D and N are uniquely determined by the above equations and they are polyno-
mials in f .
Proof. We already proved the existence part. Suppose we also have f = D + N , with
D N = N D , where D is diagonalizable, N is nilpotent, and both are polynomials in f .
We need to prove that D = D and N = N .
Since D and N commute with one another and f = D + N , we see that D and N
commute with f . Then, D and N commute with any polynomial in f ; hence they commute
with D and N . From
D + N = D + N ,
600 CHAPTER 22. ANNIHILATING POLYNOMIALS; PRIMARY DECOMPOSITION
we get
D − D = N − N,
and D, D , N, N commute with one another. Since D and D are both diagonalizable and
commute, by Proposition 22.6, they are simultaneousy diagonalizable, so D − D is diago-
nalizable. Since N and N commute, by the binomial formula, for any r ≥ 1,
r
r
(N − N)r =
(−1)j
(N )r−jN j.
j
j=0
Since both N and N are nilpotent, we have N r1 = 0 and (N )r2 = 0, for some r1, r2 > 0, so
for r ≥ r1 + r2, the right-han