Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 5

Determinants

5.1

Permutations, Signature of a Permutation

This chapter contains a review of determinants and their use in linear algebra. We begin

with permutations and the signature of a permutation. Next, we define multilinear maps

and alternating multilinear maps. Determinants are introduced as alternating multilinear

maps taking the value 1 on the unit matrix (following Emil Artin). It is then shown how

to compute a determinant using the Laplace expansion formula, and the connection with

the usual definition is made. It is shown how determinants can be used to invert matrices

and to solve (at least in theory!) systems of linear equations (the Cramer formulae). The

determinant of a linear map is defined. We conclude by defining the characteristic polynomial

of a matrix (and of a linear map) and by proving the celebrated Cayley-Hamilton theorem

which states that every matrix is a “zero” of its characteristic polynomial (we give two proofs;

one computational, the other one more conceptual).

Determinants can be defined in several ways. For example, determinants can be defined

in a fancy way in terms of the exterior algebra (or alternating algebra) of a vector space.

We will follow a more algorithmic approach due to Emil Artin. No matter which approach

is followed, we need a few preliminaries about permutations on a finite set. We need to

show that every permutation on n elements is a product of transpositions, and that the

parity of the number of transpositions involved is an invariant of the permutation. Let

[n] = {1, 2 . . . , n}, where n ∈ N, and n > 0.

Definition 5.1. A permutation on n elements is a bijection π : [n] → [n]. When n = 1, the

only function from [1] to [1] is the constant map: 1 → 1. Thus, we will assume that n ≥ 2.

A transposition is a permutation τ : [n] → [n] such that, for some i < j (with 1 ≤ i < j ≤ n),

τ (i) = j, τ (j) = i, and τ (k) = k, for all k ∈ [n] − {i, j}. In other words, a transposition

exchanges two distinct elements i, j ∈ [n]. A cyclic permutation of order k (or k-cycle) is a

permutation σ : [n] → [n] such that, for some i1, i2, . . . , ik, with 1 ≤ i1 < i2 < . . . < ik ≤ n,

and k ≥ 2,

σ(i1) = i2, . . . , σ(ik−1) = ik, σ(ik) = i1,

123

124

CHAPTER 5. DETERMINANTS

and σ(j) = j, for j ∈ [n] − {i1, . . . , ik}. The set {i1, . . . , ik} is called the domain of the cyclic

permutation, and the cyclic permutation is sometimes denoted by (i1, i2, . . . , ik).

If τ is a transposition, clearly, τ ◦ τ = id. Also, a cyclic permutation of order 2 is a

transposition, and for a cyclic permutation σ of order k, we have σk = id. Clearly, the

composition of two permutations is a permutation and every permutation has an inverse

which is also a permutation. Therefore, the set of permutations on [n] is a group often

denoted Sn. It is easy to show by induction that the group Sn has n! elements. We will

also use the terminology product of permutations (or transpositions), as a synonym for

composition of permutations.

The following proposition shows the importance of cyclic permutations and transposi-

tions.

Proposition 5.1. For every n ≥ 2, for every permutation π : [n] → [n], there is a partition

of [n] into r subsets, with 1 ≤ r ≤ n, where each set J in this partition is either a singleton

{i}, or it is of the form

J = {i, π(i), π2(i), . . . , πri−1(i)},

where ri is the smallest integer, such that, πri(i) = i and 2 ≤ ri ≤ n. If π is not the

identity, then it can be written in a unique way as a composition π = σ1 ◦ . . . ◦ σs of cyclic

permutations (where 1 ≤ s ≤ r). Every permutation π : [n] → [n] can be written as a

nonempty composition of transpositions.

Proof. Consider the relation Rπ defined on [n] as follows: iRπj iff there is some k ≥ 1 such

that j = πk(i). We claim that Rπ is an equivalence relation. Transitivity is obvious. We

claim that for every i ∈ [n], there is some least r (1 ≤ r ≤ n) such that πr(i) = i. Indeed,

consider the following sequence of n + 1 elements:

i, π(i), π2(i), . . . , πn(i) .

Since [n] only has n distinct elements, there are some h, k with 0 ≤ h < k ≤ n such that

πh(i) = πk(i),

and since π is a bijection, this implies πk−h(i) = i, where 0 ≤ k − h ≤ n. Thus, Rπ is

reflexive. It is symmetric, since if j = πk(i), letting r be the least r ≥ 1 such that πr(i) = i,

then

i = πkr(i) = πk(r−1)(πk(i)) = πk(r−1)(j).

Now, for every i ∈ [n], the equivalence class of i is a subset of [n], either the singleton {i} or

a set of the form

J = {i, π(i), π2(i), . . . , πri−1(i)},

where ri is the smallest integer such that πri(i) = i and 2 ≤ ri ≤ n, and in the second case,

the restriction of π to J induces a cyclic permutation σi, and π = σ1 ◦ . . . ◦ σs, where s is the

number of equivalence classes having at least two elements.

5.1. PERMUTATIONS, SIGNATURE OF A PERMUTATION

125

For the second part of the proposition, we proceed by induction on n. If n = 2, there are

exactly two permutations on [2], the transposition τ exchanging 1 and 2, and the identity.

However, id2 = τ 2. Now, let n ≥ 3. If π(n) = n, since by the induction hypothesis, the

restriction of π to [n − 1] can be written as a product of transpositions, π itself can be

written as a product of transpositions. If π(n) = k = n, letting τ be the transposition such

that τ (n) = k and τ (k) = n, it is clear that τ ◦ π leaves n invariant, and by the induction

hypothesis, we have τ ◦ π = τm ◦ . . . ◦ τ1 for some transpositions, and thus

π = τ ◦ τm ◦ . . . ◦ τ1,

a product of transpositions (since τ ◦ τ = idn).

Remark: When π = idn is the identity permutation, we can agree that the composition of

0 transpositions is the identity. The second part of Proposition 5.1 shows that the transpo-

sitions generate the group of permutations Sn.

In writing a permutation π as a composition π = σ1 ◦ . . . ◦ σs of cyclic permutations, it

is clear that the order of the σi does not matter, since their domains are disjoint. Given

a permutation written as a product of transpositions, we now show that the parity of the

number of transpositions is an invariant.

Definition 5.2. For every n ≥ 2, since every permutation π : [n] → [n] defines a partition

of r subsets over which π acts either as the identity or as a cyclic permutation, let (π),

called the signature of π, be defined by (π) = (−1)n−r, where r is the number of sets in the

partition.

If τ is a transposition exchanging i and j, it is clear that the partition associated with

τ consists of n − 1 equivalence classes, the set {i, j}, and the n − 2 singleton sets {k}, for

k ∈ [n] − {i, j}, and thus, (τ) = (−1)n−(n−1) = (−1)1 = −1.

Proposition 5.2. For every n ≥ 2, for every permutation π : [n] → [n], for every transpo-

sition τ , we have

(τ ◦ π) = − (π).

Consequently, for every product of transpositions such that π = τm ◦ . . . ◦ τ1, we have

(π) = (−1)m,

which shows that the parity of the number of transpositions is an invariant.

Proof. Assume that τ (i) = j and τ (j) = i, where i < j. There are two cases, depending

whether i and j are in the same equivalence class Jl of Rπ, or if they are in distinct equivalence

classes. If i and j are in the same class Jl, then if

Jl = {i1, . . . , ip, . . . iq, . . . ik},

126

CHAPTER 5. DETERMINANTS

where ip = i and iq = j, since

τ (π(π−1(ip))) = τ (ip) = τ (i) = j = iq

and

τ (π(iq−1)) = τ(iq) = τ(j) = i = ip,

it is clear that Jl splits into two subsets, one of which is {ip, . . . , iq−1}, and thus, the number

of classes associated with τ ◦ π is r + 1, and (τ ◦ π) = (−1)n−r−1 = −(−1)n−r = − (π). If i

and j are in distinct equivalence classes Jl and Jm, say

{i1, . . . , ip, . . . ih}

and

{j1, . . . , jq, . . . jk},

where ip = i and jq = j, since

τ (π(π−1(ip))) = τ (ip) = τ (i) = j = jq

and

τ (π(π−1(jq))) = τ (jq) = τ (j) = i = ip,

we see that the classes Jl and Jm merge into a single class, and thus, the number of classes

associated with τ ◦ π is r − 1, and (τ ◦ π) = (−1)n−r+1 = −(−1)n−r = − (π).

Now, let π = τm ◦ . . . ◦ τ1 be any product of transpositions. By the first part of the

proposition, we have

(π) = (−1)m−1 (τ1) = (−1)m−1(−1) = (−1)m,

since (τ1) = −1 for a transposition.

Remark: When π = idn is the identity permutation, since we agreed that the composition

of 0 transpositions is the identity, it it still correct that (−1)0 = (id) = +1. From the

proposition, it is immediate that (π ◦ π) = (π ) (π). In particular, since π−1 ◦ π = idn, we

get (π−1) = (π).

We can now proceed with the definition of determinants.

5.2

Alternating Multilinear Maps

First, we define multilinear maps, symmetric multilinear maps, and alternating multilinear

maps.

Remark: Most of the definitions and results presented in this section also hold when K is

a commutative ring, and when we consider modules over K (free modules, when bases are

needed).

Let E1, . . . , En, and F , be vector spaces over a field K, where n ≥ 1.

5.2. ALTERNATING MULTILINEAR MAPS

127

Definition 5.3. A function f : E1 × . . . × En → F is a multilinear map (or an n-linear

map) if it is linear in each argument, holding the others fixed. More explicitly, for every i,

1 ≤ i ≤ n, for all x1 ∈ E1 . . ., xi−1 ∈ Ei−1, xi+1 ∈ Ei+1, . . ., xn ∈ En, for all x, y ∈ Ei, for all

λ ∈ K,

f (x1, . . . , xi−1, x + y, xi+1, . . . , xn) = f(x1, . . . , xi−1, x, xi+1, . . . , xn)

+ f (x1, . . . , xi−1, y, xi+1, . . . , xn),

f (x1, . . . , xi−1, λx, xi+1, . . . , xn) = λf(x1, . . . , xi−1, x, xi+1, . . . , xn).

When F = K, we call f an n-linear form (or multilinear form). If n ≥ 2 and E1 =

E2 = . . . = En, an n-linear map f : E × . . . × E → F is called symmetric, if f(x1, . . . , xn) =

f (xπ(1), . . . , xπ(n)), for every permutation π on {1, . . . , n}. An n-linear map f : E ×. . .×E →

F is called alternating, if f (x1, . . . , xn) = 0 whenever xi = xi+1, for some i, 1 ≤ i ≤ n − 1 (in

other words, when two adjacent arguments are equal). It does not harm to agree that when

n = 1, a linear map is considered to be both symmetric and alternating, and we will do so.

When n = 2, a 2-linear map f : E1 × E2 → F is called a bilinear map. We have already

seen several examples of bilinear maps. Multiplication ·: K × K → K is a bilinear map,

treating K as a vector space over itself. More generally, multiplication ·: A × A → A in a

ring A is a bilinear map, viewing A as a module over itself.

The operation −, − : E∗ × E → K applying a linear form to a vector is a bilinear map.

Symmetric bilinear maps (and multilinear maps) play an important role in geometry

(inner products, quadratic forms), and in differential calculus (partial derivatives).

A bilinear map is symmetric if f (u, v) = f (v, u), for all u, v ∈ E.

Alternating multilinear maps satisfy the following simple but crucial properties.

Proposition 5.3. Let f : E × . . . × E → F be an n-linear alternating map, with n ≥ 2. The

following properties hold:

(1)

f (. . . , xi, xi+1, . . .) = −f(. . . , xi+1, xi, . . .)

(2)

f (. . . , xi, . . . , xj, . . .) = 0,

where xi = xj, and 1 ≤ i < j ≤ n.

(3)

f (. . . , xi, . . . , xj, . . .) = −f(. . . , xj, . . . , xi, . . .),

where 1 ≤ i < j ≤ n.

128

CHAPTER 5. DETERMINANTS

(4)

f (. . . , xi, . . .) = f (. . . , xi + λxj, . . .),

for any λ ∈ K, and where i = j.

Proof. (1) By multilinearity applied twice, we have

f (. . . , xi + xi+1, xi + xi+1, . . .) = f (. . . , xi, xi, . . .) + f (. . . , xi, xi+1, . . .)

+ f (. . . , xi+1, xi, . . .) + f (. . . , xi+1, xi+1, . . .),

and since f is alternating, this yields

0 = f (. . . , xi, xi+1, . . .) + f (. . . , xi+1, xi, . . .),

that is, f (. . . , xi, xi+1, . . .) = −f(. . . , xi+1, xi, . . .).

(2) If xi = xj and i and j are not adjacent, we can interchange xi and xi+1, and then xi

and xi+2, etc, until xi and xj become adjacent. By (1),

f (. . . , xi, . . . , xj, . . .) = f (. . . , xi, xj, . . .),

where = +1 or −1, but f(. . . , xi, xj, . . .) = 0, since xi = xj, and (2) holds.

(3) follows from (2) as in (1). (4) is an immediate consequence of (2).

Proposition 5.3 will now be used to show a fundamental property of alternating multilin-

ear maps. First, we need to extend the matrix notation a little bit. Let E be a vector space

over K. Given an n × n matrix A = (ai j) over K, we can define a map L(A): En → En as

follows:

L(A)1(u) = a1 1u1 + · · · + a1 nun,

. . .

L(A)n(u) = an 1u1 + · · · + an nun,

for all u1, . . . , un ∈ E, with u = (u1, . . . , un). It is immediately verified that L(A) is linear.

Then, given two n × n matrice A = (ai j) and B = (bi j), by repeating the calculations

establishing the product of matrices (just before Definition 3.1), we can show that

L(AB) = L(A) ◦ L(B).

It is then convenient to use the matrix notation to describe the effect of the linear map L(A),

as

L(A)

 

1(u)

a1 1 a1 2 . . . a1 n

u1

L(A)2(u)

a2 1

a2 2 . . . a2 n u2

.

 =  .

.

.

.   .  .

..

..

..

. .

..   .. 

 

L(A)n(u)

an 1 an 2 . . . an n

un

5.2. ALTERNATING MULTILINEAR MAPS

129

Lemma 5.4. Let f : E × . . . × E → F be an n-linear alternating map. Let (u1, . . . , un) and

(v1, . . . , vn) be two families of n vectors, such that,

v1 = a1 1u1 + · · · + an 1un,

. . .

vn = a1 nu1 + · · · + an nun.

Equivalently, letting

a

1 1

a1 2 . . . a1 n

a2 1

a2 2 . . . a2 n

A =  .

.

.

. 

..

..

. .

.. 

an 1 an 2 . . . an n

assume that we have

v 

1

u1

v2 

u2 

 .  = A  .  .

 .. 

 .. 

vn

un

Then,

f (v1, . . . , vn) =

(π)aπ(1) 1 · · · aπ(n) n f(u1, . . . , un),

π∈Sn

where the sum ranges over all permutations π on {1, . . . , n}.

Proof. Expanding f (v1, . . . , vn) by multilinearity, we get a sum of terms of the form

aπ(1) 1 · · · aπ(n) nf(uπ(1), . . . , uπ(n)),

for all possible functions π : {1, . . . , n} → {1, . . . , n}. However, because f is alternating, only

the terms for which π is a permutation are nonzero. By Proposition 5.1, every permutation

π is a product of transpositions, and by Proposition 5.2, the parity (π) of the number of

transpositions only depends on π. Then, applying Proposition 5.3 (3) to each transposition

in π, we get

aπ(1) 1 · · · aπ(n) nf(uπ(1), . . . , uπ(n)) = (π)aπ(1) 1 · · · aπ(n) nf(u1, . . . , un).

Thus, we get the expression of the lemma.

The quantity

det(A) =

(π)aπ(1) 1 · · · aπ(n) n

π∈Sn

is in fact the value of the determinant of A (which, as we shall see shortly, is also equal to the

determinant of A ). However, working directly with the above definition is quite ackward,

and we will proceed via a slightly indirect route

130

CHAPTER 5. DETERMINANTS

5.3

Definition of a Determinant

Recall that the set of all square n × n-matrices with coefficients in a field K is denoted by

Mn(K).

Definition 5.4. A determinant is defined as any map

D : Mn(K) → K,

which, when viewed as a map on (Kn)n, i.e., a map of the n columns of a matrix, is n-linear

alternating and such that D(In) = 1 for the identity matrix In. Equivalently, we can consider

a vector space E of dimension n, some fixed basis (e1, . . . , en), and define

D : En → K

as an n-linear alternating map such that D(e1, . . . , en) = 1.

First, we will show that such maps D exist, using an inductive definition that also gives

a recursive method for computing determinants. Actually, we will define a family (Dn)n≥1

of (finite) sets of maps D : Mn(K) → K. Second, we will show that determinants are in fact

uniquely defined, that is, we will show that each Dn consists of a single map. This will show

the equivalence of the direct definition det(A) of Lemma 5.4 with the inductive definition

D(A). Finally, we will prove some basic properties of determinants, using the uniqueness

theorem.

Given a matrix A ∈ Mn(K), we denote its n columns by A1, . . . , An.

Definition 5.5. For every n ≥ 1, we define a finite set Dn of maps D : Mn(K) → K

inductively as follows:

When n = 1, D1 consists of the single map D such that, D(A) = a, where A = (a), with

a ∈ K.

Assume that Dn−1 has been defined, where n ≥ 2. We define the set Dn as follows. For

every matrix A ∈ Mn(K), let Ai j be the (n − 1) × (n − 1)-matrix obtained from A = (ai j)

by deleting row i and column j. Then, Dn consists of all the maps D such that, for some i,

1 ≤ i ≤ n,

D(A) = (−1)i+1ai 1D(Ai 1) + · · · + (−1)i+nai nD(Ai n),

where for every j, 1 ≤ j ≤ n, D(Ai j) is the result of applying any D in Dn−1 to Ai j.

We confess that the use of the same letter D for the member of Dn being defined, and

for members of Dn−1, may be slightly confusing. We considered using subscripts to

distinguish, but this seems to complicate things unnecessarily. One should not worry too

much anyway, since it will turn out that each Dn contains just one map.

5.3. DEFINITION OF A DETERMINANT

131

Each (−1)i+jD(Ai j) is called the cofactor of ai j, and the inductive expression for D(A)

is called a Laplace expansion of D according to the i-th row . Given a matrix A ∈ Mn(K),

each D(A) is called a determinant of A.

We can think of each member of Dn as an algorithm to evaluate “the” determinant of A.

The main point is that these algorithms, which recursively evaluate a determinant using all

possible Laplace row expansions, all yield the same result, det(A).

We will prove shortly that D(A) is uniquely defined (at the moment, it is not clear that

Dn consists of a single map). Assuming this fact, given a n × n-matrix A = (ai j),

a

1 1

a1 2 . . . a1 n

a2 1

a2 2 . . . a2 n

A =  .

.

.

. 

..

..

. .

.. 

an 1 an 2 . . . an n

its determinant is denoted by D(A) or det(A), or more explicitly by

a1 1 a1 2 . . . a1 n

a2 1 a2 2 . . . a2 n

det(A) =

..

.

.

.

.

..

. .

..

an 1 an 2 . . . an n

First, let us first consider some examples.

Example 5.1.

1. When n = 2, if

a b

A =

c d

expanding according to any row, we have

D(A) = ad − bc.

2. When n = 3, if

a

1 1

a1 2 a1 3

A =

a

2 1

a2 2 a2 3

a3 1 a3 2 a3 3

expanding according to the first row, we have

a

a

a

D(A) = a

2 2

a2 3

2 1

a2 3

2 1

a2 2

1 1

− a

+ a

a

1 2

1 3

3 2

a3 3

a3 1 a3 3

a3 1 a3 2

132

CHAPTER 5. DETERMINANTS

that is,

D(A) = a1 1(a2 2a3 3 − a3 2a2 3) − a1 2(a2 1a3 3 − a3 1a2 3) + a1 3(a2 1a3 2 − a3 1a2 2),

which gives the explicit formula

D(A) = a1 1a2 2a3 3 + a2 1a3 2a1 3 + a3 1a1 2a2 3 − a1 1a3 2a2 3 − a2 1a1 2a3 3 − a3 1a2 2a1 3.

We now show that each D ∈ Dn is a determinant (map).

Lemma 5.5. For every n ≥ 1, for every D ∈ Dn as defined in Definition 5.5, D is an

alternating multilinear map such that D(In) = 1.

Proof. By induction on n, it is obvious that D(In) = 1. Let us now prove that D is

multilinear. Let us show that D is linear in each column. Consider any column k. Since

D(A) = (−1)i+1ai 1D(Ai 1) + · · · + (−1)i+jai jD(Ai j) + · · · + (−1)i+nai nD(Ai n),

if j = k, then by induction, D(Ai j) is linear in column k, and ai j does not belong to column

k, so (−1)i+jai jD(Ai j) is linear in column k. If j = k, then D(Ai j) does not depend on

column k = j, since Ai j is obtained from A by deleting row i and column j = k, and ai j

belongs to column j = k. Thus, (−1)i+jai jD(Ai j) is linear in column k. Consequently, in

all cases, (−1)i+jai jD(Ai j) is linear in column k, and thus, D(A) is linear in column k.

Let us now prove that D is alternating. Assume that two adjacent rows of A are equal,

say Ak = Ak+1. First, let j = k and j = k + 1. Then, the matrix Ai j has two identical

adjacent columns, and by the induction hypothesis, D(Ai j) = 0. The remaining terms of

D(A) are

(−1)i+kai kD(Ai k) + (−1)i+k+1ai k+1D(Ai k+1).

However, the two matrices Ai k and Ai k+1 are equal, since we are assuming that columns k

and k + 1 of A are identical, and since Ai k is obtained from A by deleting row i and column

k, and Ai k+1 is obtained from A by deleting row i and column k + 1. Similarly, ai k = ai k+1,

since columns k and k + 1 of A are equal. But then,

(−1)i+kai kD(Ai k) + (−1)i+k+1ai k+1D(Ai k+1) = (−1)i+kai kD(Ai k) − (−1)i+kai kD(Ai k) = 0.

This shows that D is alternating, and completes the proof.

Lemma 5.5 shows the existence of determinants. We now prove their uniqueness.

Theorem 5.6. For every n ≥ 1, for every D ∈ Dn, for every matrix A ∈ Mn(K), we have

D(A) =

(π)aπ(1) 1 · · · aπ(n) n,

π∈Sn

where the sum ranges over all permutations π on {1, . . . , n}. As a consequence, Dn consists

of a single map for every n ≥ 1, and this map is given by the above explicit formula.

5.3. DEFINITION OF A DETERMINANT

133

Proof. Consider the standard basis (e1, . . . , en) of Kn, where (ei)i = 1 and (ei)j = 0, for

j = i. Then, each column Aj of A corresponds to a vector vj whose coordinates over the

basis (e1, . . . , en) are the components of Aj, that is, we can write

v1 = a1 1e1 + · · · + an 1en,

. . .

vn = a1 ne1 + · · · + an nen.

Since by Lemma 5.5, each D is a multilinear alternating map, by applying Lemma 5.4, we

get

D(A) = D(v1, . . . , vn) =