5.1
Permutations, Signature of a Permutation
This chapter contains a review of determinants and their use in linear algebra. We begin
with permutations and the signature of a permutation. Next, we define multilinear maps
and alternating multilinear maps. Determinants are introduced as alternating multilinear
maps taking the value 1 on the unit matrix (following Emil Artin). It is then shown how
to compute a determinant using the Laplace expansion formula, and the connection with
the usual definition is made. It is shown how determinants can be used to invert matrices
and to solve (at least in theory!) systems of linear equations (the Cramer formulae). The
determinant of a linear map is defined. We conclude by defining the characteristic polynomial
of a matrix (and of a linear map) and by proving the celebrated Cayley-Hamilton theorem
which states that every matrix is a “zero” of its characteristic polynomial (we give two proofs;
one computational, the other one more conceptual).
Determinants can be defined in several ways. For example, determinants can be defined
in a fancy way in terms of the exterior algebra (or alternating algebra) of a vector space.
We will follow a more algorithmic approach due to Emil Artin. No matter which approach
is followed, we need a few preliminaries about permutations on a finite set. We need to
show that every permutation on n elements is a product of transpositions, and that the
parity of the number of transpositions involved is an invariant of the permutation. Let
[n] = {1, 2 . . . , n}, where n ∈ N, and n > 0.
Definition 5.1. A permutation on n elements is a bijection π : [n] → [n]. When n = 1, the
only function from [1] to [1] is the constant map: 1 → 1. Thus, we will assume that n ≥ 2.
A transposition is a permutation τ : [n] → [n] such that, for some i < j (with 1 ≤ i < j ≤ n),
τ (i) = j, τ (j) = i, and τ (k) = k, for all k ∈ [n] − {i, j}. In other words, a transposition
exchanges two distinct elements i, j ∈ [n]. A cyclic permutation of order k (or k-cycle) is a
permutation σ : [n] → [n] such that, for some i1, i2, . . . , ik, with 1 ≤ i1 < i2 < . . . < ik ≤ n,
and k ≥ 2,
σ(i1) = i2, . . . , σ(ik−1) = ik, σ(ik) = i1,
123
124
CHAPTER 5. DETERMINANTS
and σ(j) = j, for j ∈ [n] − {i1, . . . , ik}. The set {i1, . . . , ik} is called the domain of the cyclic
permutation, and the cyclic permutation is sometimes denoted by (i1, i2, . . . , ik).
If τ is a transposition, clearly, τ ◦ τ = id. Also, a cyclic permutation of order 2 is a
transposition, and for a cyclic permutation σ of order k, we have σk = id. Clearly, the
composition of two permutations is a permutation and every permutation has an inverse
which is also a permutation. Therefore, the set of permutations on [n] is a group often
denoted Sn. It is easy to show by induction that the group Sn has n! elements. We will
also use the terminology product of permutations (or transpositions), as a synonym for
composition of permutations.
The following proposition shows the importance of cyclic permutations and transposi-
tions.
Proposition 5.1. For every n ≥ 2, for every permutation π : [n] → [n], there is a partition
of [n] into r subsets, with 1 ≤ r ≤ n, where each set J in this partition is either a singleton
{i}, or it is of the form
J = {i, π(i), π2(i), . . . , πri−1(i)},
where ri is the smallest integer, such that, πri(i) = i and 2 ≤ ri ≤ n. If π is not the
identity, then it can be written in a unique way as a composition π = σ1 ◦ . . . ◦ σs of cyclic
permutations (where 1 ≤ s ≤ r). Every permutation π : [n] → [n] can be written as a
nonempty composition of transpositions.
Proof. Consider the relation Rπ defined on [n] as follows: iRπj iff there is some k ≥ 1 such
that j = πk(i). We claim that Rπ is an equivalence relation. Transitivity is obvious. We
claim that for every i ∈ [n], there is some least r (1 ≤ r ≤ n) such that πr(i) = i. Indeed,
consider the following sequence of n + 1 elements:
i, π(i), π2(i), . . . , πn(i) .
Since [n] only has n distinct elements, there are some h, k with 0 ≤ h < k ≤ n such that
πh(i) = πk(i),
and since π is a bijection, this implies πk−h(i) = i, where 0 ≤ k − h ≤ n. Thus, Rπ is
reflexive. It is symmetric, since if j = πk(i), letting r be the least r ≥ 1 such that πr(i) = i,
then
i = πkr(i) = πk(r−1)(πk(i)) = πk(r−1)(j).
Now, for every i ∈ [n], the equivalence class of i is a subset of [n], either the singleton {i} or
a set of the form
J = {i, π(i), π2(i), . . . , πri−1(i)},
where ri is the smallest integer such that πri(i) = i and 2 ≤ ri ≤ n, and in the second case,
the restriction of π to J induces a cyclic permutation σi, and π = σ1 ◦ . . . ◦ σs, where s is the
number of equivalence classes having at least two elements.
5.1. PERMUTATIONS, SIGNATURE OF A PERMUTATION
125
For the second part of the proposition, we proceed by induction on n. If n = 2, there are
exactly two permutations on [2], the transposition τ exchanging 1 and 2, and the identity.
However, id2 = τ 2. Now, let n ≥ 3. If π(n) = n, since by the induction hypothesis, the
restriction of π to [n − 1] can be written as a product of transpositions, π itself can be
written as a product of transpositions. If π(n) = k = n, letting τ be the transposition such
that τ (n) = k and τ (k) = n, it is clear that τ ◦ π leaves n invariant, and by the induction
hypothesis, we have τ ◦ π = τm ◦ . . . ◦ τ1 for some transpositions, and thus
π = τ ◦ τm ◦ . . . ◦ τ1,
a product of transpositions (since τ ◦ τ = idn).
Remark: When π = idn is the identity permutation, we can agree that the composition of
0 transpositions is the identity. The second part of Proposition 5.1 shows that the transpo-
sitions generate the group of permutations Sn.
In writing a permutation π as a composition π = σ1 ◦ . . . ◦ σs of cyclic permutations, it
is clear that the order of the σi does not matter, since their domains are disjoint. Given
a permutation written as a product of transpositions, we now show that the parity of the
number of transpositions is an invariant.
Definition 5.2. For every n ≥ 2, since every permutation π : [n] → [n] defines a partition
of r subsets over which π acts either as the identity or as a cyclic permutation, let (π),
called the signature of π, be defined by (π) = (−1)n−r, where r is the number of sets in the
partition.
If τ is a transposition exchanging i and j, it is clear that the partition associated with
τ consists of n − 1 equivalence classes, the set {i, j}, and the n − 2 singleton sets {k}, for
k ∈ [n] − {i, j}, and thus, (τ) = (−1)n−(n−1) = (−1)1 = −1.
Proposition 5.2. For every n ≥ 2, for every permutation π : [n] → [n], for every transpo-
sition τ , we have
(τ ◦ π) = − (π).
Consequently, for every product of transpositions such that π = τm ◦ . . . ◦ τ1, we have
(π) = (−1)m,
which shows that the parity of the number of transpositions is an invariant.
Proof. Assume that τ (i) = j and τ (j) = i, where i < j. There are two cases, depending
whether i and j are in the same equivalence class Jl of Rπ, or if they are in distinct equivalence
classes. If i and j are in the same class Jl, then if
Jl = {i1, . . . , ip, . . . iq, . . . ik},
126
CHAPTER 5. DETERMINANTS
where ip = i and iq = j, since
τ (π(π−1(ip))) = τ (ip) = τ (i) = j = iq
and
τ (π(iq−1)) = τ(iq) = τ(j) = i = ip,
it is clear that Jl splits into two subsets, one of which is {ip, . . . , iq−1}, and thus, the number
of classes associated with τ ◦ π is r + 1, and (τ ◦ π) = (−1)n−r−1 = −(−1)n−r = − (π). If i
and j are in distinct equivalence classes Jl and Jm, say
{i1, . . . , ip, . . . ih}
and
{j1, . . . , jq, . . . jk},
where ip = i and jq = j, since
τ (π(π−1(ip))) = τ (ip) = τ (i) = j = jq
and
τ (π(π−1(jq))) = τ (jq) = τ (j) = i = ip,
we see that the classes Jl and Jm merge into a single class, and thus, the number of classes
associated with τ ◦ π is r − 1, and (τ ◦ π) = (−1)n−r+1 = −(−1)n−r = − (π).
Now, let π = τm ◦ . . . ◦ τ1 be any product of transpositions. By the first part of the
proposition, we have
(π) = (−1)m−1 (τ1) = (−1)m−1(−1) = (−1)m,
since (τ1) = −1 for a transposition.
Remark: When π = idn is the identity permutation, since we agreed that the composition
of 0 transpositions is the identity, it it still correct that (−1)0 = (id) = +1. From the
proposition, it is immediate that (π ◦ π) = (π ) (π). In particular, since π−1 ◦ π = idn, we
get (π−1) = (π).
We can now proceed with the definition of determinants.
5.2
Alternating Multilinear Maps
First, we define multilinear maps, symmetric multilinear maps, and alternating multilinear
maps.
Remark: Most of the definitions and results presented in this section also hold when K is
a commutative ring, and when we consider modules over K (free modules, when bases are
needed).
Let E1, . . . , En, and F , be vector spaces over a field K, where n ≥ 1.
5.2. ALTERNATING MULTILINEAR MAPS
127
Definition 5.3. A function f : E1 × . . . × En → F is a multilinear map (or an n-linear
map) if it is linear in each argument, holding the others fixed. More explicitly, for every i,
1 ≤ i ≤ n, for all x1 ∈ E1 . . ., xi−1 ∈ Ei−1, xi+1 ∈ Ei+1, . . ., xn ∈ En, for all x, y ∈ Ei, for all
λ ∈ K,
f (x1, . . . , xi−1, x + y, xi+1, . . . , xn) = f(x1, . . . , xi−1, x, xi+1, . . . , xn)
+ f (x1, . . . , xi−1, y, xi+1, . . . , xn),
f (x1, . . . , xi−1, λx, xi+1, . . . , xn) = λf(x1, . . . , xi−1, x, xi+1, . . . , xn).
When F = K, we call f an n-linear form (or multilinear form). If n ≥ 2 and E1 =
E2 = . . . = En, an n-linear map f : E × . . . × E → F is called symmetric, if f(x1, . . . , xn) =
f (xπ(1), . . . , xπ(n)), for every permutation π on {1, . . . , n}. An n-linear map f : E ×. . .×E →
F is called alternating, if f (x1, . . . , xn) = 0 whenever xi = xi+1, for some i, 1 ≤ i ≤ n − 1 (in
other words, when two adjacent arguments are equal). It does not harm to agree that when
n = 1, a linear map is considered to be both symmetric and alternating, and we will do so.
When n = 2, a 2-linear map f : E1 × E2 → F is called a bilinear map. We have already
seen several examples of bilinear maps. Multiplication ·: K × K → K is a bilinear map,
treating K as a vector space over itself. More generally, multiplication ·: A × A → A in a
ring A is a bilinear map, viewing A as a module over itself.
The operation −, − : E∗ × E → K applying a linear form to a vector is a bilinear map.
Symmetric bilinear maps (and multilinear maps) play an important role in geometry
(inner products, quadratic forms), and in differential calculus (partial derivatives).
A bilinear map is symmetric if f (u, v) = f (v, u), for all u, v ∈ E.
Alternating multilinear maps satisfy the following simple but crucial properties.
Proposition 5.3. Let f : E × . . . × E → F be an n-linear alternating map, with n ≥ 2. The
following properties hold:
(1)
f (. . . , xi, xi+1, . . .) = −f(. . . , xi+1, xi, . . .)
(2)
f (. . . , xi, . . . , xj, . . .) = 0,
where xi = xj, and 1 ≤ i < j ≤ n.
(3)
f (. . . , xi, . . . , xj, . . .) = −f(. . . , xj, . . . , xi, . . .),
where 1 ≤ i < j ≤ n.
128
CHAPTER 5. DETERMINANTS
(4)
f (. . . , xi, . . .) = f (. . . , xi + λxj, . . .),
for any λ ∈ K, and where i = j.
Proof. (1) By multilinearity applied twice, we have
f (. . . , xi + xi+1, xi + xi+1, . . .) = f (. . . , xi, xi, . . .) + f (. . . , xi, xi+1, . . .)
+ f (. . . , xi+1, xi, . . .) + f (. . . , xi+1, xi+1, . . .),
and since f is alternating, this yields
0 = f (. . . , xi, xi+1, . . .) + f (. . . , xi+1, xi, . . .),
that is, f (. . . , xi, xi+1, . . .) = −f(. . . , xi+1, xi, . . .).
(2) If xi = xj and i and j are not adjacent, we can interchange xi and xi+1, and then xi
and xi+2, etc, until xi and xj become adjacent. By (1),
f (. . . , xi, . . . , xj, . . .) = f (. . . , xi, xj, . . .),
where = +1 or −1, but f(. . . , xi, xj, . . .) = 0, since xi = xj, and (2) holds.
(3) follows from (2) as in (1). (4) is an immediate consequence of (2).
Proposition 5.3 will now be used to show a fundamental property of alternating multilin-
ear maps. First, we need to extend the matrix notation a little bit. Let E be a vector space
over K. Given an n × n matrix A = (ai j) over K, we can define a map L(A): En → En as
follows:
L(A)1(u) = a1 1u1 + · · · + a1 nun,
. . .
L(A)n(u) = an 1u1 + · · · + an nun,
for all u1, . . . , un ∈ E, with u = (u1, . . . , un). It is immediately verified that L(A) is linear.
Then, given two n × n matrice A = (ai j) and B = (bi j), by repeating the calculations
establishing the product of matrices (just before Definition 3.1), we can show that
L(AB) = L(A) ◦ L(B).
It is then convenient to use the matrix notation to describe the effect of the linear map L(A),
as
L(A)
1(u)
a1 1 a1 2 . . . a1 n
u1
L(A)2(u)
a2 1
a2 2 . . . a2 n u2
.
= .
.
.
. . .
..
..
..
. .
.. ..
L(A)n(u)
an 1 an 2 . . . an n
un
5.2. ALTERNATING MULTILINEAR MAPS
129
Lemma 5.4. Let f : E × . . . × E → F be an n-linear alternating map. Let (u1, . . . , un) and
(v1, . . . , vn) be two families of n vectors, such that,
v1 = a1 1u1 + · · · + an 1un,
. . .
vn = a1 nu1 + · · · + an nun.
Equivalently, letting
a
1 1
a1 2 . . . a1 n
a2 1
a2 2 . . . a2 n
A = .
.
.
.
..
..
. .
..
an 1 an 2 . . . an n
assume that we have
v
1
u1
v2
u2
. = A . .
..
..
vn
un
Then,
f (v1, . . . , vn) =
(π)aπ(1) 1 · · · aπ(n) n f(u1, . . . , un),
π∈Sn
where the sum ranges over all permutations π on {1, . . . , n}.
Proof. Expanding f (v1, . . . , vn) by multilinearity, we get a sum of terms of the form
aπ(1) 1 · · · aπ(n) nf(uπ(1), . . . , uπ(n)),
for all possible functions π : {1, . . . , n} → {1, . . . , n}. However, because f is alternating, only
the terms for which π is a permutation are nonzero. By Proposition 5.1, every permutation
π is a product of transpositions, and by Proposition 5.2, the parity (π) of the number of
transpositions only depends on π. Then, applying Proposition 5.3 (3) to each transposition
in π, we get
aπ(1) 1 · · · aπ(n) nf(uπ(1), . . . , uπ(n)) = (π)aπ(1) 1 · · · aπ(n) nf(u1, . . . , un).
Thus, we get the expression of the lemma.
The quantity
det(A) =
(π)aπ(1) 1 · · · aπ(n) n
π∈Sn
is in fact the value of the determinant of A (which, as we shall see shortly, is also equal to the
determinant of A ). However, working directly with the above definition is quite ackward,
and we will proceed via a slightly indirect route
130
CHAPTER 5. DETERMINANTS
5.3
Definition of a Determinant
Recall that the set of all square n × n-matrices with coefficients in a field K is denoted by
Mn(K).
Definition 5.4. A determinant is defined as any map
D : Mn(K) → K,
which, when viewed as a map on (Kn)n, i.e., a map of the n columns of a matrix, is n-linear
alternating and such that D(In) = 1 for the identity matrix In. Equivalently, we can consider
a vector space E of dimension n, some fixed basis (e1, . . . , en), and define
D : En → K
as an n-linear alternating map such that D(e1, . . . , en) = 1.
First, we will show that such maps D exist, using an inductive definition that also gives
a recursive method for computing determinants. Actually, we will define a family (Dn)n≥1
of (finite) sets of maps D : Mn(K) → K. Second, we will show that determinants are in fact
uniquely defined, that is, we will show that each Dn consists of a single map. This will show
the equivalence of the direct definition det(A) of Lemma 5.4 with the inductive definition
D(A). Finally, we will prove some basic properties of determinants, using the uniqueness
theorem.
Given a matrix A ∈ Mn(K), we denote its n columns by A1, . . . , An.
Definition 5.5. For every n ≥ 1, we define a finite set Dn of maps D : Mn(K) → K
inductively as follows:
When n = 1, D1 consists of the single map D such that, D(A) = a, where A = (a), with
a ∈ K.
Assume that Dn−1 has been defined, where n ≥ 2. We define the set Dn as follows. For
every matrix A ∈ Mn(K), let Ai j be the (n − 1) × (n − 1)-matrix obtained from A = (ai j)
by deleting row i and column j. Then, Dn consists of all the maps D such that, for some i,
1 ≤ i ≤ n,
D(A) = (−1)i+1ai 1D(Ai 1) + · · · + (−1)i+nai nD(Ai n),
where for every j, 1 ≤ j ≤ n, D(Ai j) is the result of applying any D in Dn−1 to Ai j.
We confess that the use of the same letter D for the member of Dn being defined, and
for members of Dn−1, may be slightly confusing. We considered using subscripts to
distinguish, but this seems to complicate things unnecessarily. One should not worry too
much anyway, since it will turn out that each Dn contains just one map.
5.3. DEFINITION OF A DETERMINANT
131
Each (−1)i+jD(Ai j) is called the cofactor of ai j, and the inductive expression for D(A)
is called a Laplace expansion of D according to the i-th row . Given a matrix A ∈ Mn(K),
each D(A) is called a determinant of A.
We can think of each member of Dn as an algorithm to evaluate “the” determinant of A.
The main point is that these algorithms, which recursively evaluate a determinant using all
possible Laplace row expansions, all yield the same result, det(A).
We will prove shortly that D(A) is uniquely defined (at the moment, it is not clear that
Dn consists of a single map). Assuming this fact, given a n × n-matrix A = (ai j),
a
1 1
a1 2 . . . a1 n
a2 1
a2 2 . . . a2 n
A = .
.
.
.
..
..
. .
..
an 1 an 2 . . . an n
its determinant is denoted by D(A) or det(A), or more explicitly by
a1 1 a1 2 . . . a1 n
a2 1 a2 2 . . . a2 n
det(A) =
..
.
.
.
.
..
. .
..
an 1 an 2 . . . an n
First, let us first consider some examples.
Example 5.1.
1. When n = 2, if
a b
A =
c d
expanding according to any row, we have
D(A) = ad − bc.
2. When n = 3, if
a
1 1
a1 2 a1 3
A =
a
2 1
a2 2 a2 3
a3 1 a3 2 a3 3
expanding according to the first row, we have
a
a
a
D(A) = a
2 2
a2 3
2 1
a2 3
2 1
a2 2
1 1
− a
+ a
a
1 2
1 3
3 2
a3 3
a3 1 a3 3
a3 1 a3 2
132
CHAPTER 5. DETERMINANTS
that is,
D(A) = a1 1(a2 2a3 3 − a3 2a2 3) − a1 2(a2 1a3 3 − a3 1a2 3) + a1 3(a2 1a3 2 − a3 1a2 2),
which gives the explicit formula
D(A) = a1 1a2 2a3 3 + a2 1a3 2a1 3 + a3 1a1 2a2 3 − a1 1a3 2a2 3 − a2 1a1 2a3 3 − a3 1a2 2a1 3.
We now show that each D ∈ Dn is a determinant (map).
Lemma 5.5. For every n ≥ 1, for every D ∈ Dn as defined in Definition 5.5, D is an
alternating multilinear map such that D(In) = 1.
Proof. By induction on n, it is obvious that D(In) = 1. Let us now prove that D is
multilinear. Let us show that D is linear in each column. Consider any column k. Since
D(A) = (−1)i+1ai 1D(Ai 1) + · · · + (−1)i+jai jD(Ai j) + · · · + (−1)i+nai nD(Ai n),
if j = k, then by induction, D(Ai j) is linear in column k, and ai j does not belong to column
k, so (−1)i+jai jD(Ai j) is linear in column k. If j = k, then D(Ai j) does not depend on
column k = j, since Ai j is obtained from A by deleting row i and column j = k, and ai j
belongs to column j = k. Thus, (−1)i+jai jD(Ai j) is linear in column k. Consequently, in
all cases, (−1)i+jai jD(Ai j) is linear in column k, and thus, D(A) is linear in column k.
Let us now prove that D is alternating. Assume that two adjacent rows of A are equal,
say Ak = Ak+1. First, let j = k and j = k + 1. Then, the matrix Ai j has two identical
adjacent columns, and by the induction hypothesis, D(Ai j) = 0. The remaining terms of
D(A) are
(−1)i+kai kD(Ai k) + (−1)i+k+1ai k+1D(Ai k+1).
However, the two matrices Ai k and Ai k+1 are equal, since we are assuming that columns k
and k + 1 of A are identical, and since Ai k is obtained from A by deleting row i and column
k, and Ai k+1 is obtained from A by deleting row i and column k + 1. Similarly, ai k = ai k+1,
since columns k and k + 1 of A are equal. But then,
(−1)i+kai kD(Ai k) + (−1)i+k+1ai k+1D(Ai k+1) = (−1)i+kai kD(Ai k) − (−1)i+kai kD(Ai k) = 0.
This shows that D is alternating, and completes the proof.
Lemma 5.5 shows the existence of determinants. We now prove their uniqueness.
Theorem 5.6. For every n ≥ 1, for every D ∈ Dn, for every matrix A ∈ Mn(K), we have
D(A) =
(π)aπ(1) 1 · · · aπ(n) n,
π∈Sn
where the sum ranges over all permutations π on {1, . . . , n}. As a consequence, Dn consists
of a single map for every n ≥ 1, and this map is given by the above explicit formula.
5.3. DEFINITION OF A DETERMINANT
133
Proof. Consider the standard basis (e1, . . . , en) of Kn, where (ei)i = 1 and (ei)j = 0, for
j = i. Then, each column Aj of A corresponds to a vector vj whose coordinates over the
basis (e1, . . . , en) are the components of Aj, that is, we can write
v1 = a1 1e1 + · · · + an 1en,
. . .
vn = a1 ne1 + · · · + an nen.
Since by Lemma 5.5, each D is a multilinear alternating map, by applying Lemma 5.4, we
get
D(A) = D(v1, . . . , vn) =