3.1
Matrices
Proposition 2.13 shows that given two vector spaces E and F and a basis (uj)j∈J of E,
every linear map f : E → F is uniquely determined by the family (f(uj))j∈J of the images
under f of the vectors in the basis (uj)j∈J. Thus, in particular, taking F = K(J), we get an
isomorphism between any vector space E of dimension |J| and K(J). If J = {1, . . . , n}, a
vector space E of dimension n is isomorphic to the vector space Kn. If we also have a basis
(vi)i∈I of F , then every vector f(uj) can be written in a unique way as
f (uj) =
ai jvi,
i∈I
where j ∈ J, for a family of scalars (ai j)i∈I. Thus, with respect to the two bases (uj)j∈J
of E and (vi)i∈I of F , the linear map f is completely determined by a possibly infinite
“I × J-matrix” M(f) = (ai j)i∈I, j∈J.
Remark: Note that we intentionally assigned the index set J to the basis (uj)j∈J of E,
and the index I to the basis (vi)i∈I of F , so that the rows of the matrix M(f) associated
with f : E → F are indexed by I, and the columns of the matrix M(f) are indexed by J.
Obviously, this causes a mildly unpleasant reversal. If we had considered the bases (ui)i∈I of
E and (vj)j∈J of F , we would obtain a J × I-matrix M(f) = (aj i)j∈J, i∈I. No matter what
we do, there will be a reversal! We decided to stick to the bases (uj)j∈J of E and (vi)i∈I of
F , so that we get an I × J-matrix M(f), knowing that we may occasionally suffer from this
decision!
When I and J are finite, and say, when |I| = m and |J| = n, the linear map f is
determined by the matrix M (f ) whose entries in the j-th column are the components of the
45
46
CHAPTER 3. MATRICES AND LINEAR MAPS
vector f (uj) over the basis (v1, . . . , vm), that is, the matrix
a
1 1
a1 2 . . . a1 n
a2 1
a2 2 . . . a2 n
M (f ) = .
.
.
.
..
..
. .
..
am 1 am 2 . . . am n
whose entry on row i and column j is ai j (1 ≤ i ≤ m, 1 ≤ j ≤ n).
We will now show that when E and F have finite dimension, linear maps can be very
conveniently represented by matrices, and that composition of linear maps corresponds to
matrix multiplication. We will follow rather closely an elegant presentation method due to
Emil Artin.
Let E and F be two vector spaces, and assume that E has a finite basis (u1, . . . , un) and
that F has a finite basis (v1, . . . , vm). Recall that we have shown that every vector x ∈ E
can be written in a unique way as
x = x1u1 + · · · + xnun,
and similarly every vector y ∈ F can be written in a unique way as
y = y1v1 + · · · + ymvm.
Let f : E → F be a linear map between E and F . Then, for every x = x1u1 + · · · + xnun in
E, by linearity, we have
f (x) = x1f (u1) + · · · + xnf(un).
Let
f (uj) = a1 jv1 + · · · + am jvm,
or more concisely,
m
f (uj) =
ai jvi,
i=1
for every j, 1 ≤ j ≤ n. This can be expressed by writing the coefficients a1j, a2j, . . . , amj of
f (uj) over the basis (v1, . . . , vm), as the jth column of a matrix, as shown below:
f (u1) f (u2) . . . f(un)
v
1
a11
a12
. . .
a1n
v2 a21
a22
. . .
a2n
.
.
.
.
.
.
. ..
..
. .
..
vm
am1
am2
. . .
amn
Then, substituting the right-hand side of each f (uj) into the expression for f (x), we get
m
m
f (x) = x1(
ai 1vi) + · · · + xn(
ai nvi),
i=1
i=1
3.1. MATRICES
47
which, by regrouping terms to obtain a linear combination of the vi, yields
n
n
f (x) = (
a1 jxj)v1 + · · · + (
am jxj)vm.
j=1
j=1
Thus, letting f (x) = y = y1v1 + · · · + ymvm, we have
n
yi =
ai jxj
(1)
j=1
for all i, 1 ≤ i ≤ m.
To make things more concrete, let us treat the case where n = 3 and m = 2. In this case,
f (u1) = a11v1 + a21v2
f (u2) = a12v1 + a22v2
f (u3) = a13v1 + a23v2,
which in matrix form is expressed by
f (u1) f (u2) f (u3)
v1
a11
a12
a13
,
v2
a21
a22
a23
and for any x = x1u1 + x2u2 + x3u3, we have
f (x) = f (x1u1 + x2u2 + x3u3)
= x1f (u1) + x2f (u2) + x3f (u3)
= x1(a11v1 + a21v2) + x2(a12v1 + a22v2) + x3(a13v1 + a23v2)
= (a11x1 + a12x2 + a13x3)v1 + (a21x1 + a22x2 + a23x3)v2.
Consequently, since
y = y1v1 + y2v2,
we have
y1 = a11x1 + a12x2 + a13x3
y2 = a21x1 + a22x2 + a23x3.
This agrees with the matrix equation
x
y
1
1
a
=
11
a12 a13
x
y
2 .
2
a21 a22 a23
x3
48
CHAPTER 3. MATRICES AND LINEAR MAPS
Let us now consider how the composition of linear maps is expressed in terms of bases.
Let E, F , and G, be three vectors spaces with respective bases (u1, . . . , up) for E,
(v1, . . . , vn) for F , and (w1, . . . , wm) for G. Let g : E → F and f : F → G be linear maps.
As explained earlier, g : E → F is determined by the images of the basis vectors uj, and
f : F → G is determined by the images of the basis vectors vk. We would like to understand
how f ◦ g : E → G is determined by the images of the basis vectors uj.
Remark: Note that we are considering linear maps g : E → F and f : F → G, instead
of f : E → F and g : F → G, which yields the composition f ◦ g : E → G instead of
g ◦ f : E → G. Our perhaps unusual choice is motivated by the fact that if f is represented
by a matrix M (f ) = (ai k) and g is represented by a matrix M(g) = (bk j), then f ◦g : E → G
is represented by the product AB of the matrices A and B. If we had adopted the other
choice where f : E → F and g : F → G, then g ◦ f : E → G would be represented by the
product BA. Personally, we find it easier to remember the formula for the entry in row i and
column of j of the product of two matrices when this product is written by AB, rather than
BA. Obviously, this is a matter of taste! We will have to live with our perhaps unorthodox
choice.
Thus, let
m
f (vk) =
ai kwi,
i=1
for every k, 1 ≤ k ≤ n, and let
n
g(uj) =
bk jvk,
k=1
for every j, 1 ≤ j ≤ p; in matrix form, we have
f (v1) f (v2) . . . f(vn)
w
1
a11
a12
. . .
a1n
w2 a21
a22
. . .
a2n
.
.
.
.
.
.
. ..
..
. .
..
wm
am1
am2
. . .
amn
and
g(u1) g(u2) . . . g(up)
v
1
b11
b12
. . .
b1p
v2 b21
b22
. . .
b2p
.
.
.
.
.
.
. ..
..
. .
..
vn
bn1
bn2
. . .
bnp
3.1. MATRICES
49
By previous considerations, for every
x = x1u1 + · · · + xpup,
letting g(x) = y = y1v1 + · · · + ynvn, we have
p
yk =
bk jxj
(2)
j=1
for all k, 1 ≤ k ≤ n, and for every
y = y1v1 + · · · + ynvn,
letting f (y) = z = z1w1 + · · · + zmwm, we have
n
zi =
ai kyk
(3)
k=1
for all i, 1 ≤ i ≤ m. Then, if y = g(x) and z = f(y), we have z = f(g(x)), and in view of
(2) and (3), we have
n
p
zi =
ai k(
bk jxj)
k=1
j=1
n
p
=
ai kbk jxj
k=1 j=1
p
n
=
ai kbk jxj
j=1 k=1
p
n
=
(
ai kbk j)xj.
j=1 k=1
Thus, defining ci j such that
n
ci j =
ai kbk j,
k=1
for 1 ≤ i ≤ m, and 1 ≤ j ≤ p, we have
p
zi =
ci jxj
(4)
j=1
Identity (4) suggests defining a multiplication operation on matrices, and we proceed to
do so. We have the following definitions.
50
CHAPTER 3. MATRICES AND LINEAR MAPS
Definition 3.1. Given a field K, an m × n-matrix is a family (ai j)1≤i≤m, 1≤j≤n of scalars in
K, represented as an array
a
1 1
a1 2 . . . a1 n
a2 1
a2 2 . . . a2 n
.
.
.
.
..
..
. .
..
am 1 am 2 . . . am n
In the special case where m = 1, we have a row vector , represented as
(a1 1 · · · a1 n)
and in the special case where n = 1, we have a column vector , represented as
a
1 1
.
..
am 1
In these last two cases, we usually omit the constant index 1 (first index in case of a row,
second index in case of a column). The set of all m × n-matrices is denoted by Mm,n(K)
or Mm,n. An n × n-matrix is called a square matrix of dimension n. The set of all square
matrices of dimension n is denoted by Mn(K), or Mn.
Remark: As defined, a matrix A = (ai j)1≤i≤m, 1≤j≤n is a family, that is, a function from
{1, 2, . . . , m} × {1, 2, . . . , n} to K. As such, there is no reason to assume an ordering on
the indices. Thus, the matrix A can be represented in many different ways as an array, by
adopting different orders for the rows or the columns. However, it is customary (and usually
convenient) to assume the natural ordering on the sets {1, 2, . . . , m} and {1, 2, . . . , n}, and
to represent A as an array according to this ordering of the rows and columns.
We also define some operations on matrices as follows.
Definition 3.2. Given two m × n matrices A = (ai j) and B = (bi j), we define their sum
A + B as the matrix C = (ci j) such that ci j = ai j + bi j; that is,
a
1 1
a1 2 . . . a1 n
b1 1
b1 2 . . . b1 n
a2 1
a2 2 . . . a2 n
b2 1
b2 2 . . . b2 n
.
.
.
. + .
.
.
.
..
..
. .
.. ..
..
. .
..
am 1 am 2 . . . am n
bm 1 bm 2 . . . bm n
a
1 1 + b1 1
a1 2 + b1 2
. . .
a1 n + b1 n
a2 1 + b2 1
a2 2 + b2 2
. . .
a2 n + b2 n
=
.
.
.
.
.
..
..
. .
..
am 1 + bm 1 am 2 + bm 2 . . . am n + bm n
3.1. MATRICES
51
For any matrix A = (ai j), we let −A be the matrix (−ai j). Given a scalar λ ∈ K, we define
the matrix λA as the matrix C = (ci j) such that ci j = λai j; that is
a
1 1
a1 2 . . . a1 n
λa1 1
λa1 2 . . . λa1 n
a2 1
a2 2 . . . a2 n
λa2 1
λa2 2 . . . λa2 n
λ .
.
.
. = .
.
.
.
.
..
..
. .
.. ..
..
. .
..
am 1 am 2 . . . am n
λam 1 λam 2 . . . λam n
Given an m × n matrices A = (ai k) and an n × p matrices B = (bk j), we define their product
AB as the m × p matrix C = (ci j) such that
n
ci j =
ai kbk j,
k=1
for 1 ≤ i ≤ m, and 1 ≤ j ≤ p. In the product AB = C shown below
a
1 1
a1 2 . . . a1 n
b1 1 b1 2 . . . b1 p
c1 1
c1 2 . . . c1 p
a2 1
a2 2 . . . a2 n b2 1 b2 2 . . . b2 p
c2 1
c2 2 . . . c2 p
.
.
.
. .
.
.
. = .
.
.
.
..
..
. .
.. ..
..
. .
.. ..
..
. .
..
am 1 am 2 . . . am n
bn 1 bn 2 . . . bn p
cm 1 cm 2 . . . cm p
note that the entry of index i and j of the matrix AB obtained by multiplying the matrices
A and B can be identified with the product of the row matrix corresponding to the i-th row
of A with the column matrix corresponding to the j-column of B:
b
1 j
n
(a
.
.
i 1 · · · ai n)
.
=
ai kbk j.
b
k=1
n j
The square matrix In of dimension n containing 1 on the diagonal and 0 everywhere else
is called the identity matrix . It is denoted as
1 0 . . . 0
0
1 . . . 0
.
.
.
.
..
..
. . ..
0 0 . . . 1
Given an m × n matrix A = (ai j), its transpose A = (aj i), is the n × m-matrix such
that aj i = ai j, for all i, 1 ≤ i ≤ m, and all j, 1 ≤ j ≤ n.
The transpose of a matrix A is sometimes denoted by At, or even by tA. Note that the
transpose A of a matrix A has the property that the j-th row of A is the j-th column of
A. In other words, transposition exchanges the rows and the columns of a matrix.
52
CHAPTER 3. MATRICES AND LINEAR MAPS
The following observation will be useful later on when we discuss the SVD. Given any
m × n matrix A and any n × p matrix B, if we denote the columns of A by A1, . . . , An and
the rows of B by B1, . . . , Bn, then we have
AB = A1B1 + · · · + AnBn.
For every square matrix A of dimension n, it is immediately verified that AIn = InA = A.
If a matrix B such that AB = BA = In exists, then it is unique, and it is called the inverse
of A. The matrix B is also denoted by A−1. An invertible matrix is also called a nonsingular
matrix, and a matrix that is not invertible is called a singular matrix.
Proposition 2.16 shows that if a square matrix A has a left inverse, that is a matrix B
such that BA = I, or a right inverse, that is a matrix C such that AC = I, then A is actually
invertible; so B = A−1 and C = A−1. These facts also follow from Proposition 4.14.
It is immediately verified that the set Mm,n(K) of m × n matrices is a vector space under
addition of matrices and multiplication of a matrix by a scalar. Consider the m × n-matrices
Ei,j = (eh k), defined such that ei j = 1, and eh k = 0, if h = i or k = j. It is clear that every
matrix A = (ai j) ∈ Mm,n(K) can be written in a unique way as
m
n
A =
ai jEi,j.
i=1 j=1
Thus, the family (Ei,j)1≤i≤m,1≤j≤n is a basis of the vector space Mm,n(K), which has dimen-
sion mn.
Remark: Definition 3.1 and Definition 3.2 also make perfect sense when K is a (commuta-
tive) ring rather than a field. In this more general setting, the framework of vector spaces
is too narrow, but we can consider structures over a commutative ring A satisfying all the
axioms of Definition 2.9. Such structures are called modules. The theory of modules is
(much) more complicated than that of vector spaces. For example, modules do not always
have a basis, and other properties holding for vector spaces usually fail for modules. When
a module has a basis, it is called a free module. For example, when A is a commutative
ring, the structure An is a module such that the vectors ei, with (ei)i = 1 and (ei)j = 0 for
j = i, form a basis of An. Many properties of vector spaces still hold for An. Thus, An is a
free module. As another example, when A is a commutative ring, Mm,n(A) is a free module
with basis (Ei,j)1≤i≤m,1≤j≤n. Polynomials over a commutative ring also form a free module
of infinite dimension.
Square matrices provide a natural example of a noncommutative ring with zero divisors.
Example 3.1. For example, letting A, B be the 2 × 2-matrices
1 0
0 0
A =
,
B =
,
0 0
1 0
3.1. MATRICES
53
then
1 0
0 0
0 0
AB =
=
,
0 0
1 0
0 0
and
0 0
1 0
0 0
BA =
=
.
1 0
0 0
1 0
We now formalize the representation of linear maps by matrices.
Definition 3.3. Let E and F be two vector spaces, and let (u1, . . . , un) be a basis for E,
and (v1, . . . , vm) be a basis for F . Each vector x ∈ E expressed in the basis (u1, . . . , un) as
x = x1u1 + · · · + xnun is represented by the column matrix
x
1
M (x) =
.
..
xn
and similarly for each vector y ∈ F expressed in the basis (v1, . . . , vm).
Every linear map f : E → F is represented by the matrix M(f) = (ai j), where ai j is the
i-th component of the vector f (uj) over the basis (v1, . . . , vm), i.e., where
m
f (uj) =
ai jvi,
for every j, 1 ≤ j ≤ n.
i=1
The coefficients a1j, a2j, . . . , amj of f (uj) over the basis (v1, . . . , vm) form the jth column of
the matrix M (f ) shown below:
f (u1) f (u2) . . . f(un)
v
1
a11
a12
. . .
a1n
v2 a21
a22
. . .
a2n
.
.
.
.
.
.
.
. ..
..
. .
..
vm
am1
am2
. . .
amn
The matrix M (f ) associated with the linear map f : E → F is called the matrix of f with
respect to the bases (u1, . . . , un) and (v1, . . . , vm). When E = F and the basis (v1, . . . , vm)
is identical to the basis (u1, . . . , un) of E, the matrix M(f ) associated with f : E → E (as
above) is called the matrix of f with respect to the base (u1, . . . , un).
Remark: As in the remark after Definition 3.1, there is no reason to assume that the vectors
in the bases (u1, . . . , un) and (v1, . . . , vm) are ordered in any particular way. However, it is
often convenient to assume the natural ordering. When this is so, authors sometimes refer
54
CHAPTER 3. MATRICES AND LINEAR MAPS
to the matrix M (f ) as the matrix of f with respect to the ordered bases (u1, . . . , un) and
(v1, . . . , vm).
Then, given a linear map f : E → F represented by the matrix M(f) = (ai j) w.r.t. the
bases (u1, . . . , un) and (v1, . . . , vm), by equations (1) and the definition of matrix multipli-
cation, the equation y = f (x) correspond to the matrix equation M (y) = M (f )M (x), that
is,
y
1
a1 1 . . . a1 n
x1
.
.
.
.
.
.. = ..