7.1
Normed Vector Spaces
In order to define how close two vectors or two matrices are, and in order to define the
convergence of sequences of vectors or matrices, we can use the notion of a norm. Recall
that R+ = {x ∈ R | x ≥ 0}. Also recall that if z = a + ib ∈ C is a complex number, with
√
a, b ∈ R, then z = a − ib and |z| = a2 + b2 (|z| is the modulus of z).
Definition 7.1. Let E be a vector space over a field K, where K is either the field R of
reals, or the field C of complex numbers. A norm on E is a function
: E → R+, assigning
a nonnegative real number u to any vector u ∈ E, and satisfying the following conditions
for all x, y, z ∈ E:
(N1) x ≥ 0, and x = 0 iff x = 0.
(positivity)
(N2) λx = |λ| x .
(scaling)
(N3) x + y ≤ x + y .
(triangle inequality)
A vector space E together with a norm
is called a normed vector space.
From (N3), we easily get
| x − y | ≤ x − y .
Let us give some examples of normed vector spaces.
Example 7.1.
1. Let E = R, and x = |x|, the absolute value of x.
2. Let E = C, and z = |z|, the modulus of z.
205
206
CHAPTER 7. VECTOR NORMS AND MATRIX NORMS
3. Let E = n
n
R (or E = C ). There are three standard norms. For every (x1, . . . , xn) ∈ E,
we have the norm x 1, defined such that,
x 1 = |x1| + · · · + |xn|,
we have the Euclidean norm x 2, defined such that,
1
x
2
2 =
|x1|2 + · · · + |xn|2
,
and the sup-norm x ∞, defined such that,
x ∞ = max{|xi| | 1 ≤ i ≤ n}.
More generally, we define the p-norm (for p ≥ 1) by
x p = (|x1|p + · · · + |xn|p)1/p.
There are other norms besides the p-norms; we urge the reader to find such norms.
Some work is required to show the triangle inequality for the p-norm.
Proposition 7.1. If E is a finite-dimensional vector space over R or C, for every real
number p ≥ 1, the p-norm is indeed a norm.
Proof. The cases p = 1 and p = ∞ are easy and left to the reader. If p > 1, then let q > 1
such that
1
1
+
= 1.
p
q
We will make use of the following fact: for all α, β ∈ R, if α, β ≥ 0, then
αp
βq
αβ ≤
+
.
(∗)
p
q
To prove the above inequality, we use the fact that the exponential function t → et satisfies
the following convexity inequality:
eθx+(1−θ)y ≤ θex + (1 − y)ey,
for all x, y ∈ R and all θ with 0 ≤ θ ≤ 1.
Since the case αβ = 0 is trivial, let us assume that α > 0 and β > 0. If we replace θ by
1/p, x by p log α and y by q log β, then we get
1
1
1
e p log α+1q log β
p
q
≤ ep log α + eq log β,
p
q
7.1. NORMED VECTOR SPACES
207
which simplifies to
αp
βq
αβ ≤
+
,
p
q
as claimed.
We will now prove that for any two vectors u, v ∈ E, we have
n
|uivi| ≤ u
v
.
(
p
q
∗∗)
i=1
Since the above is trivial if u = 0 or v = 0, let us assume that u = 0 and v = 0. Then, the
inequality (∗) with α = |ui|/ u
and β =
yields
p
|vi|/ v q
|uivi|
|u
|v
≤
i|p
+
i|q ,
u
v
p u p
q u q
p
q
p
q
for i = 1, . . . , n, and by summing up these inequalities, we get
n
|uivi| ≤ u
v
,
p
q
i=1
as claimed. To finish the proof, we simply have to prove that property (N3) holds, since
(N1) and (N2) are clear. Now, for i = 1, . . . , n, we can write
(|ui| + |vi|)p = |ui|(|ui| + |vi|)p−1 + |vi|(|ui| + |vi|)p−1,
so that by summing up these equations we get
n
n
n
(|ui| + |vi|)p =
|ui|(|ui| + |vi|)p−1 +
|vi|(|ui| + |vi|)p−1,
i=1
i=1
i=1
and using the inequality (∗∗), we get
n
n
1/q
(|ui| + |vi|)p ≤ ( u + v )
(
.
p
p
|ui| + |vi|)(p−1)q
i=1
i=1
However, 1/p + 1/q = 1 implies pq = p + q, that is, (p − 1)q = p, so we have
n
n
1/q
(|ui| + |vi|)p ≤ ( u + v )
(
,
p
p
|ui| + |vi|)p
i=1
i=1
which yields
n
1/p
(|ui| + |vi|)p
≤ u + v .
p
p
i=1
Since |ui + vi| ≤ |ui| + |vi|, the above implies the triangle inequality u + v
+ v ,
p ≤
u p
p
as claimed.
208
CHAPTER 7. VECTOR NORMS AND MATRIX NORMS
For p > 1 and 1/p + 1/q = 1, the inequality
n
n
1/p
n
1/q
|uivi| ≤
|ui|p
|vi|q
i=1
i=1
i=1
is known as Hölder’s inequality. For p = 2, it is the Cauchy–Schwarz inequality.
Actually, if we define the Hermitian inner product −, − on n
C by
n
u, v =
uivi,
i=1
where u = (u1, . . . , un) and v = (v1, . . . , vn), then
n
n
| u, v | ≤
|uivi| =
|uivi|,
i=1
i=1
so Hölder’s inequality implies the inequality
| u, v | ≤ u
v
p
q
also called Hölder’s inequality, which, for p = 2 is the standard Cauchy–Schwarz inequality.
The triangle inequality for the p-norm,
n
1/p
n
1/p
n
1/q
(|ui + vi|)p
≤
|ui|p
+
|vi|q
,
i=1
i=1
i=1
is known as Minkowski’s inequality.
When we restrict the Hermitian inner product to real vectors, u, v ∈
n
R , we get the
Euclidean inner product
n
u, v =
uivi.
i=1
It is very useful to observe that if we represent (as usual) u = (u1, . . . , un) and v = (v1, . . . , vn)
(in
n
R ) by column vectors, then their Euclidean inner product is given by
u, v = u v = v u,
and when u, v ∈ n
C , their Hermitian inner product is given by
u, v = v∗u = u∗v.
In particular, when u = v, in the complex case we get
u 2 = u∗u,
2
7.1. NORMED VECTOR SPACES
209
and in the real case, this becomes
u 2 = u u.
2
As convenient as these notations are, we still recommend that you do not abuse them; the
notation u, v is more intrinsic and still “works” when our vector space is infinite dimen-
sional.
The following proposition is easy to show.
Proposition 7.2. The following inequalities hold for all x ∈ n
n
R
(or x ∈ C ):
x ∞ ≤ x 1 ≤ n x ∞,
√
x ∞ ≤ x 2 ≤ n x ∞,
√
x 2 ≤ x 1 ≤ n x 2.
Proposition 7.2 is actually a special case of a very important result: in a finite-dimensional
vector space, any two norms are equivalent.
Definition 7.2. Given any (real or complex) vector space E, two norms
and
are
a
b
equivalent iff there exists some positive reals C1, C2 > 0, such that
u
and
u
, for all u
a ≤ C1
u b
b ≤ C2
u a
∈ E.
Given any norm
on a vector space of dimension n, for any basis (e1, . . . , en) of E,
observe that for any vector x = x1e1 + · · · + xnen, we have
x = x1e1 + · · · + xnen ≤ |x1| e1 + · · · + |xn| en ≤ C(|x1| + · · · + |xn|) = C x ,
1
with C = max1≤i≤n ei and
x
= x
1
1e1 + · · · + xnen
= |x1| + · · · + |xn|.
The above implies that
| u − v | ≤ u − v ≤ C u − v ,
1
which means that the map u → u is continuous with respect to the norm
.
1
Let Sn−1
1
be the unit ball with respect to the norm
, namely
1
Sn−1
1
= {x ∈ E | x
= 1
1
}.
Now, Sn−1
1
is a closed and bounded subset of a finite-dimensional vector space, so by Heine–
Borel (or equivalently, by Bolzano–Weiertrass), Sn−1
1
is compact. On the other hand, it
is a well known result of analysis that any continuous real-valued function on a nonempty
compact set has a minimum and a maximum, and that they are achieved. Using these facts,
we can prove the following important theorem:
210
CHAPTER 7. VECTOR NORMS AND MATRIX NORMS
Theorem 7.3. If E is any real or complex vector space of finite dimension, then any two
norms on E are equivalent.
Proof. It is enough to prove that any norm
is equivalent to the 1-norm. We already proved
that the function x → x is continuous with respect to the norm
and we observed that
1
the unit ball Sn−1
1
is compact. Now, we just recalled that because the function f : x → x is
continuous and because Sn−1
1
is compact, the function f has a minimum m and a maximum
M , and because x is never zero on Sn−1
1
, we must have m > 0. Consequently, we just
proved that if x
= 1, then
1
0 < m ≤ x ≤ M,
so for any x ∈ E with x = 0, we get
m ≤ x/ x 1 ≤ M,
which implies
m x
.
1 ≤
x ≤ M x 1
Since the above inequality holds trivially if x = 0, we just proved that
and
are
1
equivalent, as claimed.
Next, we will consider norms on matrices.
7.2
Matrix Norms
For simplicity of exposition, we will consider the vector spaces Mn(R) and Mn(C) of square
n × n matrices. Most results also hold for the spaces Mm,n(R) and Mm,n(C) of rectangular
m × n matrices. Since n × n matrices can be multiplied, the idea behind matrix norms is
that they should behave “well” with respect to matrix multiplication.
Definition 7.3. A matrix norm
on the space of square n × n matrices in Mn(K), with
K = R or K = C, is a norm on the vector space Mn(K) with the additional property that
AB ≤ A
B ,
for all A, B ∈ Mn(K).
Since I2 = I, from I = I2 ≤ I 2, we get I ≥ 1, for every matrix norm.
Before giving examples of matrix norms, we need to review some basic definitions about
matrices. Given any matrix A = (aij) ∈ Mm,n(C), the conjugate A of A is the matrix such
that
Aij = aij,
1 ≤ i ≤ m, 1 ≤ j ≤ n.
The transpose of A is the n × m matrix A such that
Aij = aji, 1 ≤ i ≤ m, 1 ≤ j ≤ n.
7.2. MATRIX NORMS
211
The adjoint of A is the n × m matrix A∗ such that
A∗ = (A ) = (A) .
When A is a real matrix, A∗ = A . A matrix A ∈ Mn(C) is Hermitian if
A∗ = A.
If A is a real matrix (A ∈ Mn(R)), we say that A is symmetric if
A = A.
A matrix A ∈ Mn(C) is normal if
AA∗ = A∗A,
and if A is a real matrix, it is normal if
AA = A A.
A matrix U ∈ Mn(C) is unitary if
U U ∗ = U ∗U = I.
A real matrix Q ∈ Mn(R) is orthogonal if
QQ = Q Q = I.
Given any matrix A = (aij) ∈ Mn(C), the trace tr(A) of A is the sum of its diagonal
elements
tr(A) = a11 + · · · + ann.
It is easy to show that the trace is a linear map, so that
tr(λA) = λtr(A)
and
tr(A + B) = tr(A) + tr(B).
Moreover, if A is an m × n matrix and B is an n × m matrix, it is not hard to show that
tr(AB) = tr(BA).
We also review eigenvalues and eigenvectors. We content ourselves with definition in-
volving matrices. A more general treatment will be given later on (see Chapter 12).
212
CHAPTER 7. VECTOR NORMS AND MATRIX NORMS
Definition 7.4. Given any square matrix A ∈ Mn(C), a complex number λ ∈ C is an
eigenvalue of A if there is some nonzero vector u ∈ n
C , such that
Au = λu.
If λ is an eigenvalue of A, then the nonzero vectors u ∈ n
C
such that Au = λu are called
eigenvectors of A associated with λ; together with the zero vector, these eigenvectors form a
subspace of
n
C denoted by Eλ(A), and called the eigenspace associated with λ.
Remark: Note that Definition 7.4 requires an eigenvector to be nonzero. A somewhat
unfortunate consequence of this requirement is that the set of eigenvectors is not a subspace,
since the zero vector is missing! On the positive side, whenever eigenvectors are involved,
there is no need to say that they are nonzero. The fact that eigenvectors are nonzero is
implicitly used in all the arguments involving them, so it seems safer (but perhaps not as
elegant) to stituplate that eigenvectors should be nonzero.
If A is a square real matrix A ∈ Mn(R), then we restrict Definition 7.4 to real eigenvalues
λ ∈ R and real eigenvectors. However, it should be noted that although every complex
matrix always has at least some complex eigenvalue, a real matrix may not have any real
eigenvalues. For example, the matrix
0 −1
A =
1
0
has the complex eigenvalues i and −i, but no real eigenvalues. Thus, typically, even for real
matrices, we consider complex eigenvalues.
Observe that λ ∈ C is an eigenvalue of A
iff Au = λu for some nonzero vector u ∈ n
C
iff (λI − A)u = 0
iff the matrix λI − A defines a linear map which has a nonzero kernel, that is,
iff λI − A not invertible.
However, from Proposition 5.10, λI − A is not invertible iff
det(λI − A) = 0.
Now, det(λI − A) is a polynomial of degree n in the indeterminate λ, in fact, of the form
λn − tr(A)λn−1 + · · · + (−1)n det(A).
Thus, we see that the eigenvalues of A are the zeros (also called roots) of the above polyno-
mial. Since every complex polynomial of degree n has exactly n roots, counted with their
multiplicity, we have the following definition:
7.2. MATRIX NORMS
213
Definition 7.5. Given any square n × n matrix A ∈ Mn(C), the polynomial
det(λI − A) = λn − tr(A)λn−1 + · · · + (−1)n det(A)
is called the characteristic polynomial of A. The n (not necessarily distinct) roots λ1, . . . , λn
of the characteristic polynomial are all the eigenvalues of A and constitute the spectrum of
A. We let
ρ(A) = max |λi|
1≤i≤n
be the largest modulus of the eigenvalues of A, called the spectral radius of A.
Proposition 7.4. For any matrix norm
on Mn(C) or Mn(R), and for any square n × n
matrix A, we have
ρ(A) ≤ A .
Proof. First, let us consider the case where A is a complex matrix, since it is simpler. Let λ
be some eigenvalue of A for which |λ| is maximum, that is, such that |λ| = ρ(A). If u (= 0)
is any eigenvector associated with λ and if U is the n × n matrix whose columns are all u,
then Au = λu implies
AU = λU,
and since
|λ| U = λU = AU ≤ A U
and U = 0, we have U = 0, and get
ρ(A) = |λ| ≤ A ,
as claimed.
If A is a real matrix, the problem is that even if there is a real eigenvalue λ such that
ρ(A) = |λ|, corresponding eigenvectors may be complex. We use a trick based on the fact
that for every matrix A (real or complex),
ρ(Ak) = (ρ(A))k,
which is left as a simple exercise.
Pick any complex norm
on
n and let
denote the corresponding induced norm
c
C
c
on matrices. The restriction of
to real matrices is a real norm that we also denote by
c
. Now, by Theorem 7.3, since M
c
n(R) has finite dimension n2, there is some constant
C > 0 so that
A c ≤ C A , for all A ∈ Mn(R).
Furthermore, for every k ≥ 1 and for every real n × n matrix A, by the previous part,
ρ(Ak) ≤ Ak , and because
is a matrix norm, Ak ≤ A k, so we have
c
(ρ(A))k = ρ(Ak) ≤ Ak
≤ C Ak ≤ C A k ,
c
214
CHAPTER 7. VECTOR NORMS AND MATRIX NORMS
for all k ≥ 1. It follows that
ρ(A) ≤ C1/k A , for all k ≥ 1.
However because C > 0, we have lim
1
k→∞ C1/k = 1 (we have limk→∞
log(C) = 0). There-
k
fore, we conclude that
ρ(A) ≤ A ,
as desired.
Now, it turns out that if A is a real n × n symmetric matrix, then the eigenvalues of A
are all real and there is some orthogonal matrix Q such that
A = Q diag(λ1, . . . , λn)Q,
where diag(λ1, . . . , λn) denotes the matrix whose only nonzero entries (if any) are its diagonal
entries, which are the (real) eigenvalues of A. Similarly, if A is a complex n × n Hermitian
matrix, then the eigenvalues of A are all real and there is some unitary matrix U such that
A = U ∗diag(λ1, . . . , λn)U,
where diag(λ1, . . . , λn) denotes the matrix whose only nonzero entries (if any) are its diagonal
entries, which are the (real) eigenvalues of A.
We now return to matrix norms. We begin with the so-called Frobenius norm, which is
just the norm
on
n2 , where the n
2
C
× n matrix A is viewed as the vector obtained by
concatenating together the rows (or the columns) of A. The reader should check that for
any n × n complex matrix A = (aij),
n
1/2
|aij|2
=
tr(A∗A) =
tr(AA∗).
i,j=1
Definition 7.6. The Frobenius norm
is defined so that for every square n
F
× n matrix
A ∈ Mn(C),
n
1/2
A
=
=
tr(AA∗) =
tr(A∗A).
F
|aij|2
i,j=1
The following proposition show that the Frobenius norm is a matrix norm satisfying other
nice properties.
Proposition 7.5. The Frobenius norm
on M
F
n(C) satisfies the following properties:
(1) It is a matrix norm; that is, AB
B
, for all A, B
F ≤
A F
F
∈ Mn(C).
(2) It is unitarily invariant, which means that for all unitary matrices U, V , we have
A
= U A
= AV
= U AV
.
F
F
F
F
7.2. MATRIX NORMS
215
√
(3)
ρ(A∗A) ≤ A
n ρ(A∗A), for all A
F ≤
∈ Mn(C).
Proof. (1) The only property that requires a proof is the fact AB
B
. This
F ≤
A F
F
follows from the Cauchy–Schwarz inequality:
n
n
2
AB 2 =
a
F
ikbkj
i,j=1
k=1
n
n
n
≤
|aih|2
|bkj|2
i,j=1
h=1
k=1
n
n
=
|aih|2
|bkj|2 = A 2 B 2 .
F
F
i,h=1
k,j=1
(2) We have
A 2 = tr(A∗A) = tr(V V ∗A∗A) = tr(V ∗A∗AV ) = AV 2 ,
F
F
and
A 2 = tr(A∗A) = tr(A∗U ∗U A) = U A 2 .
F
F
The identity
A
= U AV
F
F
follows from the previous two.
(3) It is well known that the trace of a matrix is equal to the sum of its eigenvalues.
Furthermore, A∗A is symmetric positive semidefinite (which means that its eigenvalues are
nonnegative), so ρ(A∗A) is the largest eigenvalue of A∗A and
ρ(A∗A) ≤ tr(A∗A) ≤ nρ(A∗A),
which yields (3) by taking square roots.
Remark: The Frobenius norm is also known as the Hilbert-Schmidt norm or the Schur
norm. So many famous names associated with such a simple thing!
We now give another method for obtaining matrix norms using subordinate norms. First,
we need a proposition that shows that in a finite-dimensional space, the linear map induced
by a matrix is bounded, and thus continuous.
Proposition 7.6. For every norm
on
n
n
C
(or R ), for every matrix A ∈ Mn(C) (or
A ∈ Mn(R)), there is a real constant CA > 0, such that
Au ≤ CA u ,
for every vector u ∈ n
n
C
(or u ∈ R if A is real).