Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

/ Home / Mathematics (Academic) / Basics of Algebra, Topology, and Differential Calculus

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 7

Vector Norms and Matrix Norms

7.1

Normed Vector Spaces

In order to define how close two vectors or two matrices are, and in order to define the

convergence of sequences of vectors or matrices, we can use the notion of a norm. Recall

that R+ = {x ∈ R | x ≥ 0}. Also recall that if z = a + ib ∈ C is a complex number, with

√

a, b ∈ R, then z = a − ib and |z| = a2 + b2 (|z| is the modulus of z).

Definition 7.1. Let E be a vector space over a field K, where K is either the field R of

reals, or the field C of complex numbers. A norm on E is a function

: E → R+, assigning

a nonnegative real number u to any vector u ∈ E, and satisfying the following conditions

for all x, y, z ∈ E:

(N1) x ≥ 0, and x = 0 iff x = 0.

(positivity)

(N2) λx = |λ| x .

(scaling)

(N3) x + y ≤ x + y .

(triangle inequality)

A vector space E together with a norm

is called a normed vector space.

From (N3), we easily get

| x − y | ≤ x − y .

Let us give some examples of normed vector spaces.

Example 7.1.

1. Let E = R, and x = |x|, the absolute value of x.

2. Let E = C, and z = |z|, the modulus of z.

205

206

CHAPTER 7. VECTOR NORMS AND MATRIX NORMS

3. Let E = n

R (or E = C ). There are three standard norms. For every (x1, . . . , xn) ∈ E,

we have the norm x 1, defined such that,

x 1 = |x1| + · · · + |xn|,

we have the Euclidean norm x 2, defined such that,

2 =

|x1|2 + · · · + |xn|2

and the sup-norm x ∞, defined such that,

x ∞ = max{|xi| | 1 ≤ i ≤ n}.

More generally, we define the p-norm (for p ≥ 1) by

x p = (|x1|p + · · · + |xn|p)1/p.

There are other norms besides the p-norms; we urge the reader to find such norms.

Some work is required to show the triangle inequality for the p-norm.

Proposition 7.1. If E is a finite-dimensional vector space over R or C, for every real

number p ≥ 1, the p-norm is indeed a norm.

Proof. The cases p = 1 and p = ∞ are easy and left to the reader. If p > 1, then let q > 1

such that

= 1.

We will make use of the following fact: for all α, β ∈ R, if α, β ≥ 0, then

αp

βq

αβ ≤

(∗)

To prove the above inequality, we use the fact that the exponential function t → et satisfies

the following convexity inequality:

eθx+(1−θ)y ≤ θex + (1 − y)ey,

for all x, y ∈ R and all θ with 0 ≤ θ ≤ 1.

Since the case αβ = 0 is trivial, let us assume that α > 0 and β > 0. If we replace θ by

1/p, x by p log α and y by q log β, then we get

e p log α+1q log β

≤ ep log α + eq log β,

7.1. NORMED VECTOR SPACES

207

which simplifies to

αp

βq

αβ ≤

as claimed.

We will now prove that for any two vectors u, v ∈ E, we have

|uivi| ≤ u

(

∗∗)

i=1

Since the above is trivial if u = 0 or v = 0, let us assume that u = 0 and v = 0. Then, the

inequality (∗) with α = |ui|/ u

and β =

yields

|vi|/ v q

|uivi|

≤

i|p

i|q ,

p u p

q u q

for i = 1, . . . , n, and by summing up these inequalities, we get

|uivi| ≤ u

i=1

as claimed. To finish the proof, we simply have to prove that property (N3) holds, since

(N1) and (N2) are clear. Now, for i = 1, . . . , n, we can write

(|ui| + |vi|)p = |ui|(|ui| + |vi|)p−1 + |vi|(|ui| + |vi|)p−1,

so that by summing up these equations we get

(|ui| + |vi|)p =

|ui|(|ui| + |vi|)p−1 +

|vi|(|ui| + |vi|)p−1,

i=1

and using the inequality (∗∗), we get

1/q

(|ui| + |vi|)p ≤ ( u + v )

(

|ui| + |vi|)(p−1)q

i=1

However, 1/p + 1/q = 1 implies pq = p + q, that is, (p − 1)q = p, so we have

1/q

(|ui| + |vi|)p ≤ ( u + v )

(

|ui| + |vi|)p

i=1

which yields

1/p

(|ui| + |vi|)p

≤ u + v .

i=1

Since |ui + vi| ≤ |ui| + |vi|, the above implies the triangle inequality u + v

+ v ,

p ≤

u p

as claimed.

208

CHAPTER 7. VECTOR NORMS AND MATRIX NORMS

For p > 1 and 1/p + 1/q = 1, the inequality

1/p

1/q

|uivi| ≤

|ui|p

|vi|q

i=1

is known as Hölder’s inequality. For p = 2, it is the Cauchy–Schwarz inequality.

Actually, if we define the Hermitian inner product −, − on n

C by

u, v =

uivi,

i=1

where u = (u1, . . . , un) and v = (v1, . . . , vn), then

| u, v | ≤

|uivi| =

|uivi|,

i=1

so Hölder’s inequality implies the inequality

| u, v | ≤ u

also called Hölder’s inequality, which, for p = 2 is the standard Cauchy–Schwarz inequality.

The triangle inequality for the p-norm,

1/p

1/q

(|ui + vi|)p

≤

|ui|p

|vi|q

i=1

is known as Minkowski’s inequality.

When we restrict the Hermitian inner product to real vectors, u, v ∈

R , we get the

Euclidean inner product

u, v =

uivi.

i=1

It is very useful to observe that if we represent (as usual) u = (u1, . . . , un) and v = (v1, . . . , vn)

(in

R ) by column vectors, then their Euclidean inner product is given by

u, v = u v = v u,

and when u, v ∈ n

C , their Hermitian inner product is given by

u, v = v∗u = u∗v.

In particular, when u = v, in the complex case we get

u 2 = u∗u,

7.1. NORMED VECTOR SPACES

209

and in the real case, this becomes

u 2 = u u.

As convenient as these notations are, we still recommend that you do not abuse them; the

notation u, v is more intrinsic and still “works” when our vector space is infinite dimen-

sional.

The following proposition is easy to show.

Proposition 7.2. The following inequalities hold for all x ∈ n

(or x ∈ C ):

x ∞ ≤ x 1 ≤ n x ∞,

√

x ∞ ≤ x 2 ≤ n x ∞,

√

x 2 ≤ x 1 ≤ n x 2.

Proposition 7.2 is actually a special case of a very important result: in a finite-dimensional

vector space, any two norms are equivalent.

Definition 7.2. Given any (real or complex) vector space E, two norms

and

are

equivalent iff there exists some positive reals C1, C2 > 0, such that

and

, for all u

a ≤ C1

u b

b ≤ C2

u a

∈ E.

Given any norm

on a vector space of dimension n, for any basis (e1, . . . , en) of E,

observe that for any vector x = x1e1 + · · · + xnen, we have

x = x1e1 + · · · + xnen ≤ |x1| e1 + · · · + |xn| en ≤ C(|x1| + · · · + |xn|) = C x ,

with C = max1≤i≤n ei and

= x

1e1 + · · · + xnen

= |x1| + · · · + |xn|.

The above implies that

| u − v | ≤ u − v ≤ C u − v ,

which means that the map u → u is continuous with respect to the norm

Let Sn−1

be the unit ball with respect to the norm

, namely

Sn−1

= {x ∈ E | x

= 1

Now, Sn−1

is a closed and bounded subset of a finite-dimensional vector space, so by Heine–

Borel (or equivalently, by Bolzano–Weiertrass), Sn−1

is compact. On the other hand, it

is a well known result of analysis that any continuous real-valued function on a nonempty

compact set has a minimum and a maximum, and that they are achieved. Using these facts,

we can prove the following important theorem:

210

CHAPTER 7. VECTOR NORMS AND MATRIX NORMS

Theorem 7.3. If E is any real or complex vector space of finite dimension, then any two

norms on E are equivalent.

Proof. It is enough to prove that any norm

is equivalent to the 1-norm. We already proved

that the function x → x is continuous with respect to the norm

and we observed that

the unit ball Sn−1

is compact. Now, we just recalled that because the function f : x → x is

continuous and because Sn−1

is compact, the function f has a minimum m and a maximum

M , and because x is never zero on Sn−1

, we must have m > 0. Consequently, we just

proved that if x

= 1, then

0 < m ≤ x ≤ M,

so for any x ∈ E with x = 0, we get

m ≤ x/ x 1 ≤ M,

which implies

m x

1 ≤

x ≤ M x 1

Since the above inequality holds trivially if x = 0, we just proved that

and

are

equivalent, as claimed.

Next, we will consider norms on matrices.

7.2

Matrix Norms

For simplicity of exposition, we will consider the vector spaces Mn(R) and Mn(C) of square

n × n matrices. Most results also hold for the spaces Mm,n(R) and Mm,n(C) of rectangular

m × n matrices. Since n × n matrices can be multiplied, the idea behind matrix norms is

that they should behave “well” with respect to matrix multiplication.

Definition 7.3. A matrix norm

on the space of square n × n matrices in Mn(K), with

K = R or K = C, is a norm on the vector space Mn(K) with the additional property that

AB ≤ A

B ,

for all A, B ∈ Mn(K).

Since I2 = I, from I = I2 ≤ I 2, we get I ≥ 1, for every matrix norm.

Before giving examples of matrix norms, we need to review some basic definitions about

matrices. Given any matrix A = (aij) ∈ Mm,n(C), the conjugate A of A is the matrix such

that

Aij = aij,

1 ≤ i ≤ m, 1 ≤ j ≤ n.

The transpose of A is the n × m matrix A such that

Aij = aji, 1 ≤ i ≤ m, 1 ≤ j ≤ n.

7.2. MATRIX NORMS

211

The adjoint of A is the n × m matrix A∗ such that

A∗ = (A ) = (A) .

When A is a real matrix, A∗ = A . A matrix A ∈ Mn(C) is Hermitian if

A∗ = A.

If A is a real matrix (A ∈ Mn(R)), we say that A is symmetric if

A = A.

A matrix A ∈ Mn(C) is normal if

AA∗ = A∗A,

and if A is a real matrix, it is normal if

AA = A A.

A matrix U ∈ Mn(C) is unitary if

U U ∗ = U ∗U = I.

A real matrix Q ∈ Mn(R) is orthogonal if

QQ = Q Q = I.

Given any matrix A = (aij) ∈ Mn(C), the trace tr(A) of A is the sum of its diagonal

elements

tr(A) = a11 + · · · + ann.

It is easy to show that the trace is a linear map, so that

tr(λA) = λtr(A)

and

tr(A + B) = tr(A) + tr(B).

Moreover, if A is an m × n matrix and B is an n × m matrix, it is not hard to show that

tr(AB) = tr(BA).

We also review eigenvalues and eigenvectors. We content ourselves with definition in-

volving matrices. A more general treatment will be given later on (see Chapter 12).

212

CHAPTER 7. VECTOR NORMS AND MATRIX NORMS

Definition 7.4. Given any square matrix A ∈ Mn(C), a complex number λ ∈ C is an

eigenvalue of A if there is some nonzero vector u ∈ n

C , such that

Au = λu.

If λ is an eigenvalue of A, then the nonzero vectors u ∈ n

such that Au = λu are called

eigenvectors of A associated with λ; together with the zero vector, these eigenvectors form a

subspace of

C denoted by Eλ(A), and called the eigenspace associated with λ.

Remark: Note that Definition 7.4 requires an eigenvector to be nonzero. A somewhat

unfortunate consequence of this requirement is that the set of eigenvectors is not a subspace,

since the zero vector is missing! On the positive side, whenever eigenvectors are involved,

there is no need to say that they are nonzero. The fact that eigenvectors are nonzero is

implicitly used in all the arguments involving them, so it seems safer (but perhaps not as

elegant) to stituplate that eigenvectors should be nonzero.

If A is a square real matrix A ∈ Mn(R), then we restrict Definition 7.4 to real eigenvalues

λ ∈ R and real eigenvectors. However, it should be noted that although every complex

matrix always has at least some complex eigenvalue, a real matrix may not have any real

eigenvalues. For example, the matrix

0 −1

A =

has the complex eigenvalues i and −i, but no real eigenvalues. Thus, typically, even for real

matrices, we consider complex eigenvalues.

Observe that λ ∈ C is an eigenvalue of A

iff Au = λu for some nonzero vector u ∈ n

iff (λI − A)u = 0

iff the matrix λI − A defines a linear map which has a nonzero kernel, that is,

iff λI − A not invertible.

However, from Proposition 5.10, λI − A is not invertible iff

det(λI − A) = 0.

Now, det(λI − A) is a polynomial of degree n in the indeterminate λ, in fact, of the form

λn − tr(A)λn−1 + · · · + (−1)n det(A).

Thus, we see that the eigenvalues of A are the zeros (also called roots) of the above polyno-

mial. Since every complex polynomial of degree n has exactly n roots, counted with their

multiplicity, we have the following definition:

7.2. MATRIX NORMS

213

Definition 7.5. Given any square n × n matrix A ∈ Mn(C), the polynomial

det(λI − A) = λn − tr(A)λn−1 + · · · + (−1)n det(A)

is called the characteristic polynomial of A. The n (not necessarily distinct) roots λ1, . . . , λn

of the characteristic polynomial are all the eigenvalues of A and constitute the spectrum of

A. We let

ρ(A) = max |λi|

1≤i≤n

be the largest modulus of the eigenvalues of A, called the spectral radius of A.

Proposition 7.4. For any matrix norm

on Mn(C) or Mn(R), and for any square n × n

matrix A, we have

ρ(A) ≤ A .

Proof. First, let us consider the case where A is a complex matrix, since it is simpler. Let λ

be some eigenvalue of A for which |λ| is maximum, that is, such that |λ| = ρ(A). If u (= 0)

is any eigenvector associated with λ and if U is the n × n matrix whose columns are all u,

then Au = λu implies

AU = λU,

and since

|λ| U = λU = AU ≤ A U

and U = 0, we have U = 0, and get

ρ(A) = |λ| ≤ A ,

as claimed.

If A is a real matrix, the problem is that even if there is a real eigenvalue λ such that

ρ(A) = |λ|, corresponding eigenvectors may be complex. We use a trick based on the fact

that for every matrix A (real or complex),

ρ(Ak) = (ρ(A))k,

which is left as a simple exercise.

Pick any complex norm

n and let

denote the corresponding induced norm

on matrices. The restriction of

to real matrices is a real norm that we also denote by

. Now, by Theorem 7.3, since M

n(R) has finite dimension n2, there is some constant

C > 0 so that

A c ≤ C A , for all A ∈ Mn(R).

Furthermore, for every k ≥ 1 and for every real n × n matrix A, by the previous part,

ρ(Ak) ≤ Ak , and because

is a matrix norm, Ak ≤ A k, so we have

(ρ(A))k = ρ(Ak) ≤ Ak

≤ C Ak ≤ C A k ,

214

CHAPTER 7. VECTOR NORMS AND MATRIX NORMS

for all k ≥ 1. It follows that

ρ(A) ≤ C1/k A , for all k ≥ 1.

However because C > 0, we have lim

k→∞ C1/k = 1 (we have limk→∞

log(C) = 0). There-

fore, we conclude that

ρ(A) ≤ A ,

as desired.

Now, it turns out that if A is a real n × n symmetric matrix, then the eigenvalues of A

are all real and there is some orthogonal matrix Q such that

A = Q diag(λ1, . . . , λn)Q,

where diag(λ1, . . . , λn) denotes the matrix whose only nonzero entries (if any) are its diagonal

entries, which are the (real) eigenvalues of A. Similarly, if A is a complex n × n Hermitian

matrix, then the eigenvalues of A are all real and there is some unitary matrix U such that

A = U ∗diag(λ1, . . . , λn)U,

where diag(λ1, . . . , λn) denotes the matrix whose only nonzero entries (if any) are its diagonal

entries, which are the (real) eigenvalues of A.

We now return to matrix norms. We begin with the so-called Frobenius norm, which is

just the norm

n2 , where the n

× n matrix A is viewed as the vector obtained by

concatenating together the rows (or the columns) of A. The reader should check that for

any n × n complex matrix A = (aij),

1/2

|aij|2

tr(A∗A) =

tr(AA∗).

i,j=1

Definition 7.6. The Frobenius norm

is defined so that for every square n

× n matrix

A ∈ Mn(C),

1/2

tr(AA∗) =

tr(A∗A).

|aij|2

i,j=1

The following proposition show that the Frobenius norm is a matrix norm satisfying other

nice properties.

Proposition 7.5. The Frobenius norm

on M

n(C) satisfies the following properties:

(1) It is a matrix norm; that is, AB

, for all A, B

F ≤

A F

∈ Mn(C).

(2) It is unitarily invariant, which means that for all unitary matrices U, V , we have

= U A

= AV

= U AV

7.2. MATRIX NORMS

215

√

(3)

ρ(A∗A) ≤ A

n ρ(A∗A), for all A

F ≤

∈ Mn(C).

Proof. (1) The only property that requires a proof is the fact AB

. This

F ≤

A F

follows from the Cauchy–Schwarz inequality:

AB 2 =

ikbkj

i,j=1

k=1

≤

|aih|2

|bkj|2

i,j=1

h=1

k=1

|aih|2

|bkj|2 = A 2 B 2 .

i,h=1

k,j=1

(2) We have

A 2 = tr(A∗A) = tr(V V ∗A∗A) = tr(V ∗A∗AV ) = AV 2 ,

and

A 2 = tr(A∗A) = tr(A∗U ∗U A) = U A 2 .

The identity

= U AV

follows from the previous two.

(3) It is well known that the trace of a matrix is equal to the sum of its eigenvalues.

Furthermore, A∗A is symmetric positive semidefinite (which means that its eigenvalues are

nonnegative), so ρ(A∗A) is the largest eigenvalue of A∗A and

ρ(A∗A) ≤ tr(A∗A) ≤ nρ(A∗A),

which yields (3) by taking square roots.

Remark: The Frobenius norm is also known as the Hilbert-Schmidt norm or the Schur

norm. So many famous names associated with such a simple thing!

We now give another method for obtaining matrix norms using subordinate norms. First,

we need a proposition that shows that in a finite-dimensional space, the linear map induced

by a matrix is bounded, and thus continuous.

Proposition 7.6. For every norm

(or R ), for every matrix A ∈ Mn(C) (or

A ∈ Mn(R)), there is a real constant CA > 0, such that

Au ≤ CA u ,

for every vector u ∈ n

(or u ∈ R if A is real).