Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 28

Differential Calculus

28.1

Directional Derivatives, Total Derivatives

This chapter contains a review of basic notions of differential calculus. First, we review

the definition of the derivative of a function f : R → R. Next, we define directional deriva-

tives and the total derivative of a function f : E → F between normed affine spaces. Basic

properties of derivatives are shown, including the chain rule. We show how derivatives are

represented by Jacobian matrices. The mean value theorem is stated, as well as the implicit

function theorem and the inverse function theorem. Diffeomorphisms and local diffeomor-

phisms are defined. Tangent spaces are defined. Higher-order derivatives are defined, as well

as the Hessian. Schwarz’s lemma (about the commutativity of partials) is stated. Several

versions of Taylor’s formula are stated, and a famous formula due to Faà di Bruno’s is given.

We first review the notion of the derivative of a real-valued function whose domain is an

open subset of R.

Let f : A → R, where A is a nonempty open subset of R, and consider any a ∈ A.

The main idea behind the concept of the derivative of f at a, denoted by f (a), is that

locally around a (that is, in some small open set U ⊆ A containing a), the function f is

approximated linearly by the map

x → f(a) + f (a)(x − a).

Part of the difficulty in extending this idea to more complex spaces is to give an adequate

notion of linear approximation. Of course, we will use linear maps! Let us now review the

formal definition of the derivative of a real-valued function.

Definition 28.1. Let A be any nonempty open subset of R, and let a ∈ A. For any function

f : A → R, the derivative of f at a ∈ A is the limit (if it exists)

f (a + h) − f(a)

lim

,

h→0, h∈U

h

781

782

CHAPTER 28. DIFFERENTIAL CALCULUS

where U = {h ∈ R | a + h ∈ A, h = 0}. This limit is denoted by f (a), or Df(a), or df (a).

dx

If f (a) exists for every a ∈ A, we say that f is differentiable on A. In this case, the map

a → f (a) is denoted by f , or Df, or df .

dx

Note that since A is assumed to be open, A − {a} is also open, and since the function

h → a + h is continuous and U is the inverse image of A − {a} under this function, U is

indeed open and the definition makes sense.

We can also define f (a) as follows: there is some function , such that,

f (a + h) = f (a) + f (a) · h + (h)h,

whenever a + h ∈ A, where (h) is defined for all h such that a + h ∈ A, and

lim

(h) = 0.

h→0, h∈U

Remark: We can also define the notion of derivative of f at a on the left , and derivative

of f at a on the right . For example, we say that the derivative of f at a on the left is the

limit f (a−) (if it exists)

f (a + h) − f(a)

lim

,

h→0, h∈U

h

where U = {h ∈ R | a + h ∈ A, h < 0}.

If a function f as in Definition 28.1 has a derivative f (a) at a, then it is continuous at

a. If f is differentiable on A, then f is continuous on A. The composition of differentiable

functions is differentiable.

Remark: A function f has a derivative f (a) at a iff the derivative of f on the left at a and

the derivative of f on the right at a exist, and if they are equal. Also, if the derivative of f

on the left at a exists, then f is continuous on the left at a (and similarly on the right).

We would like to extend the notion of derivative to functions f : A → F , where E and F

are normed affine spaces, and A is some nonempty open subset of E. The first difficulty is

to make sense of the quotient

f (a + h) − f(a).

h

If E and F are normed affine spaces, it will be notationally convenient to assume that

the vector space associated with E is denoted by E, and that the vector space associated

with F is denoted as F .

Since F is a normed affine space, making sense of f (a+h)−f(a) is easy: we can define this

−−−−−−−−−→

as f (a)f (a + h), the unique vector translating f (a) to f (a + h). We should note however,

that this quantity is a vector and not a point. Nevertheless, in defining derivatives, it is

−−−−−−−−−→

notationally more pleasant to denote f (a)f (a + h) by f (a + h) − f(a). Thus, in the rest of

28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES

783

this chapter, the vector ab will be denoted by b − a. But now, how do we define the quotient

by a vector? Well, we don’t!

A first possibility is to consider the directional derivative with respect to a vector u = 0

in E. We can consider the vector f (a + tu) − f(a), where t ∈ R (or t ∈ C). Now,

f (a + tu) − f(a)

t

makes sense. The idea is that in E, the points of the form a + tu for t in some small interval

[− , + ] in R (or C) form a line segment [r, s] in A containing a, and that the image of

this line segment defines a small curve segment on f (A). This curve segment is defined by

the map t → f(a + tu), from [r, s] to F , and the directional derivative Duf(a) defines the

direction of the tangent line at a to this curve. This leads us to the following definition.

Definition 28.2. Let E and F be two normed affine spaces, let A be a nonempty open

subset of E, and let f : A → F be any function. For any a ∈ A, for any u = 0 in E, the

directional derivative of f at a w.r.t. the vector u, denoted by Duf (a), is the limit (if it

exists)

f (a + tu) − f(a)

lim

,

t→0, t∈U

t

where U = {t ∈ R | a + tu ∈ A, t = 0} (or U = {t ∈ C | a + tu ∈ A, t = 0}).

Since the map t → a + tu is continuous, and since A − {a} is open, the inverse image U

of A − {a} under the above map is open, and the definition of the limit in Definition 28.2

makes sense.

Remark: Since the notion of limit is purely topological, the existence and value of a di-

rectional derivative is independent of the choice of norms in E and F , as long as they are

equivalent norms.

The directional derivative is sometimes called the Gâteaux derivative.

In the special case where E = R and F = R, and we let u = 1 (i.e., the real number 1,

viewed as a vector), it is immediately verified that D1f (a) = f (a), in the sense of Definition

28.1. When E = R (or E = C) and F is any normed vector space, the derivative D1f(a),

also denoted by f (a), provides a suitable generalization of the notion of derivative.

However, when E has dimension ≥ 2, directional derivatives present a serious problem,

which is that their definition is not sufficiently uniform. Indeed, there is no reason to believe

that the directional derivatives w.r.t. all nonnull vectors u share something in common. As

a consequence, a function can have all directional derivatives at a, and yet not be continuous

at a. Two functions may have all directional derivatives in some open sets, and yet their

composition may not. Thus, we introduce a more uniform notion.

784

CHAPTER 28. DIFFERENTIAL CALCULUS

Definition 28.3. Let E and F be two normed affine spaces, let A be a nonempty open subset

of E, and let f : A → F be any function. For any a ∈ A, we say that f is differentiable at

a ∈ A if there is a linear continuous map L: E → F and a function , such that

f (a + h) = f (a) + L(h) + (h) h

for every a + h ∈ A, where (h) is defined for every h such that a + h ∈ A and

lim

(h) = 0,

h→0, h∈U

where U = {h ∈ E | a + h ∈ A, h = 0}. The linear map L is denoted by Df(a), or Dfa, or

df (a), or dfa, or f (a), and it is called the Fréchet derivative, or derivative, or total derivative,

or total differential , or differential , of f at a.

Since the map h → a+h from E to E is continuous, and since A is open in E, the inverse

image U of A − {a} under the above map is open in E, and it makes sense to say that

lim

(h) = 0.

h→0, h∈U

Note that for every h ∈ U, since h = 0, (h) is uniquely determined since

f (a + h) − f(a) − L(h)

(h) =

,

h

and that the value (0) plays absolutely no role in this definition. The condition for f to be

differentiable at a amounts to the fact that

f (a + h) − f(a) − L(h)

lim

= 0

h→0

h

as h = 0 approaches 0, when a + h ∈ A. However, it does no harm to assume that (0) = 0,

and we will assume this from now on.

Again, we note that the derivative Df (a) of f at a provides an affine approximation of

f , locally around a.

Remark: Since the notion of limit is purely topological, the existence and value of a deriva-

tive is independent of the choice of norms in E and F , as long as they are equivalent norms.

Note that the continuous linear map L is unique, if it exists. In fact, the next proposi-

tion implies this as a corollary. The following proposition shows that our new definition is

consistent with the definition of the directional derivative.

Proposition 28.1. Let E and F be two normed affine spaces, let A be a nonempty open

subset of E, and let f : A → F be any function. For any a ∈ A, if Df(a) is defined, then

f is continuous at a and f has a directional derivative Duf (a) for every u = 0 in E, and

furthermore,

Duf (a) = Df (a)(u).

28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES

785

Proof. If h = 0 approaches 0, since L is continuous, (h) h approaches 0, and thus, f is

continuous at a. For any u = 0 in E, for |t| ∈ R small enough (where t ∈ R or t ∈ C), we

have a + tu ∈ A, and letting h = tu, we have

f (a + tu) = f (a) + tL(u) + (tu)|t| u ,

and for t = 0,

f (a + tu) − f(a)

|t|

= L(u) +

(tu) u ,

t

t

and the limit when t = 0 approaches 0 is indeed Duf (a).

The uniqueness of L follows from Proposition 28.1. Also, when E is of finite dimension, it

is easily shown that every linear map is continuous, and this assumption is then redundant.

It is important to note that the derivative Df (a) of f at a is a continuous linear map

from the vector space E to the vector space F , and not a function from the affine space E

to the affine space F .

As an example, consider the map, f : Mn(R) → Mn(R), given by

f (A) = A A − I,

where Mn(R) is equipped with any matrix norm, since they are all equivalent; for example,

pick the Frobenius norm, A

=

tr(A A). We claim that

F

Df (A)(H) = A H + H A,

for all A and H in Mn(R).

We have

f (A + H) − f(A) − (A H + H A) = (A + H) (A + H) − I − (A A − I) − A H − H A

= A A + A H + H A + H H − A A − A H − H A

= H H.

It follows that

f (A + H) − f(A) − (A H + H A)

H H

(H) =

=

,

H

H

and since our norm is the Frobenius norm,

H H

H

H

(H) =

= H

= H ,

H

H

so

lim (H) = 0,

H→0

and we conclude that

Df (A)(H) = A H + H A.

786

CHAPTER 28. DIFFERENTIAL CALCULUS

If Df (a) exists for every a ∈ A, we get a map

Df : A → L(E; F ),

called the derivative of f on A, and also denoted by df . Recall that L(E; F ) denotes the

vector space of all continuous maps from E to F .

When E is of finite dimension n, for any frame (a0, (u1, . . . , un)) of E, where (u1, . . . , un)

is a basis of E, we can define the directional derivatives with respect to the vectors in the

basis (u1, . . . , un) (actually, we can also do it for an infinite frame). This way, we obtain the

definition of partial derivatives, as follows.

Definition 28.4. For any two normed affine spaces E and F , if E is of finite dimension

n, for every frame (a0, (u1, . . . , un)) for E, for every a ∈ E, for every function f : E → F ,

the directional derivatives Du f (a) (if they exist) are called the partial derivatives of f with

j

respect to the frame (a0, (u1, . . . , un)). The partial derivative Du f (a) is also denoted by

j

∂f

∂jf (a), or

(a).

∂xj

∂f

The notation

(a) for a partial derivative, although customary and going back to

∂xj

Leibniz, is a “logical obscenity.” Indeed, the variable xj really has nothing to do with the

formal definition. This is just another of these situations where tradition is just too hard to

overthrow!

We now consider a number of standard results about derivatives.

Proposition 28.2. Given two normed affine spaces E and F , if f : E → F is a constant

function, then Df (a) = 0, for every a ∈ E. If f : E → F is a continuous affine map, then

Df (a) = f , for every a ∈ E, the linear map associated with f.

Proof. Straightforward.

Proposition 28.3. Given a normed affine space E and a normed vector space F , for any

two functions f, g : E → F , for every a ∈ E, if Df(a) and Dg(a) exist, then D(f + g)(a) and

D(λf )(a) exist, and

D(f + g)(a) = Df (a) + Dg(a),

D(λf )(a) = λDf (a).

Proof. Straightforward.

Proposition 28.4. Given three normed vector spaces E1, E2, and F , for any continuous

bilinear map

f : E1 × E2 → F , for every (a, b) ∈ E1 × E2, Df(a, b) exists, and for every u ∈ E1 and

v ∈ E2,

Df (a, b)(u, v) = f (u, b) + f (a, v).

28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES

787

Proof. Straightforward.

We now state the very useful chain rule.

Theorem 28.5. Given three normed affine spaces E, F , and G, let A be an open set in

E, and let B an open set in F . For any functions f : A → F and g : B → G, such that

f (A) ⊆ B, for any a ∈ A, if Df(a) exists and Dg(f(a)) exists, then D(g ◦ f)(a) exists, and

D(g ◦ f)(a) = Dg(f(a)) ◦ Df(a).

Proof. It is not difficult, but more involved than the previous two.

Theorem 28.5 has many interesting consequences. We mention two corollaries.

Proposition 28.6. Given three normed affine spaces E, F , and G, for any open subset A in

E, for any a ∈ A, let f : A → F such that Df(a) exists, and let g : F → G be a continuous

affine map. Then, D(g ◦ f)(a) exists, and

D(g ◦ f)(a) = g ◦ Df(a),

where g is the linear map associated with the affine map g.

Proposition 28.7. Given two normed affine spaces E and F , let A be some open subset in

E, let B be some open subset in F , let f : A → B be a bijection from A to B, and assume

that Df exists on A and that Df −1 exists on B. Then, for every a ∈ A,

Df −1(f (a)) = (Df (a))−1.

Proposition 28.7 has the remarkable consequence that the two vector spaces E and F

have the same dimension. In other words, a local property, the existence of a bijection f

between an open set A of E and an open set B of F , such that f is differentiable on A and

f −1 is differentiable on B, implies a global property, that the two vector spaces E and F

have the same dimension.

We now consider the situation where the normed affine space F is a finite direct sum

F = (F1, b1) ⊕ · · · ⊕ (Fm, bm).

Proposition 28.8. Given normed affine spaces E and F = (F1, b1) ⊕ · · · ⊕ (Fm, bm), given

any open subset A of E, for any a ∈ A, for any function f : A → F , letting f = (f1, . . . , fm),

Df (a) exists iff every Dfi(a) exists, and

Df (a) = in1 ◦ Df1(a) + · · · + inm ◦ Dfm(a).

Proof. Observe that f (a + h) − f(a) = (f(a + h) − b) − (f(a) − b), where b = (b1, . . . , bm),

and thus, as far as dealing with derivatives, Df (a) is equal to Dfb(a), where fb : E → F is

defined such that fb(x) = f (x)−b, for every x ∈ E. Thus, we can work with the vector space

F instead of the affine space F . The proposition is then a simple application of Theorem

28.5.

788

CHAPTER 28. DIFFERENTIAL CALCULUS

In the special case where F is a normed affine space of finite dimension m, for any frame

(b0, (v1, . . . , vm)) of F , where (v1, . . . , vm) is a basis of F , every point x ∈ F can be expressed

uniquely as

x = b0 + x1v1 + · · · + xmvm,

where (x1, . . . , xm) ∈ Km, the coordinates of x in the frame (b0, (v1, . . . , vm)) (where K = R

or K = C). Thus, letting Fi be the standard normed affine space K with its natural

structure, we note that F is isomorphic to the direct sum F = (K, 0) ⊕ · · · ⊕ (K, 0). Then,

every function f : E → F is represented by m functions (f1, . . . , fm), where fi : E → K

(where K = R or K = C), and

f (x) = b0 + f1(x)v1 + · · · + fm(x)vm,

for every x ∈ E. The following proposition is an immediate corollary of Proposition 28.8.

Proposition 28.9. For any two normed affine spaces E and F , if F is of finite dimension

m, for any frame (b0, (v1, . . . , vm)) of F , where (v1, . . . , vm) is a basis of F , for every a ∈ E,

a function f : E → F is differentiable at a iff each fi is differentiable at a, and

Df (a)(u) = Df1(a)(u)v1 + · · · + Dfm(a)(u)vm,

for every u ∈ E.

We now consider the situation where E is a finite direct sum. Given a normed affine

space E = (E1, a1) ⊕ · · · ⊕ (En, an) and a normed affine space F , given any open subset A

of E, for any c = (c1, . . . , cn) ∈ A, we define the continuous functions icj : Ej → E, such that

icj(x) = (c1, . . . , cj−1, x, cj+1, . . . , cn).

For any function f : A → F , we have functions f ◦ icj : Ej → F , defined on (icj)−1(A), which

contains cj. If D(f ◦icj)(cj) exists, we call it the partial derivative of f w.r.t. its jth argument,

at c. We also denote this derivative by Djf (c). Note that Djf (c) ∈ L(Ej; F ).

This notion is a generalization of the notion defined in Definition 28.4. In fact, when

E is of dimension n, and a frame (a0, (u1, . . . , un)) has been chosen, we can write E =

(E1, a1) ⊕ · · · ⊕ (En, an), for some obvious (Ej, aj) (as explained just after Proposition 28.8),

and then

Djf (c)(λuj) = λ∂jf (c),

and the two notions are consistent.

The definition of icj and of Djf(c) also makes sense for a finite product E1 × · · · × En of

affine spaces Ei. We will use freely the notation ∂jf (c) instead of Djf (c).

The notion ∂jf (c) introduced in Definition 28.4 is really that of the vector derivative,

whereas Djf (c) is the corresponding linear map. Although perhaps confusing, we identify

the two notions. The following proposition holds.

28.2. JACOBIAN MATRICES

789

Proposition 28.10. Given a normed affine space E = (E1, a1) ⊕ · · · ⊕ (En, an), and a

normed affine space F , given any open subset A of E, for any function f : A → F , for every

c ∈ A, if Df(c) exists, then each Djf(c) exists, and

Df (c)(u1, . . . , un) = D1f (c)(u1) + · · · + Dnf(c)(un),

for every ui ∈ Ei, 1 ≤ i ≤ n. The same result holds for the finite product E1 × · · · × En.

Proof. Since every c ∈ E can be written as c = a + c − a, where a = (a1, . . . , an), defining

fa : E → F such that, fa(u) = f(a + u), for every u ∈ E, clearly, Df(c) = Dfa(c − a), and

thus, we can work with the function fa whose domain is the vector space E. The proposition

is then a simple application of Theorem 28.5.

28.2

Jacobian Matrices

If both E and F are of finite dimension, for any frame (a0, (u1, . . . , un)) of E and any frame

(b0, (v1, . . . , vm)) of F , every function f : E → F is determined by m functions fi : E → R

(or fi : E → C), where

f (x) = b0 + f1(x)v1 + · · · + fm(x)vm,

for every x ∈ E. From Proposition 28.1, we have

Df (a)(uj) = Du f (a) = ∂

j

j f (a),

and from Proposition 28.9, we have

Df (a)(uj) = Df1(a)(uj)v1 + · · · + Dfi(a)(uj)vi + · · · + Dfm(a)(uj)vm,

that is,

Df (a)(uj) = ∂jf1(a)v1 + · · · + ∂jfi(a)vi + · · · + ∂jfm(a)vm.

Since the j-th column of the m×n-matrix representing Df(a) w.r.t. the bases (u1, . . . , un)

and (v1, . . . , vm) is equal to the components of the vector Df (a)(uj) over the basis (v1, . . . ,vm),

the linear map Df (a) is determined by the m × n-matrix J(f)(a) = (∂jfi(a)), (or J(f)(a) =

∂f

(

i (a))):

∂xj

 ∂

1f1(a)

∂2f1(a) . . . ∂nf1(a)

 ∂1f2(a)

∂2f2(a) . . . ∂nf2(a) 

J(f )(a) = 

.

.

.

.

..

..

. .

..

∂1fm(a) ∂2fm(a) . . . ∂nfm(a)

790

CHAPTER 28. DIFFERENTIAL CALCULUS

or

 ∂f

1

∂f

∂f

(a)

1 (a) . . .

1 (a)

 ∂ x1

∂x2

∂xn

 ∂ f2

∂f2

∂f2

(a)

(a) . . .

(a)

J(f )(a) =  ∂x1

∂x2

∂xn

.

.

.

.

.

.

..

. .

..

 ∂fm

∂f

∂f

(a)

m (a) . . .

m (a)

∂x1

∂x2

∂xn

This matrix is called the Jacobian matrix of Df at a. When m = n, the determinant,

det(J(f )(a)), of J(f )(a) is called the Jacobian of Df (a). From a previous remark, we know

that this determinant in fact only depends on Df (a), and not on specific bases. However,

partial derivatives give a means for computing it.

When E =

n

m

n

m

R

and F = R , for any function f : R → R , it is easy to compute the

∂f

partial derivatives

i (a). We simply treat the function f

n →

∂x

i : R

R as a function of its j-th

j

argument, leaving the others fixed, and compute the derivative as in Definition 28.1, that is,

the usual derivative.

Example 28.1. For example, consider the function f :

2

2

R → R , defined such that

f (r, θ) = (r cos(θ), r sin(θ)).

Then, we have

cos(θ) −r sin(θ)

J(f )(r, θ) =

sin(θ)

r cos(θ)

and the Jacobian (determinant) has value det(J(f )(r, θ)) = r.

In the case where E = R (or E = C), for any function f : R → F (or f : C → F ), the

Jacobian matrix of Df (a) is a column vector. In fact, this column vector is just D1f (a).

Then, for every λ ∈ R (or λ ∈ C),

Df (a)(λ) = λD1f (a).

This case is sufficiently important to warrant a definition.

Definition 28.5. Given a function f : R → F (or f : C → F ), where F is a normed affine

space, the vector

Df (a)(1) = D1f (a)

is called the vector d