28.1
Directional Derivatives, Total Derivatives
This chapter contains a review of basic notions of differential calculus. First, we review
the definition of the derivative of a function f : R → R. Next, we define directional deriva-
tives and the total derivative of a function f : E → F between normed affine spaces. Basic
properties of derivatives are shown, including the chain rule. We show how derivatives are
represented by Jacobian matrices. The mean value theorem is stated, as well as the implicit
function theorem and the inverse function theorem. Diffeomorphisms and local diffeomor-
phisms are defined. Tangent spaces are defined. Higher-order derivatives are defined, as well
as the Hessian. Schwarz’s lemma (about the commutativity of partials) is stated. Several
versions of Taylor’s formula are stated, and a famous formula due to Faà di Bruno’s is given.
We first review the notion of the derivative of a real-valued function whose domain is an
open subset of R.
Let f : A → R, where A is a nonempty open subset of R, and consider any a ∈ A.
The main idea behind the concept of the derivative of f at a, denoted by f (a), is that
locally around a (that is, in some small open set U ⊆ A containing a), the function f is
approximated linearly by the map
x → f(a) + f (a)(x − a).
Part of the difficulty in extending this idea to more complex spaces is to give an adequate
notion of linear approximation. Of course, we will use linear maps! Let us now review the
formal definition of the derivative of a real-valued function.
Definition 28.1. Let A be any nonempty open subset of R, and let a ∈ A. For any function
f : A → R, the derivative of f at a ∈ A is the limit (if it exists)
f (a + h) − f(a)
lim
,
h→0, h∈U
h
781
782
CHAPTER 28. DIFFERENTIAL CALCULUS
where U = {h ∈ R | a + h ∈ A, h = 0}. This limit is denoted by f (a), or Df(a), or df (a).
dx
If f (a) exists for every a ∈ A, we say that f is differentiable on A. In this case, the map
a → f (a) is denoted by f , or Df, or df .
dx
Note that since A is assumed to be open, A − {a} is also open, and since the function
h → a + h is continuous and U is the inverse image of A − {a} under this function, U is
indeed open and the definition makes sense.
We can also define f (a) as follows: there is some function , such that,
f (a + h) = f (a) + f (a) · h + (h)h,
whenever a + h ∈ A, where (h) is defined for all h such that a + h ∈ A, and
lim
(h) = 0.
h→0, h∈U
Remark: We can also define the notion of derivative of f at a on the left , and derivative
of f at a on the right . For example, we say that the derivative of f at a on the left is the
limit f (a−) (if it exists)
f (a + h) − f(a)
lim
,
h→0, h∈U
h
where U = {h ∈ R | a + h ∈ A, h < 0}.
If a function f as in Definition 28.1 has a derivative f (a) at a, then it is continuous at
a. If f is differentiable on A, then f is continuous on A. The composition of differentiable
functions is differentiable.
Remark: A function f has a derivative f (a) at a iff the derivative of f on the left at a and
the derivative of f on the right at a exist, and if they are equal. Also, if the derivative of f
on the left at a exists, then f is continuous on the left at a (and similarly on the right).
We would like to extend the notion of derivative to functions f : A → F , where E and F
are normed affine spaces, and A is some nonempty open subset of E. The first difficulty is
to make sense of the quotient
f (a + h) − f(a).
h
If E and F are normed affine spaces, it will be notationally convenient to assume that
the vector space associated with E is denoted by E, and that the vector space associated
with F is denoted as F .
Since F is a normed affine space, making sense of f (a+h)−f(a) is easy: we can define this
−−−−−−−−−→
as f (a)f (a + h), the unique vector translating f (a) to f (a + h). We should note however,
that this quantity is a vector and not a point. Nevertheless, in defining derivatives, it is
−−−−−−−−−→
notationally more pleasant to denote f (a)f (a + h) by f (a + h) − f(a). Thus, in the rest of
28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES
783
−
→
this chapter, the vector ab will be denoted by b − a. But now, how do we define the quotient
by a vector? Well, we don’t!
A first possibility is to consider the directional derivative with respect to a vector u = 0
in E. We can consider the vector f (a + tu) − f(a), where t ∈ R (or t ∈ C). Now,
f (a + tu) − f(a)
t
makes sense. The idea is that in E, the points of the form a + tu for t in some small interval
[− , + ] in R (or C) form a line segment [r, s] in A containing a, and that the image of
this line segment defines a small curve segment on f (A). This curve segment is defined by
the map t → f(a + tu), from [r, s] to F , and the directional derivative Duf(a) defines the
direction of the tangent line at a to this curve. This leads us to the following definition.
Definition 28.2. Let E and F be two normed affine spaces, let A be a nonempty open
subset of E, and let f : A → F be any function. For any a ∈ A, for any u = 0 in E, the
directional derivative of f at a w.r.t. the vector u, denoted by Duf (a), is the limit (if it
exists)
f (a + tu) − f(a)
lim
,
t→0, t∈U
t
where U = {t ∈ R | a + tu ∈ A, t = 0} (or U = {t ∈ C | a + tu ∈ A, t = 0}).
Since the map t → a + tu is continuous, and since A − {a} is open, the inverse image U
of A − {a} under the above map is open, and the definition of the limit in Definition 28.2
makes sense.
Remark: Since the notion of limit is purely topological, the existence and value of a di-
rectional derivative is independent of the choice of norms in E and F , as long as they are
equivalent norms.
The directional derivative is sometimes called the Gâteaux derivative.
In the special case where E = R and F = R, and we let u = 1 (i.e., the real number 1,
viewed as a vector), it is immediately verified that D1f (a) = f (a), in the sense of Definition
28.1. When E = R (or E = C) and F is any normed vector space, the derivative D1f(a),
also denoted by f (a), provides a suitable generalization of the notion of derivative.
However, when E has dimension ≥ 2, directional derivatives present a serious problem,
which is that their definition is not sufficiently uniform. Indeed, there is no reason to believe
that the directional derivatives w.r.t. all nonnull vectors u share something in common. As
a consequence, a function can have all directional derivatives at a, and yet not be continuous
at a. Two functions may have all directional derivatives in some open sets, and yet their
composition may not. Thus, we introduce a more uniform notion.
784
CHAPTER 28. DIFFERENTIAL CALCULUS
Definition 28.3. Let E and F be two normed affine spaces, let A be a nonempty open subset
of E, and let f : A → F be any function. For any a ∈ A, we say that f is differentiable at
a ∈ A if there is a linear continuous map L: E → F and a function , such that
f (a + h) = f (a) + L(h) + (h) h
for every a + h ∈ A, where (h) is defined for every h such that a + h ∈ A and
lim
(h) = 0,
h→0, h∈U
where U = {h ∈ E | a + h ∈ A, h = 0}. The linear map L is denoted by Df(a), or Dfa, or
df (a), or dfa, or f (a), and it is called the Fréchet derivative, or derivative, or total derivative,
or total differential , or differential , of f at a.
Since the map h → a+h from E to E is continuous, and since A is open in E, the inverse
image U of A − {a} under the above map is open in E, and it makes sense to say that
lim
(h) = 0.
h→0, h∈U
Note that for every h ∈ U, since h = 0, (h) is uniquely determined since
f (a + h) − f(a) − L(h)
(h) =
,
h
and that the value (0) plays absolutely no role in this definition. The condition for f to be
differentiable at a amounts to the fact that
f (a + h) − f(a) − L(h)
lim
= 0
h→0
h
as h = 0 approaches 0, when a + h ∈ A. However, it does no harm to assume that (0) = 0,
and we will assume this from now on.
Again, we note that the derivative Df (a) of f at a provides an affine approximation of
f , locally around a.
Remark: Since the notion of limit is purely topological, the existence and value of a deriva-
tive is independent of the choice of norms in E and F , as long as they are equivalent norms.
Note that the continuous linear map L is unique, if it exists. In fact, the next proposi-
tion implies this as a corollary. The following proposition shows that our new definition is
consistent with the definition of the directional derivative.
Proposition 28.1. Let E and F be two normed affine spaces, let A be a nonempty open
subset of E, and let f : A → F be any function. For any a ∈ A, if Df(a) is defined, then
f is continuous at a and f has a directional derivative Duf (a) for every u = 0 in E, and
furthermore,
Duf (a) = Df (a)(u).
28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES
785
Proof. If h = 0 approaches 0, since L is continuous, (h) h approaches 0, and thus, f is
continuous at a. For any u = 0 in E, for |t| ∈ R small enough (where t ∈ R or t ∈ C), we
have a + tu ∈ A, and letting h = tu, we have
f (a + tu) = f (a) + tL(u) + (tu)|t| u ,
and for t = 0,
f (a + tu) − f(a)
|t|
= L(u) +
(tu) u ,
t
t
and the limit when t = 0 approaches 0 is indeed Duf (a).
The uniqueness of L follows from Proposition 28.1. Also, when E is of finite dimension, it
is easily shown that every linear map is continuous, and this assumption is then redundant.
It is important to note that the derivative Df (a) of f at a is a continuous linear map
from the vector space E to the vector space F , and not a function from the affine space E
to the affine space F .
As an example, consider the map, f : Mn(R) → Mn(R), given by
f (A) = A A − I,
where Mn(R) is equipped with any matrix norm, since they are all equivalent; for example,
pick the Frobenius norm, A
=
tr(A A). We claim that
F
Df (A)(H) = A H + H A,
for all A and H in Mn(R).
We have
f (A + H) − f(A) − (A H + H A) = (A + H) (A + H) − I − (A A − I) − A H − H A
= A A + A H + H A + H H − A A − A H − H A
= H H.
It follows that
f (A + H) − f(A) − (A H + H A)
H H
(H) =
=
,
H
H
and since our norm is the Frobenius norm,
H H
H
H
(H) =
≤
= H
= H ,
H
H
so
lim (H) = 0,
H→0
and we conclude that
Df (A)(H) = A H + H A.
786
CHAPTER 28. DIFFERENTIAL CALCULUS
If Df (a) exists for every a ∈ A, we get a map
Df : A → L(E; F ),
called the derivative of f on A, and also denoted by df . Recall that L(E; F ) denotes the
vector space of all continuous maps from E to F .
When E is of finite dimension n, for any frame (a0, (u1, . . . , un)) of E, where (u1, . . . , un)
is a basis of E, we can define the directional derivatives with respect to the vectors in the
basis (u1, . . . , un) (actually, we can also do it for an infinite frame). This way, we obtain the
definition of partial derivatives, as follows.
Definition 28.4. For any two normed affine spaces E and F , if E is of finite dimension
n, for every frame (a0, (u1, . . . , un)) for E, for every a ∈ E, for every function f : E → F ,
the directional derivatives Du f (a) (if they exist) are called the partial derivatives of f with
j
respect to the frame (a0, (u1, . . . , un)). The partial derivative Du f (a) is also denoted by
j
∂f
∂jf (a), or
(a).
∂xj
∂f
The notation
(a) for a partial derivative, although customary and going back to
∂xj
Leibniz, is a “logical obscenity.” Indeed, the variable xj really has nothing to do with the
formal definition. This is just another of these situations where tradition is just too hard to
overthrow!
We now consider a number of standard results about derivatives.
Proposition 28.2. Given two normed affine spaces E and F , if f : E → F is a constant
function, then Df (a) = 0, for every a ∈ E. If f : E → F is a continuous affine map, then
Df (a) = f , for every a ∈ E, the linear map associated with f.
Proof. Straightforward.
Proposition 28.3. Given a normed affine space E and a normed vector space F , for any
two functions f, g : E → F , for every a ∈ E, if Df(a) and Dg(a) exist, then D(f + g)(a) and
D(λf )(a) exist, and
D(f + g)(a) = Df (a) + Dg(a),
D(λf )(a) = λDf (a).
Proof. Straightforward.
Proposition 28.4. Given three normed vector spaces E1, E2, and F , for any continuous
bilinear map
f : E1 × E2 → F , for every (a, b) ∈ E1 × E2, Df(a, b) exists, and for every u ∈ E1 and
v ∈ E2,
Df (a, b)(u, v) = f (u, b) + f (a, v).
28.1. DIRECTIONAL DERIVATIVES, TOTAL DERIVATIVES
787
Proof. Straightforward.
We now state the very useful chain rule.
Theorem 28.5. Given three normed affine spaces E, F , and G, let A be an open set in
E, and let B an open set in F . For any functions f : A → F and g : B → G, such that
f (A) ⊆ B, for any a ∈ A, if Df(a) exists and Dg(f(a)) exists, then D(g ◦ f)(a) exists, and
D(g ◦ f)(a) = Dg(f(a)) ◦ Df(a).
Proof. It is not difficult, but more involved than the previous two.
Theorem 28.5 has many interesting consequences. We mention two corollaries.
Proposition 28.6. Given three normed affine spaces E, F , and G, for any open subset A in
E, for any a ∈ A, let f : A → F such that Df(a) exists, and let g : F → G be a continuous
affine map. Then, D(g ◦ f)(a) exists, and
D(g ◦ f)(a) = g ◦ Df(a),
where g is the linear map associated with the affine map g.
Proposition 28.7. Given two normed affine spaces E and F , let A be some open subset in
E, let B be some open subset in F , let f : A → B be a bijection from A to B, and assume
that Df exists on A and that Df −1 exists on B. Then, for every a ∈ A,
Df −1(f (a)) = (Df (a))−1.
Proposition 28.7 has the remarkable consequence that the two vector spaces E and F
have the same dimension. In other words, a local property, the existence of a bijection f
between an open set A of E and an open set B of F , such that f is differentiable on A and
f −1 is differentiable on B, implies a global property, that the two vector spaces E and F
have the same dimension.
We now consider the situation where the normed affine space F is a finite direct sum
F = (F1, b1) ⊕ · · · ⊕ (Fm, bm).
Proposition 28.8. Given normed affine spaces E and F = (F1, b1) ⊕ · · · ⊕ (Fm, bm), given
any open subset A of E, for any a ∈ A, for any function f : A → F , letting f = (f1, . . . , fm),
Df (a) exists iff every Dfi(a) exists, and
Df (a) = in1 ◦ Df1(a) + · · · + inm ◦ Dfm(a).
Proof. Observe that f (a + h) − f(a) = (f(a + h) − b) − (f(a) − b), where b = (b1, . . . , bm),
and thus, as far as dealing with derivatives, Df (a) is equal to Dfb(a), where fb : E → F is
defined such that fb(x) = f (x)−b, for every x ∈ E. Thus, we can work with the vector space
F instead of the affine space F . The proposition is then a simple application of Theorem
28.5.
788
CHAPTER 28. DIFFERENTIAL CALCULUS
In the special case where F is a normed affine space of finite dimension m, for any frame
(b0, (v1, . . . , vm)) of F , where (v1, . . . , vm) is a basis of F , every point x ∈ F can be expressed
uniquely as
x = b0 + x1v1 + · · · + xmvm,
where (x1, . . . , xm) ∈ Km, the coordinates of x in the frame (b0, (v1, . . . , vm)) (where K = R
or K = C). Thus, letting Fi be the standard normed affine space K with its natural
structure, we note that F is isomorphic to the direct sum F = (K, 0) ⊕ · · · ⊕ (K, 0). Then,
every function f : E → F is represented by m functions (f1, . . . , fm), where fi : E → K
(where K = R or K = C), and
f (x) = b0 + f1(x)v1 + · · · + fm(x)vm,
for every x ∈ E. The following proposition is an immediate corollary of Proposition 28.8.
Proposition 28.9. For any two normed affine spaces E and F , if F is of finite dimension
m, for any frame (b0, (v1, . . . , vm)) of F , where (v1, . . . , vm) is a basis of F , for every a ∈ E,
a function f : E → F is differentiable at a iff each fi is differentiable at a, and
Df (a)(u) = Df1(a)(u)v1 + · · · + Dfm(a)(u)vm,
for every u ∈ E.
We now consider the situation where E is a finite direct sum. Given a normed affine
space E = (E1, a1) ⊕ · · · ⊕ (En, an) and a normed affine space F , given any open subset A
of E, for any c = (c1, . . . , cn) ∈ A, we define the continuous functions icj : Ej → E, such that
icj(x) = (c1, . . . , cj−1, x, cj+1, . . . , cn).
For any function f : A → F , we have functions f ◦ icj : Ej → F , defined on (icj)−1(A), which
contains cj. If D(f ◦icj)(cj) exists, we call it the partial derivative of f w.r.t. its jth argument,
at c. We also denote this derivative by Djf (c). Note that Djf (c) ∈ L(Ej; F ).
This notion is a generalization of the notion defined in Definition 28.4. In fact, when
E is of dimension n, and a frame (a0, (u1, . . . , un)) has been chosen, we can write E =
(E1, a1) ⊕ · · · ⊕ (En, an), for some obvious (Ej, aj) (as explained just after Proposition 28.8),
and then
Djf (c)(λuj) = λ∂jf (c),
and the two notions are consistent.
The definition of icj and of Djf(c) also makes sense for a finite product E1 × · · · × En of
affine spaces Ei. We will use freely the notation ∂jf (c) instead of Djf (c).
The notion ∂jf (c) introduced in Definition 28.4 is really that of the vector derivative,
whereas Djf (c) is the corresponding linear map. Although perhaps confusing, we identify
the two notions. The following proposition holds.
28.2. JACOBIAN MATRICES
789
Proposition 28.10. Given a normed affine space E = (E1, a1) ⊕ · · · ⊕ (En, an), and a
normed affine space F , given any open subset A of E, for any function f : A → F , for every
c ∈ A, if Df(c) exists, then each Djf(c) exists, and
Df (c)(u1, . . . , un) = D1f (c)(u1) + · · · + Dnf(c)(un),
for every ui ∈ Ei, 1 ≤ i ≤ n. The same result holds for the finite product E1 × · · · × En.
Proof. Since every c ∈ E can be written as c = a + c − a, where a = (a1, . . . , an), defining
fa : E → F such that, fa(u) = f(a + u), for every u ∈ E, clearly, Df(c) = Dfa(c − a), and
thus, we can work with the function fa whose domain is the vector space E. The proposition
is then a simple application of Theorem 28.5.
28.2
Jacobian Matrices
If both E and F are of finite dimension, for any frame (a0, (u1, . . . , un)) of E and any frame
(b0, (v1, . . . , vm)) of F , every function f : E → F is determined by m functions fi : E → R
(or fi : E → C), where
f (x) = b0 + f1(x)v1 + · · · + fm(x)vm,
for every x ∈ E. From Proposition 28.1, we have
Df (a)(uj) = Du f (a) = ∂
j
j f (a),
and from Proposition 28.9, we have
Df (a)(uj) = Df1(a)(uj)v1 + · · · + Dfi(a)(uj)vi + · · · + Dfm(a)(uj)vm,
that is,
Df (a)(uj) = ∂jf1(a)v1 + · · · + ∂jfi(a)vi + · · · + ∂jfm(a)vm.
Since the j-th column of the m×n-matrix representing Df(a) w.r.t. the bases (u1, . . . , un)
and (v1, . . . , vm) is equal to the components of the vector Df (a)(uj) over the basis (v1, . . . ,vm),
the linear map Df (a) is determined by the m × n-matrix J(f)(a) = (∂jfi(a)), (or J(f)(a) =
∂f
(
i (a))):
∂xj
∂
1f1(a)
∂2f1(a) . . . ∂nf1(a)
∂1f2(a)
∂2f2(a) . . . ∂nf2(a)
J(f )(a) =
.
.
.
.
..
..
. .
..
∂1fm(a) ∂2fm(a) . . . ∂nfm(a)
790
CHAPTER 28. DIFFERENTIAL CALCULUS
or
∂f
1
∂f
∂f
(a)
1 (a) . . .
1 (a)
∂ x1
∂x2
∂xn
∂ f2
∂f2
∂f2
(a)
(a) . . .
(a)
J(f )(a) = ∂x1
∂x2
∂xn
.
.
.
.
.
.
..
. .
..
∂fm
∂f
∂f
(a)
m (a) . . .
m (a)
∂x1
∂x2
∂xn
This matrix is called the Jacobian matrix of Df at a. When m = n, the determinant,
det(J(f )(a)), of J(f )(a) is called the Jacobian of Df (a). From a previous remark, we know
that this determinant in fact only depends on Df (a), and not on specific bases. However,
partial derivatives give a means for computing it.
When E =
n
m
n
m
R
and F = R , for any function f : R → R , it is easy to compute the
∂f
partial derivatives
i (a). We simply treat the function f
n →
∂x
i : R
R as a function of its j-th
j
argument, leaving the others fixed, and compute the derivative as in Definition 28.1, that is,
the usual derivative.
Example 28.1. For example, consider the function f :
2
2
R → R , defined such that
f (r, θ) = (r cos(θ), r sin(θ)).
Then, we have
cos(θ) −r sin(θ)
J(f )(r, θ) =
sin(θ)
r cos(θ)
and the Jacobian (determinant) has value det(J(f )(r, θ)) = r.
In the case where E = R (or E = C), for any function f : R → F (or f : C → F ), the
Jacobian matrix of Df (a) is a column vector. In fact, this column vector is just D1f (a).
Then, for every λ ∈ R (or λ ∈ C),
Df (a)(λ) = λD1f (a).
This case is sufficiently important to warrant a definition.
Definition 28.5. Given a function f : R → F (or f : C → F ), where F is a normed affine
space, the vector
Df (a)(1) = D1f (a)
is called the vector d