The following results can be shown.
Proposition 28.15. Let A be an open subset of
n
m
R , and let f : A → R
be a function.
For every a ∈ A, f : A → m
R
is a submersion at a iff there exists an open subset U of A
containing a, an open subset W ⊆ n−m
R
, and a diffeomorphism ϕ : U → f(U) × W , such
that,
f = π1 ◦ ϕ,
where π1 : f (U ) × W → f(U) is the first projection. Equivalently,
(f ◦ ϕ−1)(y1, . . . , ym, . . . , yn) = (y1, . . . , ym).
ϕ
U ⊆ A
/
f
&◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
f (U ) × W
π1
f (U ) ⊆ m
R
Futhermore, the image of every open subset of A under f is an open subset of F . (The same
result holds for
n
m
C
and C ).
Proposition 28.16. Let A be an open subset of
n
m
R , and let f : A → R
be a function.
For every a ∈ A, f : A →
m
R
is an immersion at a iff there exists an open subset U of
A containing a, an open subset V containing f (a) such that f (U ) ⊆ V , an open subset W
containing 0 such that W ⊆ m−n
R
, and a diffeomorphism ϕ : V → U × W , such that,
ϕ ◦ f = in1,
798
CHAPTER 28. DIFFERENTIAL CALCULUS
where in1 : U → U × W is the injection map such that in1(u) = (u, 0), or equivalently,
(ϕ ◦ f)(x1, . . . , xn) = (x1, . . . , xn, 0, . . . , 0).
f
U ⊆ A
/
in1
&▼
▼
▼
▼
▼
▼
▼
▼
▼
▼
▼
f (U ) ⊆ V
ϕ
U × W
(The same result holds for
n
m
C
and C ).
28.4
Tangent Spaces and Differentials
In this section, we discuss briefly a geometric interpretation of the notion of derivative. We
consider sets of points defined by a differentiable function. This is a special case of the notion
of a (differential) manifold.
Given two normed affine spaces E and F , let A be an open subset of E, and let f : A → F
be a function.
Definition 28.9. Given f : A → F as above, its graph Γ(f) is the set of all points
Γ(f ) = {(x, y) ∈ E × F | x ∈ A, y = f(x)}.
If Df is defined on A, we say that Γ(f ) is a differential submanifold of E × F of equation
y = f (x).
It should be noted that this is a very particular kind of differential manifold.
Example 28.3. If E =
2
R and F = R , letting f = (g, h), where g : R → R and h : R → R,
Γ(f ) is a curve in
3
2
R , of equations y = g(x), z = h(x). When E = R and F = R, Γ(f ) is a
surface in
3
R , of equation z = f (x, y).
We now define the notion of affine tangent space in a very general way. Next, we will see
what it means for manifolds Γ(f ), as in Definition 28.9.
Definition 28.10. Given a normed affine space E, given any nonempty subset M of E,
given any point a ∈ M, we say that a vector u ∈ E is tangent at a to M if there exist a
sequence (an)n∈ of points in M converging to a, and a sequence (λ
, with λ
N
n)n∈N
i ∈ R and
λn ≥ 0, such that the sequence (λn(an − a))n∈ converges to u.
N
The set of all vectors tangent at a to M is called the family of tangent vectors at a to
M and the set of all points of E of the form a + u where u belongs to the family of tangent
vectors at a to M is called the affine tangent family at a to M .
28.5. SECOND-ORDER AND HIGHER-ORDER DERIVATIVES
799
Clearly, 0 is always tangent, and if u is tangent, then so is every λu, for λ ∈ R, λ ≥ 0. If
u = 0, then the sequence (λn)n∈ must tend towards +∞. We have the following proposition.
N
Proposition 28.17. Let E and F be two normed affine spaces, let A be an open subset of
E, let a ∈ A, and let f : A → F be a function. If Df(a) exists, then the family of tangent
vectors at (a, f (a)) to Γ is a subspace Ta(Γ) of E × F , defined by the condition (equation)
(u, v) ∈ Ta(Γ) iff v = Df(a)(u),
and the affine tangent family at (a, f (a)) to Γ is an affine variety Ta(Γ) of E × F , defined
by the condition (equation)
(x, y) ∈ Ta(Γ) iff y = f(a) + Df(a)(x − a),
where Γ is the graph of f .
The proof is actually rather simple. We have Ta(Γ) = a + Ta(Γ), and since Ta(Γ) is a
subspace of E × F , the set Ta(Γ) is an affine variety. Thus, the affine tangent space at a
point (a, f (a)) is a familar object, a line, a plane, etc.
As an illustration, when E = 2
R and F = R, the affine tangent plane at the point (a, b, c)
to the surface of equation z = f (x, y), is defined by the equation
∂f
∂f
z = c +
(a, b)(x − a) +
(a, b)(y − b).
∂x
∂y
If E =
2
R and F = R , the tangent line at (a, b, c), to the curve of equations y = g(x),
z = h(x), is defined by the equations
y = b + Dg(a)(x − a),
z = c + Dh(a)(x − a).
Thus, derivatives and partial derivatives have the desired intended geometric interpreta-
tion as tangent spaces. Of course, in order to deal with this topic properly, we really would
have to go deeper into the study of (differential) manifolds.
We now briefly consider second-order and higher-order derivatives.
28.5
Second-Order and Higher-Order Derivatives
Given two normed affine spaces E and F , and some open subset A of E, if Df (a) is defined
for every a ∈ A, then we have a mapping Df : A → L(E; F ). Since L(E; F ) is a normed
vector space, if Df exists on an open subset U of A containing a, we can consider taking
the derivative of Df at some a ∈ A. If D(Df)(a) exists for every a ∈ A, we get a mapping
800
CHAPTER 28. DIFFERENTIAL CALCULUS
D2f : A → L(E; L(E; F )), where D2f(a) = D(Df)(a), for every a ∈ A. If D2f(a) exists,
then for every u ∈ E,
D2f (a)(u) = D(Df )(a)(u) = Du(Df )(a) ∈ L(E; F ).
Recall from Proposition 26.46, that the map app from L(E; F ) × E to F , defined such
that for every L ∈ L(E; F ), for every v ∈ E,
app(L, v) = L(v),
is a continuous bilinear map. Thus, in particular, given a fixed v ∈ E, the linear map
appv : L(E; F ) → F , defined such that appv(L) = L(v), is a continuous map.
Also recall from Proposition 28.6, that if h : A → G is a function such that Dh(a) exits,
and k : G → H is a continuous linear map, then, D(k ◦ h)(a) exists, and
k(Dh(a)(u)) = D(k ◦ h)(a)(u),
that is,
k(Duh(a)) = Du(k ◦ h)(a),
Applying these two facts to h = Df , and to k = appv, we have
Du(Df )(a)(v) = Du(appv ◦ Df)(a).
But (appv ◦ Df)(x) = Df(x)(v) = Dvf(x), for every x ∈ A, that is, appv ◦ Df = Dvf on A.
So, we have
Du(Df )(a)(v) = Du(Dvf )(a),
and since D2f (a)(u) = Du(Df )(a), we get
D2f (a)(u)(v) = Du(Dvf )(a).
Thus, when D2f (a) exists, Du(Dvf )(a) exists, and
D2f (a)(u)(v) = Du(Dvf )(a),
for all u, v ∈ E. We also denote Du(Dvf)(a) by D2u,vf(a), or DuDvf(a).
Recall from Proposition 26.45, that the map from L2(E, E; F ) to L(E; L(E; F )) defined
such that g → ϕ iff for every g ∈ L2(E, E; F ),
ϕ(u)(v) = g(u, v),
is an isomorphism of vector spaces. Thus, we will consider D2f (a) ∈ L(E; L(E; F )) as a con-
tinuous bilinear map in L2(E, E; F ), and we will write D2f(a)(u, v), instead of D2f(a)(u)(v).
28.5. SECOND-ORDER AND HIGHER-ORDER DERIVATIVES
801
Then, the above discussion can be summarized by saying that when D2f (a) is defined,
we have
D2f (a)(u, v) = DuDvf (a).
When E has finite dimension and (a0, (e1, . . . , en)) is a frame for E, we denote De D f (a)
j
ei
∂2f
∂2f
by
(a), when i = j, and we denote D D f (a) by
(a).
∂x
ei
ei
i∂xj
∂x2i
The following important lemma attributed to Schwarz can be shown, using Lemma 28.11.
Given a bilinear map f : E × E → F , recall that f is symmetric, if
f (u, v) = f (v, u),
for all u, v ∈ E.
Lemma 28.18. (Schwarz’s lemma) Given two normed affine spaces E and F , given any
open subset A of E, given any f : A → F , for every a ∈ A, if D2f(a) exists, then D2f(a) ∈
L2(E, E; F ) is a continuous symmetric bilinear map. As a corollary, if E is of finite dimen-
sion n, and (a0, (e1, . . . , en)) is a frame for E, we have
∂2f
∂2f
(a) =
(a).
∂xi∂xj
∂xj∂xi
Remark: There is a variation of the above lemma which does not assume the existence of
D2f (a), but instead assumes that DuDvf and DvDuf exist on an open subset containing a
and are continuous at a, and concludes that DuDvf (a) = DvDuf (a). This is just a different
result which does not imply Lemma 28.18, and is not a consequence of Lemma 28.18.
∂2f
∂2f
When E = 2
R , the only existence of
(a) and
(a) is not sufficient to insure the
∂x∂y
∂y∂x
existence of D2f (a).
When E if of finite dimension n and (a0, (e1, . . . , en)) is a frame for E, if D2f (a) exists,
for every u = u1e1 + · · · + unen and v = v1e1 + · · · + vnen in E, since D2f(a) is a symmetric
bilinear form, we have
n
∂2f
D2f (a)(u, v) =
uivj
(a),
∂x
i=1,j=1
i∂xj
which can be written in matrix form as:
∂2f
∂2f
∂2f
(a)
(a) . . .
(a)
∂x21
∂x1∂x2
∂x1∂xn
∂2f
∂2f
∂2f
(a)
(a)
. . .
(a)
D2f (a)(u, v) = U ∂x
∂x2
∂x
1∂x2
2
2∂xn
V
..
..
. .
..
.
.
.
.
∂2f
∂2f
∂2f
(a)
(a) . . .
(a)
∂x1∂xn
∂x2∂xn
∂x2n
802
CHAPTER 28. DIFFERENTIAL CALCULUS
where U is the column matrix representing u, and V is the column matrix representing v,
over the frame (a0, (e1, . . . , en)).
The above symmetric matrix is called the Hessian of f at a. If F itself is of finite
dimension, and (b0, (v1, . . . , vm)) is a frame for F , then f = (f1, . . . , fm), and each component
D2f (a)i(u, v) of D2f (a)(u, v) (1 ≤ i ≤ m), can be written as
∂2f
i
∂2f
∂2f
(a)
i
(a) . . .
i
(a)
∂x21
∂x1∂x2
∂x1∂xn
∂2f
i
∂2fi
∂2fi
(a)
(a)
. . .
(a)
D2f (a)
∂x
∂x2
∂x
i(u, v) = U
1∂x2
2
2∂xn
V
..
..
. .
..
.
.
.
.
∂2f
∂2f
∂2f
i
(a)
i
(a) . . .
i (a)
∂x1∂xn
∂x2∂xn
∂x2n
Thus, we could describe the vector D2f (a)(u, v) in terms of an mn×mn-matrix consisting
of m diagonal blocks, which are the above Hessians, and the row matrix (U , . . . , U ) (m
times) and the column matrix consisting of m copies of V .
We now indicate briefly how higher-order derivatives are defined. Let m ≥ 2. Given
a function f : A → F as before, for any a ∈ A, if the derivatives Dif exist on A for all
i, 1 ≤ i ≤ m − 1, by induction, Dm−1f can be considered to be a continuous function
Dm−1f : A → Lm−1(Em−1; F ), and we define
Dmf (a) = D(Dm−1f )(a).
Then, Dmf (a) can be identified with a continuous m-multilinear map in Lm(Em; F ). We
can then show (as we did before), that if Dmf (a) is defined, then
Dmf (a)(u1, . . . , um) = Du . . . D f (a).
1
um
When E if of finite dimension n and (a0, (e1, . . . , en)) is a frame for E, if Dmf (a) exists,
for every j1, . . . , jm ∈ {1, . . . , n}, we denote De . . . D f(a) by
j
e
m
j1
∂mf
(a).
∂xj . . . ∂x
1
jm
Given a m-multilinear map f ∈ Lm(Em; F ), recall that f is symmetric if
f (uπ(1), . . . , uπ(m)) = f(u1, . . . , um),
for all u1, . . . , um ∈ E, and all permutations π on {1, . . . , m}. Then, the following general-
ization of Schwarz’s lemma holds.
28.5. SECOND-ORDER AND HIGHER-ORDER DERIVATIVES
803
Lemma 28.19. Given two normed affine spaces E and F , given any open subset A of E,
given any f : A → F , for every a ∈ A, for every m ≥ 1, if Dmf(a) exists, then Dmf(a) ∈
Lm(Em; F ) is a continuous symmetric m-multilinear map. As a corollary, if E is of finite
dimension n, and (a0, (e1, . . . , en)) is a frame for E, we have
∂mf
∂mf
(a) =
(a),
∂xj . . . ∂x
∂x
1
jm
π(j1) . . . ∂xπ(jm)
for every j1, . . . , jm ∈ {1, . . . , n}, and for every permutation π on {1, . . . , m}.
If E is of finite dimension n, and (a0, (e1, . . . , en)) is a frame for E, Dmf (a) is a symmetric
m-multilinear map, and we have
∂mf
Dmf (a)(u1, . . . , um) =
u1,j · · · u
(a),
1
m,jm ∂x . . . ∂x
j
j1
jm
where j ranges over all functions j : {1, . . . , m} → {1, . . . , n}, for any m vectors
uj = uj,1e1 + · · · + uj,nen.
The concept of C1-function is generalized to the concept of Cm-function, and Theorem
28.12 can also be generalized.
Definition 28.11. Given two normed affine spaces E and F , and an open subset A of E,
for any m ≥ 1, we say that a function f : A → F is of class Cm on A or a Cm-function on
A if Dkf exists and is continuous on A for every k, 1 ≤ k ≤ m. We say that f : A → F
is of class C∞ on A or a C∞-function on A if Dkf exists and is continuous on A for every
k ≥ 1. A C∞-function (on A) is also called a smooth function (on A). A Cm-diffeomorphism
f : A → B between A and B (where A is an open subset of E and B is an open subset
of B) is a bijection between A and B = f (A), such that both f : A → B and its inverse
f −1 : B → A are Cm-functions.
Equivalently, f is a Cm-function on A if f is a C1-function on A and Df is a Cm−1-
function on A.
We have the following theorem giving a necessary and sufficient condition for f to a
Cm-function on A. A generalization to the case where E = (E1, a1) ⊕ · · · ⊕ (En, an) also
holds.
Theorem 28.20. Given two normed affine spaces E and F , where E is of finite dimension
n, and where (a0, (u1, . . . , un)) is a frame of E, given any open subset A of E, given any
function f : A → F , for any m ≥ 1, the derivative Dmf is a Cm-function on A iff every
∂kf
partial derivative Du . . . D
f (or
(a)) is defined and continuous on A, for all
j
u
k
j1
∂xj . . . ∂x
1
jk
804
CHAPTER 28. DIFFERENTIAL CALCULUS
k, 1 ≤ k ≤ m, and all j1, . . . , jk ∈ {1, . . . , n}. As a corollary, if F is of finite dimension p,
and (b0, (v1, . . . , vp)) is a frame of F , the derivative Dmf is defined and continuous on A iff
∂kf
every partial derivative D
i
u
. . . D
f
(a)) is defined and continuous on A,
j
u
i (or
k
j1
∂xj . . . ∂x
1
jk
for all k, 1 ≤ k ≤ m, for all i, 1 ≤ i ≤ p, and all j1, . . . , jk ∈ {1, . . . , n}.
When E = R (or E = C), for any a ∈ E, Dmf(a)(1, . . . , 1) is a vector in F , called
the mth-order vector derivative. As in the case m = 1, we will usually identify the mul-
tilinear map Dmf (a) with the vector Dmf (a)(1, . . . , 1). Some notational conventions can
also be introduced to simplify the notation of higher-order derivatives, and we discuss such
conventions very briefly.
Recall that when E is of finite dimension n, and (a0, (e1, . . . , en)) is a frame for E, Dmf (a)
is a symmetric m-multilinear map, and we have
∂mf
Dmf (a)(u1, . . . , um) =
u1,j · · · u
(a),
1
m,jm ∂x . . . ∂x
j
j1
jm
where j ranges over all functions j : {1, . . . , m} → {1, . . . , n}, for any m vectors
uj = uj,1e1 + · · · + uj,nen.
We can then group the various occurrences of ∂xj corresponding to the same variable x ,
k
jk
and this leads to the notation
∂
α1
∂
α2
∂
αn
· · ·
f (a),
∂x1
∂x2
∂xn
where α1 + α2 + · · · + αn = m.
If we denote (α1, . . . , αn) simply by α, then we denote
∂
α1
∂
α2
∂
αn
· · ·
f
∂x1
∂x2
∂xn
by
∂
α
∂αf,
or
f.
∂x
If α = (α1, . . . , αn), we let |α| = α1 + α2 + · · · + αn, α! = α1! · · · αn!, and if h = (h1, . . . , hn),
we denote hα1
1 · · · hαn
n
by hα.
In the next section, we survey various versions of Taylor’s formula.
28.6. TAYLOR’S FORMULA, FA À DI BRUNO’S FORMULA
805
28.6
Taylor’s formula, Faà di Bruno’s formula
We discuss, without proofs, several versions of Taylor’s formula. The hypotheses required in
each version become increasingly stronger. The first version can be viewed as a generalization
of the notion of derivative. Given an m-linear map f : Em → F , for any vector h ∈ E, we
abbreviate
f (h, . . . , h)
m
by f (hm). The version of Taylor’s formula given next is sometimes referred to as the formula
of Taylor–Young.
Theorem 28.21. (Taylor–Young) Given two normed affine spaces E and F , for any open
subset A ⊆ E, for any function f : A → F , for any a ∈ A, if Dkf exists in A for all k,
1 ≤ k ≤ m − 1, and if Dmf(a) exists, then we have:
1
1
f (a + h) = f (a) +
D1f (a)(h) + · · · +
Dmf (a)(hm) + h m (h),
1!
m!
for any h such that a + h ∈ A, and where limh→0, h=0 (h) = 0.
The above version of Taylor’s formula has applications to the study of relative maxima
(or minima) of real-valued functions. It is also used to study the local properties of curves
and surfaces.
The next version of Taylor’s formula can be viewed as a generalization of Lemma 28.11.
It is sometimes called the Taylor formula with Lagrange remainder or generalized mean value
theorem.
Theorem 28.22. (Generalized mean value theorem) Let E and F be two normed affine
spaces, let A be an open subset of E, and let f : A → F be a function on A. Given any
a ∈ A and any h = 0 in E, if the closed segment [a, a + h] is contained in A, Dkf exists in
A for all k, 1 ≤ k ≤ m, Dm+1f(x) exists at every point x of the open segment ]a, a + h[, and
max
Dm+1f (x) ≤ M,
x∈]a,a+h[
for some M ≥ 0, then
1
1
h m+1
f (a + h) − f(a) −
D1f (a)(h) + · · · +
Dmf (a)(hm)
≤ M
.
1!
m!
(m + 1)!
As a corollary, if L : Em+1 → F is a continuous (m + 1)-linear map, then
1
1
L(hm+1)
h m+1
f (a + h) − f(a) −
D1f (a)(h) + · · · +
Dmf (a)(hm) +
≤ M
,
1!
m!
(m + 1)!
(m + 1)!
where M = maxx∈]a,a+h[ Dm+1f(x) − L .
806
CHAPTER 28. DIFFERENTIAL CALCULUS
The above theorem is sometimes stated under the slightly stronger assumption that f is
a Cm-function on A. If f : A → R is a real-valued function, Theorem 28.22 can be refined a
little bit. This version is often called the formula of Taylor–MacLaurin.
Theorem 28.23. (Taylor–MacLaurin) Let E be a normed affine space, let A be an open
subset of E, and let f : A → R be a real-valued function on A. Given any a ∈ A and any
h = 0 in E, if the closed segment [a, a + h] is contained in A, if Dkf exists in A for all k,
1 ≤ k ≤ m, and Dm+1f(x) exists at every point x of the open segment ]a, a + h[, then there