29.1
Local Extrema, Constrained Local Extrema, and
Lagrange Multipliers
Let J : E → R be a real-valued function defined on a normed vector space E (or more
generally, any topological space). Ideally we would like to find where the function J reaches
a minimum or a maximum value, at least locally. In this chapter, we will usually use the
notations dJ(u) or J (u) (or dJu or Ju) for the derivative of J at u, instead of DJ(u). Our
presentation follows very closely that of Ciarlet [22] (Chapter 7), which we find to be one of
the clearest.
Definition 29.1. If J : E → R is a real-valued function defined on a normed vector space
E, we say that J has a local minimum (or relative minimum) at the point u ∈ E if there is
some open subset W ⊆ E containing u such that
J(u) ≤ J(w) for all w ∈ W .
Similarly, we say that J has a local maximum (or relative maximum) at the point u ∈ E if
there is some open subset W ⊆ E containing u such that
J(u) ≥ J(w) for all w ∈ W .
In either case, we say that J has a local extremum (or relative extremum) at u. We say that
J has a strict local minimum (resp. strict local maximum) at the point u ∈ E if there is
some open subset W ⊆ E containing u such that
J(u) < J(w) for all w ∈ W − {u}
(resp.
J(u) > J(w) for all w ∈ W − {u}).
813
814
CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS
By abuse of language, we often say that the point u itself “is a local minimum” or a
“local maximum,” even though, strictly speaking, this does not make sense.
We begin with a well-known necessary condition for a local extremum.
Proposition 29.1. Let E be a normed vector space and let J : Ω → R be a function, with
Ω some open subset of E. If the function J has a local extremum at some point u ∈ Ω and
if J is differentiable at u, then
dJ(u) = J (u) = 0.
Proof. Pick any v ∈ E. Since Ω is open, for t small enough we have u + tv ∈ Ω, so there is
an open interval I ⊆ R such that the function ϕ given by
ϕ(t) = J(u + tv)
for all t ∈ I is well-defined. By applying the chain rule, we see that ϕ is differentiable at
t = 0, and we get
ϕ (0) = dJu(v).
Without loss of generality, assume that u is a local minimum. Then we have
ϕ(t) − ϕ(0)
ϕ (0) = lim
≤ 0
t→0−
t
and
ϕ(t) − ϕ(0)
ϕ (0) = lim
≥ 0,
t→0+
t
which shows that ϕ (0) = dJu(v) = 0. As v ∈ E is arbitrary, we conclude that dJu = 0.
A point u ∈ Ω such that J(u) = 0 is called a critical point of J.
It is important to note that the fact that Ω is open is crucial. For example, if J is the
identity function on [0, 1], then dJ(x) = 1 for all x ∈ [0, 1], even though J has a minimum at
x = 0 and a maximum at x = 1. Also, if E = n
R , then the condition dJ (u) = 0 is equivalent
to the system
∂J (u
∂x
1, . . . , un) = 0
1
...
∂J (u
∂x
1, . . . , un) = 0.
n
In many practical situations, we need to look for local extrema of a function J under
additional constraints. This situation can be formalized conveniently as follows: We have a
function J : Ω → R defined on some open subset Ω of a normed vector space, but we also
have some subset U of Ω and we are looking for the local extrema of J with respect to the
set U . Note that in most cases, U is not open. In fact, U is usually closed.
29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS
815
Definition 29.2. If J : Ω → R is a real-valued function defined on some open subset Ω of a
normed vector space E and if U is some subset of Ω, we say that J has a local minimum (or
relative minimum) at the point u ∈ U with respect to U if there is some open subset W ⊆ Ω
containing u such that
J(u) ≤ J(w) for all w ∈ U ∩ W .
Similarly, we say that J has a local maximum (or relative maximum) at the point u ∈ U
with respect to U if there is some open subset W ⊆ Ω containing u such that
J(u) ≥ J(w) for all w ∈ U ∩ W .
In either case, we say that J has a local extremum at u with respect to U .
We will be particularly interested in the case where Ω ⊆ E1 × E2 is an open subset of a
product of normed vector spaces and where U is the zero locus of some continuous function
ϕ : Ω → E2, which means that
U = {(u1, u2) ∈ Ω | ϕ(u1, u2) = 0}.
For the sake of brevity, we say that J has a constrained local extremum at u instead of saying
that J has a local extremum at the point u ∈ U with respect to U. Fortunately, there is a
necessary condition for constrained local extrema in terms of Lagrange multipliers.
Theorem 29.2. (Necessary condition for a constrained extremum) Let Ω ⊆ E1 × E2 be an
open subset of a product of normed vector spaces, with E1 a Banach space (E1 is complete),
let ϕ : Ω → E2 be a C1-function (which means that dϕ(ω) exists and is continuous for all
ω ∈ Ω), and let
U = {(u1, u2) ∈ Ω | ϕ(u1, u2) = 0}.
Moreover, let u = (u1, u2) ∈ U be a point such that
∂ϕ
∂ϕ
−1
(u
(u
∈ L(E
∂x
1, u2) ∈ L(E2; E2)
and
1, u2)
2; E2),
2
∂x2
and let J : Ω → R be a function which is differentiable at u. If J has a constrained local
extremum at u, then there is a continuous linear form Λ(u) ∈ L(E2; R) such that
dJ(u) + Λ(u) ◦ dϕ(u) = 0.
Proof. The plan of attack is to use the implicit function theorem; Theorem 28.13. Observe
that the assumptions of Theorem 28.13 are indeed met. Therefore, there exist some open
subsets U1 ⊆ E1, U2 ⊆ E2, and a continuous function g : U1 → U2 with (u1, u2) ∈ U1×U2 ⊆ Ω
and such that
ϕ(v1, g(v1)) = 0
816
CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS
for all v1 ∈ U1. Moreover, g is differentiable at u1 ∈ U1 and
∂ϕ
−1
∂ϕ
dg(u1) = −
(u)
◦
(u).
∂x2
∂x1
It follows that the restriction of J to (U1 × U2) ∩ U yields a function G of a single variable,
with
G(v1) = J(v1, g(v1))
for all v1 ∈ U1. Now, the function G is differentiable at u1 and it has a local extremum at
u1 on U1, so Proposition 29.1 implies that
dG(u1) = 0.
By the chain rule,
∂J
∂J
dG(u1) =
(u) +
(u) ◦ dg(u
∂x
1)
1
∂x2
∂J
∂J
∂ϕ
−1
∂ϕ
=
(u) −
(u) ◦
(u)
◦
(u).
∂x1
∂x2
∂x2
∂x1
From dG(u1) = 0, we deduce
∂J
∂J
∂ϕ
−1
∂ϕ
(u) =
(u) ◦
(u)
◦
(u),
∂x1
∂x2
∂x2
∂x1
and since we also have
∂J
∂J
∂ϕ
−1
∂ϕ
(u) =
(u) ◦
(u)
◦
(u),
∂x2
∂x2
∂x2
∂x2
if we let
∂J
∂ϕ
−1
Λ(u) = −
(u) ◦
(u)
,
∂x2
∂x2
then we get
∂J
∂J
dJ(u) =
(u) +
(u)
∂x1
∂x2
∂J
∂ϕ
−1
∂ϕ
∂ϕ
=
(u) ◦
(u)
◦
(u) +
(u)
∂x2
∂x2
∂x1
∂x2
= −Λ(u) ◦ dϕ(u),
which yields dJ(u) + Λ(u) ◦ dϕ(u) = 0, as claimed.
29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS
817
In most applications, we have E
n−m
m
1 = R
and E2 = R for some integers m, n such that
1 ≤ m < n, Ω is an open subset of n
R , J : Ω → R, and we have m functions ϕi : Ω → R
defining the subset
U = {v ∈ Ω | ϕi(v) = 0, 1 ≤ i ≤ m}.
Theorem 29.2 yields the following necessary condition:
Theorem 29.3. (Necessary condition for a constrained extremum in terms of Lagrange
multipliers) Let Ω be an open subset of
n
R , consider m C1-functions ϕi : Ω → R (with
1 ≤ m < n), let
U = (v ∈ Ω | ϕi(v) = 0, 1 ≤ i ≤ m},
and let u ∈ U be a point such that the derivatives dϕ
n
i(u) ∈ L(R ; R) are linearly independent;
equivalently, assume that the m × n matrix (∂ϕi/∂xj)(u) has rank m. If J : Ω → R is a
function which is differentiable at u ∈ U and if J has a local constrained extremum at u,
then there exist m numbers λi(u) ∈ R, uniquely defined, such that
dJ(u) + λ1(u)dϕ1(u) + · · · + λm(u)dϕm(u) = 0;
equivalently,
∇J(u) + λ1(u)∇ϕ1(u) + · · · + λ1(u)∇ϕm(u) = 0.
Proof. The linear independence of the m linear forms dϕi(u) is equivalent to the fact that
the m × n matrix A = (∂ϕi/∂xj)(u) has rank m. By reordering the columns, we may
assume that the first m columns are linearly independent. If we let ϕ : Ω →
m
R
be the
function defined by
ϕ(v) = (ϕ1(v), . . . , ϕm(v))
for all v ∈ Ω, then we see that ∂ϕ/∂x2(u) is invertible and both ∂ϕ/∂x2(u) and its inverse
are continuous, so that Theorem 29.3 applies, and there is some (continuous) linear form
Λ(u) ∈ L( m
R ; R) such that
dJ(u) + Λ(u) ◦ dϕ(u) = 0.
However, Λ(u) is defined by some m-tuple (λ
m
1(u), . . . , λm(u)) ∈ R , and in view of the
definition of ϕ, the above equation is equivalent to
dJ(u) + λ1(u)dϕ1(u) + · · · + λm(u)dϕm(u) = 0.
The uniqueness of the λi(u) is a consequence of the linear independence of the dϕi(u).
The numbers λi(u) involved in Theorem 29.3 are called the Lagrange multipliers asso-
ciated with the constrained extremum u (again, with some minor abuse of language). The
linear independence of the linear forms dϕi(u) is equivalent to the fact that the Jacobian ma-
trix (∂ϕi/∂xj)(u) of ϕ = (ϕ1, . . . , ϕm) at u has rank m. If m = 1, the linear independence
of the dϕi(u) reduces to the condition ∇ϕ1(u) = 0.
818
CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS
A fruitful way to reformulate the use of Lagrange multipliers is to introduce the notion
of the Lagrangian associated with our constrained extremum problem. This is the function
L : Ω × m
R
→ R given by
L(v, λ) = J(v) + λ1ϕ1(v) + · · · + λmϕm(v),
with λ = (λ1, . . . , λm). Then, observe that there exists some µ = (µ1, . . . , µm) and some
u ∈ U such that
dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0
if and only if
dL(u, µ) = 0,
or equivalently
∇L(u, µ) = 0;
that is, iff (u, λ) is a critical point of the Lagrangian L.
Indeed dL(u, µ) = 0 if equivalent to
∂L (u,µ) = 0
∂v
∂L (u,µ) = 0
∂λ1
...
∂L (u,µ) = 0,
∂λm
and since
∂L (u,µ) = dJ(u) + µ
∂v
1dϕ1(u) + · · · + µmdϕm(u)
and
∂L (u,µ) = ϕ
∂λ
i(u),
i
we get
dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0
and
ϕ1(u) = · · · = ϕm(u) = 0,
that is, u ∈ U.
If we write out explicitly the condition
dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0,
29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS
819
we get the n × m system
∂J
∂ϕ
∂ϕ
(u) + λ
1 (u) + · · · + λ
m (u) = 0
∂x
1
m
1
∂x1
∂x1
...
∂J
∂ϕ
∂ϕ
(u) + λ
1 (u) + · · · + λ
m (u) = 0,
∂x
1
m
n
∂xn
∂xn
and it is important to note that the matrix of this system is the transpose of the Jacobian
matrix of ϕ at u. If we write Jac(J)(u) = (∂ϕi/∂xj)(u) for the Jacobian matrix of J (at
u), then the above system is written in matrix form as
∇J(u) + (Jac(J)(u)) λ = 0,
where λ is viewed as a column vector, and the Lagrangian is equal to
L(u, λ) = J(u) + (ϕ1(u), . . . , ϕm(u))λ.
Remark: If the Jacobian matrix Jac(J)(v) = (∂ϕi/∂xj)(v) has rank m for all v ∈ U
(which is equivalent to the linear independence of the linear forms dϕi(v)), then we say that
0 ∈ m
R
is a regular value of ϕ. In this case, it is known that
U = {v ∈ Ω | ϕ(v) = 0}
is a smooth submanifold of dimension n − m of n
R . Furthermore, the set
m
T
n
vU = {w ∈ R | dϕi(v)(w) = 0, 1 ≤ i ≤ m} =
Ker dϕi(v)
i=1
is the tangent space to U at v (a vector space of dimension n − m). Then, the condition
dJ(v) + µ1dϕ1(v) + · · · + µmdϕm(v) = 0
implies that dJ(v) vanishes on the tangent space TvU. Conversely, if dJ(v)(w) = 0 for
all w ∈ TvU, this means that dJ(v) is orthogonal (in the sense of Definition 4.7) to TvU.
Since (by Theorem 4.17 (b)) the orthogonal of TvU is the space of linear forms spanned
by dϕ1(v), . . . , dϕm(v), it follows that dJ(v) must be a linear combination of the dϕi(v).
Therefore, when 0 is a regular value of ϕ, Theorem 29.3 asserts that if u ∈ U is a local
extremum of J, then dJ(u) must vanish on the tangent space TuU. We can say even more.
The subset Z(J) of Ω given by
Z(J) = {v ∈ Ω | J(v) = J(u)}
820
CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS
(the level set of level J(u)) is a hypersurface in Ω, and if dJ(u) = 0, the zero locus of dJ(u)
is the tangent space TuZ(J) to Z(J) at u (a vector space of dimension n − 1), where
T
n
uZ (J ) = {w ∈ R | dJ (u)(w) = 0}.
Consequently, Theorem 29.3 asserts that
TuU ⊆ TuZ(J);
this is a geometric condition.
The beauty of the Lagrangian is that the constraints {ϕi(v) = 0} have been incorporated
into the function L(v, λ), and that the necessary condition for the existence of a constrained
local extremum of J is reduced to the necessary condition for the existence of a local ex-
tremum of the unconstrained L.
However, one should be careful to check that the assumptions of Theorem 29.3 are sat-
isfied (in particular, the linear independence of the linear forms dϕi). For example, let
J :
3
R → R be given by
J(x, y, z) = x + y + z2
and g :
3
R → R by
g(x, y, z) = x2 + y2.
Since g(x, y, z) = 0 iff x = y = 0, we have U = {(0, 0, z) | z ∈ R} and the restriction of J to
U is given by
J(0, 0, z) = z2,
which has a minimum for z = 0. However, a “blind” use of Lagrange multipliers would
require that there is some λ so that
∂J
∂g
∂J
∂g
∂J
∂g
(0, 0, z) = λ
(0, 0, z),
(0, 0, z) = λ
(0, 0, z),
(0, 0, z) = λ
(0, 0, z),
∂x
∂x
∂y
∂y
∂z
∂z
and since
∂g
∂g
∂g
(x, y, z) = 2x,
(x, y, z) = 2y,
(0, 0, z) = 0,
∂x
∂y
∂z
the partial derivatives above all vanish for x = y = 0, so at a local extremum we should also
have
∂J
∂J
∂J
(0, 0, z) = 0,
(0, 0, z) = 0,
(0, 0, z) = 0,
∂x
∂y
∂z
but this is absurd since
∂J
∂J
∂J
(x, y, z) = 1,
(x, y, z) = 1,
(x, y, z) = 2z.
∂x
∂y
∂z
The reader should enjoy finding the reason for the flaw in the argument.
29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS
821
One should also keep in mind that Theorem 29.3 gives only a necessary condition. The
(u, λ) may not correspond to local extrema! Thus, it is always necessary to analyze the local
behavior of J near a critical point u. This is generally difficult, but in the case where J is
affine or quadratic and the constraints are affine or quadratic, this is possible (although not
always easy).
Let us apply the above method to the following example in which E1 = R, E2 = R,
Ω = 2
R , and
J(x1, x2) = −x2
ϕ(x1, x2) = x21 + x22 − 1.
Observe that
U = {(x
2
1, x2) ∈ R | x21 + x22 = 1}
is the unit circle, and since
2x
∇ϕ(x
1
1, x2) =
,
2x2
it is clear that ∇ϕ(x1, x2) = 0 for every point = (x1, x2) on the unit circle. If we form the
Lagrangian
L(x1, x2, λ) = −x2 + λ(x21 + x22 − 1),
Theorem 29.3 says that a necessary condition for J to have a constrained local extremum is
that ∇L(x1, x2, λ) = 0, so the following equations must hold:
2λx1 = 0
−1 + 2λx2 = 0
x21 + x22 = 1.
The second equation implies that λ = 0, and then the first yields x1 = 0, so the third yields
x2 = ±1, and we get two solutions:
1
λ = ,
(x
2
1, x2) = (0, 1)
1
λ = − ,
(x
2
1, x2) = (0, −1).
We can check immediately that the first solution is a minimum and the second is a maximum.
The reader should look for a geometric interpretation of this problem.
Let us now consider the case in which J is a quadratic function of the form
1
J(v) = v Av − v b,
2
where A is an n × n symmetric matrix, b ∈ n
R , and the constraints are given by a linear
system of the form
Cv = d,
822
CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS
where C is an m × n matrix with m < n and d ∈ m
R . We also assume that C has rank m.
In this case, the function ϕ is given by
ϕ(v) = (Cv − d) ,
because we view ϕ(v) as a row vector (and v as a column vector), and since
dϕ(v)(w) = C w,
the condition that the Jacobian matrix of ϕ at u have rank m is satisfied. The Lagrangian
of this problem is
1
1
L(v, λ) = v Av − v b + (Cv − d) λ = v Av − v b + λ (Cv − d),
2
2
where λ is viewed as a column vector. Now, because A is a symmetric matrix, it is easy to
show that
Av − b + C λ
∇L(v, λ) =
.
Cv − d
Therefore, the necessary condition for contrained local extrema is
Av + C λ = b
Cv = d,
which can be expressed in matrix form as
A C
v
b
=
,
C
0
λ
d
where the matrix of the system is a symmetric matrix. We should not be surprised to find
the system of Section 18, except for some renaming of the matrices and vectors involved.
As we know from Section 18.2, the function J has a minimum iff A is positive definite, so
in general, if A is only a symmetric matrix, the critical points of the Lagrangian do not
correspond to extrema of J.
We now investigate conditions for the existence of extrema involving the second derivative
of J.
29.2
Using Second Derivatives to Find Extrema
For the sake of brevity, we consider only the case of local minima; analogous results are
obtained for local maxima (replace J by −J, since maxu J(u) = − minu −J(u)). We begin
with a necessary condition for an unconstrained local minimum.
29.2. USING SECOND DERIVATIVES TO FIND EXTREMA
823
Proposition 29.4. Let E be a normed vector space and let J : Ω → R be a function, with Ω
some open subset of E. If the function J is differentiable in Ω, if J has a second derivative
D2J(u) at some point u ∈ Ω, and if J has a local minimum at u, then
D2J(u)(w, w) ≥ 0 for all w ∈ E.
Proof. Pick any nonzero vector w ∈ E. Since Ω is open, for t small enough, u + tw ∈ Ω and
J(u + tw) ≥ J(u), so there is some open interval I ⊆ R such that
u + tw ∈ Ω and J(u + tw) ≥ J(u)
for all t ∈ I. Using the Taylor–Young formula and the fact that we must have dJ(u) = 0
since J has a local minimum at u, we get
t2
0 ≤ J(u + tw) − J(u) =
D2J(u)(w, w) + t2 w 2 (tw),
2
with limt→0 (tw) = 0, which implies that
D2J(u)(w, w) ≥ 0.
Since the argument holds for all w ∈ E (trivially if w = 0), the proposition is proved.
One should be cautioned that there is no converse to the previous proposition. For exam-
ple, the function f : x → x3 has no local minimum at 0, yet df(0) = 0 and D2f(0)(u, v) = 0.
Similarly, the reader should check that the function f :
2
R → R given by
f (x, y) = x2 − 3y3
has no local minimum at (0, 0); yet df (0, 0) = 0 and D2f (0, 0)(u, v) = 2u2 ≥ 0.
When E =
n
R , Proposition 29.4 says that a necessary condition for having a local
minimum is that the Hessian ∇2J(u) be positive semidefinite (it is always symmetric).
We now give sufficient conditions for the existence of a local minimum.
Theorem 29.5. Let E be a normed vector space, let J : Ω → R be a function with Ω some
open subset of E, and assume that J is differentiable in Ω and that dJ(u) = 0 at some point
u ∈ Ω. The following properties hold:
(1) If D2J(u) exists and if