Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

/ Home / Mathematics (Academic) / Basics of Algebra, Topology, and Differential Calculus

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 29

Extrema of Real-Valued Functions

29.1

Local Extrema, Constrained Local Extrema, and

Lagrange Multipliers

Let J : E → R be a real-valued function defined on a normed vector space E (or more

generally, any topological space). Ideally we would like to find where the function J reaches

a minimum or a maximum value, at least locally. In this chapter, we will usually use the

notations dJ(u) or J (u) (or dJu or Ju) for the derivative of J at u, instead of DJ(u). Our

presentation follows very closely that of Ciarlet [22] (Chapter 7), which we find to be one of

the clearest.

Definition 29.1. If J : E → R is a real-valued function defined on a normed vector space

E, we say that J has a local minimum (or relative minimum) at the point u ∈ E if there is

some open subset W ⊆ E containing u such that

J(u) ≤ J(w) for all w ∈ W .

Similarly, we say that J has a local maximum (or relative maximum) at the point u ∈ E if

there is some open subset W ⊆ E containing u such that

J(u) ≥ J(w) for all w ∈ W .

In either case, we say that J has a local extremum (or relative extremum) at u. We say that

J has a strict local minimum (resp. strict local maximum) at the point u ∈ E if there is

some open subset W ⊆ E containing u such that

J(u) < J(w) for all w ∈ W − {u}

(resp.

J(u) > J(w) for all w ∈ W − {u}).

813

814

CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS

By abuse of language, we often say that the point u itself “is a local minimum” or a

“local maximum,” even though, strictly speaking, this does not make sense.

We begin with a well-known necessary condition for a local extremum.

Proposition 29.1. Let E be a normed vector space and let J : Ω → R be a function, with

Ω some open subset of E. If the function J has a local extremum at some point u ∈ Ω and

if J is differentiable at u, then

dJ(u) = J (u) = 0.

Proof. Pick any v ∈ E. Since Ω is open, for t small enough we have u + tv ∈ Ω, so there is

an open interval I ⊆ R such that the function ϕ given by

ϕ(t) = J(u + tv)

for all t ∈ I is well-defined. By applying the chain rule, we see that ϕ is differentiable at

t = 0, and we get

ϕ (0) = dJu(v).

Without loss of generality, assume that u is a local minimum. Then we have

ϕ(t) − ϕ(0)

ϕ (0) = lim

≤ 0

t→0−

and

ϕ(t) − ϕ(0)

ϕ (0) = lim

≥ 0,

t→0+

which shows that ϕ (0) = dJu(v) = 0. As v ∈ E is arbitrary, we conclude that dJu = 0.

A point u ∈ Ω such that J(u) = 0 is called a critical point of J.

It is important to note that the fact that Ω is open is crucial. For example, if J is the

identity function on [0, 1], then dJ(x) = 1 for all x ∈ [0, 1], even though J has a minimum at

x = 0 and a maximum at x = 1. Also, if E = n

R , then the condition dJ (u) = 0 is equivalent

to the system

∂J (u

∂x

1, . . . , un) = 0

...

∂J (u

∂x

1, . . . , un) = 0.

In many practical situations, we need to look for local extrema of a function J under

additional constraints. This situation can be formalized conveniently as follows: We have a

function J : Ω → R defined on some open subset Ω of a normed vector space, but we also

have some subset U of Ω and we are looking for the local extrema of J with respect to the

set U . Note that in most cases, U is not open. In fact, U is usually closed.

29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS

815

Definition 29.2. If J : Ω → R is a real-valued function defined on some open subset Ω of a

normed vector space E and if U is some subset of Ω, we say that J has a local minimum (or

relative minimum) at the point u ∈ U with respect to U if there is some open subset W ⊆ Ω

containing u such that

J(u) ≤ J(w) for all w ∈ U ∩ W .

Similarly, we say that J has a local maximum (or relative maximum) at the point u ∈ U

with respect to U if there is some open subset W ⊆ Ω containing u such that

J(u) ≥ J(w) for all w ∈ U ∩ W .

In either case, we say that J has a local extremum at u with respect to U .

We will be particularly interested in the case where Ω ⊆ E1 × E2 is an open subset of a

product of normed vector spaces and where U is the zero locus of some continuous function

ϕ : Ω → E2, which means that

U = {(u1, u2) ∈ Ω | ϕ(u1, u2) = 0}.

For the sake of brevity, we say that J has a constrained local extremum at u instead of saying

that J has a local extremum at the point u ∈ U with respect to U. Fortunately, there is a

necessary condition for constrained local extrema in terms of Lagrange multipliers.

Theorem 29.2. (Necessary condition for a constrained extremum) Let Ω ⊆ E1 × E2 be an

open subset of a product of normed vector spaces, with E1 a Banach space (E1 is complete),

let ϕ : Ω → E2 be a C1-function (which means that dϕ(ω) exists and is continuous for all

ω ∈ Ω), and let

U = {(u1, u2) ∈ Ω | ϕ(u1, u2) = 0}.

Moreover, let u = (u1, u2) ∈ U be a point such that

∂ϕ

−1

∈ L(E

∂x

1, u2) ∈ L(E2; E2)

and

1, u2)

2; E2),

∂x2

and let J : Ω → R be a function which is differentiable at u. If J has a constrained local

extremum at u, then there is a continuous linear form Λ(u) ∈ L(E2; R) such that

dJ(u) + Λ(u) ◦ dϕ(u) = 0.

Proof. The plan of attack is to use the implicit function theorem; Theorem 28.13. Observe

that the assumptions of Theorem 28.13 are indeed met. Therefore, there exist some open

subsets U1 ⊆ E1, U2 ⊆ E2, and a continuous function g : U1 → U2 with (u1, u2) ∈ U1×U2 ⊆ Ω

and such that

ϕ(v1, g(v1)) = 0

816

CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS

for all v1 ∈ U1. Moreover, g is differentiable at u1 ∈ U1 and

∂ϕ

−1

∂ϕ

dg(u1) = −

(u)

◦

(u).

∂x2

∂x1

It follows that the restriction of J to (U1 × U2) ∩ U yields a function G of a single variable,

with

G(v1) = J(v1, g(v1))

for all v1 ∈ U1. Now, the function G is differentiable at u1 and it has a local extremum at

u1 on U1, so Proposition 29.1 implies that

dG(u1) = 0.

By the chain rule,

∂J

dG(u1) =

(u) +

(u) ◦ dg(u

∂x

∂x2

∂J

∂ϕ

−1

∂ϕ

(u) −

(u) ◦

(u)

◦

(u).

∂x1

∂x2

∂x1

From dG(u1) = 0, we deduce

∂J

∂ϕ

−1

∂ϕ

(u) =

(u) ◦

(u)

◦

(u),

∂x1

∂x2

∂x1

and since we also have

∂J

∂ϕ

−1

∂ϕ

(u) =

(u) ◦

(u)

◦

(u),

∂x2

if we let

∂J

∂ϕ

−1

Λ(u) = −

(u) ◦

(u)

∂x2

then we get

∂J

dJ(u) =

(u) +

(u)

∂x1

∂x2

∂J

∂ϕ

−1

∂ϕ

(u) ◦

(u)

◦

(u) +

(u)

∂x2

∂x1

∂x2

= −Λ(u) ◦ dϕ(u),

which yields dJ(u) + Λ(u) ◦ dϕ(u) = 0, as claimed.

29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS

817

In most applications, we have E

n−m

1 = R

and E2 = R for some integers m, n such that

1 ≤ m < n, Ω is an open subset of n

R , J : Ω → R, and we have m functions ϕi : Ω → R

defining the subset

U = {v ∈ Ω | ϕi(v) = 0, 1 ≤ i ≤ m}.

Theorem 29.2 yields the following necessary condition:

Theorem 29.3. (Necessary condition for a constrained extremum in terms of Lagrange

multipliers) Let Ω be an open subset of

R , consider m C1-functions ϕi : Ω → R (with

1 ≤ m < n), let

U = (v ∈ Ω | ϕi(v) = 0, 1 ≤ i ≤ m},

and let u ∈ U be a point such that the derivatives dϕ

i(u) ∈ L(R ; R) are linearly independent;

equivalently, assume that the m × n matrix (∂ϕi/∂xj)(u) has rank m. If J : Ω → R is a

function which is differentiable at u ∈ U and if J has a local constrained extremum at u,

then there exist m numbers λi(u) ∈ R, uniquely defined, such that

dJ(u) + λ1(u)dϕ1(u) + · · · + λm(u)dϕm(u) = 0;

equivalently,

∇J(u) + λ1(u)∇ϕ1(u) + · · · + λ1(u)∇ϕm(u) = 0.

Proof. The linear independence of the m linear forms dϕi(u) is equivalent to the fact that

the m × n matrix A = (∂ϕi/∂xj)(u) has rank m. By reordering the columns, we may

assume that the first m columns are linearly independent. If we let ϕ : Ω →

be the

function defined by

ϕ(v) = (ϕ1(v), . . . , ϕm(v))

for all v ∈ Ω, then we see that ∂ϕ/∂x2(u) is invertible and both ∂ϕ/∂x2(u) and its inverse

are continuous, so that Theorem 29.3 applies, and there is some (continuous) linear form

Λ(u) ∈ L( m

R ; R) such that

dJ(u) + Λ(u) ◦ dϕ(u) = 0.

However, Λ(u) is defined by some m-tuple (λ

1(u), . . . , λm(u)) ∈ R , and in view of the

definition of ϕ, the above equation is equivalent to

dJ(u) + λ1(u)dϕ1(u) + · · · + λm(u)dϕm(u) = 0.

The uniqueness of the λi(u) is a consequence of the linear independence of the dϕi(u).

The numbers λi(u) involved in Theorem 29.3 are called the Lagrange multipliers asso-

ciated with the constrained extremum u (again, with some minor abuse of language). The

linear independence of the linear forms dϕi(u) is equivalent to the fact that the Jacobian ma-

trix (∂ϕi/∂xj)(u) of ϕ = (ϕ1, . . . , ϕm) at u has rank m. If m = 1, the linear independence

of the dϕi(u) reduces to the condition ∇ϕ1(u) = 0.

818

CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS

A fruitful way to reformulate the use of Lagrange multipliers is to introduce the notion

of the Lagrangian associated with our constrained extremum problem. This is the function

L : Ω × m

→ R given by

L(v, λ) = J(v) + λ1ϕ1(v) + · · · + λmϕm(v),

with λ = (λ1, . . . , λm). Then, observe that there exists some µ = (µ1, . . . , µm) and some

u ∈ U such that

dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0

if and only if

dL(u, µ) = 0,

or equivalently

∇L(u, µ) = 0;

that is, iff (u, λ) is a critical point of the Lagrangian L.

Indeed dL(u, µ) = 0 if equivalent to

∂L (u,µ) = 0

∂v

∂L (u,µ) = 0

∂λ1

...

∂L (u,µ) = 0,

∂λm

and since

∂L (u,µ) = dJ(u) + µ

∂v

1dϕ1(u) + · · · + µmdϕm(u)

and

∂L (u,µ) = ϕ

∂λ

i(u),

we get

dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0

and

ϕ1(u) = · · · = ϕm(u) = 0,

that is, u ∈ U.

If we write out explicitly the condition

dJ(u) + µ1dϕ1(u) + · · · + µmdϕm(u) = 0,

29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS

819

we get the n × m system

∂J

∂ϕ

(u) + λ

1 (u) + · · · + λ

m (u) = 0

∂x

∂x1

...

∂J

∂ϕ

(u) + λ

1 (u) + · · · + λ

m (u) = 0,

∂x

∂xn

and it is important to note that the matrix of this system is the transpose of the Jacobian

matrix of ϕ at u. If we write Jac(J)(u) = (∂ϕi/∂xj)(u) for the Jacobian matrix of J (at

u), then the above system is written in matrix form as

∇J(u) + (Jac(J)(u)) λ = 0,

where λ is viewed as a column vector, and the Lagrangian is equal to

L(u, λ) = J(u) + (ϕ1(u), . . . , ϕm(u))λ.

Remark: If the Jacobian matrix Jac(J)(v) = (∂ϕi/∂xj)(v) has rank m for all v ∈ U

(which is equivalent to the linear independence of the linear forms dϕi(v)), then we say that

0 ∈ m

is a regular value of ϕ. In this case, it is known that

U = {v ∈ Ω | ϕ(v) = 0}

is a smooth submanifold of dimension n − m of n

R . Furthermore, the set

vU = {w ∈ R | dϕi(v)(w) = 0, 1 ≤ i ≤ m} =

Ker dϕi(v)

i=1

is the tangent space to U at v (a vector space of dimension n − m). Then, the condition

dJ(v) + µ1dϕ1(v) + · · · + µmdϕm(v) = 0

implies that dJ(v) vanishes on the tangent space TvU. Conversely, if dJ(v)(w) = 0 for

all w ∈ TvU, this means that dJ(v) is orthogonal (in the sense of Definition 4.7) to TvU.

Since (by Theorem 4.17 (b)) the orthogonal of TvU is the space of linear forms spanned

by dϕ1(v), . . . , dϕm(v), it follows that dJ(v) must be a linear combination of the dϕi(v).

Therefore, when 0 is a regular value of ϕ, Theorem 29.3 asserts that if u ∈ U is a local

extremum of J, then dJ(u) must vanish on the tangent space TuU. We can say even more.

The subset Z(J) of Ω given by

Z(J) = {v ∈ Ω | J(v) = J(u)}

820

CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS

(the level set of level J(u)) is a hypersurface in Ω, and if dJ(u) = 0, the zero locus of dJ(u)

is the tangent space TuZ(J) to Z(J) at u (a vector space of dimension n − 1), where

uZ (J ) = {w ∈ R | dJ (u)(w) = 0}.

Consequently, Theorem 29.3 asserts that

TuU ⊆ TuZ(J);

this is a geometric condition.

The beauty of the Lagrangian is that the constraints {ϕi(v) = 0} have been incorporated

into the function L(v, λ), and that the necessary condition for the existence of a constrained

local extremum of J is reduced to the necessary condition for the existence of a local ex-

tremum of the unconstrained L.

However, one should be careful to check that the assumptions of Theorem 29.3 are sat-

isfied (in particular, the linear independence of the linear forms dϕi). For example, let

J :

R → R be given by

J(x, y, z) = x + y + z2

and g :

R → R by

g(x, y, z) = x2 + y2.

Since g(x, y, z) = 0 iff x = y = 0, we have U = {(0, 0, z) | z ∈ R} and the restriction of J to

U is given by

J(0, 0, z) = z2,

which has a minimum for z = 0. However, a “blind” use of Lagrange multipliers would

require that there is some λ so that

∂J

∂g

∂J

∂g

∂J

∂g

(0, 0, z) = λ

(0, 0, z),

(0, 0, z) = λ

(0, 0, z),

(0, 0, z) = λ

(0, 0, z),

∂x

∂y

∂z

and since

∂g

(x, y, z) = 2x,

(x, y, z) = 2y,

(0, 0, z) = 0,

∂x

∂y

∂z

the partial derivatives above all vanish for x = y = 0, so at a local extremum we should also

have

∂J

(0, 0, z) = 0,

∂x

∂y

∂z

but this is absurd since

∂J

(x, y, z) = 1,

(x, y, z) = 2z.

∂x

∂y

∂z

The reader should enjoy finding the reason for the flaw in the argument.

29.1. LOCAL EXTREMA AND LAGRANGE MULTIPLIERS

821

One should also keep in mind that Theorem 29.3 gives only a necessary condition. The

(u, λ) may not correspond to local extrema! Thus, it is always necessary to analyze the local

behavior of J near a critical point u. This is generally difficult, but in the case where J is

affine or quadratic and the constraints are affine or quadratic, this is possible (although not

always easy).

Let us apply the above method to the following example in which E1 = R, E2 = R,

Ω = 2

R , and

J(x1, x2) = −x2

ϕ(x1, x2) = x21 + x22 − 1.

Observe that

U = {(x

1, x2) ∈ R | x21 + x22 = 1}

is the unit circle, and since

∇ϕ(x

1, x2) =

2x2

it is clear that ∇ϕ(x1, x2) = 0 for every point = (x1, x2) on the unit circle. If we form the

Lagrangian

L(x1, x2, λ) = −x2 + λ(x21 + x22 − 1),

Theorem 29.3 says that a necessary condition for J to have a constrained local extremum is

that ∇L(x1, x2, λ) = 0, so the following equations must hold:

2λx1 = 0

−1 + 2λx2 = 0

x21 + x22 = 1.

The second equation implies that λ = 0, and then the first yields x1 = 0, so the third yields

x2 = ±1, and we get two solutions:

λ = ,

1, x2) = (0, 1)

λ = − ,

1, x2) = (0, −1).

We can check immediately that the first solution is a minimum and the second is a maximum.

The reader should look for a geometric interpretation of this problem.

Let us now consider the case in which J is a quadratic function of the form

J(v) = v Av − v b,

where A is an n × n symmetric matrix, b ∈ n

R , and the constraints are given by a linear

system of the form

Cv = d,

822

CHAPTER 29. EXTREMA OF REAL-VALUED FUNCTIONS

where C is an m × n matrix with m < n and d ∈ m

R . We also assume that C has rank m.

In this case, the function ϕ is given by

ϕ(v) = (Cv − d) ,

because we view ϕ(v) as a row vector (and v as a column vector), and since

dϕ(v)(w) = C w,

the condition that the Jacobian matrix of ϕ at u have rank m is satisfied. The Lagrangian

of this problem is

L(v, λ) = v Av − v b + (Cv − d) λ = v Av − v b + λ (Cv − d),

where λ is viewed as a column vector. Now, because A is a symmetric matrix, it is easy to

show that

Av − b + C λ

∇L(v, λ) =

Cv − d

Therefore, the necessary condition for contrained local extrema is

Av + C λ = b

Cv = d,

which can be expressed in matrix form as

A C

where the matrix of the system is a symmetric matrix. We should not be surprised to find

the system of Section 18, except for some renaming of the matrices and vectors involved.

As we know from Section 18.2, the function J has a minimum iff A is positive definite, so

in general, if A is only a symmetric matrix, the critical points of the Lagrangian do not

correspond to extrema of J.

We now investigate conditions for the existence of extrema involving the second derivative

of J.

29.2

Using Second Derivatives to Find Extrema

For the sake of brevity, we consider only the case of local minima; analogous results are

obtained for local maxima (replace J by −J, since maxu J(u) = − minu −J(u)). We begin

with a necessary condition for an unconstrained local minimum.

29.2. USING SECOND DERIVATIVES TO FIND EXTREMA

823

Proposition 29.4. Let E be a normed vector space and let J : Ω → R be a function, with Ω

some open subset of E. If the function J is differentiable in Ω, if J has a second derivative

D2J(u) at some point u ∈ Ω, and if J has a local minimum at u, then

D2J(u)(w, w) ≥ 0 for all w ∈ E.

Proof. Pick any nonzero vector w ∈ E. Since Ω is open, for t small enough, u + tw ∈ Ω and

J(u + tw) ≥ J(u), so there is some open interval I ⊆ R such that

u + tw ∈ Ω and J(u + tw) ≥ J(u)

for all t ∈ I. Using the Taylor–Young formula and the fact that we must have dJ(u) = 0

since J has a local minimum at u, we get

0 ≤ J(u + tw) − J(u) =

D2J(u)(w, w) + t2 w 2 (tw),

with limt→0 (tw) = 0, which implies that

D2J(u)(w, w) ≥ 0.

Since the argument holds for all w ∈ E (trivially if w = 0), the proposition is proved.

One should be cautioned that there is no converse to the previous proposition. For exam-

ple, the function f : x → x3 has no local minimum at 0, yet df(0) = 0 and D2f(0)(u, v) = 0.

Similarly, the reader should check that the function f :

R → R given by

f (x, y) = x2 − 3y3

has no local minimum at (0, 0); yet df (0, 0) = 0 and D2f (0, 0)(u, v) = 2u2 ≥ 0.

When E =

R , Proposition 29.4 says that a necessary condition for having a local

minimum is that the Hessian ∇2J(u) be positive semidefinite (it is always symmetric).

We now give sufficient conditions for the existence of a local minimum.

Theorem 29.5. Let E be a normed vector space, let J : Ω → R be a function with Ω some

open subset of E, and assume that J is differentiable in Ω and that dJ(u) = 0 at some point

u ∈ Ω. The following properties hold:

(1) If D2J(u) exists and if