Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 30

Newton’s Method and its

Generalizations

30.1

Newton’s Method for Real Functions of a Real

Argument

In Chapter 29 we investigated the problem of determining when a function J : Ω → R defined

on some open subset Ω of a normed vector space E has a local extremum. Proposition 29.1

gives a necessary condition when J is differentiable: if J has a local extremum at u ∈ Ω,

then we must have

J (u) = 0.

Thus, we are led to the problem of finding the zeros of the derivative

J : Ω → E ,

where E = L(E; R) is the set of linear continuous functions from E to R; that is, the dual

of E, as defined in the Remark after Proposition 29.7.

This leads us to consider the problem in a more general form, namely: Given a function

f : Ω → Y from an open subset Ω of a normed vector space X to a normed vector space Y ,

find

(i) Sufficient conditions which guarantee the existence of a zero of the function f ; that is,

an element a ∈ Ω such that f(a) = 0.

(ii) An algorithm for approximating such an a, that is, a sequence (xk) of points of Ω whose

limit is a.

When X = Y = R, we can use Newton’s method. We pick some initial element x0 ∈ R

“close enough” to a zero a of f , and we define the sequence (xk) by

f (x

x

k)

k+1 = xk −

,

f (xk)

835

836

CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS

for all k ≥ 0, provided that f (xk) = 0. The idea is to define xk+1 as the intersection of the

x-axis with the tangent line to the graph of the function x → f(x) at the point (xk, f(xk)).

Indeed, the equation of this tangent line is

y − f(xk) = f (xk)(x − xk),

and its intersection with the x-axis is obtained for y = 0, which yields

f (x

x = x

k)

k −

,

f (xk)

as claimed.

For example, if α > 0 and f (x) = x2 − α, Newton’s method yields the sequence

1

α

xk+1 =

x

2

k + xk

to compute the square root

α of α. It can be shown that the method converges to

α for

any x0 > 0. Actually, the method also converges when x0 < 0! Find out what is the limit.

The case of a real function suggests the following method for finding the zeros of a

function f : Ω → Y , with Ω ⊆ X: given a starting point x0 ∈ Ω, the sequence (xk) is defined

by

xk+1 = xk − (f (xk))−1(f(xk))

for all k ≥ 0.

For the above to make sense, it must be ensured that

(1) All the points xk remain within Ω.

(2) The function f is differentiable within Ω.

(3) The derivative f (x) is a bijection from X to Y for all x ∈ Ω.

These are rather demanding conditions but there are sufficient conditions that guarantee

that they are met. Another practical issue is that irt may be very cotstly to compute

(f (xk))−1 at every iteration step. In the next section, we investigate generalizations of

Newton’s method which address the issues that we just discussed.

30.2

Generalizations of Newton’s Method

Suppose that f : Ω → n

n

R

is given by n functions fi : Ω → R, where Ω ⊆ R . In this case,

finding a zero a of f is equivalent to solving the system

f1(a1 . . . , an) = 0

f2(a1 . . . , an) = 0

...

fn(a1 . . . , an) = 0.

30.2. GENERALIZATIONS OF NEWTON’S METHOD

837

A single iteration of Newton’s method consists in solving the linear system

(J(f )(xk)) k = −f(xk),

and then setting

xk+1 = xk + k,

where J(f )(xk) = ( ∂fi (x

∂x

k)) is the Jacobian matrix of f at xk.

j

In general, it is very costly to compute J(f )(xk) at each iteration and then to solve

the corresponding linear system. If the method converges, the consecutive vectors xk should

differ only a little, as also the corresponding matrices J(f )(xk). Thus, we are led to a variant

of Newton’s method which consists in keeping the same matrix for p consecutive steps (where

p is some fixed integer ≥ 2):

xk+1 = xk − (f (x0))−1(f(xk)),

0 ≤ k ≤ p − 1

xk+1 = xk − (f (xp))−1(f(xk)),

p ≤ k ≤ 2p − 1

...

xk+1 = xk − (f (xrp))−1(f(xk)),

rp ≤ k ≤ (r + 1)p − 1

...

It is also possible to set p = ∞, that is, to use the same matrix f (x0) for all iterations,

which leads to iterations of the form

xk+1 = xk − (f (x0))−1(f(xk)), k ≥ 0,

or even to replace f (x0) by a particular matrix A0 which is easy to invert:

xk+1 = xk − A−1

0 f (xk),

k ≥ 0.

In the last two cases, if possible, we use an LU factorization of f (x0) or A0 to speed up the

method. In some cases, it may even possible to set A0 = I.

The above considerations lead us to the definition of a generalized Newton method , as in

Ciarlet [22] (Chapter 7). Recall that a linear map f ∈ L(E; F ) is called an isomorphism iff

f is continuous, bijective, and f −1 is also continuous.

Definition 30.1. If X and Y are two normed vector spaces and if f : Ω → Y is a function

from some open subset Ω of X, a generalized Newton method for finding zeros of f consists

of

(1) A sequence of families (Ak(x)) of linear isomorphisms from X to Y , for all x ∈ Ω and

all integers k ≥ 0;

(2) Some starting point x0 ∈ Ω;

838

CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS

(3) A sequence (xk) of points of Ω defined by

xk+1 = xk − (Ak(x ))−1(f(xk)), k ≥ 0,

where for every integer k ≥ 0, the integer satisfies the condition

0 ≤ ≤ k.

The function Ak(x) usually depends on f .

Definition 30.1 gives us enough flexibility to capture all the situations that we have

previously discussed:

Ak(x) = f (x),

= k

Ak(x) = f (x),

= min{rp, k}, if rp ≤ k ≤ (r + 1)p − 1, r ≥ 0

Ak(x) = f (x),

= 0

Ak(x) = A0,

where A0 is a linear isomorphism from X to Y . The first case corresponds to Newton’s

orginal method and the others to the variants that we just discussed. We could also have

Ak(x) = Ak, a fixed linear isomorphism independent of x ∈ Ω.

The following theorem inspired by the Newton–Kantorovich theorem gives sufficient con-

ditions that guarantee that the sequence (xk) constructed by a generalized Newton method

converges to a zero of f close to x0. Althoug quite technical, these conditions are not very

surprising.

Theorem 30.1. Let X be a Banach space, let f : Ω → Y be differentiable on the open subset

Ω ⊆ X, and assume that there are constants r, M, β > 0 such that if we let

B = {x ∈ X | x − x0 ≤ r} ⊆ Ω,

then

(1)

sup sup A−1(x)

k

≤ M,

L(Y ;X)

k≥0 x∈B

(2) β < 1 and

β

sup sup

f (x) − Ak(x ) L(X;Y ) ≤

k≥0 x,x ∈B

M

(3)

r

f (x0) ≤

(1 − β).

M

30.2. GENERALIZATIONS OF NEWTON’S METHOD

839

Then, the sequence (xk) defined by

xk+1 = xk − A−1(x )(f(x

k

k)),

0 ≤ ≤ k

is entirely contained within B and converges to a zero a of f , which is the only zero of f in

B. Furthermore, the convergence is geometric, which means that

x

x

1 − x0

k − a

βk.

1 − β

A proof of Theorem 30.1 can be found in Ciarlet [22] (Section 7.5). It is not really difficult

but quite technical.

If we assume that we already know that some element a ∈ Ω is a zero of f, the next

theorem gives sufficient conditions for a special version of a generalized Newton method to

converge. For this special method, the linear isomorphisms Ak(x) are independent of x ∈ Ω.

Theorem 30.2. Let X be a Banach space, and let f : Ω → Y be differentiable on the open

subset Ω ⊆ X. If a ∈ Ω is a point such that f(a) = 0, if f (a) is a linear isomorphism, and

if there is some λ with 0 < λ < 1/2 such that

λ

sup Ak − f (a)

,

L(X;Y ) ≤

k≥0

(f (a))−1 L(Y ;X)

then there is a closed ball B of center a such that for every x0 ∈ B, the sequence (xk) defined

by

xk+1 = xk − A−1(f(x

k

k)),

k ≥ 0,

is entirely contained within B and converges to a, which is the only zero of f in B. Further-

more, the convergence is geometric, which means that

xk − a ≤ βk x0 − a ,

for some β < 1.

A proof of Theorem 30.2 can be also found in Ciarlet [22] (Section 7.5).

For the sake of completeness, we state a version of the Newton–Kantorovich theorem,

which corresponds to the case where Ak(x) = f (x). In this instance, a stronger result can

be obtained especially regarding upper bounds, and we state a version due to Gragg and

Tapia which appears in Problem 7.5-4 of Ciarlet [22].

Theorem 30.3. (Newton–Kantorovich) Let X be a Banach space, and let f : Ω → Y be

differentiable on the open subset Ω ⊆ X. Assume that there exist three positive constants

λ, µ, ν and a point x0 ∈ Ω such that

1

0 < λµν ≤ ,

2

840

CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS

and if we let

1 − 1 − 2λµν

ρ− =

µν

1 +

1 − 2λµν

ρ+ =

µν

B = {x ∈ X | x − x0 < ρ−}

Ω+ = {x ∈ Ω | x − x0 < ρ+},

then B ⊆ Ω, f (x0) is an isomorphism of L(X; Y ), and

(f (x0))−1 ≤ µ,

(f (x0))−1f (x0) ≤ λ,

sup

f (x) − f (y) ≤ ν x − y .

x,y∈Ω+

Then, f (x) is isomorphism of L(X; Y ) for all x ∈ B, and the sequence defined by

xk+1 = xk − (f (xk))−1(f(xk)),

k ≥ 0

is entirely contained within the ball B and converges to a zero a of f which is the only zero

of f in Ω+. Finally, if we write θ = ρ−/ρ+, then we have the following bounds:

2 1 − 2λµν θ2k

1

xk − a ≤

x

λµν

1 − θ2k

1 − x0

if λµν < 2

x

1

x

1 − x0

k − a

if λµν = ,

2k−1

2

and

2 xk+1 − xk

≤ xk − a ≤ θ2k−1 xk − xk−1 .

1 +

(1 + 4θ2k(1 + θ2k)−2)

We can now specialize Theorems 30.1 and 30.2 to the search of zeros of the derivative

f : Ω → E , of a function f : Ω → R, with Ω ⊆ E. The second derivative J of J is a

continuous bilinear form J : E × E → R, but is is convenient to view it as a linear map

in L(E, E ); the continuous linear form J (u) is given by J (u)(v) = J (u, v). In our next

theorem, we assume that the Ak(x) are isomorphisms in L(E, E ).

Theorem 30.4. Let E be a Banach space, let J : Ω → R be twice differentiable on the open

subset Ω ⊆ E, and assume that there are constants r, M, β > 0 such that if we let

B = {x ∈ E | x − x0 ≤ r} ⊆ Ω,

then

30.2. GENERALIZATIONS OF NEWTON’S METHOD

841

(1)

sup sup A−1(x)

k

≤ M,

L(E ;E)

k≥0 x∈B

(2) β < 1 and

β

sup sup

J (x) − Ak(x ) L(E;E ) ≤

k≥0 x,x ∈B

M

(3)

r

J (x0) ≤

(1 − β).

M

Then, the sequence (xk) defined by

xk+1 = xk − A−1(x )(J (x

k

k)),

0 ≤ ≤ k

is entirely contained within B and converges to a zero a of J , which is the only zero of J

in B. Furthermore, the convergence is geometric, which means that

x

x

1 − x0

k − a

βk.

1 − β

In the next theorem, we assume that the Ak(x) are isomorphisms in L(E, E ) that are

independent of x ∈ Ω.

Theorem 30.5. Let E be a Banach space, and let J : Ω → R be twice differentiable on the

open subset Ω ⊆ E. If a ∈ Ω is a point such that J (a) = 0, if J (a) is a linear isomorphism,

and if there is some λ with 0 < λ < 1/2 such that

λ

sup Ak − J (a)

,

L(E;E ) ≤

k≥0

(J (a))−1 L(E ;E)

then there is a closed ball B of center a such that for every x0 ∈ B, the sequence (xk) defined

by

xk+1 = xk − A−1(J (x

k

k)),

k ≥ 0,

is entirely contained within B and converges to a, which is the only zero of J in B. Fur-

thermore, the convergence is geometric, which means that

xk − a ≤ βk x0 − a ,

for some β < 1.

When E =

n

R , the Newton method given by Theorem 30.4 yield an itereation step of

the form

xk+1 = xk − A−1(x )

k

∇J(xk), 0 ≤ ≤ k,

842

CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS

where ∇J(x

n

k) is the gradient of J at xk (here, we identify E

with R ). In particular,

Newton’s original method picks Ak = J , and the iteration step is of the form

xk+1 = xk − (∇2J(xk))−1∇J(xk), k ≥ 0,

where ∇2J(xk) is the Hessian of J at xk.

As remarked in [22] (Section 7.5), generalized Newton methods have a very wide range

of applicability. For example, various versions of gradient descent methods can be viewed as

instances of Newton methods.

Newton’s method also plays an important role in convex optimization, in particular,

interior-point methods. A variant of Newton’s method dealing with equality constraints has

been developed. We refer the reader to Boyd and Vandenberghe [15], Chapters 10 and 11,

for a comprehensive exposition of these topics.

30.3

Summary

The main concepts and results of this chapter are listed below:

• Newton’s method for functions f : R → R.

• Generalized Newton methods.

• The Newton-Kantorovich theorem.