Generalizations
30.1
Newton’s Method for Real Functions of a Real
Argument
In Chapter 29 we investigated the problem of determining when a function J : Ω → R defined
on some open subset Ω of a normed vector space E has a local extremum. Proposition 29.1
gives a necessary condition when J is differentiable: if J has a local extremum at u ∈ Ω,
then we must have
J (u) = 0.
Thus, we are led to the problem of finding the zeros of the derivative
J : Ω → E ,
where E = L(E; R) is the set of linear continuous functions from E to R; that is, the dual
of E, as defined in the Remark after Proposition 29.7.
This leads us to consider the problem in a more general form, namely: Given a function
f : Ω → Y from an open subset Ω of a normed vector space X to a normed vector space Y ,
find
(i) Sufficient conditions which guarantee the existence of a zero of the function f ; that is,
an element a ∈ Ω such that f(a) = 0.
(ii) An algorithm for approximating such an a, that is, a sequence (xk) of points of Ω whose
limit is a.
When X = Y = R, we can use Newton’s method. We pick some initial element x0 ∈ R
“close enough” to a zero a of f , and we define the sequence (xk) by
f (x
x
k)
k+1 = xk −
,
f (xk)
835
836
CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS
for all k ≥ 0, provided that f (xk) = 0. The idea is to define xk+1 as the intersection of the
x-axis with the tangent line to the graph of the function x → f(x) at the point (xk, f(xk)).
Indeed, the equation of this tangent line is
y − f(xk) = f (xk)(x − xk),
and its intersection with the x-axis is obtained for y = 0, which yields
f (x
x = x
k)
k −
,
f (xk)
as claimed.
For example, if α > 0 and f (x) = x2 − α, Newton’s method yields the sequence
1
α
xk+1 =
x
2
k + xk
√
√
to compute the square root
α of α. It can be shown that the method converges to
α for
any x0 > 0. Actually, the method also converges when x0 < 0! Find out what is the limit.
The case of a real function suggests the following method for finding the zeros of a
function f : Ω → Y , with Ω ⊆ X: given a starting point x0 ∈ Ω, the sequence (xk) is defined
by
xk+1 = xk − (f (xk))−1(f(xk))
for all k ≥ 0.
For the above to make sense, it must be ensured that
(1) All the points xk remain within Ω.
(2) The function f is differentiable within Ω.
(3) The derivative f (x) is a bijection from X to Y for all x ∈ Ω.
These are rather demanding conditions but there are sufficient conditions that guarantee
that they are met. Another practical issue is that irt may be very cotstly to compute
(f (xk))−1 at every iteration step. In the next section, we investigate generalizations of
Newton’s method which address the issues that we just discussed.
30.2
Generalizations of Newton’s Method
Suppose that f : Ω → n
n
R
is given by n functions fi : Ω → R, where Ω ⊆ R . In this case,
finding a zero a of f is equivalent to solving the system
f1(a1 . . . , an) = 0
f2(a1 . . . , an) = 0
...
fn(a1 . . . , an) = 0.
30.2. GENERALIZATIONS OF NEWTON’S METHOD
837
A single iteration of Newton’s method consists in solving the linear system
(J(f )(xk)) k = −f(xk),
and then setting
xk+1 = xk + k,
where J(f )(xk) = ( ∂fi (x
∂x
k)) is the Jacobian matrix of f at xk.
j
In general, it is very costly to compute J(f )(xk) at each iteration and then to solve
the corresponding linear system. If the method converges, the consecutive vectors xk should
differ only a little, as also the corresponding matrices J(f )(xk). Thus, we are led to a variant
of Newton’s method which consists in keeping the same matrix for p consecutive steps (where
p is some fixed integer ≥ 2):
xk+1 = xk − (f (x0))−1(f(xk)),
0 ≤ k ≤ p − 1
xk+1 = xk − (f (xp))−1(f(xk)),
p ≤ k ≤ 2p − 1
...
xk+1 = xk − (f (xrp))−1(f(xk)),
rp ≤ k ≤ (r + 1)p − 1
...
It is also possible to set p = ∞, that is, to use the same matrix f (x0) for all iterations,
which leads to iterations of the form
xk+1 = xk − (f (x0))−1(f(xk)), k ≥ 0,
or even to replace f (x0) by a particular matrix A0 which is easy to invert:
xk+1 = xk − A−1
0 f (xk),
k ≥ 0.
In the last two cases, if possible, we use an LU factorization of f (x0) or A0 to speed up the
method. In some cases, it may even possible to set A0 = I.
The above considerations lead us to the definition of a generalized Newton method , as in
Ciarlet [22] (Chapter 7). Recall that a linear map f ∈ L(E; F ) is called an isomorphism iff
f is continuous, bijective, and f −1 is also continuous.
Definition 30.1. If X and Y are two normed vector spaces and if f : Ω → Y is a function
from some open subset Ω of X, a generalized Newton method for finding zeros of f consists
of
(1) A sequence of families (Ak(x)) of linear isomorphisms from X to Y , for all x ∈ Ω and
all integers k ≥ 0;
(2) Some starting point x0 ∈ Ω;
838
CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS
(3) A sequence (xk) of points of Ω defined by
xk+1 = xk − (Ak(x ))−1(f(xk)), k ≥ 0,
where for every integer k ≥ 0, the integer satisfies the condition
0 ≤ ≤ k.
The function Ak(x) usually depends on f .
Definition 30.1 gives us enough flexibility to capture all the situations that we have
previously discussed:
Ak(x) = f (x),
= k
Ak(x) = f (x),
= min{rp, k}, if rp ≤ k ≤ (r + 1)p − 1, r ≥ 0
Ak(x) = f (x),
= 0
Ak(x) = A0,
where A0 is a linear isomorphism from X to Y . The first case corresponds to Newton’s
orginal method and the others to the variants that we just discussed. We could also have
Ak(x) = Ak, a fixed linear isomorphism independent of x ∈ Ω.
The following theorem inspired by the Newton–Kantorovich theorem gives sufficient con-
ditions that guarantee that the sequence (xk) constructed by a generalized Newton method
converges to a zero of f close to x0. Althoug quite technical, these conditions are not very
surprising.
Theorem 30.1. Let X be a Banach space, let f : Ω → Y be differentiable on the open subset
Ω ⊆ X, and assume that there are constants r, M, β > 0 such that if we let
B = {x ∈ X | x − x0 ≤ r} ⊆ Ω,
then
(1)
sup sup A−1(x)
k
≤ M,
L(Y ;X)
k≥0 x∈B
(2) β < 1 and
β
sup sup
f (x) − Ak(x ) L(X;Y ) ≤
k≥0 x,x ∈B
M
(3)
r
f (x0) ≤
(1 − β).
M
30.2. GENERALIZATIONS OF NEWTON’S METHOD
839
Then, the sequence (xk) defined by
xk+1 = xk − A−1(x )(f(x
k
k)),
0 ≤ ≤ k
is entirely contained within B and converges to a zero a of f , which is the only zero of f in
B. Furthermore, the convergence is geometric, which means that
x
x
1 − x0
k − a
≤
βk.
1 − β
A proof of Theorem 30.1 can be found in Ciarlet [22] (Section 7.5). It is not really difficult
but quite technical.
If we assume that we already know that some element a ∈ Ω is a zero of f, the next
theorem gives sufficient conditions for a special version of a generalized Newton method to
converge. For this special method, the linear isomorphisms Ak(x) are independent of x ∈ Ω.
Theorem 30.2. Let X be a Banach space, and let f : Ω → Y be differentiable on the open
subset Ω ⊆ X. If a ∈ Ω is a point such that f(a) = 0, if f (a) is a linear isomorphism, and
if there is some λ with 0 < λ < 1/2 such that
λ
sup Ak − f (a)
,
L(X;Y ) ≤
k≥0
(f (a))−1 L(Y ;X)
then there is a closed ball B of center a such that for every x0 ∈ B, the sequence (xk) defined
by
xk+1 = xk − A−1(f(x
k
k)),
k ≥ 0,
is entirely contained within B and converges to a, which is the only zero of f in B. Further-
more, the convergence is geometric, which means that
xk − a ≤ βk x0 − a ,
for some β < 1.
A proof of Theorem 30.2 can be also found in Ciarlet [22] (Section 7.5).
For the sake of completeness, we state a version of the Newton–Kantorovich theorem,
which corresponds to the case where Ak(x) = f (x). In this instance, a stronger result can
be obtained especially regarding upper bounds, and we state a version due to Gragg and
Tapia which appears in Problem 7.5-4 of Ciarlet [22].
Theorem 30.3. (Newton–Kantorovich) Let X be a Banach space, and let f : Ω → Y be
differentiable on the open subset Ω ⊆ X. Assume that there exist three positive constants
λ, µ, ν and a point x0 ∈ Ω such that
1
0 < λµν ≤ ,
2
840
CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS
and if we let
√
1 − 1 − 2λµν
ρ− =
µν
√
1 +
1 − 2λµν
ρ+ =
µν
B = {x ∈ X | x − x0 < ρ−}
Ω+ = {x ∈ Ω | x − x0 < ρ+},
then B ⊆ Ω, f (x0) is an isomorphism of L(X; Y ), and
(f (x0))−1 ≤ µ,
(f (x0))−1f (x0) ≤ λ,
sup
f (x) − f (y) ≤ ν x − y .
x,y∈Ω+
Then, f (x) is isomorphism of L(X; Y ) for all x ∈ B, and the sequence defined by
xk+1 = xk − (f (xk))−1(f(xk)),
k ≥ 0
is entirely contained within the ball B and converges to a zero a of f which is the only zero
of f in Ω+. Finally, if we write θ = ρ−/ρ+, then we have the following bounds:
√
2 1 − 2λµν θ2k
1
xk − a ≤
x
λµν
1 − θ2k
1 − x0
if λµν < 2
x
1
x
1 − x0
k − a
≤
if λµν = ,
2k−1
2
and
2 xk+1 − xk
≤ xk − a ≤ θ2k−1 xk − xk−1 .
1 +
(1 + 4θ2k(1 + θ2k)−2)
We can now specialize Theorems 30.1 and 30.2 to the search of zeros of the derivative
f : Ω → E , of a function f : Ω → R, with Ω ⊆ E. The second derivative J of J is a
continuous bilinear form J : E × E → R, but is is convenient to view it as a linear map
in L(E, E ); the continuous linear form J (u) is given by J (u)(v) = J (u, v). In our next
theorem, we assume that the Ak(x) are isomorphisms in L(E, E ).
Theorem 30.4. Let E be a Banach space, let J : Ω → R be twice differentiable on the open
subset Ω ⊆ E, and assume that there are constants r, M, β > 0 such that if we let
B = {x ∈ E | x − x0 ≤ r} ⊆ Ω,
then
30.2. GENERALIZATIONS OF NEWTON’S METHOD
841
(1)
sup sup A−1(x)
k
≤ M,
L(E ;E)
k≥0 x∈B
(2) β < 1 and
β
sup sup
J (x) − Ak(x ) L(E;E ) ≤
k≥0 x,x ∈B
M
(3)
r
J (x0) ≤
(1 − β).
M
Then, the sequence (xk) defined by
xk+1 = xk − A−1(x )(J (x
k
k)),
0 ≤ ≤ k
is entirely contained within B and converges to a zero a of J , which is the only zero of J
in B. Furthermore, the convergence is geometric, which means that
x
x
1 − x0
k − a
≤
βk.
1 − β
In the next theorem, we assume that the Ak(x) are isomorphisms in L(E, E ) that are
independent of x ∈ Ω.
Theorem 30.5. Let E be a Banach space, and let J : Ω → R be twice differentiable on the
open subset Ω ⊆ E. If a ∈ Ω is a point such that J (a) = 0, if J (a) is a linear isomorphism,
and if there is some λ with 0 < λ < 1/2 such that
λ
sup Ak − J (a)
,
L(E;E ) ≤
k≥0
(J (a))−1 L(E ;E)
then there is a closed ball B of center a such that for every x0 ∈ B, the sequence (xk) defined
by
xk+1 = xk − A−1(J (x
k
k)),
k ≥ 0,
is entirely contained within B and converges to a, which is the only zero of J in B. Fur-
thermore, the convergence is geometric, which means that
xk − a ≤ βk x0 − a ,
for some β < 1.
When E =
n
R , the Newton method given by Theorem 30.4 yield an itereation step of
the form
xk+1 = xk − A−1(x )
k
∇J(xk), 0 ≤ ≤ k,
842
CHAPTER 30. NEWTON’S METHOD AND ITS GENERALIZATIONS
where ∇J(x
n
k) is the gradient of J at xk (here, we identify E
with R ). In particular,
Newton’s original method picks Ak = J , and the iteration step is of the form
xk+1 = xk − (∇2J(xk))−1∇J(xk), k ≥ 0,
where ∇2J(xk) is the Hessian of J at xk.
As remarked in [22] (Section 7.5), generalized Newton methods have a very wide range
of applicability. For example, various versions of gradient descent methods can be viewed as
instances of Newton methods.
Newton’s method also plays an important role in convex optimization, in particular,
interior-point methods. A variant of Newton’s method dealing with equality constraints has
been developed. We refer the reader to Boyd and Vandenberghe [15], Chapters 10 and 11,
for a comprehensive exposition of these topics.
30.3
Summary
The main concepts and results of this chapter are listed below:
• Newton’s method for functions f : R → R.
• Generalized Newton methods.
• The Newton-Kantorovich theorem.