steepest descent technique, which is based on an idea that we discussed in Section 2.4. Recall
that the negative gradient −∇ f gives the direction of the fastest rate of decrease of a function
f . The crux of the steepest descent idea, then, is that starting from some initial point, you
move a certain amount in the direction of −∇ f at that point. Wherever that takes you
2.6 Unconstrained Optimization: Numerical Methods
95
becomes your new point, and you then just keep repeating that procedure until eventually
(hopefully) you reach the point where f has its smallest value. There is a “pure” steepest
descent method, and a multitude of variations on it that improve the rate of convergence,
ease of calculation, etc. In fact, Newton’s algorithm can be interpreted as a modified steepest
descent method. For more discussion of this, and of nonlinear programming in general, see
Exercises
C
1. Recall Example 2.21 from the previous section, where we showed that the point (2, 1) was
a global minimum for the function f ( x, y) = ( x − 2)4 + ( x − 2 y)2. Notice that our computer
program can be modified fairly easily to use this function (just change the return values
in the fx, fy, fxx, fyy and fxy function definitions to use the appropriate partial derivative).
Either modify that program or write one of your own in a programming language of your
choice to show that Newton’s algorithm does lead to the point (2, 1). First use the initial
point (0, 3), then use the initial point (3, 2), and compare the results. Make sure that your
program attempts to do 100 iterations of the algorithm. Did anything strange happen
when your program ran? If so, how do you explain it? ( Hint: Something strange should
happen. )
2. There is a version of Newton’s algorithm for solving a system of two equations
f ( x, y)
( x, y)
1
= 0
and
f 2
= 0 ,
where f ( x, y) and f ( x, y) are smooth real-valued functions:
1
2
Pick an initial point ( x , y ). For n
0
0
= 0,1,2,3,..., define:
f ( x , y )
f ( x , y )
1
n
n
2
n
n
f ( x , y )
f ( x , y )
1
n
n
2
n
n
∂ f 1 ( x , y ) ∂f 2 ( x , y )
∂ f 1
∂ y
n
n
∂ y
n
n
( x , y )
∂ f 2 ( x , y )
n
n
n
n
x
,
y
∂x
∂x
, where
n+1 = xn −
D( x , y )
n+1 = yn +
D( x , y )
n
n
n
n
∂ f
∂ f
∂ f
∂ f
D( x , y )
1 ( x , y )
2 ( x , y )
1 ( x , y )
2 ( x , y ) .
n
n
=
−
∂x
n
n
∂y
n
n
∂y
n
n
∂x
n
n
Then the sequence of points ( x , y )∞
converges to a solution. Write a computer program
n
n n=1
that uses this algorithm to find approximate solutions to the system of equations
f ( x, y)
( x, y)
1
= sin( xy) − x − y = 0
and
f 2
= e 2 x − 2 x + 3 y = 0 .
Show that you get two different solutions when using (0, 0) and (1, 1) for the initial point
( x , y ).
0
0
96
CHAPTER 2. FUNCTIONS OF SEVERAL VARIABLES
2.7 Constrained Optimization: Lagrange Multipliers
In Sections 2.5 and 2.6 we were concerned with finding maxima and minima of functions
without any constraints on the variables (other than being in the domain of the function).
What would we do if there were constraints on the variables? The following example illus-
trates a simple case of this type of problem.
Example 2.24. For a rectangle whose perimeter is 20 m, find the dimensions that will max-
imize the area.
Solution: The area A of a rectangle with width x and height y is A = xy. The perimeter P of the rectangle is then given by the formula P = 2 x+2 y. Since we are given that the perimeter
P = 20, this problem can be stated as:
Maximize : f ( x, y) = xy
given : 2 x + 2 y = 20
The reader is probably familiar with a simple method, using single-variable calculus, for
solving this problem. Since we must have 2 x + 2 y = 20, then we can solve for, say, y in
terms of x using that equation. This gives y = 10 − x, which we then substitute into f to get
f ( x, y) = xy = x(10 − x) = 10 x − x 2. This is now a function of x alone, so we now just have to maximize the function f ( x) = 10 x − x 2 on the interval [0,10]. Since f ′( x) = 10−2 x = 0 ⇒ x = 5
and f ′′(5) = −2 < 0, then the Second Derivative Test tells us that x = 5 is a local maximum
for f , and hence x = 5 must be the global maximum on the interval [0,10] (since f = 0 at
the endpoints of the interval). So since y = 10 − x = 5, then the maximum area occurs for a
rectangle whose width and height both are 5 m.
Notice in the above example that the ease of the solution depended on being able to solve
for one variable in terms of the other in the equation 2 x + 2 y = 20. But what if that were not
possible (which is often the case)? In this section we will use a general method, called the
Lagrange multiplier method 10, for solving constrained optimization problems:
Maximize (or minimize) : f ( x, y) (or f ( x, y, z))
given : g( x, y) = c (or g( x, y, z) = c) for some constant c
The equation g( x, y) = c is called the constraint equation, and we say that x and y are constrained by g( x, y) = c. Points ( x, y) which are maxima or minima of f ( x, y) with the condition that they satisfy the constraint equation g( x, y) = c are called constrained maximum
or constrained minimum points, respectively. Similar definitions hold for functions of three
variables.
The Lagrange multiplier method for solving such problems can now be stated:
10Named after the French mathematician Joseph Louis Lagrange (1736-1813).
2.7 Constrained Optimization: Lagrange Multipliers
97
Theorem 2.7. Let f ( x, y) and g( x, y) be smooth functions, and suppose that c is a scalar constant such that ∇ g( x, y) = 0 for all ( x, y) that satisfy the equation g( x, y) = c. Then to solve the constrained optimization problem
Maximize (or minimize) : f ( x, y)
given : g( x, y) = c ,
find the points ( x, y) that solve the equation ∇ f ( x, y) = λ∇ g( x, y) for some constant λ (the number λ is called the Lagrange multiplier). If there is a constrained maximum or minimum, then it must be such a point.
A rigorous proof of the above theorem requires use of the Implicit Function Theorem,
which is beyond the scope of this text.11 Note that the theorem only gives a necessary con-
dition for a point to be a constrained maximum or minimum. Whether a point ( x, y) that
satisfies ∇ f ( x, y) = λ∇ g( x, y) for some λ actually is a constrained maximum or minimum can sometimes be determined by the nature of the problem itself. For instance, in Example 2.24
it was clear that there had to be a global maximum.
So how can you tell when a point that satisfies the condition in Theorem 2.7 really is a
constrained maximum or minimum? The answer is that it depends on the constraint func-
tion g( x, y), together with any implicit constraints. It can be shown12 that if the constraint
equation g( x, y) = c (plus any hidden constraints) describes a bounded set B in R2, then the
constrained maximum or minimum of f ( x, y) will occur either at a point ( x, y) satisfying
∇ f ( x, y) = λ∇ g( x, y) or at a “boundary” point of the set B.
In Example 2.24 the constraint equation 2 x+2 y = 20 describes a line in R2, which by itself
is not bounded. However, there are “hidden” constraints, due to the nature of the problem,
namely 0 ≤ x, y ≤ 10, which cause that line to be restricted to a line segment in R2 (including
the endpoints of that line segment), which is bounded.
Example 2.25. For a rectangle whose perimeter is 20 m, use the Lagrange multiplier method
to find the dimensions that will maximize the area.
Solution: As we saw in Example 2.24, with x and y representing the width and height,
respectively, of the rectangle, this problem can be stated as:
Maximize : f ( x, y) = xy
given : g( x, y) = 2 x + 2 y = 20
Then solving the equation ∇ f ( x, y) = λ∇ g( x, y) for some λ means solving the equations
11See TAYLOR and MANN, § 6.8 for more detail.
12Again, see TAYLOR and MANN.
98
CHAPTER 2. FUNCTIONS OF SEVERAL VARIABLES
∂ f
∂g
∂ f
∂g
= λ
and
= λ
, namely:
∂x
∂x
∂y
∂y
y = 2 λ ,
x = 2 λ
The general idea is to solve for λ in both equations, then set those expressions equal (since
they both equal λ) to solve for x and y. Doing this we get
y
x
= λ =
⇒
x = y ,
2
2
so now substitute either of the expressions for x or y into the constraint equation to solve for
x and y:
20 = g( x, y) = 2 x + 2 y = 2 x + 2 x = 4 x
⇒
x = 5
⇒
y = 5
There must be a maximum area, since the minimum area is 0 and f (5, 5) = 25 > 0, so the
point (5, 5) that we found (called a constrained critical point) must be the constrained maxi-
mum.
∴ The maximum area occurs for a rectangle whose width and height both are 5 m.
Example 2.26. Find the points on the circle x 2 + y 2 = 80 which are closest to and farthest
from the point (1, 2).
Solution: The distance d from any point ( x, y) to the point (1, 2) is
d =
( x − 1)2 + ( y − 2)2 ,
and minimizing the distance is equivalent to minimizing the square of the distance. Thus
the problem can be stated as:
Maximize (and minimize) : f ( x, y) = ( x − 1)2 + ( y − 2)2
given : g( x, y) = x 2 + y 2 = 80
Solving ∇ f ( x, y) = λ∇ g( x, y) means solving the following equations:
2( x − 1) = 2 λx ,
2( y − 2) = 2 λy
Note that x = 0 since otherwise we would get −2 = 0 in the first equation. Similarly, y = 0.
So we can solve both equations for λ as follows:
x − 1
y − 2
= λ =
⇒
x y − y = xy − 2 x
⇒
y = 2 x
x
y
2.7 Constrained Optimization: Lagrange Multipliers
99
y
Substituting this into g( x, y) = x 2 + y 2 = 80 yields 5 x 2 = 80,
x 2 + y 2 = 80
(4, 8)
so x = ±4. So the two constrained critical points are (4,8) and
(−4,−8). Since f (4,8) = 45 and f (−4,−8) = 125, and since there
must be points on the circle closest to and farthest from (1, 2),
(1, 2)
x
then it must be the case that (4, 8) is the point on the circle clos-
0
est to (1, 2) and (−4,−8) is the farthest from (1,2) (see Figure
2.7.1).
(−4,−8)
Notice that since the constraint equation x 2+ y 2 = 80 describes
a circle, which is a bounded set in R2, then we were guaranteed
Figure 2.7.1
that the constrained critical points we found were indeed the
constrained maximum and minimum.
The Lagrange multiplier method can be extended to functions of three variables.
Example 2.27.
Maximize (and minimize) : f ( x, y, z) = x + z
given : g( x, y, z) = x 2 + y 2 + z 2 = 1
Solution: Solve the equation ∇ f ( x, y, z) = λ∇ g( x, y, z):
1 = 2 λx
0 = 2 λy
1 = 2 λz
The first equation implies λ = 0 (otherwise we would have 1 = 0), so we can divide by λ in the
second equation to get y = 0 and we can divide by λ in the first and third equations to get
x = 1
2 λ = z. Substituting these expressions into the constraint equation g( x, y, z) = x 2 + y 2 +
z 2 = 1 yields the constrained critical points
1 ,0, 1 and −1 ,0, −1 . Since f 1 ,0, 1 >
2
2
2
2
2
2
f −1 , 0, −1 , and since the constraint equation x 2 + y 2 + z 2 = 1 describes a sphere (which is
2
2
bounded) in R3, then
1 ,0, 1
is the constrained maximum point and
−1 ,0, −1 is the
2
2
2
2
constrained minimum point.
So far we have not attached any significance to the value of the Lagrange multiplier λ. We
needed λ only to find the constrained critical points, but made no use of its value. It turns
out that λ gives an approximation of the change in the value of the function f ( x, y) that we
wish to maximize or minimize, when the constant c in the constraint equation g( x, y) = c is
changed by 1.
100
CHAPTER 2. FUNCTIONS OF SEVERAL VARIABLES
For example, in Example 2.25 we showed that the constrained optimization problem
Maximize : f ( x, y) = xy
given : g( x, y) = 2 x + 2 y = 20
had the solution ( x, y) = (5,5), and that λ = x/2 = y/2. Thus, λ = 2.5. In a similar fashion we
could show that the constrained optimization problem
Maximize : f (