Linear Controller Design: Limits of Performance by Stephen Boyd and Craig Barratt - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 14

Special Algorithms for Convex

Optimization

We describe several simple but powerful algorithms that are specically designed

for convex optimization. These algorithms require only the ability to compute

the function value and any subgradient for each relevant function. A key feature

of these algorithms is that they maintain converging upper and lower bounds on

the quantity being computed, and thus can compute the quantity to a guaranteed

We demonstrate each on the two-parameter example of chapter 11 in

ac

cur

acy.

chapter 15 they are applied to more substantial problems.

In this chapter we concentrate on the nite-dimensional case these methods are

extended to the in nite-dimensional case in the next chapter.

14.1

Notation and Problem Definitions

We will consider several speci c forms of optimization problems.

The unconstrained problem is to compute the minimum value of ,

= min (z)

(14.1)

and in addition to compute a minimizer x , which satis es (x ) = . We will use

the notation

x = argmin (z)

to mean that x is some minimizer of .

The constrained optimization problem is to compute the minimum value of ,

subject to the constraints 1(z) 0, ..., m(z) 0, i.e.,

=

min

(z)

1(z) 0 ... m(z) 0

311

index-321_1.png

index-321_2.png

index-321_3.png

index-321_4.png

312

CHAPTER 14 SPECIAL ALGORITHMS FOR CONVEX OPTIMIZATION

and in addition to compute a minimizer x that satis es (x ) = , 1(x ) 0,

..., m(x ) 0. To simplify notation we de ne the constraint function

(z) = max

1 i m i(z)

so we can express the constraints as (z) 0:

= min (z):

(14.2)

(z) 0

We will say that z is feasible if (z) 0.

The feasibility problem is:

nd x such that (x) 0 or determine that there is no such x:

(14.3)

Algorithms that are designed for one form of the convex optimization problem

can often be modi ed or adapted for others we will see several examples of this.

It is also possible to modify the algorithms we present to directly solve other prob-

lems that we do not consider, e.g., to nd Pareto optimal points or to do goal

programming.

Throughout this chapter, unless otherwise stated, and 1 ... m are convex

functions from n into . The constraint function is then also convex.

R

R

14.2

On Algorithms for Convex Optimization

Many algorithms for convex optimization have been devised, and we could not hope

to survey them here. Instead, we give a more detailed description of two types of

algorithms that are speci cally designed for convex problems: cutting-plane and

ellipsoid algorithms.

Let us brie y mention another large family of algorithms, the descent meth-

ods, contrasting them with the algorithms that we will describe. In these methods,

successive iterations produce points that have decreasing objective values. General-

purpose descent methods have been successfully applied to, and adapted for, non-

di erentiable convex optimization see the Notes and References at the end of this

chapter. The cutting-plane and ellipsoid algorithms are not descent methods ob-

jective values often increase after an iteration.

Possible advantages and disadvantages of some of the descent methods over

cutting-plane and ellipsoid methods are:

The cutting-plane and ellipsoid algorithms require only the evaluation of func-

tion values and any one (of possibly many) subgradients of functions. Most

(but not all) descent methods require the computation of a descent direction

or even steepest descent direction for the function at a point this can be

a di cult task in itself, and is always at least as di cult as computing a

subgradient.

index-322_1.png

index-322_2.png

index-322_3.png

index-322_4.png

14.3 CUTTING-PLANE ALGORITHMS

313

Many (but not all) descent methods for convex optimization retain the heuris-

tic stopping criteria used when they are applied to general (nonconvex) opti-

mization problems. In contrast we will see that the cutting-plane and ellipsoid

methods have simple stopping criteria that guarantee the optimum has been

found to a known accuracy.

Most descent methods for nondi erentiable optimization are substantially

more complicated than the cutting-plane and ellipsoid algorithms. These

methods often include parameters that need to be adjusted for the partic-

ular problem.

For smooth problems, many of the descent methods o er substantially faster

convergence, e.g., quadratic.

Let us immediately qualify the foregoing remarks. Many descent methods have

worked very well in practice: one famous example is the simplex method for linear

programming. In addition, it is neither possible nor pro table to draw a sharp

line between descent methods and non-descent methods. For example, the ellipsoid

algorithm can be interpreted as a type of variable-metric descent algorithm. We

refer the reader to the references at the end of this chapter.

14.3

Cutting-Plane Algorithms

14.3.1

Computing a Lower Bound on

We consider the unconstrained problem (14.1). Suppose we have computed function

values and at least one subgradient at x1 ... xk:

(x1) ... (xk)

g1 @ (x1) ... gk @ (xk):

(14.4)

2

2

Each of these function values and subgradients yields an a ne lower bound on :

(z)

(xi) + gTi(z xi)

for all z 1 i k

;

and hence

(z)

lbk(z) = max ; (x

x

1 i k

i) + gTi(z

i) :

(14.5)

;

lbk is a piecewise linear convex function that is everywhere less than or equal to .

Moreover, this global lower bound function is tight at the points x1 ... xk, since

(xi) = lbk(xi) for 1 i k. In fact, lbk is the smallest convex function that

has the function values and subgradients given in (14.4). An example is shown in

gure 14.1.

It follows that

Lk = min lb

z k (z):

(14.6)

index-323_1.png

index-323_2.png

index-323_3.png

index-323_4.png

314

CHAPTER 14 SPECIAL ALGORITHMS FOR CONVEX OPTIMIZATION

slope 1g

;

;

( )

x

@

@

R

;

lb

@

I

;

3

@

slope 2g

slope 3g

;

;

H

Y

H

3

L

1

3

3

x

x

2

x

x

The maximum of the ane support functionals at 1, 2, and

Figure

14.1

x

x

3 gives the global lower bound function lb3, which is tight at the points

x

1, 2, and 3. 3, the minimum of lb3, which occurs at 3, is found by

x

x

x

L

x

solving (14.7). 3 is a lower bound on .

L

The minimization problem on the right-hand side is readily solved via linear pro-

gramming. We can express (14.6) as

Lk =

min

L

L

z

T

(x

)

+

g

(z

;

x

)

L

1

i

k

i

i

i

which has the form of a linear program in the variable w:

Lk = min cTw

(14.7)

Aw b

where

2

gT1 1 3

2

gT1x1 (x1) 3

;

;

w = z

.

.

L

c = 01 A = 6 .. ... 7 b = 6

..

7

: (14.8)

4

5

4

5

gTk 1

gTkxk (xk)

;

;

In fact, this linear program determines not only Lk, but also a minimizer of lbk,

which we denote xk (shown in gure 14.1 as well).

The idea behind (14.6) is elementary, but it is very important. It shows that

by knowing only a nite number of function values and subgradients of a convex

function, we can deduce a lower bound on the minimum of the function. No such

index-324_1.png

index-324_2.png

index-324_3.png

index-324_4.png

14.3 CUTTING-PLANE ALGORITHMS

315

property holds for general nonconvex functions. This property will be useful in de-

signing stopping criteria for optimization algorithms that can guarantee a maximum

error.The function lbk can be unbounded below, so that Lk = (e.g. when k =

;1

1 and g1 = 0), which is not a useful bound. This can be avoided by explicitly

6

specifying bounds on the variables, so that we consider

=

min

z

(z):

min z zmax

In this case we have the lower bound

Lk = min

lb

zmin z zmax k (z)

which can be computed by adding the bound inequalities zmin z zmax to the

linear program (14.7).

Finally we mention that having computed at the points x1 ... xk we have

the simple upper bound on :

Uk = min

1 i k (xi)

(14.9)

which is the lowest objective value so far encountered. If an objective value and a

subgradient are evaluated at another point xk+1, the new lower and upper bounds

Lk+1 and Uk+1 are improved:

Lk Lk+1

Uk+1 Uk:

14.3.2

Kelley’s Cutting-Plane Algorithm

Kelley's cutting-plane algorithm is a natural extension of the lower bound compu-

tation of the previous section. It is simply:

x1 zmin zmax any initial box that contains minimizers

k 0

repeat fk k+1

compute (xk) and any gk @ (xk)

2

solve (14.7) to nd xk and Lk

compute Uk using (14.9)

xk+1 xk

until ( Uk Lk

)

g

;

An example of the second iteration of the cutting-plane algorithm is shown in

gure 14.2. The idea behind Kelley's algorithm is that at each iteration the lower

bound function lbk is re ned or improved (i.e., made larger, so that lbk+1

lbk),

index-325_1.png

index-325_2.png

index-325_3.png

index-325_4.png

index-325_5.png

index-325_6.png

316

CHAPTER 14 SPECIAL ALGORITHMS FOR CONVEX OPTIMIZATION

slope 1g

;

;

( )

x

@

@

R

@

I

@

lb

slope 2g

2

;

;

2

L

1

2

x

2

x

x

In the second iteration of the cutting-plane algorithm a sub-

Figure

14.2

gradient 2 at 2 is found, giving the global lower bound function lb2. (14.7)

g

x

is solved to give 2 and 3 = 2. (The next iteration of the cutting-plane

L

x

x

algorithm is shown in gure 14.1.)

since one more term is added to the maximum in (14.5). Moreover, since lbk is tight

at x1 ... xk, it is a good approximation to near x1 ... xk. In the next section,

we will use this idea to show that the cutting-plane algorithm always terminates.

The cutting-plane algorithm maintains upper and lower bounds on the quantity

being computed:

Lk

Uk

which moreover converge as the algorithm proceeds:

Uk Lk 0 as k 0:

;

!

!

Thus we compute to a guaranteed accuracy of : on exit, we have a point with

a low function value, and in addition we have a proof that there are no points with

function value more than better than that of our point. Stopping criteria for

general optimization algorithms (e.g., descent methods) are often more heuristic|

they cannot guarantee that on exit, has been computed to a given accuracy.

Provided

= 0, it is also possible to specify a maximum relative error (as

6

opposed to absolute error), with the modi ed stopping criterion

until ( Uk Lk

min Lk Uk )

;

fj

j

j

jg

which guarantees a relative accuracy of at least on exit. These stopping criteria can

o er a great advantage when the accuracy required is relatively low, for example,

index-326_1.png

index-326_2.png

index-326_3.png

index-326_4.png

index-326_5.png

14.3 CUTTING-PLANE ALGORITHMS

317

10%. This relative accuracy can be achieved long before the objective values or

iterates xk appear to be converging still, we can con dently halt the algorithm.

A valid criticism of the cutting-plane algorithm is that the number of constraints

in the linear program (14.7) that must be solved at each iteration grows with the to-

tal number of elapsed iterations. In practice, if these linear programs are initialized

at the previous point, they can be solved very rapidly, e.g., in a few simplex itera-

tions. Some cutting-plane algorithms developed since Kelley's drop constraints, so

the size of the linear programs to be solved does not grow as the algorithm proceeds

see the Notes and References at end of this chapter.

While the cutting-plane algorithm makes use of all of the information ( (xi) gi,

i = 1 ... k) that we have obtained about in previous iterations, the ellipsoid

methods that we will describe later in this chapter maintain a data structure of

constant size (at the cost of an approximation) that describes what we have learned

about the function in past iterations.

14.3.3

Proof of Convergence

We noted above that when the cutting-plane algorithm terminates, we know that

the optimum objective

lies between the bounds L and U, which di er by less

than . In this section we show that the cutting-plane algorithm does in fact always

terminate.

Let Binit denote the initial box, and

G = sup

g :

k

k

g

2

@

(z

)

init

z

2

B

(G can be shown to be nite.) Suppose that for k = 1 ... K the algorithm has not

terminated, i.e. Uk Lk > for k = 1 ... K. Since

;

Lk = lbk(xk+1) = max;

j k (xj) + gTj(xk+1 xj)

;

we have

Lk

(xj) + gTj(xk+1 xj)

1 j k K

;

and hence using (xj) Uk and the Cauchy-Schwarz inequality

Lk Uk G xk+1 xj

1 j k K:

;

k

;

k

From this and Uk Lk > for k K we conclude that

;

xi xj >

= j

(14.10)

k

;

k

G

i j K i 6

in other words, the minimum distance between any two of the points x1 ... xk

exceeds =G.

index-327_1.png

index-327_2.png

index-327_3.png

index-327_4.png

318

CHAPTER 14 SPECIAL ALGORITHMS FOR CONVEX OPTIMIZATION

A volume argument can now be used to show that K cannot be too large.

Around each xi we place a ball Bi of diameter =G. By (14.10), these balls do not

intersect, so their total volume is K times the volume of one ball. These balls are

all contained in a box B which is the original box Binit enlarged in every dimension

by =2G. Hence the total volume of the balls must be less than the volume of B.

We conclude that K is no larger than the volume of B divided by the volume of

one of the balls.

If we take Kmax = (B)= (B1), then within Kmax iterations, the cutting-

v

ol

v

ol

plane algorithm terminates. (We comment that this upper bound is a very poor

bound, vastly greater than the typical number of iterations required.)

14.3.4

Cutting-Plane Algorithm with Constraints

The cutting-plane algorithm of the previous section can be modi ed in many ways

to handle the constrained optimization problem (14.2). We will show one simple

method, which uses the same basic idea of forming a piecewise linear lower bound

approximation of a convex function based on the function values and subgradients

already evaluated.

Suppose we have computed function values and at least one subgradient at

x1 ... xk for both the objective and the constraint function:

(x1) ... (xk) g1 @ (x1) ... gk @ (xk)

2

2

(x1) ... (xk) h1 @ (x1) ... hk @ (xk):

2

2

These points xi need not be feasible.

We form piecewise linear lower bound functions for both the objective and the

constraint: lbk in (14.5) and

lbk(z) = max ; (x

x

1 i k

i) + hTi(z

i)

(14.11)

;

which satis es lbk(z) (z) for all z

n.

2

R

The lower bound function lbk yields a polyhedral outer approximation to the

feasible set:

z (z) 0

z lbk(z) 0 :

(14.12)

f

j

g

Thus we have the following lower bound on :

Lk = min lbk(z) lbk(z) 0 :

(14.13)

As in section 14.3.1, the optimization problem (14.13) is equivalent to a linear

program:

Lk = min cTw

(14.14)

Aw b

index-328_1.png

index-328_2.png

index-328_3.png

index-328_4.png

14.3 CUTTING-PLANE ALGORITHMS

319

where

2

gT

3

2

3

1

1

gT1x1 (x1)

;

.

;

.

6

.. ... 7

6

..

7

6

7

6

7

6

7

6

7

w = z

6

gT

1 7

6

gT

(xk) 7

L

c = 01 A = k

k xk

;

;

6

7

b = 6

7

:

6

hT

7

6

hT

(x1) 7

6

1

0 7

6

1 x1

7

;

6

..

7

6

..

7

6

.

... 7

6

.

7

4

5

4

5

hTk 0

hTkxk (xk)

;

We note again that the lower bound Lk in (14.13) can be computed no matter how

the points and subgradients were chosen.

The modi ed cutting-plane algorithm is exactly the same as the cutting-plane

algorithm, except that the linear program (14.14) is solved instead of (14.8), and

the stopping criterion must be modi ed for reasons that we now consider.

If the feasible set is empty then eventually the linear program (14.14) will be-

come infeasible. When this occurs, the cutting-plane algorithm terminates with the

conclusion that the feasible set is empty. On the other hand, if the feasible set is not

empty, and the optimum occurs on the boundary of the feasible set, which is often

the case, then the iterates xk are generally infeasible, since they lie in the outer

approximation