Linear Controller Design: Limits of Performance by Stephen Boyd and Craig Barratt - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

CHAPTER 13 ELEMENTS OF CONVEX ANALYSIS

If is a convex functional on a (possibly in nite-dimensional) vector space ,

V

then we say

is a subgradient for at

if

is a linear functional on ,

sg

sg

v

2

V

V

and we have

( )

( ) + (

) for all

(13.3)

sg

z

v

z

;

v

z

2

V

:

The subdi erential ( ) consists of all subgradients of at note that it is a set

@

v

v

of linear functionals on .

V

If = n, then every linear functional on has the form T for some vector

V

R

V

g

z

n, and our two de nitions of subgradient are therefore the same, provided we

g

2

R

ignore the distinction between the vector

n and the linear functional on n

g

2

R

R

given by the inner product with .g

13.1.2

Quasigradients

For quasiconvex functions, there is a concept analogous to the subgradient. Suppose

: n

is quasiconvex, which we recall from section 6.2.2 means that

R

!

R

( + (1 )~) max ( ) (~)

for all 0

1

~

n

x

;

x

f

x

x

g

x

x

2

R

:

We say that is a quasigradient for at if

g

x

( )

( ) whenever T(

) 0

(13.4)

z

x

g

z

;

x

:

This simply means that the hyperplane T(

) = 0 forms a simple cut for ,

g

z

;

x

exactly as in gure 13.2: if we are searching for a minimizer of , we can rule out

the half-space T(

) 0.

g

z

;

x

>

If is di erentiable and ( ) = 0, then ( ) is a quasigradient if is convex,

r

x

6

r

x

then (13.2) shows that any subgradient is also a quasigradient. It can be shown

that every quasiconvex function has at least one quasigradient at every point. Note

that the length of a quasigradient is irrelevant (for our purposes): all that matters

is its direction, or equivalently, the cutting-plane for that it determines.

Any algorithm for convex optimization that uses only the cutting-planes that

are determined by subgradients will also work for quasiconvex functions, if we sub-

stitute quasigradients for subgradients. It is not possible to form any deep-cut for

a quasiconvex function.

In the in nite-dimensional case, we will say that a linear functional

on is

qg

V

a quasigradient for the quasiconvex functional at

if

v

2

V

( )

( ) whenever

(

) 0

qg

z

v

z

;

v

:

As discussed above, this agrees with our de nition above for = n, provided we

V

R

do not distinguish between vectors and the associated inner product linear func-

tionals.

index-306_1.png

index-306_2.png

index-306_3.png

index-306_4.png

index-306_5.png

index-306_6.png

index-306_7.png

13.1 SUBGRADIENTS

297

13.1.3

Subgradients and Directional Derivatives

In this section we brie y discuss the directional derivative, a concept of di erential

calculus that is more familiar than the subgradient. We will not use this concept in

the optimization algorithms we present in the next chapter we mention it because

it is used in descent methods, the most common algorithms for optimization.

We de ne the directional derivative of at in the direction as

x

x

(

) = lim ( +

)

( )

x

h

x

;

x

0

x

x

h & 0

h

(the notation

0 means that converges to 0 from above). It can be shown

h

&

h

that for convex this limit always exists. Of course, if is di erentiable at , then

x

0

(

) = ( )T

x

x

r

x

x:

We say that is a descent direction for at if (

) 0.

0

x

x

x

x

<

The directional derivative tells us how changes if is moved slightly in the

x

direction , since for small ,

x

h

+

(

)

0

x

( ) +

x

x

x

h

x

h

:

k

xk

k

xk

The steepest descent direction of at is de ned as

x

= argmin 0(

)

x

x

x

:

sd

x

k

k=1

In general the directional derivatives, descent directions, and the steepest descent

direction of at can be described in terms of the subdi erential at (see the

x

x

Notes and References at the end of the chapter). In many cases it is considerably

more di cult to nd a descent direction or the steepest descent direction of at x

than a single subgradient of at .x

If is di erentiable at , and ( ) = 0, then

( ) is a descent direction

x

r

x

6

;r

x

for at . It is not true, however, that the negative of any nonzero subgradient

x

provides a descent direction: we can have

( ), = 0, but

not a descent

g

2

@

x

g

6

;g

direction for at . As an example, the level curves of a convex function are

x

shown in gure 13.4(a), together with a point and a nonzero subgradient . Note

x

g

that increases for any movement along the directions , so, in particular,

g

;g

is not a descent direction. Negatives of the subgradients at non-optimal points

are, however, descent directions for the distance to a (or any) minimizer, i.e., if

( ) =

and ( ) =

, then (

) 0 for any

( ). Thus,

0

x

z

kz

;

x

k

x

;g

<

g

2

@

x

moving slightly in the direction

will decrease the distance to (any) minimizer

;g

, as shown in gure 13.4(b).

x

A consequence of these properties is that the optimization algorithms described

in the next chapter do not necessarily generate sequences of decreasing functional

values (as would a descent method).

index-307_1.png

index-307_2.png

index-307_3.png

index-307_4.png

index-307_5.png

index-307_6.png

index-307_7.png

index-307_8.png

298