Basics of Algebra, Topology, and Differential Calculus by Jean Gallier - HTML preview

/ Home / Mathematics (Academic) / Basics of Algebra, Topology, and Differential Calculus

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 18

Quadratic Optimization Problems

18.1

Quadratic Optimization:

The Positive Definite

Case

In this chapter, we consider two classes of quadratic optimization problems that appear

frequently in engineering and in computer science (especially in computer vision):

1. Minimizing

f (x) = x Ax + x b

over all x ∈ n

R , or subject to linear or affine constraints.

2. Minimizing

f (x) = x Ax + x b

over the unit sphere.

In both cases, A is a symmetric matrix. We also seek necessary and sufficient conditions for

f to have a global minimum.

Many problems in physics and engineering can be stated as the minimization of some

energy function, with or without constraints. Indeed, it is a fundamental principle of me-

chanics that nature acts so as to minimize energy. Furthermore, if a physical system is in a

stable state of equilibrium, then the energy in that state should be minimal. For example, a

small ball placed on top of a sphere is in an unstable equilibrium position. A small motion

causes the ball to roll down. On the other hand, a ball placed inside and at the bottom of a

sphere is in a stable equilibrium position, because the potential energy is minimal.

The simplest kind of energy function is a quadratic function. Such functions can be

conveniently defined in the form

P (x) = x Ax − x b,

459

460

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

where A is a symmetric n × n matrix, and x, b, are vectors in n

R , viewed as column vectors.

Actually, for reasons that will be clear shortly, it is preferable to put a factor 1 in front of

the quadratic term, so that

P (x) = x Ax − x b.

The question is, under what conditions (on A) does P (x) have a global minimum, prefer-

ably unique?

We give a complete answer to the above question in two stages:

1. In this section, we show that if A is symmetric positive definite, then P (x) has a unique

global minimum precisely when

Ax = b.

2. In Section 18.2, we give necessary and sufficient conditions in the general case, in terms

of the pseudo-inverse of A.

We begin with the matrix version of Definition 16.2.

Definition 18.1. A symmetric positive definite matrix is a matrix whose eigenvalues are

strictly positive, and a symmetric positive semidefinite matrix is a matrix whose eigenvalues

are nonnegative.

Equivalent criteria are given in the following proposition.

Proposition 18.1. Given any Euclidean space E of dimension n, the following properties

hold:

(1) Every self-adjoint linear map f : E → E is positive definite iff

x, f (x) > 0

for all x ∈ E with x = 0.

(2) Every self-adjoint linear map f : E → E is positive semidefinite iff

x, f (x) ≥ 0

for all x ∈ E.

Proof. (1) First, assume that f is positive definite. Recall that every self-adjoint linear map

has an orthonormal basis (e1, . . . , en) of eigenvectors, and let λ1, . . . , λn be the corresponding

eigenvalues. With respect to this basis, for every x = x1e1 + · · · + xnen = 0, we have

x, f (x) =

xiei, f

xiei

xiei,

λixiei =

λix2i,

i=1

18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE

461

which is strictly positive, since λi > 0 for i = 1, . . . , n, and x2i > 0 for some i, since x = 0.

Conversely, assume that

x, f (x) > 0

for all x = 0. Then for x = ei, we get

ei, f(ei) = ei, λiei = λi,

and thus λi > 0 for all i = 1, . . . , n.

(2) As in (1), we have

x, f (x) =

λix2i,

i=1

and since λi ≥ 0 for i = 1, . . . , n because f is positive semidefinite, we have x, f(x) ≥ 0, as

claimed. The converse is as in (1) except that we get only λi ≥ 0 since ei, f(ei) ≥ 0.

Some special notation is customary (especially in the field of convex optinization) to

express that a symmetric matrix is positive definite or positive semidefinite.

Definition 18.2. Given any n × n symmetric matrix A we write A

0 if A is positive

semidefinite and we write A

0 if A is positive definite.

It should be noted that we can define the relation

between any two n × n matrices (symmetric or not) iff A − B is symmetric positive semidef-

inite. It is easy to check that this relation is actually a partial order on matrices, called the

positive semidefinite cone ordering; for details, see Boyd and Vandenberghe [15], Section 2.4.

If A is symmetric positive definite, it is easily checked that A−1 is also symmetric positive

definite. Also, if C is a symmetric positive definite m × m matrix and A is an m × n matrix

of rank n (and so m ≥ n), then A CA is symmetric positive definite.

We can now prove that

P (x) = x Ax − x b

has a global minimum when A is symmetric positive definite.

Proposition 18.2. Given a quadratic function

P (x) = x Ax − x b,

if A is symmetric positive definite, then P (x) has a unique global minimum for the solution

of the linear system Ax = b. The minimum value of P (x) is

P (A−1b) = − b A−1b.

462

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

Proof. Since A is positive definite, it is invertible, since its eigenvalues are all strictly positive.

Let x = A−1b, and compute P (y) − P (x) for any y ∈ n

R . Since Ax = b, we get

P (y) − P (x) = y Ay − y b − x Ax + x b

= y Ay − y Ax + x Ax

= (y − x) A(y − x).

Since A is positive definite, the last expression is nonnegative, and thus

P (y) ≥ P (x)

for all y ∈

R , which proves that x = A−1b is a global minimum of P (x).

A simple

computation yields

P (A−1b) = − b A−1b.

Remarks:

(1) The quadratic function P (x) is also given by

P (x) = x Ax − b x,

but the definition using x b is more convenient for the proof of Proposition 18.2.

(2) If P (x) contains a constant term c ∈ R, so that

P (x) = x Ax − x b + c,

the proof of Proposition 18.2 still shows that P (x) has a unique global minimum for

x = A−1b, but the minimal value is

P (A−1b) = − b A−1b + c.

Thus, when the energy function P (x) of a system is given by a quadratic function

P (x) = x Ax − x b,

where A is symmetric positive definite, finding the global minimum of P (x) is equivalent to

solving the linear system Ax = b. Sometimes, it is useful to recast a linear problem Ax = b

18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE

463

as a variational problem (finding the minimum of some energy function). However, very

often, a minimization problem comes with extra constraints that must be satisfied for all

admissible solutions. For instance, we may want to minimize the quadratic function

Q(y1, y2) =

1 + y2

subject to the constraint

2y1 − y2 = 5.

The solution for which Q(y1, y2) is minimum is no longer (y1, y2) = (0, 0), but instead,

(y1, y2) = (2, −1), as will be shown later.

Geometrically, the graph of the function defined by z = Q(y

1, y2) in R

is a paraboloid

of revolution P with axis of revolution Oz. The constraint

2y1 − y2 = 5

corresponds to the vertical plane H parallel to the z-axis and containing the line of equation

2y1 −y2 = 5 in the xy-plane. Thus, the constrained minimum of Q is located on the parabola

that is the intersection of the paraboloid P with the plane H.

A nice way to solve constrained minimization problems of the above kind is to use the

method of Lagrange multipliers. But first, let us define precisely what kind of minimization

problems we intend to solve.

Definition 18.3. The quadratic constrained minimization problem consists in minimizing a

quadratic function

Q(y) = y C−1y − b y

subject to the linear constraints

A y = f,

where C−1 is an m × m symmetric positive definite matrix, A is an m × n matrix of rank n

(so that m ≥ n), and where b, y ∈ m

(viewed as column vectors), and f ∈ R (viewed as a

column vector).

The reason for using C−1 instead of C is that the constrained minimization problem has

an interpretation as a set of equilibrium equations in which the matrix that arises naturally

is C (see Strang [100]). Since C and C−1 are both symmetric positive definite, this doesn’t

make any difference, but it seems preferable to stick to Strang’s notation.

The method of Lagrange consists in incorporating the n constraints A y = f into the

quadratic function Q(y), by introducing extra variables λ = (λ1, . . . , λn) called Lagrange

multipliers, one for each constraint. We form the Lagrangian

L(y, λ) = Q(y) + λ (A y − f) = y C−1y − (b − Aλ) y − λ f.

464

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

We shall prove that our constrained minimization problem has a unique solution given

by the system of linear equations

C−1y + Aλ = b,

A y = f,

which can be written in matrix form as

C−1 A

Note that the matrix of this system is symmetric. Eliminating y from the first equation

C−1y + Aλ = b,

we get

y = C(b − Aλ),

and substituting into the second equation, we get

A C(b − Aλ) = f,

that is,

A CAλ = A Cb − f.

However, by a previous remark, since C is symmetric positive definite and the columns of

A are linearly independent, A CA is symmetric positive definite, and thus invertible. Note

that this way of solving the system requires solving for the Lagrange multipliers first.

Letting e = b − Aλ, we also note that the system

C−1 A

is equivalent to the system

e = b − Aλ,

y = Ce,

A y = f.

The latter system is called the equilibrium equations by Strang [100]. Indeed, Strang shows

that the equilibrium equations of many physical systems can be put in the above form.

This includes spring-mass systems, electrical networks, and trusses, which are structures

built from elastic bars. In each case, y, e, b, C, λ, f , and K = A CA have a physical

18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE

465

interpretation. The matrix K = A CA is usually called the stiffness matrix . Again, the

reader is referred to Strang [100].

In order to prove that our constrained minimization problem has a unique solution, we

proceed to prove that the constrained minimization of Q(y) subject to A y = f is equivalent

to the unconstrained maximization of another function −P (λ). We get P (λ) by minimizing

the Lagrangian L(y, λ) treated as a function of y alone. Since C−1 is symmetric positive

definite and

L(y, λ) = y C−1y − (b − Aλ) y − λ f,

by Proposition 18.2 the global minimum (with respect to y) of L(y, λ) is obtained for the

solution y of

C−1y = b − Aλ,

that is, when

y = C(b − Aλ),

and the minimum of L(y, λ) is

min L(y, λ) = − (Aλ − b) C(Aλ − b) − λ f.

Letting

P (λ) = (Aλ − b) C(Aλ − b) + λ f,

we claim that the solution of the constrained minimization of Q(y) subject to A y = f

is equivalent to the unconstrained maximization of −P (λ). Of course, since we minimized

L(y, λ) with respect to y, we have

L(y, λ) ≥ −P (λ)

for all y and all λ. However, when the constraint A y = f holds, L(y, λ) = Q(y), and thus

for any admissible y, which means that A y = f , we have

min Q(y) ≥ max −P (λ).

In order to prove that the unique minimum of the constrained problem Q(y) subject to

A y = f is the unique maximum of −P (λ), we compute Q(y) + P (λ).

Proposition 18.3. The quadratic constrained minimization problem of Definition 18.3 has

a unique solution (y, λ) given by the system

C−1 A

Furthermore, the component λ of the above solution is the unique value for which −P (λ) is

maximum.

466

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

Proof. As we suggested earlier, let us compute Q(y) + P (λ), assuming that the constraint

A y = f holds. Eliminating f , since b y = y b and λ A y = y Aλ, we get

Q(y) + P (λ) = y C−1y − b y + (Aλ − b) C(Aλ − b) + λ f

= (C−1y + Aλ − b) C(C−1y + Aλ − b).

Since C is positive definite, the last expression is nonnegative. In fact, it is null iff

C−1y + Aλ − b = 0,

that is,

C−1y + Aλ = b.

But then the unique constrained minimum of Q(y) subject to A y = f is equal to the

unique maximum of −P (λ) exactly when A y = f and C−1y + Aλ = b, which proves the

proposition.

Remarks:

(1) There is a form of duality going on in this situation. The constrained minimization

of Q(y) subject to A y = f is called the primal problem, and the unconstrained

maximization of −P (λ) is called the dual problem. Duality is the fact stated slightly

loosely as

min Q(y) = max −P (λ).

Recalling that e = b − Aλ, since

P (λ) = (Aλ − b) C(Aλ − b) + λ f,

we can also write

P (λ) = e Ce + λ f.

This expression often represents the total potential energy of a system. Again, the

optimal solution is the one that minimizes the potential energy (and thus maximizes

−P (λ)).

(2) It is immediately verified that the equations of Proposition 18.3 are equivalent to the

equations stating that the partial derivatives of the Lagrangian L(y, λ) are null:

∂L = 0, i = 1,...,m,

∂yi

∂L = 0, j = 1,...,n.

∂λj

18.2. QUADRATIC OPTIMIZATION: THE GENERAL CASE

467

Thus, the constrained minimum of Q(y) subject to A y = f is an extremum of the

Lagrangian L(y, λ). As we showed in Proposition 18.3, this extremum corresponds

to simultaneously minimizing L(y, λ) with respect to y and maximizing L(y, λ) with

respect to λ. Geometrically, such a point is a saddle point for L(y, λ).

(3) The Lagrange multipliers sometimes have a natural physical meaning. For example, in

the spring-mass system they correspond to node displacements. In some general sense,

Lagrange multipliers are correction terms needed to satisfy equilibrium equations and

the price paid for the constraints. For more details, see Strang [100].

Going back to the constrained minimization of Q(y1, y2) = 1(y2

1 + y2

2 ) subject to

2y1 − y2 = 5,

the Lagrangian is

L(y1, y2, λ) =

1 + y2

+ λ(2y1 − y2 − 5),

and the equations stating that the Lagrangian has a saddle point are

y1 + 2λ = 0,

y2 − λ = 0,

2y1 − y2 − 5 = 0.

We obtain the solution (y1, y2, λ) = (2, −1, −1).

Much more should be said about the use of Lagrange multipliers in optimization or

variational problems. This is a vast topic. Least squares methods and Lagrange multipliers

are used to tackle many problems in computer graphics and computer vision; see Trucco

and Verri [107], Metaxas [76], Jain, Katsuri, and Schunck [58], Faugeras [35], and Foley, van

Dam, Feiner, and Hughes [36]. For a lucid introduction to optimization methods, see Ciarlet

[22].

18.2

Quadratic Optimization: The General Case

In this section, we complete the study initiated in Section 18.1 and give necessary and

sufficient conditions for the quadratic function 1x Ax + x b to have a global minimum. We

begin with the following simple fact:

Proposition 18.4. If A is an invertible symmetric matrix, then the function

f (x) = x Ax + x b

has a minimum value iff A

0, in which case this optimal value is obtained for a unique

value of x, namely x∗ = −A−1b, and with

f (A−1b) = − b A−1b.

468

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

Proof. Observe that

(x + A−1b) A(x + A−1b) = x Ax + x b + b A−1b.

Thus,

f (x) = x Ax + x b = (x + A−1b) A(x + A−1b) − b A−1b.

If A has some negative eigenvalue, say −λ (with λ > 0), if we pick any eigenvector u of A

associated with λ, then for any α ∈ R with α = 0, if we let x = αu − A−1b, then since

Au = −λu, we get

f (x) = (x + A−1b) A(x + A−1b) − b A−1b

= αu Aαu − b A−1b

= − α2λ u 2

b A−1b,

2 − 2

and since α can be made as large as we want and λ > 0, we see that f has no minimum.

Consequently, in order for f to have a minimum, we must have A

0. In this case, since

(x + A−1b) A(x + A−1b) ≥ 0, it is clear that the minimum value of f is achieved when

x + A−1b = 0, that is, x = −A−1b.

Let us now consider the case of an arbitrary symmetric matrix A.

Proposition 18.5. If A is a symmetric matrix, then the function

f (x) = x Ax + x b

has a minimum value iff A

0 and (I − AA+)b = 0, in which case this minimum value is

p∗ = − b A+b.

Furthermore, if A = U ΣU is an SVD of A, then the optimal value is achieved by all x ∈ n

of the form

x = −A+b + U

for any z ∈ n−r

, where r is the rank of A.

Proof. The case that A is invertible is taken care of by Proposition 18.4, so we may assume

that A is singular. If A has rank r < n, then we can diagonalize A as

A = U

0 U,

18.2. QUADRATIC OPTIMIZATION: THE GENERAL CASE

469

where U is an orthogonal matrix and where Σr is an r × r diagonal invertible matrix. Then

we have

f (x) = x U

0 Ux + x U Ub

= (U x)

0 Ux + (Ux) Ub.

If we write

U x =

and U b =

with y, c ∈ r

n−r

R and z, d ∈ R

, we get

f (x) = (U x)

0 Ux + (Ux) Ub

= (y , z )

+ (y , z )

= y Σ

ry + y c + z d.

For y = 0, we get

f (x) = z d,

so if d = 0, the function f has no minimum. Therefore, if f has a minimum, then d = 0.

However, d = 0 means that

U b =

and we know from Section 17.1 that b is in the range of A (here, U is U ), which is equivalent

to (I − AA+)b = 0. If d = 0, then

f (x) = y Σ

ry + y c,

and since Σr is invertible, by Proposition 18.4, the function f has a minimum iff Σr

which is equivalent to A

Therefore, we have proved that if f has a minimum, then (I − AA+)b = 0 and A

Conversely, if (I − AA+)b = 0 and A

0, what we just did proves that f does have a

minimum.

When the above conditions hold, the minimum is achieved if y = −Σ−1

r c, z = 0 and

d = 0, that is, for x∗ given by

−Σ−1

U x∗ =

r c

and U b =

470

CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS

from which we deduce that

Σ−1

x∗ = −U

r c

= −U

r c

= −U

r c

0 Ub = −A+b

and the minimum value of f is

f (x∗) = − b A+b.

For any x ∈ n

R of the form

x = −A+b + U

for any z ∈ n−r

, our previous calculations show that f (x) = −1b A+b.