18.1
Quadratic Optimization:
The Positive Definite
Case
In this chapter, we consider two classes of quadratic optimization problems that appear
frequently in engineering and in computer science (especially in computer vision):
1. Minimizing
1
f (x) = x Ax + x b
2
over all x ∈ n
R , or subject to linear or affine constraints.
2. Minimizing
1
f (x) = x Ax + x b
2
over the unit sphere.
In both cases, A is a symmetric matrix. We also seek necessary and sufficient conditions for
f to have a global minimum.
Many problems in physics and engineering can be stated as the minimization of some
energy function, with or without constraints. Indeed, it is a fundamental principle of me-
chanics that nature acts so as to minimize energy. Furthermore, if a physical system is in a
stable state of equilibrium, then the energy in that state should be minimal. For example, a
small ball placed on top of a sphere is in an unstable equilibrium position. A small motion
causes the ball to roll down. On the other hand, a ball placed inside and at the bottom of a
sphere is in a stable equilibrium position, because the potential energy is minimal.
The simplest kind of energy function is a quadratic function. Such functions can be
conveniently defined in the form
P (x) = x Ax − x b,
459
460
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
where A is a symmetric n × n matrix, and x, b, are vectors in n
R , viewed as column vectors.
Actually, for reasons that will be clear shortly, it is preferable to put a factor 1 in front of
2
the quadratic term, so that
1
P (x) = x Ax − x b.
2
The question is, under what conditions (on A) does P (x) have a global minimum, prefer-
ably unique?
We give a complete answer to the above question in two stages:
1. In this section, we show that if A is symmetric positive definite, then P (x) has a unique
global minimum precisely when
Ax = b.
2. In Section 18.2, we give necessary and sufficient conditions in the general case, in terms
of the pseudo-inverse of A.
We begin with the matrix version of Definition 16.2.
Definition 18.1. A symmetric positive definite matrix is a matrix whose eigenvalues are
strictly positive, and a symmetric positive semidefinite matrix is a matrix whose eigenvalues
are nonnegative.
Equivalent criteria are given in the following proposition.
Proposition 18.1. Given any Euclidean space E of dimension n, the following properties
hold:
(1) Every self-adjoint linear map f : E → E is positive definite iff
x, f (x) > 0
for all x ∈ E with x = 0.
(2) Every self-adjoint linear map f : E → E is positive semidefinite iff
x, f (x) ≥ 0
for all x ∈ E.
Proof. (1) First, assume that f is positive definite. Recall that every self-adjoint linear map
has an orthonormal basis (e1, . . . , en) of eigenvectors, and let λ1, . . . , λn be the corresponding
eigenvalues. With respect to this basis, for every x = x1e1 + · · · + xnen = 0, we have
n
n
n
n
n
x, f (x) =
xiei, f
xiei
=
xiei,
λixiei =
λix2i,
i=1
i=1
i=1
i=1
i=1
18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE
461
which is strictly positive, since λi > 0 for i = 1, . . . , n, and x2i > 0 for some i, since x = 0.
Conversely, assume that
x, f (x) > 0
for all x = 0. Then for x = ei, we get
ei, f(ei) = ei, λiei = λi,
and thus λi > 0 for all i = 1, . . . , n.
(2) As in (1), we have
n
x, f (x) =
λix2i,
i=1
and since λi ≥ 0 for i = 1, . . . , n because f is positive semidefinite, we have x, f(x) ≥ 0, as
claimed. The converse is as in (1) except that we get only λi ≥ 0 since ei, f(ei) ≥ 0.
Some special notation is customary (especially in the field of convex optinization) to
express that a symmetric matrix is positive definite or positive semidefinite.
Definition 18.2. Given any n × n symmetric matrix A we write A
0 if A is positive
semidefinite and we write A
0 if A is positive definite.
It should be noted that we can define the relation
A
B
between any two n × n matrices (symmetric or not) iff A − B is symmetric positive semidef-
inite. It is easy to check that this relation is actually a partial order on matrices, called the
positive semidefinite cone ordering; for details, see Boyd and Vandenberghe [15], Section 2.4.
If A is symmetric positive definite, it is easily checked that A−1 is also symmetric positive
definite. Also, if C is a symmetric positive definite m × m matrix and A is an m × n matrix
of rank n (and so m ≥ n), then A CA is symmetric positive definite.
We can now prove that
1
P (x) = x Ax − x b
2
has a global minimum when A is symmetric positive definite.
Proposition 18.2. Given a quadratic function
1
P (x) = x Ax − x b,
2
if A is symmetric positive definite, then P (x) has a unique global minimum for the solution
of the linear system Ax = b. The minimum value of P (x) is
1
P (A−1b) = − b A−1b.
2
462
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
Proof. Since A is positive definite, it is invertible, since its eigenvalues are all strictly positive.
Let x = A−1b, and compute P (y) − P (x) for any y ∈ n
R . Since Ax = b, we get
1
1
P (y) − P (x) = y Ay − y b − x Ax + x b
2
2
1
1
= y Ay − y Ax + x Ax
2
2
1
= (y − x) A(y − x).
2
Since A is positive definite, the last expression is nonnegative, and thus
P (y) ≥ P (x)
for all y ∈
n
R , which proves that x = A−1b is a global minimum of P (x).
A simple
computation yields
1
P (A−1b) = − b A−1b.
2
Remarks:
(1) The quadratic function P (x) is also given by
1
P (x) = x Ax − b x,
2
but the definition using x b is more convenient for the proof of Proposition 18.2.
(2) If P (x) contains a constant term c ∈ R, so that
1
P (x) = x Ax − x b + c,
2
the proof of Proposition 18.2 still shows that P (x) has a unique global minimum for
x = A−1b, but the minimal value is
1
P (A−1b) = − b A−1b + c.
2
Thus, when the energy function P (x) of a system is given by a quadratic function
1
P (x) = x Ax − x b,
2
where A is symmetric positive definite, finding the global minimum of P (x) is equivalent to
solving the linear system Ax = b. Sometimes, it is useful to recast a linear problem Ax = b
18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE
463
as a variational problem (finding the minimum of some energy function). However, very
often, a minimization problem comes with extra constraints that must be satisfied for all
admissible solutions. For instance, we may want to minimize the quadratic function
1
Q(y1, y2) =
y2
2
1 + y2
2
subject to the constraint
2y1 − y2 = 5.
The solution for which Q(y1, y2) is minimum is no longer (y1, y2) = (0, 0), but instead,
(y1, y2) = (2, −1), as will be shown later.
Geometrically, the graph of the function defined by z = Q(y
3
1, y2) in R
is a paraboloid
of revolution P with axis of revolution Oz. The constraint
2y1 − y2 = 5
corresponds to the vertical plane H parallel to the z-axis and containing the line of equation
2y1 −y2 = 5 in the xy-plane. Thus, the constrained minimum of Q is located on the parabola
that is the intersection of the paraboloid P with the plane H.
A nice way to solve constrained minimization problems of the above kind is to use the
method of Lagrange multipliers. But first, let us define precisely what kind of minimization
problems we intend to solve.
Definition 18.3. The quadratic constrained minimization problem consists in minimizing a
quadratic function
1
Q(y) = y C−1y − b y
2
subject to the linear constraints
A y = f,
where C−1 is an m × m symmetric positive definite matrix, A is an m × n matrix of rank n
(so that m ≥ n), and where b, y ∈ m
n
R
(viewed as column vectors), and f ∈ R (viewed as a
column vector).
The reason for using C−1 instead of C is that the constrained minimization problem has
an interpretation as a set of equilibrium equations in which the matrix that arises naturally
is C (see Strang [100]). Since C and C−1 are both symmetric positive definite, this doesn’t
make any difference, but it seems preferable to stick to Strang’s notation.
The method of Lagrange consists in incorporating the n constraints A y = f into the
quadratic function Q(y), by introducing extra variables λ = (λ1, . . . , λn) called Lagrange
multipliers, one for each constraint. We form the Lagrangian
1
L(y, λ) = Q(y) + λ (A y − f) = y C−1y − (b − Aλ) y − λ f.
2
464
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
We shall prove that our constrained minimization problem has a unique solution given
by the system of linear equations
C−1y + Aλ = b,
A y = f,
which can be written in matrix form as
C−1 A
y
b
=
.
A
0
λ
f
Note that the matrix of this system is symmetric. Eliminating y from the first equation
C−1y + Aλ = b,
we get
y = C(b − Aλ),
and substituting into the second equation, we get
A C(b − Aλ) = f,
that is,
A CAλ = A Cb − f.
However, by a previous remark, since C is symmetric positive definite and the columns of
A are linearly independent, A CA is symmetric positive definite, and thus invertible. Note
that this way of solving the system requires solving for the Lagrange multipliers first.
Letting e = b − Aλ, we also note that the system
C−1 A
y
b
=
A
0
λ
f
is equivalent to the system
e = b − Aλ,
y = Ce,
A y = f.
The latter system is called the equilibrium equations by Strang [100]. Indeed, Strang shows
that the equilibrium equations of many physical systems can be put in the above form.
This includes spring-mass systems, electrical networks, and trusses, which are structures
built from elastic bars. In each case, y, e, b, C, λ, f , and K = A CA have a physical
18.1. QUADRATIC OPTIMIZATION: THE POSITIVE DEFINITE CASE
465
interpretation. The matrix K = A CA is usually called the stiffness matrix . Again, the
reader is referred to Strang [100].
In order to prove that our constrained minimization problem has a unique solution, we
proceed to prove that the constrained minimization of Q(y) subject to A y = f is equivalent
to the unconstrained maximization of another function −P (λ). We get P (λ) by minimizing
the Lagrangian L(y, λ) treated as a function of y alone. Since C−1 is symmetric positive
definite and
1
L(y, λ) = y C−1y − (b − Aλ) y − λ f,
2
by Proposition 18.2 the global minimum (with respect to y) of L(y, λ) is obtained for the
solution y of
C−1y = b − Aλ,
that is, when
y = C(b − Aλ),
and the minimum of L(y, λ) is
1
min L(y, λ) = − (Aλ − b) C(Aλ − b) − λ f.
y
2
Letting
1
P (λ) = (Aλ − b) C(Aλ − b) + λ f,
2
we claim that the solution of the constrained minimization of Q(y) subject to A y = f
is equivalent to the unconstrained maximization of −P (λ). Of course, since we minimized
L(y, λ) with respect to y, we have
L(y, λ) ≥ −P (λ)
for all y and all λ. However, when the constraint A y = f holds, L(y, λ) = Q(y), and thus
for any admissible y, which means that A y = f , we have
min Q(y) ≥ max −P (λ).
y
λ
In order to prove that the unique minimum of the constrained problem Q(y) subject to
A y = f is the unique maximum of −P (λ), we compute Q(y) + P (λ).
Proposition 18.3. The quadratic constrained minimization problem of Definition 18.3 has
a unique solution (y, λ) given by the system
C−1 A
y
b
=
.
A
0
λ
f
Furthermore, the component λ of the above solution is the unique value for which −P (λ) is
maximum.
466
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
Proof. As we suggested earlier, let us compute Q(y) + P (λ), assuming that the constraint
A y = f holds. Eliminating f , since b y = y b and λ A y = y Aλ, we get
1
1
Q(y) + P (λ) = y C−1y − b y + (Aλ − b) C(Aλ − b) + λ f
2
2
1
= (C−1y + Aλ − b) C(C−1y + Aλ − b).
2
Since C is positive definite, the last expression is nonnegative. In fact, it is null iff
C−1y + Aλ − b = 0,
that is,
C−1y + Aλ = b.
But then the unique constrained minimum of Q(y) subject to A y = f is equal to the
unique maximum of −P (λ) exactly when A y = f and C−1y + Aλ = b, which proves the
proposition.
Remarks:
(1) There is a form of duality going on in this situation. The constrained minimization
of Q(y) subject to A y = f is called the primal problem, and the unconstrained
maximization of −P (λ) is called the dual problem. Duality is the fact stated slightly
loosely as
min Q(y) = max −P (λ).
y
λ
Recalling that e = b − Aλ, since
1
P (λ) = (Aλ − b) C(Aλ − b) + λ f,
2
we can also write
1
P (λ) = e Ce + λ f.
2
This expression often represents the total potential energy of a system. Again, the
optimal solution is the one that minimizes the potential energy (and thus maximizes
−P (λ)).
(2) It is immediately verified that the equations of Proposition 18.3 are equivalent to the
equations stating that the partial derivatives of the Lagrangian L(y, λ) are null:
∂L = 0, i = 1,...,m,
∂yi
∂L = 0, j = 1,...,n.
∂λj
18.2. QUADRATIC OPTIMIZATION: THE GENERAL CASE
467
Thus, the constrained minimum of Q(y) subject to A y = f is an extremum of the
Lagrangian L(y, λ). As we showed in Proposition 18.3, this extremum corresponds
to simultaneously minimizing L(y, λ) with respect to y and maximizing L(y, λ) with
respect to λ. Geometrically, such a point is a saddle point for L(y, λ).
(3) The Lagrange multipliers sometimes have a natural physical meaning. For example, in
the spring-mass system they correspond to node displacements. In some general sense,
Lagrange multipliers are correction terms needed to satisfy equilibrium equations and
the price paid for the constraints. For more details, see Strang [100].
Going back to the constrained minimization of Q(y1, y2) = 1(y2
2
1 + y2
2 ) subject to
2y1 − y2 = 5,
the Lagrangian is
1
L(y1, y2, λ) =
y2
2
1 + y2
2
+ λ(2y1 − y2 − 5),
and the equations stating that the Lagrangian has a saddle point are
y1 + 2λ = 0,
y2 − λ = 0,
2y1 − y2 − 5 = 0.
We obtain the solution (y1, y2, λ) = (2, −1, −1).
Much more should be said about the use of Lagrange multipliers in optimization or
variational problems. This is a vast topic. Least squares methods and Lagrange multipliers
are used to tackle many problems in computer graphics and computer vision; see Trucco
and Verri [107], Metaxas [76], Jain, Katsuri, and Schunck [58], Faugeras [35], and Foley, van
Dam, Feiner, and Hughes [36]. For a lucid introduction to optimization methods, see Ciarlet
[22].
18.2
Quadratic Optimization: The General Case
In this section, we complete the study initiated in Section 18.1 and give necessary and
sufficient conditions for the quadratic function 1x Ax + x b to have a global minimum. We
2
begin with the following simple fact:
Proposition 18.4. If A is an invertible symmetric matrix, then the function
1
f (x) = x Ax + x b
2
has a minimum value iff A
0, in which case this optimal value is obtained for a unique
value of x, namely x∗ = −A−1b, and with
1
f (A−1b) = − b A−1b.
2
468
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
Proof. Observe that
1
1
1
(x + A−1b) A(x + A−1b) = x Ax + x b + b A−1b.
2
2
2
Thus,
1
1
1
f (x) = x Ax + x b = (x + A−1b) A(x + A−1b) − b A−1b.
2
2
2
If A has some negative eigenvalue, say −λ (with λ > 0), if we pick any eigenvector u of A
associated with λ, then for any α ∈ R with α = 0, if we let x = αu − A−1b, then since
Au = −λu, we get
1
1
f (x) = (x + A−1b) A(x + A−1b) − b A−1b
2
2
1
1
= αu Aαu − b A−1b
2
2
1
1
= − α2λ u 2
b A−1b,
2
2 − 2
and since α can be made as large as we want and λ > 0, we see that f has no minimum.
Consequently, in order for f to have a minimum, we must have A
0. In this case, since
(x + A−1b) A(x + A−1b) ≥ 0, it is clear that the minimum value of f is achieved when
x + A−1b = 0, that is, x = −A−1b.
Let us now consider the case of an arbitrary symmetric matrix A.
Proposition 18.5. If A is a symmetric matrix, then the function
1
f (x) = x Ax + x b
2
has a minimum value iff A
0 and (I − AA+)b = 0, in which case this minimum value is
1
p∗ = − b A+b.
2
Furthermore, if A = U ΣU is an SVD of A, then the optimal value is achieved by all x ∈ n
R
of the form
0
x = −A+b + U
,
z
for any z ∈ n−r
R
, where r is the rank of A.
Proof. The case that A is invertible is taken care of by Proposition 18.4, so we may assume
that A is singular. If A has rank r < n, then we can diagonalize A as
Σ
A = U
r
0 U,
0
0
18.2. QUADRATIC OPTIMIZATION: THE GENERAL CASE
469
where U is an orthogonal matrix and where Σr is an r × r diagonal invertible matrix. Then
we have
1
Σ
f (x) = x U
r
0 Ux + x U Ub
2
0
0
1
Σ
= (U x)
r
0 Ux + (Ux) Ub.
2
0
0
If we write
y
c
U x =
and U b =
,
z
d
with y, c ∈ r
n−r
R and z, d ∈ R
, we get
1
Σ
f (x) = (U x)
r
0 Ux + (Ux) Ub
2
0
0
1
Σ
y
c
= (y , z )
r
0
+ (y , z )
2
0
0
z
d
1
= y Σ
2
ry + y c + z d.
For y = 0, we get
f (x) = z d,
so if d = 0, the function f has no minimum. Therefore, if f has a minimum, then d = 0.
However, d = 0 means that
c
U b =
,
0
and we know from Section 17.1 that b is in the range of A (here, U is U ), which is equivalent
to (I − AA+)b = 0. If d = 0, then
1
f (x) = y Σ
2
ry + y c,
and since Σr is invertible, by Proposition 18.4, the function f has a minimum iff Σr
0,
which is equivalent to A
0.
Therefore, we have proved that if f has a minimum, then (I − AA+)b = 0 and A
0.
Conversely, if (I − AA+)b = 0 and A
0, what we just did proves that f does have a
minimum.
When the above conditions hold, the minimum is achieved if y = −Σ−1
r c, z = 0 and
d = 0, that is, for x∗ given by
−Σ−1
c
U x∗ =
r c
and U b =
,
0
0
470
CHAPTER 18. QUADRATIC OPTIMIZATION PROBLEMS
from which we deduce that
Σ−1
Σ−1
c
Σ−1
x∗ = −U
r c
= −U
r c
0
= −U
r c
0 Ub = −A+b
0
0
0
0
0
0
and the minimum value of f is
1
f (x∗) = − b A+b.
2
For any x ∈ n
R of the form
0
x = −A+b + U
,
z
for any z ∈ n−r
R
, our previous calculations show that f (x) = −1b A+b.
2