A( s, t) ( B( s, t) f )
and ¯
B( s, t) ( A( s, t) f ) are defined, we have
(i) ¯
A( s, t)( B( s, t) f ) = ¯
B( s, t)( A( s, t) f ) + α¯ n ( s) F ( a, b) α¯ n( s) f (A.13) where ¯
n = max {n, m} − 1 and F ( a, b) satisfies F ( a, b) ≤ c 1 | ˙ a| + c 2 |˙ b|
for some constants c 1 , c 2
1
1
(ii) ¯
A( s, t) B( s, t)
f =
A( s, t) · B( s, t) f + α
Λ
n ( s) G( s, f, a, b)
0( s)
Λ0( s)
(A.14)
for any Hurwitz polynomial Λ0( s) of order greater or equal to m, where
G( s, f, a, b) is defined as
m
G( s, f, a, b) = [ g
˙
n, . . . , g 1 , g 0] , gj = −
Wjc( s) ( Wjb( s) f)( ˙ aibj + aibj)
j=0
and Wjc, Wjb are strictly proper transfer functions that have the same
poles as
1
.
Λ0( s)
Proof (i) From the definition of ¯
A( s, t) , A( s, t) , ¯
B( s, t) , B( s, t), we have
n
m
¯
A( s, t)( B( s, t) f ) =
si( aibjsjf)
(A.15)
i=0 j=0
A. SWAPPING LEMMAS
781
n
m
¯
B( s, t)( A( s, t) f ) =
sj( aibjsif)
(A.16)
i=0 j=0
Now we consider si( aibjsjf) and treat the three cases i > j, i < j and i = j separately: First, we suppose i > j and write
si( aibjsjf) = sj si−j( aibjsjf)
Using the identity s( a
˙
ibj slf ) = aibj sl+1 f + ( ˙
aibj + aibj) slf, where l is any integer,
for i − j times, we can swap the operator si−j with aibj and obtain
i−j
si−j( a
˙
ibj sj f ) = aibj sif +
si−j−k( ˙ aibj + aibj) sj− 1+ kf
k=1
and, therefore,
i−j
si( a
˙
ibj sj f ) = sj ( aibj sif ) +
si−k( ˙ aibj + aibj) sj− 1+ kf, i > j
(A.17)
k=1
Similarly, for i < j, we write
si( aibjsjf) = si(( aibjsj−i) sif)
Now using a
˙
ibj slf = s( aibj ) sl− 1 f − ( ˙
aibj + aibj) sl− 1 f, where l is any integer, for
j − i times, we have
j−i
( a
˙
ibj sj−i) sif = sj−i( aibj ) sif −
sj−i−k( ˙ aibj + aibj) si− 1+ kf
k=1
and, therefore,
j−i
si( a
˙
ibj sj f ) = sj ( aibj sif ) −
sj−k( ˙ aibj + aibj) si− 1+ kf, i < j
(A.18)
k=1
For i = j, it is obvious that
si( aibjsjf) = sj( aibjsif) , i = j
(A.19)
Combining (A.17), (A.18) and (A.19), we have
n
m
¯
A( s, t)( B( s, t) f ) =
sj( aibjsif) + r 1 = ¯
B( s, t)( A( s, t) f ) + r 1
(A.20)
i=0 j=0
782
CHAPTER 0. APPENDIX
where
n
m i−j
n
m j−i
r
˙
˙
1 =
si−k( ˙ aibj + aibj) sj− 1+ kf −
sj−k( ˙ aibj + aibj) si− 1+ kf
i=0 j=0 k=1
i=0 j=0 k=1
j<i
j>i
Note that for 0 ≤ i ≤ n, 0 ≤ j ≤ m and 1 ≤ k ≤ |i − j|, we have
j ≤ i − k, j + k − 1 ≤ i − 1 ,
if i ≥ j
i ≤ j − k, i + k − 1 ≤ j − 1 ,
if i < j
Therefore all the s-terms in r
˙
1 ( before and after the term ˙
aibj + aibj) have order
less than max {n, m} − 1, and r 1 can be expressed as
r 1 = α¯ n ( s) F ( a, b) α¯ n( s) f
where ¯
n = max {n, m} − 1 and F ( a, b) ∈ R¯ nׯ n is a time-varying matrix whose elements are linear combinations of ˙ a
˙
ibj + aibj . Because a, b ∈ L∞, it follows from
the definition of F ( a, b) that
F ( a, b) ≤ c 1 | ˙ a| + c 2 |˙ b|
(ii) Applying Lemma A.1 with W ( s) = sj , we have
Λ0( s)
sj
sj
a
˙
ibj
f =
a
b
Λ
ibj f − Wjc(( Wjbf )( ˙
aibj + ai j))
0( s)
Λ0( s)
where Wjc( s) , Wjb( s) are strictly proper transfer functions, which have the same
poles as
1
. Therefore
Λ0( s)
n
m
¯
1
sj
A( s, t) B( s, t)
f
=
si a
f
Λ
ibj
0( s)
Λ
i=0 j=0
0( s)
n
m
si+ j
n
m
=
( a
siW
˙ b
Λ
ibj f ) −
jc(( Wjbf )( ˙
aibj + ai j))
i=0 j=0
0( s)
i=0 j=0
1
=
A( s, t) · B( s, t) f + r
Λ
2
(A.21)
0( s)
where
n
m
r
˙
2 = −
siWjc(( Wjbf)( ˙ aibj + aibj))
i=0 j=0
From the definition of r 2, we can write
r 2 = αn ( s) G( s, f, a, b)
B. OPTIMIZATION TECHNIQUES
783
by defining
m
G( s, f, a, b) = [ g
˙
n, , . . . , g 1 , g 0] , gj = −
Wjc( s) ( Wjbf)( ˙ aibj + aibj)
j=0
✷
B
Optimization Techniques
An important part of every adaptive control scheme is the on-line estimator
or adaptive law used to provide an estimate of the plant or controller param-
eters at each time t. Most of these adaptive laws are derived by minimizing
certain cost functions with respect to the estimated parameters. The type of
the cost function and method of minimization determines the properties of
the resulting adaptive law as well as the overall performance of the adaptive
scheme.
In this section we introduce some simple optimization techniques that
include the method of steepest descent, referred to as the gradient method,
Newton’s method and the gradient projection method for constrained mini-
mization problems.
B.1
Notation and Mathematical Background
A real-valued function f : Rn → R is said to be continuously differentiable
if the partial derivatives ∂f( x) , · · · , ∂f( x) exist for each x ∈ Rn and are con-
∂x 1
∂xn
tinuous functions of x. In this case, we write f ∈ C 1. More generally, we
write f ∈ Cm if all partial derivatives of order m exist and are continuous
functions of x.
If f ∈ C 1, the gradient of f at a point x ∈ Rn is defined to be the
column vector
∂f ( x)
∂x 1
∇f ( x) =
.
..
∂f ( x)
∂xn
If f ∈ C 2, the Hessian of f at x is defined to be the symmetric n × n matrix
784
CHAPTER 0. APPENDIX
having ∂ 2 f ( x) /∂xi∂xj as the ij th element, i.e.,
∂ 2 f ( x)
∇ 2 f ( x) = ∂xi∂yj n×n
A subset S of Rn is said to be convex if for every x, y ∈ S and α ∈ [0 , 1],
we have αx + (1 − α) y ∈ S.
A function f : S → R is said to be convex over the convex set S if for
every x, y ∈ S and α ∈ [0 , 1] we have
f ( αx + (1 − α) y) ≤ αf ( x) + (1 − α) f ( y)
Let f ∈ C 1 over an open convex set S, then f is convex over S iff
f ( y) ≥ f ( x) + ( ∇f ( x)) ( y − x) ,
∀x, y ∈ S
(B.1)
If f ∈ C 2 over S and ∇ 2 f ( x) ≥ 0 ∀x ∈ S, then f is convex over S.
Let us now consider the following unconstrained minimization problem
minimize J( θ)
subject to θ ∈ Rn
(B.2)
where J : Rn → R is a given function. We say that the vector θ∗ is a global
minimum for (B.2) if
J( θ∗) ≤ J( θ)
∀θ ∈ Rn
A necessary and sufficient condition satisfied by the global minimum θ∗ is
given by the following lemma.
Lemma B.1 Assume that J ∈ C 1 and is convex over Rn. Then θ∗ is a
global minimum for (B.2) iff
∇J( θ∗) = 0
The proof of Lemma B.1 can be found in [132, 196].
A vector ¯
θ is called a regular point of the surface Sθ = {θ ∈ Rn |g( θ) = 0 }
if ∇g(¯
θ) = 0. At a regular point ¯
θ, the set
M (¯
θ) = θ ∈ Rn θ ∇g(¯
θ) = 0
is called the tangent plane of g at ¯
θ.
B. OPTIMIZATION TECHNIQUES
785
B.2
The Method of Steepest Descent (Gradient Method)
This is one of the oldest and most widely known methods for solving the
unconstrained minimization problem (B.2). It is also one of the simplest for
which a satisfactory analysis exists. More sophisticated methods are often
motivated by an attempt to modify the basic steepest descent technique for
better convergence properties [21, 132, 196]. The method of steepest descent
proceeds from an initial approximation θ 0 for the minimum θ∗ to successive
points θ 1 , θ 2 , · · · in Rn in an iterative manner until some stopping condition
is satisfied. Given the current point θk, the point θk+1 is obtained by a linear
search in the direction dk where
dk = −∇J( θk)
It can be shown [196] that dk is the direction from θk in which the initial rate
of decrease of J( θ) is the greatest. Therefore, the sequence {θk} is defined
by
θk+1 = θk + λkdk = θk − λk∇J( θk) ,
( k = 0 , 1 , 2 , · · ·)
(B.3)
where θ 0 is given and λk, known as the step size or step length, is determined
by the linear search method, so that θk+1 minimizes J( θ) in the direction dk
from θk. A simpler expression for θk+1 can be obtained by setting λk = λ ∀k,
i.e.,
θk+1 = θk − λ∇J( θk)
(B.4)
In this case, the linear search for λk is not required, though the choice of the
step length λ is a compromise between accuracy and efficiency.
Considering infinitesimally small step lengths, (B.4) can be converted to
the continuous-time differential equation
˙ θ = −∇J( θ( t)) , θ( t 0) = θ 0
(B.5)
whose solution θ( t) is the descent path in the time domain starting from
t = t 0.
The direction of steepest descent d = −∇J can be scaled by a constant
positive definite matrix Γ = Γ as follows: We let Γ = Γ1Γ1 where Γ1 is an
n × n nonsingular matrix and consider the vector ¯
θ ∈ Rn given by
Γ ¯
1 θ = θ
786
CHAPTER 0. APPENDIX
Then the minimization problem (B.2) is equivalent to
minimize ¯
J(¯
θ) = J(Γ ¯
1 θ)
(B.6)
subject to ¯
θ ∈ Rn
If ¯
θ∗ is a minimum of ¯
J, the vector θ∗ = Γ ¯
1 θ∗ is a minimum of J . The
steepest descent for (B.6) is given by
¯
θk+1 = ¯
θk − λ∇ ¯
J(¯
θk)
(B.7)
¯
Because ∇ ¯
J(¯
θ) = ∂J(Γ1 θ) = Γ
¯
θ = θ it follows from (B.7) that
∂ ¯
θ