CHAPTER
~
jr_______________________________________
..
'1 '
a 'J'
M INIMIZATION O F S UM O F
S QUARES FUNCTIONS F OR M ODELS
NONLINEAR IN PARAMETERS
7.1
I NTRODUcnON
T his chapter is concerned with methods for minimizing a general sum o f
s quares function when the dependent variable is nonlinear in terms o f the
parameters. The function to be extremized is assumed to be known
although selection of such a function is n ot a trivial matter and comprises
the first optimization problem in parameter estimation. (Extremize means
either maximize o r minimize.) The extremization o f the chosen function,
the topic of this chapter, is the second optimization problem of parameter
estimation. O ther optimization problems relate to optimal design o f experiments a nd optimal designs for discrimination between competing models.
In engineering a nd science most phenomena are modeled using differential equations. The solution of these equations may be available in closed
form o r may be obtained through the use o f finite difference o r finite
element computer solutions. Regardless of the method of solution of a
differential equation, the model is more often than not a nonlinear function
of the parameters. The differential equations and boundary a nd initial
conditions may be linear in the usual mathematical sense a nd still have a
solution that is nonlinear in terms o f the parameters. See, for examples, the
models in Section 7.5.
l J4
1.1 INTRODUcnON
335
A p roblem can be either linear o r nonlinear. T he nonlinearity in one
case can pose more difficulties in obtaining a solution than in another
nonlinear case, however.
T he extremum found for the sum o f squares function when the model is
linear in the parameters can be proved to be the correct one for OLS
estimation as well as for M L estimation provided the conditions 111011
(see Section 6.1.5.2) are satisfied a nd a unique minimu~ p oint exists.
Complete a ssurance cannot readily be given for nonlinear cases since
there may be more than one extremum. I f o ne has reason to d oubt t hat the
extremum found is the desired one, it is recommended that contours o f
c onstant S b e plotted in· the region in which the solution is expected. This
involves a n extensive search. Another possibility is to start the iteration
procedure with different sets o f initial parameter values.
I t should be noted that the same problems associated with iIIconditioning in linear problems also arise in nonlinear estimation problems. In
linear problems the theoretical existence o f a minimum may be a mirage
for iIIconditional cases, that is, those associated with relatively small
IXTXI values; see Section 6.7.6. In such cases slight changes in the
measurements c an cause large movement in the location o f t he minimum,
resulting in large perturbations in the estimated parameter values. Because
o f this sensitivity for illconditioned cases, convergence proofs for nonlinear cases m ay also be more academic than practical.
Several simple optimum seeking methods are given in this section. In
Section 7.4 the Gauss method is given, a nd i n Section 7.6 several modifications o f t hat method are described. Also in Section 1.6 a comparison o f
several methods is given. Later sections discuss sequential methods a nd
c orrelated errors.
7.1.1
Trtal and Error S earch
One o f the simplest procedures for extremizing a function is trial and error.
I t is quite inefficient a nd is n ot recommended. I t is easily described and
understood, however.
As for most nonlinear search procedures, the trial a nd e rror procedure
starts with a set o f estimated values o f all the parameters. Let the initial
parameter estimates be designated .,CO) a nd for simplicity let there be a n
associated O LS s um o f squares
(7.1.1)
 For lOme c:ues converJence proof. are available but they are _ ny not practical t o
implement for a number of different reaaonl. See Bard (S, p. '7~
336
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS
(A more general function is given in Section 7.3.) Ne1lt, a nother set o f
p arameters b(l) is c hosen more o r less arbitrarily a nd S ( I) is c alculated. I f
(7.1.2)
then the combination o f p arameters given by b(l) m ust be " better" t han
b(O); o ne might wish to select a b(2) to be a modification o f b(l) in the same
m anner t hat b(l) was of b(O) . I f the inequality in (7.1.2) is reversed, then we
would not proceed in the same manner. There are many possible
strategies t hat c ould be employed to choose different b's. O ne is to fi1l
all the b;'s e1lcept o ne which is varied until S reaches a relative minimum,
a t which time this parameter is fixed a nd a nother o ne is v aried . This
p rocedure usually requires considering further changes in each parameter
a fter t he o ther p arameters have been reestimated.
I n t he procedure outlined above, there is n o rule t hat m ust b e followed
i n selection of new sets o f p arameters. Instead o ne c an try any set o f
p arameters that seems reasonable. This c an b e d one interactively using a
teletype computer terminal. In so doing o ne usually finds t hat this m ethod
o f s olution is inefficient, timeconsuming, a nd n ot p ractical. O ne c an,
however, obtain a " feel" for the difficulty o f e1ltremizing a function,
particularly when there is m ore than o ne p arameter.
I
I
,
7.1 I NTRODUcnON
•
7
S
•
1
o
••4•
23•5
b (4).
.....
 .j
 . ....1
T he exhaustive search procedure is u ndoubtedly expensive b ut it does
have the potential o f revealing local minima in addition to the global
m inimum. This is i llustrated b y Fig. 7.1 b. T he e xhaustive search procedure
is m ore likely to produce the global minimum t han s ome o ther schemes.
Irrelevant local minima are encountered in p arameter e stimation b ut a re
n ot c ommon.
•
6
0.5
2.0
b
F Ipre 7.1. Sum o f squares in an exhaustive search proc:edure
s
' global
mfnfllKlm
7.1.2 Exhaustive Search
A nother simple (and also inefficient) procedure is t ermed a n exhaustive
search. T o illustrate this procedure consider the case o f only o ne u nknown
parameter, fJ . T he " best" value of this p arameter is t o be associated with
the m inimuql value of S. Instead of selecting only a n initial estimate, a
region o f fJ is chosen in which region the minimum value o f S is e xpected
to be. Suppose that the parameter fJ is k nown to be between 0.5 a nd 2.0. In
a n e xhaustive search S is c alculated a t e qually spaced values o f b in this
region (fJ is the true value a nd b is a n e stimated value.) Fig. 7 .la shows S
for ~b intervals of 0.25. T he best be;) v alue as indicated by Fig. 7. 1a is
b (3)= l . A m ore accurate value o f b c ould be found by conducting a n
e xhaustive search with a smaller ~b in the reduced region between b (2) a nd
337
o
b
(bJ
' !pre 7. lb S with a local a nd global
minimum.
I t s h?uld also b e n oted t hat t he global m inimum m ay n ot necessarily be
the deSIred o ne. F or e xample, a certain mechanical device m ay b e k nown
to h~ve a ~atural fr~~u~ncy a bout I H z. This frequency is t o b e m ore
preCIsely estImated u tdwng several measurements o f d eflection versus time
o f the device. A local minimum o f S w ould be e xpected n ear I H z b ut the
global minimum might o ccur a t a c onsiderably higher frequency. A nother
e xample occurs when the functional form o f S i ncorporated in a c omputer
pr~gram m ay hav~ a g lobal m inimum a t negative values o f p arameters
whIch a re n ot phYSIcally possible.
7.1.3 O ther Methods
T here a re m any o ther m ethods for locating extremes o f a rbitrary f unctions.
These include direct search, Fibonacci search, gradient methods, r andom
l JI
C HAPlD 1 MlNlMIZAnON
o r SUM o r SQUAIlES t tlNCIlONS
search, HookeJeev~s search, simplex exploration, a nd d ynamic program·
ming methods. See a text o n optimization such as Beveridge a nd Schechter
(2). Most o f these methods are much more efficient than the two methods
just mentioned.
Rather than giving many methods we shall emphasize one basic method
and then suggest some modifications of it. This method is called the Gauss
method. It has proved to be very effective for a large class o f different
parameter estimation problems.
7.1 M ATRIX F ORM OF T AYWR SERIES EXPANSION
7.l SUM 0 ' SQUADS f tJNCIlON
i ?cluded in (7.3.1). .W hen a single sensor is used t o obtain many observa.
bons, we have a Stogie response case. I n this situation the observation
vector Y a nd corresponding vector found from the model, " , a re n vectors.
T he s quare m atrix'! is n X n. T he p arameter vector fJ c ontains p components as does p.; U IS a square matrix o f dimensions p x p. Much o f this
chapter explicitly considers the discrete single response
Extensions to multiresponse cases o f t he algorithms given in Sections 7.4
a nd 7.6 a re n ot difficult. Section 7.8 provides a sequential method for this
case. The multi response case occurs when m ( > I ) measurements are taken
a t n different times. The observation vector can be written as
case.
Y (I) ,
In the Gauss linearization method of minimizing a sum of squares function
a Taylor series expansion o f a vector is needed. Let 'I b e an n vector a nd a
function of the p parameters in the fJ vector. Let ' I have continuous
derivatives in the neighborhood o f fJ  b. T hen the Taylor series for a point
fJ near b begins with the terms
'Ie f J) ,,(b) + [V ~'1 T (b)
r (fJb) + . ..
Y
Y 2(i)
where Y (i)Yen)
_ (7.3.2)
Y ",(i)
(7.2.1)
where V~ is the matrix derivative operator defined by (6.1.21).
7.3 S UM OF SQUARES F UNcnON
I n o rdinary least squares, weighted least squares, maximum likelihood, a nd
m aximum a posteriori estimation, the sum of squares functions to be
minimized are generally different. In some cases, however, the M AP sum
o f s quares function reduces to that for M L which in t um reduces to that
for O LS estimation. F or t hat reason a nd for economy in presentation a
sum o f squares function is given that is appropriate for OLS, WLS, ML,
a nd M AP estimation when appropriately specialized. The function that we
consider in this chapter is
s  [Y 
: _ , ( i)
Y
Y(2)
H ence the Y vector contains mn components. T he ' I vector c an be
similarly defined a nd W is m il X mn. T he fJ, 1', a nd U matrices remain as
given above.
In some situations it is natural to consider continuous rather than
discrete measurements in time. This may be either because the measurements are actually continuous o r because it is more convenient to analyze
t hem as if they were. T hen we would replace (7.3. 1) b y
s  folj(y( t) 
'Ie l ,fJ) ( W( I )[Y(/)  ,,(/,fJ)] d l + ( I' 
fJ) TU( I '  fJ)
(7.3.3)
where the time limits a re 0 a nd I f' I f the case being considered involves a
single response, Y(/) becomes the scalar Y (/). F or the multi response case
the Y(/) vector is
'Ie fJ) ( w[Y  'Ie fJ)] + ( 1 ' fJ )TU( 1 ' fJ)
 [V  ,,( fJ) ( w[Y  'I( fJ)] + ( fJ ,,)TU( fJ 1')
(7.3.1 )
Both W and U are weighting matrices which are symmetric; W is positive
definite a nd U is positive semidefinite. I n many cases W and U will b e
assumed to be completely known.
The cases o f single a nd multiresponse for discrete measurements a re
(7.3.4)
a nd , ,(/.fJ) is similarly defined. In many algorithms to be given, the
summations on a time index can be replaced by integrations over time.
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS
O ne further modification (or interpretation, depending o~ on~'s
viewpoint) of (7.3.1) is for situations in which the d ependent , vanables m
the system model are not measured ~irectly. Th!s is a case discussed
frequently in the systems literature. T o Illu~trate this cas~ assum~ t hat t~e
system model is the set of nonlinear, fustorder o rdmary differential
equations,
i =f(t,x,l),u)
(7.3.5)
The dimension of the dependent variable x is r; f is some known function
o f x, t, I) a nd u which could b e r elated to some forcing function o r c ontrol.
It may be that all the components o f x a re n ot m easured directly. Instead
some linear o r n onlinear function of the x may b e m easured,
V =hx+£
or
V =g(t,x)+ f
(7.3.6)
Here the dimension m o f the Vi c omponent o f the V v ector would be equal
to or less than r. I n such cases the d ependent v ariable 1) in (7.3.\) could b e
replaced by hx o r g (t,x).
so t hat (7.4.1) set equal to zero a t I) =
7.4.1
Ii becomes
(7.4.3)
p
Ii
~nf~~un~tely, we c annot easily solve for the estimator s ince a ppears
Imphc.ltly m 1) ~nd X a s well as appearing explicitly. Suppose t hat w e have
a n e stImate o f ~ denote~ b ~nd t hat 1) h as c ontinuous first derivatives in IJ
a nd b ounded hIgher denvatlVe5 n ear b. T wo a pproximations a re n ow used
i n (7.4.3). First, replace ~(Ii) b y X(b) a nd second, use the first two terms
o f a T aylor series for 1)( I) a bout b. T hen (7.4.3) becomes
XT(b)W[V  1)(b)X(b)(Pb)] + U ( p b) U(Iib)~O
~ote t~at t~is
(7.4.4)
Ii.
equa.tion is linear i n
I f ( I) 1) is 'not t oo f ar from being
In IJ In a regIOn a pout t he solution t o (7.4.3) a nd if (2) this region
Includes b, t~e v alue o f IJ s atisfying (7.4.4) will b e a b etter approximation
to t~e. s olution (7.4.3) t han t hat p rovided b y b. Assuming these two
conditIons to b e true, (7.4.4) is set e qual t o zero. In the interest o f
c ompactness o f n otation a nd t o indicate a n i terative p rocedure let
~lnear
bCk) = b,
7.4
341
7.4 GAUSS M EllIOD O F MINIMIZATION
bCH I ) . .. 11,
1)Ck)
=1) (b),
XCk) =X(b)
(7.4.5)
G AUSS M ETHOD O F M INIMIZATION
Using this n otation i n (7.4.4) set equal to 0 yields p e quations in matrix
form for bCH I),
Derivation
O ne o f the simplest a nd most effective methods of minimizing the function
S is variously called the Gauss, G aussNewton, N ewtonGauss, o r linearization method; we call it the G auss m ethod. It is a ttractive because it is
relatively simple a nd b ecause it specifies direction a nd size o f the corrections to the parameter vector. T he m ethod is effective in seeking minima
that are reasonably welldefined provided the initial estimates are in the
general region o f the minimum. F or difficult cases (i.e., those with indistinct minima) modifications to the G auss m ethod discussed in Section 7.6
are recommended.
A necessary condition a t the minimum o f S is t hat the matrix derivative
of S with respect to I) b e equal to zero. F or this reason operate upon S
using (6.1.30) to get
V I/S=2(  VI/1)T(I)]W[V1)(I)]+2[  I]U(pl)
(7.4.1)
(1.4.6a)
(7.4.6b)
whic~ is t he G auss l inearization equation. Iteration o n k is r equired for
n onhnear models. F or l inearintheparameters model n o i terations are
required. N ote t hat f or." = XIJ, (1.4.6) reduces to the M AP e quation
(6.6.6a) ~y s etting ~Ck) equ~1 t o zero. N o c onstraints a re i ncluded in (7.4.6).
In uSing (1.4.6) I D n onhnear cases a n initial estimate o f I), d esignated
b CO), is n eeded. With this vector 1) C a nd XCO) c an b e c alculated, which, in
O)
turn, are used in (7.4.6a) to o btain t he improved estimate vector b(l). T his
c ompletes the first iteration. T hen 1)(1) a nd X(I) a re evaluated s o t hat b(2)
c an b e f ound. T he i terative procedure continues until there is negligible
change in a ny c omponent o f b; o ne c riterion to indicate this is
Let us use the notation X( fJ) for the sensitivity matrix,
(7.4.2)
f or i = 1,2, . .. , p
(7.4.7)
CHAPttR 1 M INIMIZATION OF SUM OF SQUAR£S FUNCIlONS
1A GAUSS Mt:11IOD o r M lNlMlZAnoN
'
·'i.
where 3 is a small number such as 1 0·. In order to avoid embarrassment
if hP') goes to zero, the quantity 8, is set equal to another small number
such as 1 0'0. When good initial estimates of the parameters are available
a nd the experiment is welldesigned, (1.4.1) is frequently satisfied by the
seventh iteration. See Table 1.3 fO.r an example o f this. (The fact that
(7.4.7) is satisfied does not guarantee that the last b(A:+ ') minimizes S,
particularly when the minimum is illdefined.)
As a minimum is being sought, the function S should logically decrease
from iteration to iteration. One might then include a check in a computer
program to see if S(A:+ ') is less than S (k). I f it is not, the procedure could
either terminate or the correction o f the parameters, h r + I )  hr), could be
decreased as discussed in Section 7.6. In some cases, however, a temporary
increase in S could permit larger parameter changes to the region of the
minimum and actually lead to more rapid convergence.
Its sensitivity coefficients are
X
II 
il'J1
ap i 
exp( /J2'/)' X12 
a,,1
CJP
2
(7.4.11)
The matrix XTWX is a symmetric matrix of dimensions p x p. Let
C EXTWX
(7.4.12)
where the CI} element o f C is
(7.4.13)
I f the weighting matrix is diagonal, (7.4.13) simplifies to
7.4.1 Components of G....... Unearlzatlon Equation
~
Consider the sensitivity matrix as defined by (7.4.2); without showing the
dependence on b (k), it can be written for a single response case as
PI'I exp( P2'/)' XI)  I
a", a",
aD
C I} ~ wUXIIXql:wlI aD
I I .
PI
(7.4.14)
PJ
Let the matrix product X TW(y  IJ) in (7.4.6a) be designated H ,
T
x
f~".
x,,,
a
a/J,
x~ ]_
['1, . .. " .. ]
a
a/J,
X.

a'll
a/J,
ch"
a/J,
(7 . 4.1~)
which is a p X I vector and has a typical component o f
HI 
a"..
aft,
a""
aft,
""
l": l": W,.X'I (Y. 
1 1.1
(7.4.16)
".)
F or CJk) a nd H r), the quantities Xu a nd " . are evaluated with lJb(A:).
F or the simple case of one parameter ( p  I) the iterative equation is
(1.4.8)
Hence the i j element of
X (k)
(7.4.17)
is
An initial estimate o f
(1.4.9)
p, designated h(f1), must be provided.
E x.mple 7.4••
This definition o f X is consistent with the linear model. A simple example
is " 1 p,XI , + P XI2 where XI} has the same meaning as in (7.4.9). A model
2
which is nonlinear in a parameter is
(1.4. 10)
Estimate /1, and /11 in the model , , /1, +(1 + /11ty ulinS
+ [ II
1
J +1 _I2
[
I
2
1
2,
2
3
o
I
2
I
n'
1', hr"2, 1'1 hrI,
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS
( fbis tf function is associated with cumulative errors in
o
I
r l .)
7.4 GAUSS ME11IOD OF MlNlMIZAnON
the first iteration we calculate S(I) to be
The data are
I
8.5
3
Use the sum of squares function given by (7.3.1) in which tf  1 is to b e used for
W.
Solution
+ [:
This problem is solved using (7.4.6) in an iterative manner. Consider the first
iteration. The sensitivity coefficients a nd values are
r[
I
2
I
_][ ~.251
?
I
1.25
=:'5 g ~][  ~.5 ]
 2.6875
which is lower than S(O). ( If it were not, then a method given in Section 1 .6 might
be used.)
For the second iteration the calculations proceed in a similar manner. We have
X TCD) = [
I I]
~o4
p (l) .!. [ 31
30  I
Using this vector we rind
XT(D)tfIX(D) .. [~
a nd thus
p I
~
!][1
I
2
I
n: ~ 1[~ I~
r
0][ .5 ]_[2.25
1 .25]
4
]
which results in
is
p  I(O)=XT(D)~IX(D)+U=[~ I~]+[g ~] .. [~
P (O)
=
b(l)_
2g]
which has the inverse
0
[01 0.05 ]
Another expression needed in (7 .4.6) is
 II ]
[I1.5 ]+.!.011
[3
3
 I ] [  0.25] _ [ 0.81667]
I
 2.25
1.4333
The associated S value is S (l)2.497059 which is, as it should be, less than S(I).
The above results along with those of the third and fourth iterations are
summarized in Table 1. 1. T he 4b;(k) expression means 4 b/k)_blk+I)b/k). Notice
that the results converge quite rapidly since the relative changes in both parameters
by the fourth iteration are less in absolute value than 1 0 4 (thus satisfying (7.4.7) if
8 is 1 0 4). There are some changes in sign in the corrections, 4b/, but no instability
is noted.
XT(D)tf  I (Y _ ,.(0) + U( " _ b(D) ..
III
[ o 0 4 ][_~
0
I
2
I
1 2]
.
:0]( 8.~=~ +[g]=[I~]
Then we can use (7.4.6a) to complete the first iteration,
b(\)[n+[~ g . 05][1~][::~]
~
C t)
......
The initial sum of squares as defined by (7 .3.1) has a value of S (0)  8.25 . Arter
T able 7.1
k +1
0
1
2
3
4
SWIlIIIII}' o f CaIcuIadoas f or Example 7.4.1
bik+')
4bfk)/bfk)
4 W)/bik)
bfk+')
2
I
0.816667
0.810561
0.810630
I
1.50000
1.433333
1.435251
1.435166
 0.5000
 0.1833
 0.748 x 1 0 2
8.56 x 1 0 5
0.500
 0.0444
O I34x 1 0 2
.
 5 .85X 1 0 5
S(k+I)
8.250000
2.681500
2.497059
2.496940
2.496939
t . ~
30M
r
104 GAUSS M l:THOD 0 ' M lNlMlZA110N
!
CHAI'1'£R 1 M lNlMIZAnON OF SUM OF SQUARES FlJNCI10NS
c an b e w ritten a s
I
~.:~ .. 7....3 CommeDt! o n G auss U neartzatlon Equation
Several c omments a nd o bservations regarding the G auss l inearization
e quation a re given in this section.
( a) By letting W "" I a nd V = 0, (7.4.6) provides o rdinary l east s quares
e stimates.
( b) U the observation errors 2 s atisfy the s tandard c onditions o f b eing
a dditive. zero mean. it is k nown w ithin a multiplicative c onstant a nd t he
i ndependent v ariable(s) a nd IJ a re n on s tochastic (i.e., 11011), t hen n onlinear G aussMarkov e stimation c an b e used by setting W = ( J  1 a nd
V O. ( We a re using i t· a 2(J w here ( J is completely known.)
( c) U in addition t o t he conditions given in (b). t he errors a re n ormal.
the assumptions a re d esignated 111011. T hen (7.4.6) provides a n onlinear
M L e stimator if W =(JI a nd U =O.
( d) U t he conditions in ( c) a re valid except there is p rior i nformation
a nd a 2 is known. then a n M AP e stimator is provided by (7.4.6). S uppose
t hat IJ is a r andom p arameter v ector with a m ean " p' c ovariance V. a nd
n ormal p robability density. T he a ssumptions a re t hen designated 111112.
By letting W it I •
a nd V "'V I • t he corresponding M AP e stimator is given b y (7.4.6).
( e) I n o rder to utilize (7.4.6) to estimate the parameters it is necessary
t hat p I have a n inverse. t hat is, P  I b e nonsingular. T hen its d eterminant
m ust n ot b e zero o r
""11'
II
f
t
I
•
I C jXjjO
j'
f or ; 1.2• ...• " f or a t l east o ne Cj~O
(7.4.20)
I f (7.4.20) is true. then & given b y (7.4.19) is e qual
10 z ero.
T his c ondition o f l inear d ependence is a lmost s atisfied in m any m ore
cases t han w ould b e e xpected. I n s uch c ases & is a lmost z ero; this is w hat is
m eant b y iIIconditioning. See Section 6.7.6. I f (7.4.20) is almost satisfied,
the s um o f s quares f unction S will h ave a u nique m inimum p oint a nd t hus
a u nique s et o f p arameters. T he m inimum p oint will n ot b e v ery pronounced, however. As a n e xample c onsider t he s um o f s quares f unction
S (2 /11eflJ)2 + (32/1, e 2JlJ )2
which is p lotted i n Fig. 7.2 f or S  O,.I, I a nd 3 . T he s ensitivity matrix f or
3r~~~~~~~
( 7.4.18)
i n t he region o f t he minimum. W e call this the identifiability c ondition. U
t his d eterminant is identically equal t o zero, there is, in general. no u nique
p oint a t which the m inimum o ccurs. A m ethod d oes n ot give the complete
l ocation o f t he minimum when it specifies a single p arameter p oint while
there is more than o ne p oint a t which the minimum occurs. S ince t he
G auss m ethod will n ot yield a ny p oint in this case, o ne is alerted t o t he
nonexistence o f s uch a point.
F or least squares estimation W = I a nd V ... O. F or this case it is necessary
t hat
(7.4.19)
i n the neighborhood o f t he m inimum o f S. T his d eterminant is shown t o b e
e qual t o z ero in Appendix A if a ny c olumn o f X c an b e expressed a s a
l inear combination o f o ther c olumns. This condition, linear dependence,
2~~~~LL~~
2
1
0
2
3
81
Ftae 1.2 C ontoun of l um of Iquafel func:tiOD S (2Jl,,"f+(J2J1.,·2IJf for
several S "alues.
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS
7.4.4
the two points indicated in S is
X=
7.4 GAUSS ME11IOD OF MINIMIZATION
[~
which exhibits linear dependence for either /32=0 o r /32+00. Since t he.
minimum S value is a t /3 I= I a nd /32 = O. the condition of linear dependence is almost satisfied in the region of the minimum point. Along the
long axis of the S =0.1 contour, the change in S is much more . gradual
than in other directions. Contours that are long, narrow. a nd curvIDg such
as S = 0.1 are frequently associated with near linear dependence or, equivalently, with !:J. being relatively small. Moreover. such contours are typ!cally associated with slow convergence of the Gauss method. F or thIS
reason it is important to examine the sensitivity coefficients over the region
of interest.
F or M L estimation, it is necessary that
Linear Dependence o f Sensitivity Coefficients
As stated in the preceding subsection u nder p oint (e), t he function S f or
LS o r M L estimation has n o u nique minimum point if the sensitivity
coefficients are linearly dependent. I t h as been found from experience t hat
difficulty encountered in convergence is frequently d ue t o a pproximate
linear dependence. In most of these cases the sensitivity coefficients were
n ot p lotted a nd e xamined beforehand. Indeed, in m any c ases·the user of a
nonlinear least squares program may n ot realize their importance a nd thus
n ot even examine them after lack of rapid convergence is apparent. F or
effective nonlinear estimation, the careful examination o f these sensitivity
coefficients is imperative. In o rder to demonstrate what should be inspected, the following discussion is given.
F or single response cases with approximately constant standard deviations o f the measurements, it is convenient to examine
(7.4.21 )
b ut ML wiJI not lead to a choice of W = ' "  I which would cause this
determinant to be equal to zero if XTX is n ot equal to zero. Also if XTX is
equal to zero, there is n o '"  I which will make (7.4.21) be true. (See
Appendix A.) Hence the condition given by (7.4.19) is again the important
one.
For MAP estimation with W =",  I a nd U =V ii ' , it is possible that
(7.4.18) may be true even if IXTXI =0. T hus if there is p rior information.
the sensitivity coefficients may not have to be independent to permit
estimation using (7 .4.6).
( J) When convergence to the estimates is a ttained. the matrix derivative of S goes to zero as indicated by (7.4.4). Since the same terms appear
in (7.4.6), the p X I vector given next must also go to zero at convergence,
(7.4.22)
,;.
T his means that every component of this vector must be zero. F or cases
when U . . O this results in each H, given by (7.4.16) being equal to zero.
.
Knowledge of this fact can sometimes aid in checking computer codes that
a re n ot yielding converging solutions.
( g) Though (7 .4.6) has been obtained by using the linear approximation given by (7 .2. 1). the Gauss equation is not a rigorous first order
approximation because a first order series was not used for X lk + I ).
(7.4.23)
N ote that Xij h as the units o f 1J. T hen the magnitude o f e ach sensitivity c an
b e compared with the others as well as with 1J itself.
F or multiresponse cases it is o ften more meaningful to plot
/3 . a1J,,(;)
X / ( ;)=_1_ ___
"
O',,(i)
a~
(7.4.24)
which is dimensionless. ( Note; refers to " time," j to the parameter, a nd k
to the response.)
In Figs. 7.3 a nd 7.4 s ome sensitivities are plottedversus the variable I ;.
T hose in Fig. 7.3 a re linearly dependent b ut those in Fig. 7.4 a re not. The
first nine graphs in Fig. 7.3 a re for two parameters; it is n ot difficult to see
the linear dependence in the sensitivities in each case. N ote t hat the
location of the Zero value on the X axes is n ot a rbitrary in most cases. The
last three cases are for the three parameters being estimated simultaneously; the linear dependence is less obvious than for twoparameter
cases.
The importance of the zero value o f X is also shown b y Figs. 7.4a,h,c.d,
a nd f which are n ot linearly dependent cases as drawn, b ut each becomes
dependent if the zero is moved. W hat z ero location would d o this in each
case?
350
CHAPTER 7 MINIMIZATION O F S UM O F S QUARES FUNCTIONS
X
Ik
X'k
,
00
~
,
,
~
0
Xu
0
t,
( a)
Xu
,
Xu
t,
( b)
L __
( f)
o ~:,
L ___
( g)
o
~.
,
Xu
X;j
( a)
X
Ik
0
t,
( b)
t,
( e)
,
Xu
t,
o
o~~
x
L ..t,
Ce)
( d)
Cf)
,
. ........
•
Xu
,
X'k
t,
( e)
35.
,
0
o t'
..
7.5 GAUSS ME11fOD INVOLVING DIFFEREN11AL E QUAll0NS
,
XIk
,
t,
,
••
X'k
... ......
o ..,~,
X,
L ::..L._t,
o
( h)
o
.
••
,
~XIt
1~t,
L _t,
( j)
f Ipre 7.3 E umples
( k)
o L..!..!.._ _
.;~
t,
7.5 EXAMPLES T O ILLUSTRATE GAUSS MINIMIZATION ME11fOD
INVOLVING ORDINARY DIFFERENTIAL EQUATIONS
••
... , .
Xfj
oL     __
( 1)
( 1)
or some linearly dependent sensitivity eoerricients.
• •••••••
• •••
L_XIt
,
X
fk
f Ipre 7.4
t,
Examples o f some linearly independent sensitivity coefficients.
equation for this case can be written as
(7.5.1)
7.5.1
Fstlmatlon o f. P. ...meter f or. Long Fin
Consider a long fin which has a temperature at its base, Z = 0, of 200°C
and which is exposed to a nuid a t 100°C, see Fig. 7.5. The differential
where M 2 is given by
(7.5.2)
J !l
CHAPTER 7 MINIMIZATION OF SUM O F SQUARES FUNCTIONS
T he p arameter M is n onrandom a nd there is no prior information. Using
o ur notation, these assumptions a re designated 1111111; t he value o f (12
n eed not be known to estimate parameters.
{}:_:C_F_i"_ _L_h_
Fluid at
T_~
7.5 GAUSS M ElHOD INVOLVING DIFFEREN'IlAL EQUATIONS
Table 7.2 Simulated Data for Fin Example
l 00·C
Figure 7.5 Geometry for fin in a nuid.
h is the heat transfer coefficient, k is the thermal conductivity of the fin, A
is the fin crosssectional area normal to z, a nd P is the perimeter of A .
T he boundary conditions are
T(O) = To ,
T (oo)=T""
(7.5.3)
Equations 7.5 .1 and 7.5.3 give a complete mathematical statement of the
classical boundary value problem. The solution for T assuming constant M
is
(7.5.4)
which contains the three parameters To, Too' a nd M. In a sense only M is a
parameter since To a nd T"" can be considered " states"; that is, they are the
temperatures at z = 0 a nd infinity, respectively. From examining (7.5.4), we
see that T is linear in To a nd Too b ut nonlinear in terms of M .
Each of the three parameters (To, T "", a nd M ) enters the problem in
such a manner that all three could be found simultaneously, rather than
only in certain combinations. On the other hand, h, P , k , a nd A can not be
found independently but only in combination M . I f other boundary
conditions were known, a different set of parameters might be found. For
example, if the heat flux q a t z = 0 has a known value, the boundary
condition is
 kdT(O)
=q
(7.5.5)
dz
and the parameters M, k . a nd Too can be simultaneously estimated.
Simulated data for this problem are given in T able 7.2 . The assumptions
are
Y;= 1";+f;
V (Y.)=O ,
; =2,3,4,5
E(f;£.i) = 0.
; :j;) a nd ; .)=2,3,4,5
Position z
(m)
I
2
3
4
5
6
Temperature Y
0
0.125
0.250
0 .375
0.5
200
166
144
128
120
100
00
( 0C)
T he assumptions V (Y,) V(Y6 )=0 m ean that To a nd T co .are the known
values of 200 a nd 100°C, respectively. T he p arameter is M. F or the
standard assumptions given above, O LS a nd M L estimation provide the
same parameter estimates. The weighting matrix W in (7.3.1) is W = , ,,' =
( 121 since
(7.5.6)
The iterative relation for finding M c an be found using (7.4.17) which
can be written as
M(k+ t) = M(k) + [
.
±
, 2
,X?)( Y 11 k»)
j[ ~ (X}k)tj
,
(7.5.7)
; 2
where the sensitivity coefficient is
(7.5.8)
From a knowledge of heat transfer for this particular problem, a n initial
estimate o f M c ould be given. M any m ethods are available, however, that
use only the given d ata r ather than relying o n experience. O ne o f these is
used in this example. Since only o ne p arameter is unknown, let us pick a
single Y" Let us choose the value o f Y )= 144°C which is nearest the
average o f To a nd T ",,' F or T = 144 a nd z = 0.25, (7.5.4) yields
1 44= 100+ ( 2oo100)exp[  M(O)(0.25)]
354
CHAPTER 7 MINIMIZAnON OF SUM OF SQUARES FUNCTIONS
M
which can be approximately solved for
(7.5.4) gives the residual vector o f
( 0)
= 3.28m  I.
Using this value,
7.5 GAUSS M ElHOD INVOLVING DWRRENTIAL EQUAnONS
A third iteration yields
e(l).,.
166166.36502501] [ 0.36502501]
( 0) = Y _ TO) =
144144.04316545 =  0.04316545
e
1.22925771
[ 128129.22925717
120119.39800423
0.60199571
The sum of squares associated with these residuals is S ( 0) = 2.008580086
using 0 2 = I .The sensitivity matrix X(O) a nd X T(O)X(O) a re
X (O).
aTO)
aM
=
j
 8.29562813
 11.01079136
[  10.96097166'
 9.69900211
~'o!~286~581~4
... 3.28 + .0275522  3.3075522m\
T he associated sum o f squares is S (I)1.700958954. N ote that
M (I)  M fO )
M (O)
=
0.0275522
3.28
=0.0084
which is small compared with unity and thus the initial estimate was very
good. This can also be verified from an examination of the small residuals
in e (a). O ne might desire to minimize S more precisely. Then using the
above M (I) value we can find
e (l).
[ 0.13685534
0.25916365
 0.92881366
0.86739235
j
M m c o 3.3077428 +
826710692]
10.93520909
10.84830512
9.56630382
= 3.3075522 + 0.000 1906314 . . 3.3077428m  I
M(l)M(I)  5.763x IO~,
.
M (I)
S (2)
= 1.10094504
8.26690996]
10.93468804
[  10.84752977
 9.56539220
3.3077433m\
Several observations c an b e m ade from the above results. First, M seems
t o b e converging rapidly. T he relative corrections are decreasing by a
factor smaller than O.QI. T he s um o f t he components in e (2) is n ot zero,
unlike a model containing a parameter which has a constant sensitivity
vector. N ote t hat the residual values changed much more between iterations than the sensitivity vector (X) values. Another observation is t hat S is
decreasing as the iterations proceed.
In the example given above the initial i t ( 0) value is relatively close t o the
converged value, resulting in only three iterations being required. N ot
m any iterations are required, however, for a range o f initial values a s large
as 0 to 10 ( or even  3 to 10, as indicated by the i t ( 0)  10 case) a s shown
in Table 7.3. Eight o r fewer iterations were required t o converge t o within
Iteration
number
0
.I
M (l)=3 3 075522+ 0.07570426
.
397.123747


S(3)1.700945oo
Table 7.3
X(l) _ [ 
X U)...
0·:.~7:6 
X T(O)X(O) = I X/(O) =404.268514
Using these values a nd I X/0)ej(0)11.I3849884 in (7.5.7) yields
M(I) =3.28 +
 0.13527965 ]
0.26124785
[  0.92674605 '
0.86921561
2
3
4
5
6
7
8
P anmeter Values as a Function o f I tentlon
for Various Initial Estimates for Fin Example
M(O)O
M(O)6
M(O)8
M(O)IO
0.00
1.8186667
2.9666083
3.2883466
3.3076357
3.3077430
3.3077433
6.0
2.1484134
3.0970338
3.3001212
3.3077155
3.3077432
3.3077433
8.0
0.0457249
1.782S492
2.9506108
3.2865352
3.3076195
3.3077430
3.3077433
10.0
3.1824944
1.1017732
0.8839079
2.45S4228
3.1918187
3.3053045
3.3077364
3.3077433
CHAPTER 7
356
MINIMIZATION O F S UM O F S QUARES FUNCTIONS
seven significant figures . T he sum of squares for this example has the same
shape as given in Fig. 1.7.
In order to design a n e xperiment to obtain the greatest accuracy (minimum v ariance o f the parameters if there is n o bias), the sensitivity
coefficients should b e p lotted a nd examined before the experiment is
p erformed. As a result, one can more intelligently design the experim~nt in
terms o f p lacement o f sensors a nd d uration o f t he experiment. I t is
suggested in C hapter 8 t hat a reasonable optimal experiment criterion is t o
maximize ~ = IX TX I for independent errors a nd subject to constraints o f a
m aximum duration of the experiment a nd m aximum range o f the dependent variable.
Figure 7.6 depicts the dimensionless temperature a nd dimensionless
sensitivity coefficient for the example of this section. N ote t hat the sensitivity coefficient starts at zero a t z = 0, increases in magnitude until Mz = I,
a nd gradually decreases in magnitude. T he ~ c riterion for the single
parameter M is maximized by selecting the maximum magnitude values o f
t he sensitivity coefficient. I f a single measurement is to be utilized in
estimation, it should be chosen corresponding to a bout Mz = I which
corresponds to the dimensionless temperature ratio of 0.368. Owing t o the
flatness of the sensitivity curve shown in Fig. 7.6, little decrease in
accuracy would result if the dimensionless ratio were chosen to be as large
as 0.5 o r as small as 0.25. A Y, value corresponding to the dimensionless T
r atio o f 0 .5 was chosen in the above example for obtaining the initial
estimate of M = 3 .28. F or the more common case of many observations,
see C hapter 8.
7.5 G AUSS M E11IOD INVOLVING DIFFERENTIAL EQUATIONS
7.5.2
357
Example o f Estimation of P arameters In Cooling Billet Problem
A si~ilar. p roblem to the preceding o ne in terms o f the differential
equation IS t hat. o~ c ooling a billet ( or · a ny object) that has a negligible
temp~ra!ure vana!lOn t hrough it. T he t emperature o f the billet changes
afte~ It IS. p laced 1 0 a fluid a t a d ifferent temperature. A n analysis o f a
c oohng billet was also given in Section 6.2.
~et T be the temper~ture o~ t he billet a t a ny time t a nd let Too b e the
fl~ld tem~erature. A differential equation describing the temperature in
thiS c ase IS
dT
p cV .ahA(T  T)
dl
00
(7.5.9)
where p is density o f t he billet, c is specific heat, V is volume, h is h eat
transfer coefficient, a nd A is billet heated area. Various terms could be
parameters, b ut t he most c ommon o ne would be t he h eat t ransfer
coefficient. F or convenience, however, the factor hA / pc V is c onsidered as
the parameter; several parameters may be f ormed from it also.
. Three cases a re investigated below. F or e ach o ne the initial temperature
IS To o r T (O)= To. Also, Too is considered to b e a c onstant. In the first
case, let h A/pcV be t he c onstant p arameter p. T he s olution for T is then
(7.5.10)
: "nother c ase is for hA / pc V b eing a function o f time. A possible function
IS
T  T..
 ,::r " e
Hz
(7.5.11)
o ..
H
~
0.5
a(T  T..)
1 01..
aH
" Hze
Hz
a nd t he solution o f t he differential equation is
(
(7.5.12)
2
3
4
A third possible model for h A/pcV is
Hz
FIIIUf"e 7.6 Dimensionless lemperalure a nd sensitivity coefficient for example o f Section
7.S.1.
~V  PI + Pz ( T Too)"
for
n~O
(7.5.13)
.... .
00
CO
351
CHAPTER 7 MINIMIZAnON OF SUM o r SQUARES FUNCTIONS
where n could also be a parameter. The solution in this case is
 II"
T (/) Too
T oToo
=
e  fJ ,,[
1 + ! 2(To Too ) "(1 e fll "')]
PI
for
(7.5.14)
n~O
In each of the models given above. (7.5.10.12.14). T is nonlinear in
terms of the parameters. Only for the last model of h. (7.5.13). was the
differential equation nonlinear.
I f the factor hA / p cV (or more specifically h ) varies during an experiment. (7.5.12) and (7.5.14) provide a number of competing models. For
example. 112 a nd/or PJ might be set equal to zero in (7.5.12). In (7.5.14). P2
might be zero or n might be unity. etc.
The sensitivity coefficients for the second model. (7.5.12). are found to
be
('i) exp [ (P t+P2
T
a T(/)
X /(/)=ap;=(ToT oo )
I
12 /2+P J I J / 3)]
(7.5.15)
where i I for
PI' etc. For the third model. (7.5.14). the sensitivities are
1.5 GAUSS ME'IlIOD INVOLVING D IFRREN'IlAL EQUAnONS
F or estimating h the above models are superior to the power series
model for T given in Section 6.2 because the above models utilize basic
physical laws while the power series for T does not. Whenever possible,
models based o n the physical mechanisms (called mechanistic models by
O . E. P. Box) should be employed.
A further choice based on physical arguments can be made between the
power series in I given by (7.5.11) a nd' the temperaturedependent model
for h A/pcV given by (7.5.13). In many situations the heat transfer
coefficient does change with time because some relaled quantity is changing with time rather than the passage o f time per se. In the present model,
h might change because a billet's temperature changes with time. Physically, the heat transfer coefficient h might account for heat transfer by
both natural convection and radiation which could both cause h to be a
function of T  Too' This suggests that the model (7.5.13) would be superior
to (7.5.11).
T o illustrate parameter estimation involving the above models, consider
again the measurements given in Table 6.2. Results o f calculations are
summarized in Table 7.4. Ordinary least squares was used with the 7J
values being the T values o f (7.5.12) o r (7.5.14) a nd with T o· 279.59°F, the
T at 1 0, a nd Too 81.S°F. Models 1 ,2, a nd 3 are for h A/pcV given by
(7.S.II) with Model I being fll' Model 2 being fll + fll/, a nd so on. Model 4
is for h A/pcV given by (7.5.13) with n~ I .
Table 7.4 Estimation of P anmeter In Models for Cooling Billet Data"
Model
$No. of
Parameten
)] }
(7.5.16)
No.
(7.5.17)
 ( I  e  fJ,'"
1
2
3
4
Parameten
bl
1
2
3
2
2.70882
2.90679
2.90824
2.13656
. b2
b)
1.39433
1.41968
0.011344
0.0041327
R
( R/(np)JI / 2
38.73731
0.7896890
0.7892203
1.162777
1.6070
0.23750
0.24639
0.28819
·Units consistent with time in houn.
and where C is the expression in the brackets of (7.5.14).
The h values in units of Btu/hrftlOF can be found using the appropriate model (7.5.11) o r (7.5.13)] and multiplying by the p cV/ A value
of 0.83432. (This means that hi o f Fig. 7.7 is 0.83432 times hi of Table 7.4).
Resulting curves for Models 1 ,2. a nd 3 a re shown in 7.7. Also depicted are
the Fig. 6.2 results for the temperature power series ·analysis.
Notice that Models 2 and 3 results are almost identical so that Model 3
is not needed. The results of Model 2 a nd the power series model are very
CHAPTER 7 M INIMIZATION O F S UM O F S QUARES F UNcnONS
2 .5 r     r ...... ,,rr:,:,,,
models
2.4
( t i s hours)
••
~
 .,
2.3
~.
2.2
...,
N
..
.....
,
2.1
L.
~
.....
:>
...
2.0
.~
1.9
1.8
0
Or~~'rrr~~
0
U
;::
....
...
 0.2
 0.4
u
(b;
=
.83432 bi )
~.~
~
;;
:;
.
~
I II
c:::
en
'."
Tfme s tep fndex.
2
Tfme step index. 1
Figure 7.7
...
..c..:::;
...
361
0
From polynomial model f or T.
See Example 6 .2.4.
CD
~
_L~~~mode~ _ _
7.5 GAUSS M ETIIOD INVOLVING D IFFERENTIAL E QUATIONS
Figure 7.1 Sensitivity coefficients for Model 4 of Table 7.4 for n 1.
Parameter estimates for Models 1 .2. a nd 3 of Table 7.4 ( itJAt).
similar. The constant h given by Model I does not apppea~ to ~e adequate
from inspection of Fig. 7.7 because the other results are qUIte dIfferent a nd
appear to be consistent with each other. Anoth~r a~gument that suggests
that Model I is not adequate is the large reductIOn t n the sum of s~uares
functions (38.7 to 0.789). This sum R also suggests that Model 3 IS not
needed because the decrease is very slight between Models 2 and 3; further
note that the estimated standard error· of th~ temperatures actually
increases for Model 3. For further related discussIOn, see Example 7.7.4.
Model 4 results given in Table 7.4 show a slightly increased value of R
compared with Model 2 which also has two parameter~. Even though R for
Model 4 is larger one mighl prefer Model 4 because It represents a more
reasonable physical model as mentioned above.
.
An attempt to obtain simultaneous estimates of fit, f32' and n t n (7.5.14)
with n = I initially was unsuccessful. A further calculation was perfor~ed
for Model 4 to estimate just n with the converged values o.f f3 t and .f32 gIven
in Table 7.4. The value obtained was \.000006. Since t~ls value IS ~~a~ly
unity, the value previously used, linear. depe~dence t n the sensItIvIty
coefficients is suggested. To investigate thIs prevIOusly u~sus~ected dep~n
dence, the sensitivity coefficients were plotted as shown t n FIg. 7.8. NotIce
0ln Table 7.4 the value of n _ 16 was used rather than 17. the number of the observations.
because the first value was used to determine To·
that the sensitivity coefficients for n and 112 a re very nearly proportional,
which tends to make XTX singular a nd thus the parameters very difficult
to estimate simultaneously.
Let us now compare from another point of view these results based on
the solution of the describing differential equation with these from the
power series method of Section 6.2. The results of this section required the
evaluation of two nonlinear parameters whereas the Section 6.2 method
requires the estimation o f five linear parameters. This is an illustration of
the principle of parsimony which states that we employ the smallest possible
n umber of parameters for adequate representation [22]. Also, note that the
estimated standard deviation of the measured temperatures was O.243°F
for the T series, 0.238°F for Model 2, a nd O.288°F for Model 4; the small
differences between these values indicate that the mechanistic models are
good in this case.
After obtaining parameters using OLS o r ML, say, it is advisable to
examine the residuals to see what assumptions seem to be valid regarding
the measurement errors. The residuals for Model 2 are very similar to
those given in the upper part of Fig. 6.1. Visual inspection of these
residuals does not lead to a contradiction of the assumptions of additive,
zero mean, constant variance, independent, and normal errors. Hence
accurate parameter estimates would be expected with the least squares
method used for this problem.
T
362
CHAPTER 7 MlNlMIZAnON O f SUM O f SQUARES f UNCIlONS
7.6 MODIFICATIONS OF GAUSS METItOD
T he G auss method has the feature of giving both the direction and t~e
m agnitude of the change in the estimate o f the parameters. of each. ste~ t n
the iteration procedure. Small changes in the parameters 1 0 the direction
indicated by the Gauss method decrease the sum of squ~res. Occasionally,
however, the size of the change indicated by the method IS so large that the
successive estimates oscillate and, even worse, the procedure may be
unstable. This can result from nearlinear dependence of the sensitivit!
coefficients a nd/or very poor initial parameter estimates. When the sensItivity coefficients are nearly dependent (which .mi~t b e t~~ed "overparameterization"), one should co~sider altematl~es 1 0 a ddition to other
minimization procedures. O ne obvIous procedure IS to decrease the number of parameters being estimated. Another is to redesign the experiment
so that the correlation between parameters is reduced. See C hapter 8 for
optimal experiment d~sign.
.
A great many algorithms have been proposed to Improve the convergence of the Gauss method. Some of these may be termed modifications to
the Gauss method whereas other methods some would call distinctly
different methods' in the latter category are the Levenberg (3) a nd
M arquardt (4) methods. We choose to treat these methods as modifications
of the Gauss method, however.
.
.
This section considers just a few of the possible methods. In thiS as 1 0
o ther iterative problems there appears to be no end to the possibi~ities; the
ingenuity of various researchers evidenced by the numerous algonthms for
this problem is impressive. For a survey, see Bard 11,5).
7.6 MODlflCAnONS o r GAUSS M£11IOD
c ation to the BoxKanemasu method that has been used b y Bard (5) a nd
others is included.
Since the linear approximation is valid over some region, a sufficiently
small correction in the direction given by the Gauss method should
improve the estimate (i.e., reduce S). Many methods have been proposed
which use the direction provided by the Gauss method b ut modify the step
size. We generalize (7.4.6) to
(7.6.1)
(7.6.2)
where h (k+ I ) is a scalar interpolation factor. Note that this factor may be
iterationdependent. I f h (A:+I) is set equal to I, we have the Gauss method.
In one class of methods, a search on h is performed to precisely
determine the minimum S along the Gauss direction (5).
Interpolation methods attempt to find good, acceptable values o f h (A:+ I )
without bothering to locate precisely the value associated with the minimum S value. O f the many methods possible, one of these is the halving
a nd doubling method (68). T he modification that we describe utilizes an
equation given by Box and Kanemasu [9]. T he modification is more
general, however, since ( a) the sum of squares function given by (1.3.1) is
used rather the OLS S function a nd ( b) a check for decreasing S is
included.
In the BoxKanemasu method, S is approximated a t each iteration by
(7.6.3)
7.6.1
BoxKanemasu Interpolation Method
Since the Gauss method depends on a linear approximation to " , in some
nonlinear estimation cases the corrections can oscillate with increasing
amplitudes a nd thus lead to nonconvergence. In this section we give the
BoxKanemasu modification of the Gauss method which may converge
when the Gauss method does not. The BoxKanemasu method does not,
however, include a check that the sum of squares function S decreases
from iteration to iteration. Bard (5) has made the point that all acceptable
methods should ensure that S does monotonically decrease to a minimum.
This is a reasonable requirement b ut in some cases it may lead to more
calculations than without it; this is illustrated by Table 7.6 which is
discussed later. On the other hand, this requirement might improve convergence in other cases. In order to ensure S continually decreases, a modifi
where aOt a i' a nd a z a re constants characteristic of each iteration. The
value is taken where S given by (7.6.3) is minimized.
A second approximation in this method is t hat IJ is given by
h (A:+I)
(7.6.4)
A minimum of three conditions are needed to find the parameters a Ot a i'
a nd az. O ne condition is to use the S value a t h  0, t hat is, a t IJ = b(k); this
S value is designated Sdk). A second S value, denoted S:k), is found a t
h  a. Initially a is set equal to I .
The third condition for finding the a /s uses (7.6.4) to find the derivative
of S a t h  0 a nd in the 4,b(k) direction. This derivative is
(7.6.5)
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCIlONS
7.6 MODIFICATIONS OF GAUSS METHOD
The matrix derivative Vp S is found from (7.4.1) to be
s
(7.6.6a)
s
o
( k)
and the derivative of f j with respect to h is found from (7 .6.4) to be
d fj/ dh = !:J.gb(lc)
(7.6.6b)
Then using (7 .6.6a. b) in (7 .6.5) yields
( dS/dh)lh_O=  2G(Ic)
G (It)
(7.6.7)
== [!:J.gb(k) JT[ XT(lc)We CIc ) + U ("  b(lc») ]
(7.6.8a)
(7.6.8b)
Note that G(Ic) is a scalar so that it is also equal to its transpose. From the
definition of G it can be proved that G ;> O.
Using the three conditions for S yields
The minimum S is located where the derivative of S [given by (7.6.3») with
respect to h is equal to zero; it occurs at the h value of  a l /2a 2 o r
(7.6.10)
This h value is used in (7.6.1) to find the ( k+ I)st iterate for the b vector.
The equation given by Box and Kanemasu (9) is obtained from (7 .6.10) by
setting a = I . An equation similar to (7.6.10) is given by Hartley [8) a nd is
attributed to Dr. K. Ruedenburg.
There are some restrictions on the use of (7.6. 10). These relate to the
possible values of a2. See Fig. 7.9. Three different cases are for Q2=0,
a2 < 0, and a2 > 0 which are discussed individually below.
In each case a condition suggested by Bard (5) is that
(7.6.11)
The parameter a is made sufficiently small for this condition to be
a nd the
satisfied. I f this inequality is not true for a = I. a is made
inequality is checked again. Should the inequality require the investigation
!
o
2
3
h
F Ipre 7.9 Sum of squares venus the h parameter for the BoxKanemasu method using
(7.6.3) for approximating S.
o f a values less than 0.01, say, the calculations are terminated. I t may be
that the problem has been incorrectly programmed; for example, the
sensitivity coefficients may be incorrect. I t is also possible that the sensitivity coefficients are nearly linearly dependent. In the BoxKanemasu
method, the inequality given by (7.6.11) is not considered and a is always
one.
The first case that we consider is for a2 =0. I f this occurs the S
expression as given by (7.6.3) is a straight line which has no minimum at
any finite value of h; see Fig. 7.9. Hence for this case we set h (Ic+I)=Aa
where A is some constant equal to or slightly larger than unity; one
possible value is 1.1.
T he second case if for a2 < 0 which would cause h given by (7.6.10) to be
negative. Again we set h(lc+ 1 )= A a.
The third and most interesting case is for a2 > 0 as shown in Fig. 7.9 . For
this case h is calculated using (7.6.10) provided h(1c+ I)..;; A a; if inequality is
not satisfied, again we set h (Ic+I)=Aa.
All three cases can be included by requiring that the inequality,
S~Ic) ;>
SJIc)  (2  A  l)aG(Ic)
(7.6.12)
be satisfied in order to use (7.6.10). I f it is not satisfied, we suggest that
h(lc+l) be set equal to A a a nd the calculation proceeds to the next iteration.
A computer program flow chart incorporating the above constraints is
CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNCflONS
U
M ODlnCATIONS OF GAUSS M£J1IOD
F or t he B oxKanemasu m ethod. t he s ection o f t he n ow c hart in Fig.
1 .10 t hat is enclosed in a b ox w ith d ashed lines would be b ypassed a nd a
w ould b e a lways unity.
C alculate S ( k) a t !1.(k)
C alculate 4 b(k) and G(k)
E xample 7.6.1
For the model TJ  Iltl + exp(  Ill/) and the observations of 2 at 1  I and 3 at ,  2,
use the BoxKanemasu method for one step starting at Il,  I and I ll 2. Use OLS.
S et a • 1 .0 and A • 1 .1
S olution
C alculate Salk) a t
b (k) +

G IS NEGATIVE
4 b (k)
Q
The S function for this
is
~ase
is shown in Fig. 7.2. F or OLS the matrix correction
~.b(·)
9'"
For tl.e first iteration, k  0 and XT is
"0 
XT(OI_[  exp/biOI)
Also
I
(XT(O)X(~  I
and XT('»(Y 
 2exP:2blOI)
,,(~
J[ 0.~353
are
I
I
I
I
No
I
I
r~, I
I
Yes
I
Yes
I
___  ____ I
:
,
P rint:
P rint:
AlPHA LESS THAN .01
a , S ( k),
s
( k), , (k)
rJ,
I
:
I
,
,
,
Tenllinate
C .lcul.tlon
XT(DI(y  ,,(01)  [ I
 0.1353
0.2086 ]
5
2
0.03663
J[ 0.9817 J  [ 0.1529 J
0.8647
2.828
which give (~.b(O)JT  (0.4323  3.195). Hence the Gauss parameter estimates are
1.4323 and  1.195 which results in an S value of 123.42. Such a large relative
change In bl (  3.195 compared with 2.0) suggests that "overshooting" and nonconvergence might occur using the Gauss method. The method does converge for
this example, however, as shown below.
For the BoltKanemasu method we check the inequality (7.6.12) for a  I. rr the
number of observations n equals the number of parameters p , it can be shown that
G (') sdk). In this example n p2. Then using A  1.1, (7.6.12) gives
S/0)123.42;> s dO)(2 A I)G(DI
,        FJpre 1.10
(X T(OIX(DI)  , _ ___ _ [ 0.0\ 966
1
0.05477 0.2086
Proceed t o n ext
i teration on k .
Flow c hart of a procedure using the BOllKanemasu equation.
 1.7113(21.1 1)1.7113 0 .1556
so that (7 .6.\0) can be used to get the small value of
h (l) G(O)[ SIO) S~O)+2G(O)r' =0.0137
given b y Fig. 7.10. I n a ddition to the two inequalities given by (7.6.11) a nd
(7.6.12), there is a check o n t he sign of G. F rom t he definition o f G, it must
be positive; thus if it is negative s omething is incorrect. I t is i mportant in
such a program to calculate SJ t ) a nd s~t) c orrectly; the s ame w eighting W
m ust be used a s is used t o e valuate 6 ,b(t) a nd t he term involving U in
(1.3.1) m ust be included if it is a lso implied in the 6 ,b(t) c alculation.
With this value we use (7.6.1) to get
b(I)b(O)+h(I)A b(O)_[ I J +00137[ 0.4323
2
.
 3.195
•
J[
UIOS9]
1.9563
which has an associated S value of 1.6645, a value lower than S(O).
CHAPTER 7 MINIMIZA n ON O F SUM O F SQUARES FUNCIlONS
F or the modified BoxKanemasu method we use (7.6.11) to check if sf 0)=
123.42 is less than SJO)= 1.7113 . Since it is not, a is m ade equal to t a nd sl~ is
calculated at blO) + t .:1,rb(O) to get S 11~2 "" 0 .0279 ; now the inequality given by (7 .6. 11)
is satisfied. Hence set a = t. The right side of (7.6.12), which is
Sdo,  (2  A 
l)aG(O)=
1.7113  (2  1.1  1)(
t
(7.6.13)
h (l)=Aa= I iI = 0.55
which gives S(l)=0.00884. I f we seek the location of the minimum S for this
iteration, we get h = 0.5336 a nd S = 0.000862. I t is partially fortuitous that the
modified BoxKanemasu method happened to yield an h value so near this latter
value. As a result, the modified B oxKanemasu method a t the e nd o f the first
iteration is much nearer the minimum than the other two methods.
A summary of the h and S values for the three methods for several iterations is
given below. The modified Box Kanemasu method converging most rapidly of the
three methods was a direct result of the excellent choice of h ... 0 .55 in the first
iteration . As is shown in Section 7.6.4.3, the same method is n ot most efficient for
all cases.
Iteration
Number
0
I
2
3
4
6
8
10
15
7.6.2
h
S
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.7113
1.234 x 102
4.559
5.520 x 10  1
6.19X 1 0 2
5.5 X 1 0 4
3. lxlO 6
1.37 x 1 0 1
1.4 x 1 0 14
BoxKanemasu
Modified
BoxKanemasu
h
h
S
0 .550
0.924
0 .955
0.951
0.944
0.942
0 .941
0.941
1.7113
8.7xlO J
8 .8x 10  4
5.2xlO  s
3 x 1 0 6
1.74 x 1 0 1
I x 1 0 10
6 x IO IJ
2 xlO 18
S
1.7113
0 .0137 1.664
0 .0208 1.595
0 .036
1.48
0 .077
1.25
1.48 x 1 0 2
0 .638
0 .372 2.8x 1 0 4
0 .444 6 .6x 10 5
3.9xlO 1
0 .933
Levenberg's function by using (7.3.1) with po being b (k), fJ b eing b (k+ I ), a nd
U b eing replaced by Ml where 0 is a d iagonal matrix. U sing these
definitions in (7.4.6) gives
)(1.7113) = 0.779
is found to be greater than S m so that (7.6.12) is n ot satisfied. Consequently h(l) is
calculated using
Gauss
7.6 MODIFICATIONS OF GAUSS ME11IOD
Levenberg Damped Least Squares Method
Levenberg (3) tried to overcome the instability o f " overshooting" in the
Gauss method by introducing constraints into the minimization of S . T he
function that Levenberg considered was the O LS s um of square function
plus a n a ddition term. By using a WLS function we can generalize
T he effects of the 0 matrix are to reduce the size a nd t o change the
direction of the step. Provided there is a unique minimum which is the
only stationary p oint a nd the iteration procedure given by (7.6.13) c onvergences, the estimates found would be those sought. T he presence o f the
~o term tends to reduce oscillations o r instabilities particularly a s t he
diagonal components o f ~o are made relatively large c ompared t o the
diagonal terms in XTWX.
Box a nd K anemasu (9) in describing the changes in the estimates o f the
parameters in progressing to the minimum o f S s tate that the term
~Ib(k)  b(k+ I )fo(klb(k)  b(k + I») i ntroduces a spherical constraint that
causes a spiral path.
Levenberg proved that S d ecreases in the initial iterations if A is first
large a nd t hen allowed to decrease (provided S does not have a stationary
point a t b(k~. O ne r ecommendation o f Levenberg was to make
0 =1
Incidently if 0 =1 a nd
~
(7.6.14)
is very large, (7.6.13) can be written as
(7.6.15)
which is c alled the m ethod o f steepest descent. T his method gives a direction
for the step b ut n ot a s tep size. Since the step size is a rbitrary, this method
can be very inefficient particularly as the minimum is a pproached. Hence
the method o f steepest descent is n ot recommended. Note, however, that it
does n ot require the inverse of a matrix as d o the Gauss a nd Levenberg
methods.
Concomitant with 0 = I, L evenberg suggested the two possibilities of ( I)
letting ~ b e a c onstant value a nd (2) varying A a s the minimum is
a pproached. O ne possibility is t hat the k th value o f A is
(7.6.16)
We shall call this the unsealed Levenberg procedure. Note that A( k) given
by (7.6.16) goes t o zero as the minimum S is a pproached because each
c omponent o f XTWe goes t o z ero a t t he minimum o f S . (See point ( f) in
I
31t
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNrnONS
Section 7.4.3.) As the minimum is a pproached S also decreases but the
numerator decreases more rapidly.
Box and Kanemasu (9) have made the point that the use of 0 = I in
'. (7.6.1 S) results in the m ethod not b eing invariant u nder l inear t ransformations of the parameters unlike the Gauss method which is invariant.
Another recommendation of Levenberg was to set 0 equal to the
diagonal terms o f XTWX o r
0 ", = diag[ CII Cl l
. ..
C",]
(1.6.17)
whe~e Cjj. is given by ~7.4.13). T his choice for 0 has the effect of making
the Ite~atlOn. problem Invariant u nder scale changes in the parameters 19).
F or thIs chOIce o f 0 , the following expression f or" has been suggested by
Davies a nd Whitting (10),
(7.6.18)
which also goes to zero as the minimum is a pproached. Using this
expression a nd (7.6.17) gives what we call the scaled Levenberg method.
When 0 is set equal to I a nd w hen" is given (10) b y
,,(A:)....
3eT(A:WX(.\:)n XT(A:We(A:)
'"
eT(A:)WX(A:)XT(A:)We(A:)
(7.6.19)
we term the associated procedure the modified Levenberg method.
Though the Levenberg method a nd its modifications can remove instability a nd reduce oscillations, it also can increase considerably the number
o f iterations in a given case.
7.6.3
MarquardC's Method
A method similar to l evenberg's is the wellknown method due t o
M arquardt {4J. This method uses (7.6.13) with 0 given by 0 which is
defined b y (7.6.17), b ut Marquardt uses a different choice for t than d oes
Levenberg. Again if AO", is large compared to X TWX, the parameter
correction is in the same direction as given by steepest descent which does
not require that IXTWXI ~O. I t is for this reason that the Levenberg a nd
M arquardt methods are helpful when XTWX is poorly conditioned a t the
starting parameter vector but is b etter conditioned in the neighborhood of
the least squares solution (Gallant ( II ]). Both methods also provide a
compromise between the steepest descent a nd G auss methods with the
initial iterations close to the steepest descent method a nd the final iterations close to the Gauss method.
7.6 M ODInCAnONS OF GAUSS ME11IOD
371
M arquardt p roposed what Box a nd K anemasu call the (A,,,) algorithm in
which A(A:) is c alculated from
(7.6.20)
T he initial value o f AO, c orresponding to the first iteration, is then
where .. is s ome constant greater than unity. This method
supposedly possesses the virtues o f the steepest d escent a nd G auss
methods where each is most effective. Marquardt's recommendations have
been followed by many a nd h ave been incorporated in numerous c omputer programs.
T hough the M arquardt m ethod has been widely used since the p ublication o f M arquardt's paper in 1963, there are still some unresolved questions regarding the effectiveness o f this method compared to others. This is
discussed further in the next section. Moreover, Box a nd K anemasu [9]
h ave presented a n analysis t hat indicates there is n o n eed t o c ompromise
the direction o f a s tep to be between those given b y steepest d escent a nd
G auss methods. They demonstrate this b y showing t hat the steepest
descent a nd G auss vectors have the same direction if the parameters a re
t ransformed into a linearly i nvariant metric. They also showed that the
constrained minimization in this metric is merely equivalent to using a
modification employing (7.6.1,2). Nevertheless, Gallant's observation given
above regarding the value o f the M arquardt method when the initial
X TWX matrix is poorly conditioned is valid.
"'1AoO",/"
7.6.4 C ompuhon 01 Methods
I n a ddition to the modifications o f t he Gauss method given above,
doubtless m any m ore will be suggested in the future. Moreover, many
quite different approaches are presently available. These include derivativefree methods (5], quasilinearization (12], stochastic approximation
(13], a nd i nvariant embedding (13). F or o ur purposes, however, some
interpolation scheme such a s t hat o f Section 7.6.1 c ombined with a
sequential procedure (such a s given in Section 7.8) is usually adequate,
particularly if the experiment is carefully designed a s discussed in Chapter
8.
T he p urpose o f this section is t o provide comparisons o f several
m ethods. Assuming t hat all the methods converge t o the same parameter
vector, o ne o f the most important considerations in selection o f a m ethod
is the relative computer time needed. In m any parameter estimation
problems, the model is a set o f o rdinary o r p artial differential equations
which may require timeconsuming finitedifference methods for solution.
I n such cases the time used in repeatedly solving the model is much greater
372
CHAPTER 7
MINIMIZATION O F SUM O F SQUARES FUNCTIONS
I han t hat used in p erforming the other requisite c alculations in p arameter
e stimation. Hence the n umber o f e valuations of the model gives a n
a pproximate relative measure o f the c omputer time required for a p articular m ethod .
A nother c riterion for comparison of methods is the power to solve
difficult cases. Unfortunately many of the more powerful procedures c an
t ake considerably more computer time t han a relatively simple m ethod
s uch a s t he G auss m ethod . T he relative a mount o f time depends o n t he
p roblem . I ndeed t here are cases for which the G auss m ethod d oes n ot
c onverge. U sually a t the expense of more c omputer time a nd g reater
p rogramming complexity. c omputer m ethods c an b e evolved to treat more
difficult cases. Though there is a n a dvantage in having more powerful
c omputer p rograms available, such programs c an n ever s upplant c areful
design of experiments. Thus o ur t endency is to r ecommend m ore efficient
though less powerful methods n ot o nly to save c omputer time b ut a lso t o
e ncourage careful experiment design . In a case t hat is p oorly designed n ot
o nly is the minimum S difficult to locate b ut the associated p arameter
v alues are probably very sensitive to the measurement errors. G ood d esigns yield more accurate p arameter e stimates a nd r equire less c omputer
t ime.
O ne w ay to compare methods is to apply them to a large variety o f
p roblems a nd to also vary the initial p arameter e stimates. S ome a uthors
h ave d one this. Usually the cases are not r andomly selected. R ather t here
is a tendency to use as test cases those that have S f unctions which a re
d ifficult to minimize. O ne such case is given next.
7.6.4.1
373
7.6 MODIFICA1 10NS O F GAUSS ME11IOD
w here XI is used r ather t han X I f or its g reater c onvenience. F rom (7.6.21)
we c an o bserve t hat 7J is linear in P2 b ut n onlinear i n PI' A lso the
sensitivity coefficients a s e xpressed b y (7.6.22) c an b e p lotted o r t abulated
v ersus PI~I for ~2 e qual t o I a nd 2; see T able 7.5. T he i nitial PI a nd P
2
v alues c hosen were 300 a nd 6, respectively, a nd t he c onverged v alues a re
716.955 a nd 0.944469. T hen i n the vicinity o f t he m inimum, t he m aximum
v alue o f PI~I w ould b e a bout 1500. N otice t hat i n T able 7.5 the X I a nd X 2
sensitivities for PI~I less t han 1000 a re a pproximately p roportional a t b oth
~2 . .. I a nd 2. T his m eans t hat t he m inimum o f S is p robably i lldefined.
T his is d emonstrated b y t he long n arrow valley shown i n Fig. 7 .lIa; s uch
c ases might pose difficulty in convergence for the G auss m ethod.
T able 7 .5
T able o f S ensitivity Coeffk:lents for
11 M odel Given by (7.6.11)
€2"'" I
(1==2
XI
PI~I
0
2 50
500
1000
2000
3000
100,000
00
0
0
I
Xl
0
.0238
.0453
.0826
.1389
.1775
.0827
.0476
.0909
.1667
.2857
.3750
.9524
.0453
.0826
.1389
.2041
.2344
.0453
0
XI
Xl
0
0
.0244
.0476
.0909
.1667
.2308
.9091
I
BoxKanemasu Example
T he m odel a nd d ata investigated by Box a nd K anemasu (9) a re
7) =
/1 1P2~1 ( I + /11~1 + 5000~2 )  1
(7.6.21)
A nother i ndication t hat t he m lmmum S is poorly defined c an b e
o btained b y e xamining the (XTX)  I m atrix for the final b's; it is
6
~I
~2
Y
2
2
0 . 1165
I
2
0 .2114
0.0684
 2.57548 X 10 ]
2.79758 X 103
0.1159
2
T he i ndependent variables are ~I a nd f 2 • T he e rrors in Y are a ssumed to be
additive. to have zero mean, to have c onstant v ariance. a nd to be i ndependent a nd n ormal. o r in s ymbols 1111111. O rdinary l east squares a nd
m aximum likelihood give the same p arameter e stimates in this case.
T he sensitivity coefficients c an b e given by
~.
/11
XI = I i;
a7)
ap , =
7) (
7 ))
/12 1  P2;
a7)
X 2= aP
=
2
7)
/12
(1.6.22)
T he c orrelation coefficient o f t he estimators is a pproximated b y
C 12 /( C II C22)1/2 =  0.997909. Since t he absolute value o f this n umber is
very n ear unity, the two p arameters a re s hown t o b e highly correlated.
Figure 7.11 b shows a c omparison o f results given b y Box a nd K anemasu
(9) f or the M arquardt m ethod d escribed in S ection 7.6.3, the modified
G auss m ethod i nvolving h alving a nd d oubling m entioned i n S ection 7.6.1
a nd t he B oxKanemasu m ethod o f S ection 7 .6.1. T he n umber o f times, n f'
t hat t he function 11 h ad t o b e e valuated is p lotted v ersus ~ t he initial value
31~
CHAPTER 1 MINlMlZAnON OF SUM OF' SQUARES F UNct10NS
1.6 MODIFICATIONS OF GAUSS ME11IOD
315
.'
6
5
....
'"
4
)
2
Contour for S).82750
K
1 0)
___ L _
_________________.
Halving and Doubling Method
/ BoXK.nemasu Method
( II'
F 'lpre 1 ."1 Sum of squares contours for the e umple of Section 7.6.4.1. (Reprinted by
permission of Prof. G. E. P. BOK .)
o f). c hosen in the M arquardt m ethod . ( Note t hat t he n umber o f i terations
is n ot e quivalent to the n umber o f f unction evaluations.)
T he following conclusions c an b e d rawn from Fig. 7. 11 b :
FIpre 1 .ttll Comparison of the Marquardt, halvin. and doublinlo and BollKanamuu
modifications of the Gauss method. (Reprinted by permission of Prof. G. E. P. Boll.)
was not superior t o t he G auss m ethod with the modifications using
(7.6.1,2), with h r eplaced b y 1 /( I + ).), ). b eing the value t hat M arquardt
w ould recommend.
7.6.4.1.
I.
2.
J.
T he B oxKanemasu m ethod is superior 10 a ny o f Ihose investigated.
T he next b est method is usually the halvingdoubling method.
T he M arquardt m ethod is usually the slowest o f those considered.
These conclusions should n ot be overgeneralized because only o ne p articular e xample was considered. F rom t heoretical considerations a nd from this
e xample. however. Box a nd K anemasu f ound that M arquardt's m ethod
Ao
& rd COmptlmo.u
Y . B ard (I) gave a n excellent survey o f 13 b est known gradient methods
including several modifications o f t he G auss m ethod. Bard found t hat
several modifications o f t he G auss m ethod were b etter t han the M arquardt
m ethod. In his book, Bard (5, p. I IIJ a ppears t o f avor a m odirication o f the
B oxKanemasu m ethod which he calls the interpolationextrapolation
m ethod .
376
7.6.4.1.
CHAPTER 7
M INIMIZATION O F S UM O F S QUARES F UNcnONS
Davies and Whitting Comparison
I n the p aper b y Davies a nd W hitling (10) a c omparison is given o f five
m ethods for seven p roblems c onsidered b y J ones (14). T he m ethods include the Levenberg p rocedures u sing ,\ g iven b y (7.6.16) a nd (7.6.18).
T here was negligible difference in the r ate o f c onvergence between the two
m ethods . T hey also used t he m odified L evenberg m ethod w hich used
(7.6.19) for 0 = I a nd a nother e quation f or A w hen 0 = Om; a gain negligible
difference were observed in t he r ate o f c onvergence. A nother m ethod
( SPIRAL) is d ue t o J ones (14). w ho e stimates t he L evenberg p arameter A
i n a nother m anner. T he first six c olumns o f T able 7.6 c ontain t he results
given b y D avies a nd W hitling (10). W e h ave a dded t he last two c olumns
w hich were o htained u sing t he Box Kanemasu m ethod ( a= I. always) a nd
t he m odified version which r educes a if t he c ondition given b y (7.6.11) is
n ot s atisfied. F or b oth m ethods s equential e stimation was used with t he
i nitial P matrix being t he l arge v alue o f 10'1 ( or e quivalently the small U o f
1 0'1); this reduces difficulties w hen XTX is singular.
F or all p roblems e xcept P roblem 5, t he r esults of T able 7.6 i ndicate t hat
t he G auss m ethod is g reatly s uperior t o t he M arquardt. S PIRAL, L evenberg. a nd M odified L evenberg m ethods. A w eakness o f t he G auss m ethod
is i ndicated b y its inability t o t reat P roblem 5; this was caused b y t he XTX
m atrix b eing initially singUlar. T he G auss m ethod w orks for P roblem 4
w hich h as t he s ame S f unction b ut d ifferent i nitial p arameter e stimates.
T he u nmodified a nd m odified Box Kanemasu m ethods ( with U = 1 0'1)
c ompare very well with t he G auss m ethod ( based o n t he n umber o f
i terations in T able 7 .6) for P roblems 3, 4, 6, a nd 7 a nd a re s uperior for
P roblem 5. O nly o ne i teration is n eeded f or P roblems 6 a nd 7 w hich has
t he l inear model o f 1}j = f3,Xi' + f32Xi2 + f3JX i3' H ence t hese m ethods h ave
t he v ery desirable c haracteristic o f n ot r equiring extensive iterations for
l inear p roblems; n otice. in c ontrast, t hat t he o ther t echniques necessitated
from 9 to 78 iterations.
F or P roblem 4 t he m ethods u sing the B oxKanemasu e quation a re
a bout e qual t o the G auss m ethod b ut m uch b etter for P roblem 5; all the
o ther m ethods a re m uch less effective for P roblems 4 a nd 5.
I t is s urprising t hat for P roblems I a nd 2 t he G auss m ethod c onverges in
two iterations whereas all the o thers t ake m any m ore. F or t hese two
p roblems t he S function is
.t...
o
:J
T he two p roblems h ave d ifferent initial p arameter e stimates, p roblem I
s tarting with f3, =  1.2 a nd f3 2= I . R egardless o f t he initial p arameter
~
e r::$~~~~~
u
~
8
. .l
:!
,....
a..
r rO>\Q\QO> . ...
 MM\O,.....

en
e.8.
.
u
J5 e
o ='
cl:z
377
371
.,:~
CHAPTER 1 M INIMIZAnON O F SUM O F SQUARES F UNCllONS
values, the Gauss method results in the correct PI a nd P2 values in exactly
two iterations. The first iteration results in the correct value of PI b ut
usually with S ( I) being much larger than S ( 0). In the second iteration in the
Gauss method S is reduced to zero. From this example (as in others) we
see that S ( I) being larger than S ( 0) does not necessarily mean that the
Gauss method will encounter difficulty in convergence. In fact, in this
problem the other methods were much less efficient.
In conclusion, although the Gauss method is very competitive in many
problems, it may not work because XTX is temporarily singular. F or t hat
reason we suggest that the U be made equal to C I where C is a very small
value compared to the diagonal terms of XTX (or that P (0)= K I be used
with large K in the sequential method as discussed in Section 6.7). I n
a ddition the BoxKanemasu method is recommended in o rder to be more
certain of finding the parameter values minimizing S, even though in some
cases the Gauss method would be more efficient. In cases when the
BoxKanemasu method does not converge, then the modification of it that
we have suggested is recommended provided one is convinced that a
unique minimum point exists.
7.7 MODEL BUILDING AND CONFIDENCE REGIONS
Most of the developments discussed in this section utilize an assumption of
a multivariate normal distribution of errors. In particular this assumption
is used in determining confidence intervals a nd regions for parameters,
locating confidence intervals for Y; a nd applying the F test for model
building. The assumptions used in this section include additive, zero mean,
normal errors in Y. The independent variables are assumed to be errorless.
The assumptions are designa ted 11 111.
Unlike linear estimation, the below expressions are all approximate, with
the approximation being better for cases which are less nonlinear than
others. For a measure of the nonlinearity, see Beale r15) a nd Guttman a nd
Meeter r16). The expressions may be approximations due to ( I) the
sensitivity coerficients being functions of e stimated parameters, (2) linear
approximations to derive equations such as (7.7.1), and (3) the usual
estimates for 0 2 .
7.7.1
Approximate Covariance Matrix of P arameten
T he approximate covariance matrix of the parameters has different forms
depending upon the method of estimation. In general, the expressions are
similar to comparable linear estimation cases.
1.1 MODEL BUILDING AND C ONnDENCE REGIONS
319
F or o rdinary least squares estimation with the assumptions denoted
1111, the approximate covariance matrix of bLS is analogous to (6.2.11),
(1.7.1)
where X is the sensitivity matrix which is a function o f bLS a nd where
", . . E ( /tit T) . (Recall that in OLS estimation using (7.4.6) we set W = I a nd
U ... O.) With the additional standard assumptions of uncorrelated a nd
c onstant variance measurement errors a nd a 2 being unknown (1111011),
the estimated covariance of bLS simplifies to
•
2
T
•
( VV) ( VV)
s::::::
n p
(7.7.2)
When maximum likelihood estimation is used a nd the measurement
errors are normal, the approximate covariance matrix of b ML is
(7.7.3)
The assumptions are 11111 (see Section 6.1.5). I f ' " is known to within
the multiplicative constant, 0 2, t hat is, "''''' a 2 0 (111011), then
cov(bMd::::::(XTOIX)  ls2
where
S2
(7.7.4)
c an be estimated using
2
( VV{OI(VV)
s::::::~
np
(7.7.5)
7.7.1 Approximate Correlation M atrix
The approximate OLS a nd M L correlation matrices can be obtained using
the above equations. T he i j element of the correlation matrix is given by
rq  P/I ( P/iPjj ) 1/2
(7.7.6)
where Pq is a term o f (7.7.1), (7.7.2), o r (7.7.3) depending on the case. The
diagonal elements of r are all unity a nd the offdiagonal element must be
in the interval r I, I). I f ' " is known only to within a multiplicative
constant 0 2, the 0 2 value need not be known for r since it cancels in (7.7.6).
Whenever all the offdiagonal elements exceed 0.9 in magnitude, the
estimates are highly correlated a nd t end to be inaccurate. Bacon (17) has
suggested that when this is true, a simpler model form than the one

j'
(
CHAPTER 7 MINIMIZATION OF SUM O F SQUARES F UNcnONS
7.7 MODEL BUILDING AND CONFIDENCE REGIONS
381
1 00(1 a) % c onfidence interval
originally proposed may be appropriate. A poor experimental design m ay
also be responsible for the high correlations. In that case it is r ecommended that the sensitivity coefficients be examined a nd t hat the experimental design strategies discussed in C hapter 8 be employed. Box (18,19]
has shown. however. that high correlations among the parameters c an be
d ue to a large extent to the nature o f the model itself a nd t hus n o
e xperimental design could be expected to yield uncorrelated p arameter
estimates. See Problem 7.18.
A rule of t humb for anticipating the inaccuracy in calculation is given
by G allant [ II]; he suggested that difficulty in computation m ay b e
e ncountered when the c ommon logarithm of the ratio o f the largest to
smallest eigenvalues of r exceeds onehalf the number of significant decimal digits used by the computer. See also Appendix A.4.
bk ± est. s.e.( bk )t'o/2( v );
k = 1 ,2" . ...p
(7.7.7)
for a nyone o f the parameters where
(7.7.8)
a nd v is the n umber o f degrees o f freedom. frequently n  P. a ssociated
with the estimate of 0 2• T he t erm PkIc is the k th d iagonal term o f P a nd
represents the variance o f bk' F or t he assumptions indicated. (7.7.2) is used
to find PkIc for O LS a nd (7.7.4) for M L e stimation. T he expression given b y
(7.7.8) is a pproximate n ot o nly because s 2 is used for 0 2 in (7.7.2) a nd
(7.7.4). b ut a lso because the sensitivity coefficients a re a pproximate since
they are functions o f b.
Example 7.7.1
Example 7.7.1
Use the Gallant criterion for the BoxKanemasu example of Section 7.6.4.1.
r
Solution
The 90% confidence interval is to b e found for each of the parameters of the model
and for the data
F or that example r can be written
,u]
I
I
Y
where ' u   0.997909
Approximate Variance o f
Solution
The sensitivities. sensitivity matrix. and XTX are
Y
A pproximate variances for the predicted values of the model Y for O LS
a nd M L a re given by expressions similar to (6.2. 12) a nd (6.5.6).
7.7.4
I
3
Assume that the standard assumptions designated by 11111011 arc valid. The
minimum sum of squares is 0.5 a nd occurs a t b ,1.5 and b2  ln20.693.
Using ( d) and (e) of Example 6.8 .1 gives A I" I  '12 and A2  I + '12' The Gallant
criterion then is 1 0g(1 'Iz)/(I + '12») "" 2.98 . Hence in order to solve the
BoxKanemasu data. the Gallant criterion indicates that at least six significant
figures be used in the calculation.
7.7.3
°°
I2
X T(b)_ [ I
I
°°
Owing to the standard assumptions. OLS and ML give the same estimates and
approximate covariance matrix of the parameters. Using (7.7.2) gives
Approximate Confidence Intervals and Regions
As frequently happens for nonlinear problems. there are several approaches for providing approximate confidence intervals a nd regions.
S ome o f these are discussed below.
Consider first calculation o f a pproximate confidence intervals. Use the
linear approximation given by the Taylor series of (7.2.1) (which is used in
the Gauss equation); then. analogous to (6.8.4). we have the approximate
2]
3'
'.
1 6
p~ 5436[ 9
!
 6]0.5 _ [
6
I
I
"6
From
I
i 'J
tables we get t.9s(l)6.31 for 90% confidence (see Table 2.15). Then. for
r ,J
c>
:>
CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNcTIONS
311 .
1.1 MODEL BUILDING AND CONFIDENCE REGIONS
used a long with (6.2.11) a nd (6.5.S). T he approximate confidence contours
given by (7.7.9) are ellipsoids.
A more natural confidence region is found using a likelihood ratio (20).
It is the nonlinear analogue to the F statistic given by (6.2.24)
90% confidence intervals, (7.7 .1) and (7.1 .8) yield
or
b ± (1/6)1/2(6.31)  0.693 ± 2.58
2
Generally confidence inten)Q/s provide poor approximations t~ the c~nfi.
.
.
d ence regIOn even for II'near cases . This is even more true for thiS n onhnear
.S s o · In addition to the apprOlllmatlons that are present
case, as I h wn below
.
. thOIS examp Ie , the reader should be aware that thiS .example was con10
.
I' .
s tructed for pedagogical simplicity and not becau~e It I~ a rea IStlC example. We can expect accurate results from the eq~atlOns gIVen above for real
cases only when the assumptions are nearly vahd a nd when the number of
observations I I is large.
.
.
Gallant has given a Monte Carlo study [ II) for a certain nonlinear
model with four parameters that showed small differences using the above
method for determining the confidence intervals compared to a more exact
method (which involves the likelihood ra~io): He indi~ated, however, ~hat
w hen P is an illconditioned matrix the hkehhood ratio method can Yield
considerably shorter confidence intervals than those using the above
method, which is based on the asymptotic normality .of the ~eas~ squares
estimator. We will describe this more powerful (but I n a pphcatlon more
timeconsuming) procedure in relation to the confidence regi~ns.
.
L et us now consider two methods of finding confidence reglons. Again a
n umber of standard assumptions are needed including knowledge o f the
probability density which we shall as~ume to be ~ormal. Assume t~at the
conditions denoted 111011 are vahd. Then uSing a Taylor senes approximation for 1J the 100(1  a)% confidence region for bOth O LS a nd M L
estimation is given by
[ S(P)R]/p
R /(np) ~FI_.(p,np)
(7.7.11 )
where p is present instead o f q because a confidence region is needed for
all the parameters. ( R is the minimum S .) Usually (7.7. 11) does not
p roduce ellipsoids in p arameter space. T he c ontours a re along the lines of
constant likelihood ratio b ut t he confidence level I  a is approximate
(15,16). T he a pproximation enters in (7.7.11) because the numerator a nd
d enominator n o longer contain independent X2 distributions. I f we are
dealing with a model which is nonlinear in the parameters, computation of
confidence contours using (7.7.11) is more difficult a nd timeconsuming
than that using (7.7.9) in which a linear approximation is incorporated. F or
example, if there are two parameters, a separate numerical search problem
is involved to obtain PI for each choice o f P •
2
Example 7.7.3
The 50, 75, and 90% confidence regions are to be found for III and III for the model
and data of Example 7.7.2 . Use both methods given above.
Solution
Method Based 011 ( 7.7.9)
For this method ( peu)I is XTX since 0 1 for the assumption given. The value of
,2 is R /(IIp)0.5/(32)0.5. Then using the XTX matrix of Example 7.7.2 in
(7.7.9) gives
(7.7.9)
( a)
where for OLS and M L estimation ( pe)I is
(7.7. lOa)
(7.7. lOb)
In (7.7.10a) the sensitivity matrices are functions o f bLS wher~as those in
(7.7. lOb) depend upon bML. In giving (7.1.9, 10), results of Section 6 .8.3 a re
Finding the confidence region using this expression is readily accomplished using
the results of Example 6.8.1. Note that C of that example is the square matrix in ( a)
given above. From ( d) and (e) of Example 6.8.1 the eigenvalues are found to be
AI  (15  (153)1/1)/2  1.31534, A2  [15 + (153)1/2)/2  13.68466. Then using
( g)U) of Example (6.8.1) gives
e  [ 0.788205
0.615412
0.615412 ]
0.788205
CHAPTER 7 MINIMIZATION O F SUM O F SQUARES F UNcnONS
Since e is orthonormal and since
p S2=
7.7 MODEL BUILDING AND CONFIDENCE REGIONS
inaccurate or illdetermined. I f the confidence region were sufficiently small so that
I. we can write analogous to ( I) and ( m),
(hi  fJl )maj = ± FII~~ (2, I )eUAI 1/2 = ± 0.68726FII~~ (2, I )
and
( h 2  fJ2 )maj = + F"~~ (2. I )e 12 AI 1/2= + 0.53660F"~~ (2, I )

we would say that the parameters are welldetermined.
F rom the above equations the ratio of the major to minor axes of the ellipses is
found to be ( A2/"I)I/2_3.23 which indicates that the ellipses are neither extremely
narrow nor approach a circle. The major axis forms an angle of t anl(eI2/e22) 37.98° with the b l  fJl axis. Along this angle the S function varies most slowly.
Another way of thinking about the major axis is to note that it can be described by
(hi  fJl )mon = + FII~~ (2, I )e2IA21/2 = + 0.16636FII~~ (2, I )
(h2  fJ2 )mon = ± FII~~ (2, I ) e Il A2 1 2= +0.21307 F"~~ (2, I )
/
The three confidence regions can be found by using these equations with the F
values which are F ~o(2, 1 )= 1.5, F75(2, 1)=7.5. and F.9(2.1)=49.5. They are shown
in Fig. 7.12 as dashed ellipses that are centered at fJ 1= 1.5 and fJ2 = 0.693. Notice
that the 90% confidence region extends as far as (hlfJl)maj= ±.68726(49.5)1/2=
± 4.84 a nd (h 2fJ2)maj= ±3.78. Since 4 .84/1.5=3.22 and 3.78/.69~=5.4~, the
confidence region is relatively large and the parameters must be conSIdered
5
3
Likelihood r atio
and approximate
contours, 50%.
\
)
8/3. 0.78088/3.
or
8/32 +0.78088/31 = 0
confidence region depends on both. F or related discussion, see reference 21.
S=2
\
2
12
e22
this means that we can estimate the sum /32+0.78/3. more accurately than /32 o r /3 •.
O ne should distinquish between the illdetermined condition provided by the
confidence region and difficulty in convergence indicated by high parameter
correlations or equivalently by a large value of the Gallant criterion. Difficult
convergence is related to the character of XTX (and thus near linearity of sensitivity coefficients), but not to the accuracy of the measurements (the .r 2 value). The
....
/
I
\
4
e
8fJ2= ( 
M ethod Based on (7.7./1)
\
F or this equation the S function is needed; it is
( b)
82
0
Because of the relative simplicity of this function, we can solve for exp( /3~ in terms
of /31 a nd S ; then taking the natural logarithm of exp( /32) gives for /32
1
( c)
2
....
In more realistic examples a search for /32 would be needed for fixed /31 a nd S.
Since R  0.5, n " 3, a nd p ... 2, (7.7.11) becomes
Approximate
.... ~
contour, 9 0%,6
3
S=50
4
5
N
o
~
( d)
4
3
2
7
81
F lpre 7.11 Likelihood ratio and approximate conridence contours for 50, 15, and 90%. S is
given in Example 1.1.3 as equation (b).
By introducing this value of S into ( c) a nd solving for /32 for a range of /31 values,
we obtain the confidence regions shown as solid lines in Fig. 7.12.
Unlike the dashed confidence regions which are all ellipses with the same ratio of
major to minor diameters, these are quite different in shape particularly as S
. becomes large, or equivalently, as the percent confidence approaches 100. I t is
,I
CHAPTER 7 MINIMIZATION OF SUM O F SQUARES FUNCOONS
significant. however. that a t the smallest S value (corresponding to 50%) the true
s hape o f the confidence region does approach the same ellipses given by (7.7.9).
F rom Fig. 7.12 note that a likelihood confidence region tends to be larger than the
corresponding approximate region. I f a likelihood ratio contour had been inside the
associated approximate contour. then the latter would be suggested as a conservative estimate of the likelihood ratio confidence region. Though this condition is n ot
satisfied in this case. it may be in some cases. See Bard (5. p. 2(9).
F or betterdesigned experiments a nd for more observations (usually n >3). the
two methods show much better agreement. F or a n example. see the Box a nd
H unter p aper (22/.
I t is instructive to make a comparison of a confidence region constructed using
the confidence intervals given in Example 7.7.2 with the confidence regions shown
in Fig. 7.12. Consider the 90% case which covers the region  1.66 < p, < 4.66 a nd
 1.89 < P2 < 3.27. This region contains some of the 90% confidence regions a nd
a dditional areas. but also there are areas in the more correct confidence regions not
covered. This indicates that a confidence region constructed from confidence
i ntervals may not be a good approximation o f the approximate regions; a poor
approximation to the likelihood ratio confidence region is provided particularly as
the percent confidence approaches 100.
7.7.5
7.1 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA
Example 7.7.4
Using results of the estimates of parameters for the cooling billet problem,
determine the 95% confidence model selecting from those given b y (7.5.11,12).
Assume that the standard assumptions indicated by 11111011 a re valid.
Solution
T he F test can be used in this example in a manner that is very similar t o t hat used
for linear problems. Using the sums o f squares given in Table 7.4 we c an construct
Table 7.7. T he F values are compared with F. 9S(I,np)4.54, 4.60, a nd 4.67 for
n p e qual t o 15. 14, a nd 13. respectively. Since 0.0077 is less than 4.67 but 672.8 is
n ot less than 4.6, we have a n i ndication t hat the two p~rameters p, a nd P2 in
(7.5.11) are needed b ut the P3 p arameter is not.
T able 7.7
Model
No. o f
Degrees o f
No.
Parameters Freedom
I
2
Model BuildIng Using the F test
S um o f S quares and F S tatistic for Example 7.7.4
3
I
2
3
15
14
13
M ean Square,
R
s 2R/(np)
38.73731
0.7896890
0.7892203
2.582
0.05641
0.06071
l lR
F
37.9476 672.8
0.000469  0.0077
T he F statistic given by (6.2.24).
(7.7.12)
a nd used for confidence regions can also be used for nonlinear model
building. As pointed out below (7.7.11), this expression is a pproximate for
nonlinear models. The assumptions denoted 111011 a re used in c onnection with (7.7.12). Recall that (7.7.12) implies two groups o f parameters; p,
is assumed needed and P m ayor may not be needed. There are p
2
p arameters in all with p, having p  q a nd P2 having q. T he objective is to
see if the P parameters should be included in the model. W2 may contain
2
values suggested by a competing model. (See discussion following Theorem
6.2.4.) When all p parameters are simultaneously estimated, the minimum
sum o f squares is designated R (b" b2); if the standard assumptions o f
11111011 a re valid O LS c an be used, but if the errors neither have a
constant variance nor are independent (11001011), then maximum likelihood estimation should be used.
T he statistic F, (7.7.12), is utilized by comparing its value with F,_ .. ( q,n
 pl. I f F > F,_ .. ( q,n  p), then we have an indication that the parameters
liz a re needed in the model a nd should not be the values implied by W2.
7.1 SEQUENTIAL E SllMAl1<;)N FOR MULl lRESPONSE DATA
T here are a n umber o f a dvantages o f sequential estimation including the
development o f simpler, yet more general computer programs a nd the
possibility o f providing more insight into model building. F or o ther
advantages, see Section 6.7.7.
T he process is considered to b e sequential with time because dynamic
experiments are so common. However, there may be cases when o ne might
wish to have some other independent variable, such as position, to be the
sequential variable. A t e ach time the experiment is considered to involve
m ( > I) different responses. T he m d ifferent responses could include different d ependent variables such as temperature, concentration, velocity, o r
voltage; they could also represent measurements from m similar sensors.
Two sequential methods are given in this section. Both utilize the Gauss
method a nd can incorporate the BoxKanemasu modification.
T he first sequential method is simply a direct adaptation o f (7.4.6). I t
would be recommended when a sequential method is being used t o r educe
the memory requirements in a computer program a nd the sequential values
o f the parameters are not o f interest.
CHAPTER 7 MINIMIZATION o r SUM o r SQUARES F UNcnONS
T he second method is similar to the linear sequential method of Section
6.7. The measurements must either be actually independent or can be
treated as if they are. The latter situation is the case, for example, when
OLS is used. Moreover, autoregressive and moving average correlated
errors can be analyzed by using certain differences which are uncorrelated.
Provided the errors are or can be treated as being uncorrelated, this
method is recommended when m <; p because the order of the matrix
inversion is reduced. Only a scalar needs to be inverted if m = I. Advantage can be taken of this fact by renumbering the observations as if they
were all at different times even though physically m> I .
Prior information can be included through the use of " a nd U.
7.B.l
7.1 SEQUEN11AL ESTIMATION FOR MULTI RESPONSE DATA
diagonal matrix
1f,
~=diag[ ...( l) . .....( n)]
where . ..( i)=diag[ tJ~(i) . .. t J!(i)]
Consistent with the above assumptions is estimation using the assumptions 11011 which are the assumptions used in G aussMarkov estimation for linear models. With the additional assumption of normal errors,
the results are equivalent to those obtained using maximum likelihood. The
sequential procedure given below could also be used with OLS estimation
by simply replacing ... by I .
7.B.2 Direct Method
Equations (7.4.6) apply to the multiresponse case. In order to reduce the
plethora of subscripts and superscripts we write (7.4.6) as
Assumptions
The standard assumptions of additive, zero mean, and noncorrelated errors
are used. The errors have a known covariance matrix ~. T he independent
variables are considered to be nonstochastic. The parameters are assumed
to be constant, that is, not time variable o r random. These assumptions are
designated 11111. The assumptions regarding the measurement errors
can be written as
Y = E (YIP>+ e = ' I + e
(7.8.1)
E (e)=O
(7.8.2)
(7.8.6)
(7.8.7)
where X and ' I a re evaluated a t b which stands for b (k). T he vector b · is the
same as b (lr+l) in (7.4.6a). Recall that k is a n iteration index which changes
only after all the data ( m X n observations) have been considered.
The sensitivity matrix in (7.8.6.7) can be partitioned so that
for u =v,j=k
=0
otherwise
X (I)
XII ( i)
X(2)
(7.8.3)
X=
X21(i)
X 12 ( i)
X 22 (i)
where X (i)X(n)
Y (i),
Y(2)
(7.8.4)
w hereY(i)=
Y=
Yen)
X",I ( i)
X",2 ( i)
X...,. ( i)
(7.8.8)
YI (i)
Y 2 (i)
Y (I)
X I, ( i)
X2p ( i)
"
The measurement vector Y is composed of the mdimensional vectors
'f
(7.8.5)
Y", ( i)
T he associated error vector e a nd dependent variable vector ' I are similarly
composed of the mdimensional vectors e (i) a nd ' I(i) where i = I " .. . , n .
T he covariance matrix of the errors for the given assumptions is the
Notice that Xjk ( i) is the sensitivity coefficient for the j th dependent
variable and k th parameter and at the i th time.
(7 .8.9)
A typical term of the triple product XT...  IX o f (7.8.7) can be indicated
390
N
o
~.
C HAPTER' MINIMIZATION OF SUM O F SQUARES F UNcnONS
by
_XT«I»IX =
[i ~
,.I
SEQUEN11AL ES11MAnON FOR M ULnRESPONSE DATA
o ther words, it would be necessary to j ust find Pen) a nd then use
k .l= 1.2 . ..... p (7.S.10)
Xjk ( /)XjI ( t)Oj2(t)];
, I i I
Rather than waiting to evaluate the sums in XT«I»IX after a complete
iteration. a running summation can be performed to reduce the computer
memory requirements. Let us define CJ/(i) as being
;
m
C J/(i)= VJ/+ ~ ~ X jJ(/)XjI(/)O,l(/)
,I i I
( 7.S.II)
At the end o f a n iteration the b vector is replaced by b ·(n) a nd another
iteration on k is begun with X being evaluated at the new b. T he process is
continued until convergence is attained.
7.8.3 SequentIal Method UsIng the Matrix Inversion Lemma
The usually preferred sequential method involves the matrix inversion
lemma. Using expressions in the preceding section we c an write
T hen the s /term of the p  I matrix can be constructed from
b ·(i + I) ... b + P (i + 1)[ d (i) + XT(i + I)«I»I(i + l )e(i+ I )]
m
C s/(i+I)=CJ/(i)+ ~ X iJ(i+I)X,/(i+ I)o,l(i+I)
(7.8.17)
(7.S.l2)
i I
P (i+ 1 ) [PI(i) + XT(i.+ I)«I»I(i + I )X(i+ I )
which forms a recursion relation with the starting value of CJ/(O)= VJ/.
Both s and / go from I to p.
A similar recursion relation can be given for the components of the
vector • XT«I»I e ; one component is
m
d J(i+ 1)==dJ(i)+ ~ X j,(i+ I )ep+ I )ojl(i+ I)
j I
where typical
terrn~
r
I
(7.8.18)
(7.8.19)
are
X T(i+ I)«I»I(i+ I )e(i+ 1 ) [
f X).(i+ I)~(i+ l)oj2(i+ I)]
(7.8.20)
J I
(7.S.13)
XT(i + I)«I»I(i+ I)X(i + 1 ) [
f
X). ( i + I )X)/ ( i + l )o)2(i + I )]
(7.8.21)
J I
which has the starting condition of
p
dJ (0) = ~ VJ, ( Ilj  hi)
(7.8.14)
i I
T he range of s is I to p .
The procedure to find the new parameter vector is first to use (7.S.12)
a nd (7.8.13). Then the (i + I )th matrix of P is found by finding the inverse.
I
C II(i+I)
C ,,(i+ I)
(7.8.15)
C I,(i+ I)
C ",,(i+ I)
Finally the parameter vector is found using
b ·(i+ 1 )=b+P(i+_
I)d(i+ I)
(7.8.16)
Note that b ·(i + I) is the parameter vector at the (i + I )th time in the kth
iteration where k has the same meaning as (7.4.6). The vector b in (7.8.16)
is " for the first iteration and b ·(n) of the preceding iteration for the
subsequent iterations on k.
I f only the final converged parameter values are of interest. it is
necessary to invert the P matrix just once for each complete iteration; in
A.
A (i + 1 )_ p (i)XT(i+ I)
(7.S.22a)
B.
£1(i+ I)~i+ I )+X(i+ I )A(i+ I)
(7.8.22b)
c.
K(i + 1 ) P(i)XT(i + 1)£1I(i+ I)
(7.S.22c)
E.
j
P (i+ I)::z
T he range of s a nd I in these last two equations is I to p .
Using (6.7.5a) and (6.7.5b) we can write analogous to (6.7.6a, b,c, f).
P (i + I )P(i) K(i + I )X(i+ I )P(i)
(7.8.22d)
Utilizing (6.7.5b) and (7.S.22c) in (7.8.1S) yields after some rearrangement
D.
b ·(i + I )b·(i)+ K(i + I){ e (i + 1 ) X(i + I )[b·(i)b]}
(7.8.22e) 
At each time i, the above five equations are to be used in the order A, B,
C, D, and E.
391
CHAPTER 7 MINIMIZATION O F S UM OF SQUARES FUNCTIONS
I t is i mportant to understand that three different parameter vectors
appear in (7.8.22e). First, b ·(i + I) is the estimated vector for the (i + I)th
time and a t the ( k + I)th iteration. Second, b ·(i) is for the ith time a nd
( k + I)th iteration. Finally, b is the vector found a t the nth time of the kth
iteration; the vector b does not change during an iteration over the time
index i. Note that P (i+ I), X (i+ I), ~(i+ I) a nd e (i+ 1 )=Y(i+ 1 )1I(i+ I)
are functions of b a nd not b ·(i).
When there is prior information, each iteration starts with b = I ' a nd
P(O) = Vp = U  I. A detailed example is given in Section 7.9. 1.
When there is no prior information, a special starting procedure is
needed at i = I for each of the k iterations. As suggested in Section 6.7, we
c an select P(O) to be a diagonal matrix with relatively large terms. Since P
represents the approximate variancecovariance matrix of the parameter
vector, /1. a P(O) with large diagonal components indicates poor prior
information. In general. P(O) is fixed from iteration to iteration (i .e., over
the k index). Provided no problems are encountered with roundoff, P(O)
6
could have any diagonal values that are large (by a factor of 10 , say)
compared to the corresponding values of Pen).
7. B. 1. 1 Seqlll!ntial M ethod f or m = 1, p Arbitrary
An important application of the previous analysis is for m = I because the
only inverse that must be found. (that of ~), becomes the inverse of a
scalar. As shown in Section 6.7.5 , uncorrelated multiresponse measurements can be renumbered so that the m = I analysis can be employed.
From (7.8.22) evaluated for m = I a nd p arbitrary we find
p
A u(i+ 1 )=
L Xj(i+ I)Puj(i).
u = ',2 ... . ,p
(7 .8.23a)
p
L
XJ(i +
')AP + I )
EXAMPLES u nUZING SEQUEN11AL ESTIMATION
T he above scalar equations should b e used in the order given. These
equations c an b e programmed in a straightforward manner, b ut o ne m ust
remember that three b vectors a re involved a nd t hat the P(O) matrix does
not vary from iteration to iteration. O nly two sets o f b storage locations are
needed, however, one is for b =b·(n) a nd the other is for b ·(i) a ndb·(i+ I )
since the latter replaces the former as calculations proceed. Since m = I ,
o nly o ne subscript is needed for Xj ( i + I) a nd none is needed for H ( i + I).
When programming, the i + 1 index need n ot b e carried.
T he sequential procedure given by (7.8.23) reduces to that given by the
linear analysis given by (6.7.8) by setting b =O.
F or M AP estimation the starting value of b is I ' a nd the starting P(O)
matrix is U  I = VfJ for each iteration. All the sensitivity coefficients for the
first iteration (which includes i = 1,2" . . . , n) a nd each 1I(i+ I) is evaluated
using b ... 1'. F or s ubsequent iterations X a nd 11 a re evaluated a t b(n), the
final vector of the previous iteration. See the s ection 7 .9.1 example.
When there is no prior information, follow the procedure described
below (7.8.22).
7.8.4
Correlated Fnors With Known Correlation P arameters
I f t he measurement errors in Y are additive a nd c orrelated in time, the
above analyses c an b e used with slight modifications for the autoregressive
a nd moving average cases discussed in Section 6.9 a nd Appendix 6A,
provided the correlation parameters a re known. The analyses can be
performed by replacing X by Z =DIX a nd e b y f =D1e. F or example,
for firstorder autoregressive errors, the terms in the matrices would b e
Zjk ( i)= ~k (i)  Pj(i)Xjk ( i  I)
/ ; ( i) = "i( i )  Pj( i )"i( i I)
jI
~(i + 1 )= 02(i + ' )+
7!J
(7.8.24a)
(7.8.24b)
for i = 1,2" . .. , n a nd where ~k (0) a nd "i(0) a re defined to be equal to zero.
(7 .8.23b)
) 1
7 9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION
u= 1,2, . . . •p
Ku(i+ I )=Au(i+ ')/~(i+ I),
(7 .8.23c)
p
H (i+ 1 )= Y (i+ l )lJ(i+ 1 )
L X ii+ 1 )[b·J(i)b;J
(7.8.23d)
) 1
tV
o
CJ1
b · u(i + 1 )= b* u(i) + Ku(i + I )H(i + I),
u= I , . . .• p
(7.8.23e)
u. v= I . ... ,p
(7 .8.23f)
T he purpose of this section is to present four different examples in which a
sequential method is used. In the first example, detailed results are given to
illustrate the sequential M AP m ethod based o n the matrix inversion
lemma. The physical problem involves heat conduction in a finite plate.
The second problem, involving a n o rdinary differential equation, is the
cooling billet problem previously considered. The third problem uses
simulated d ata for heat conduction in a semiinfinite body; the errors are
correlated. The final example is a realistic example involving a bout 1000
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS
1 .' EXAMPLES u nUZING SEQUENTIAL ES11MATION
T able 7.9
Known Conditions for Section 7.9.1 Problem
q ... 9612 B tu/ftl·hr
" I(  K)45 10.022222 hrft·oF / Btu
L I/I2ft
" I(  a)0.8 f tl/hr
To... 81.69°F
VI/. II  4 x 1O 6(hrft.oF /Btu)2
VI/.Il VI/.21 0
V1/.22  0.01 ftot /hr2
actual measurements; it is a gain for heat conduction in a finite plate.
Except for the second example each of the problems in this section
involves multi response data.
7.9.1 Simple M AP Example Involving Multiresponse Data
This example provides a detailed analysis of a maximum a posteriori
sequential estimation procedure for some multi response data. T he d ata .are
abstracted from the extensive values contained in Table 7.14. T he physical
problem is t hat of a rIat plate heated on one side a nd insulated o n the
other. More extensive use of these d ata is discussed in Section 7.9.4 where
a more complete description of the problem is given. In t hat section the
results o f estimation for the thermal conductivity. k. a nd the speciric
h eatdensity product. c. a re given; a finite dirrerence procedure is used to
calculate the dependent variable (temperature). I n this section a limited
amount of data is used; K =k I a nd a (=k/c) a re estimated; a n a nalyti·
cal solution is used for the dependent variable; a nd subjective prior
information is used.
T he measurements for this case are given in T able 7.8 a nd o ther k nown
information is given in Table 7.9. T he t emperature measurements in Table
7.8 can be considered to b e additive. zero mean. constant variance.
uncorrelated. and normal. Since a t each time there are two measurements
a t each location. a n estimate o f p ure error can be obtained. Using the fact
that VeE  E ) =2,,2 (for assumptions 1111). a n e stimate of ,,2 from the
I
2
.
h.
d ata orTable 7.8 can be found to be a bout 0.0625. T he e rrors I n t e tIme
m easurements are negligible compared to those in the temperature rise
 that is. those above the initial temperature which is a bout 81.69°F. T he
assumptions given above and that of subjective prior information is
d enoted 11111113.
The temperatures corresponding to 'II are calculated using the solution
given by (8.5.25). which can also be dirrerentiated with respect to K a nd a
to get the sensitivity coefficients. F or the times given in T able 7.8. the first
three terms a t most are needed in the summation. Values for the sensitivity
coefficients of K a nd a . designated X I a nd X 2 • respectively. a re given in
T able 7.8
D ata for Section 7.9.1 Problem
Time
(sec)
Time Measured
from Start of
Heating (hr)
TI (OF)
( x L)
T4 ( OF)
T, (OF)
T6 (OF)
( xL
( xO)
( xO
15.3
18.3
12/3600
15/3600
85.83
87.46
85.37
87.16
94.73
96.44
94.48
96.03
Table 7.10. T he c orresponding calculated values o f t emperature. denoted
'II. a re a lso given. All the values o f X I' X 2• a nd'll in Table 7.10 a re b ased on
the parameter values o f K = 1'1 a nd a = 1'2' T he i nformation in T able 7.10 is
given in a n o rder a ppropriate for sequential estimation with a single
observation being a dded a t o ne " time." O ther o rders a re possible. however.
Table 7.10
Sensitivity Coeffldents and'll for the First Iteration
o f t he S ection 7.9.1 Problem
Time
(hr)
I
2
3
4
5
6
7
8
12/3600
12/3600
12/3600
12/3600
15/3600
15/3600
15/3600
15/3600
x
L
L
0
0
L
L
0
0
T
XI ( i)
X1 (i)
, ,(i)
TI
Tot
T,
T6
TI
Tot
T,
T6
177.8
177.8
570.9
570.9
252.4
252.4
650.0
650.0
8.158
8.158
8.930
8.930
10.49
10.49
10.87
10.87
85.640
85.640
94.377
94.377
87.299
87.299
96.136
96.136
In T able 7.11 a re the s equential values for the first iteration. These
values use the sensitivity coefficients given in T able 7.10 (except to more
significant figures). Notice that all the X j(;)'s a nd '11(;) a re evaluated for the
s ame values o f K = 1'1 a nd a = 1'2; the sensitivity coefficients a nd'll in Table
7.11 a re n ol e valuated using the updated values o f b l(i) a nd bl (;). T he
sequential procedure used is given by (7.8.23). which is identical to that
given by (6.7.8) for the linear case. F or the nonlinear case. however.
iteration is required.
The starting values o f b l' bl • r ll • p u • a nd P22 for this M AP analysis a re
1'1' 1'1' VI/.II' V".u. a nd V". 11' . respectively. These s!arting v~l.ues ~re used
for every iteration as can be observed from the s tarting conditions I n T able
7.12. which is for the second iteration. In this iteration. however. all the
CHAPTER 7 MINIMIZATION O F S UM O F S QUARES FUNCTIONS
7.9 EXAMPLES u nLiZING S EQUENTIAL ESTIMATION
Sequential Estimation of K and a in Problem of Section 7.9.1,
First Iteration. Based on Parameter Values K = J1.1 = 0.022222
and a = J1.2 = 0.8
7.9.2
396
Table 7.11
T ime
(sec)
0
I
2
3
4
5
6
7
8
x
12
12
12
12
15
15
15
15
L
L
0
0
L
L
0
0
T
b. X 102
b2
p •• x l if
P12 X l if
P22 X l if
T.
T4
Ts
T6
T.
T4
Ts
T6
2.2222
2.2380
2.2188
2.3074
2.2836
2.2701
2.2815
2.2873
2.2654
0.8
0.81814
0.79603
0.78031
0.78452
0.79407
0.78603
0.78492
0.78912
4
3.408
3.386
0.5543
0.3988
0.3321
0.3030
0.2372
0.2064
0
 67.89
 70.47
 20.23
 17.48
 12.79
 10.74
 9.48
 8.89
10000
2211
1915
1024
975.0
645.0
500.9
476.8
465.5
Table 7.12 Sequential Estimation of K and a In Problem of Section 7.9.1,
Second Iteration. Based on Parameter Values K = b PI(n)
=0.022654 and a = b~I)(Il) = 0.78912
Time
(sec)
0
I
2
3
4
5
6
7
8
x
T
b. x 102
b2
12
12
12
12
15
15
15
15
L
L
0
0
L
L
0
0
T.
T4
Ts
T6
T.
T4
Ts
T6
2.2222
2.2372
2.2188
2.3077
2.2838
2.2701
2.2816
2.2874
2.2654
0.8
0.81793
0.79589
0.78075
0.78481
0.79421
0.78632
0.78525
0.78930
P II X l if
4
3.446
3.425
0.5606
0.4039
0.3354
0.3056
0.2393
0.2083
P 12 X 1()6
0
 66.17
 68.63
 19.87
 17.20
 12.53
 10.50
 9.28
 8.71
Pu
X
397
Cooling Billet Problem
The d ata for the cooling billet problem are given in Table 6.2; results o f
analyses o f these d ata a re given in Sections 6.2, 7.5.2, a nd 7.7.5. Though
converged p arameter values for a differential equation model are given in
the ~ast .two sections, sequential results are n ot given. T he p urpose o f this
~ec.tlon IS to demons.tr~te the power o ! a s equential procedure t o p rovide
mSlght for model buJldmg for the coohng billet case.
Consider first results o f the power series models for h given by (7.5.11).
The values o f b l for Model I a re d epicted in Fig. 7.13. (Recall t hat M odel
I is hA/~cV=fJl') T he value o f h l(i) is the approximation to fJl using the
d ata ~<>.r times through t;; the estimate is approximate in t hat the sensitivity
coefficients were evaluated a t the converged parameter values using all the
data. Because the hi values o f M odel I continually decrease in Fig. 7.13
a nd because hl(i) is the average until t;, the actual time variation o f h is
probably larger than that shown.
P arameters for Model 2 ( hA/pcV= fJl + fJ 2t) a re shown in Figs. 7.13 a nd
7.14. T he hi values shown in Fig. 7.13 are relatively constant with t ime'
this is p articularly true for the last five time steps. F or the last half o f t h;
time steps shown in Fig. 7.14, h2 is relatively constant. This constancy of
parameters suggests adequacy o f M odel 2.
1()6
10000
2103
1809
979.6
934.2
615.6
476.6
454.2
443.8
values of X1(i). X 2(;). a nd 1'/(;) a re found based on the values of the
previous iteration. bll)(n) = 0.022654 and b~l)(n)=0.78912.
I f this case had not been one involving prior information. then the initial
values of bI1)(0) a nd bfl(O) for the second iteration would be simply the
final values of the first iteration at which values the sensitivity coefficients
a nd 1'/ a re also evaluated. The p1;I(O) values would not be different,
however (unless one wishes to use Pk~2)(0) to introduce the Levenberg o r
M arquardt modifications).
The iterations in general continue until negligible changes in b ll(n)
occur. In this case only two iterations ( k = 2) a pparently are needed.
r.
,
•
I:.
2.95
. . +•
I:.
•
2.9
bl ,hr1
•
2.85
++
•
2 .8
+
I:.
+
I:.
•••
•
2
4
6
I:.
+ ~ ..
I:.
LHo•.'
2
• ~ Model
I
0
I:.
+
Model 3
I:.
I:.
I:.
2.75
2.7
+++
I
8

••
••
••
.
10
12
14
t
16
Time step index. 1
F lpre 7.13
Estimates of
P.
for Models 1 ,2, a nd 3 for the cooling billet data.
N
0
OC
3911
M INIMIZATION O F S UM O F S QUARES F UNCI10NS
C HAPTER 7
I
6
0
1
bZ,hl"
2
2
6
r
.. +
•
•6
+
••
t ;.
3
t ;.
Hodel
4
~j
20
6
6
+ ...
+
.
+
t•
! at 3 7.8
15
61
Hodel 3
10
s
5
6
6
Figure 7.14
0
2
Estimates of
4
10
12
8
Time s tep I ndex,
6
14
s
fJ 2 for Models 2 and 3 for the cooling billet d ata .
F or M odel 3 (hA / p cV = { jl + { j2' + PJ I 2), the b l values a re relatively
stable in Fig. 7. 13. those o f b 2 in Fig. 7. 14 less so, a nd those in Fig. 7. 15
s howing bJ are quite oscillatory. T he oscillatory n ature o f h J • p assing
through zero several times, suggests that Model 3 is n ot a ppropr iate.
Sometimes o ne w onders a bout the errects o n the parameters o f a dditional d ata . Without o btaining the d ata o ne c annot be certain. However,
the sequential method applied to existing d ata s hould give insight i nto the
effects. Add itional d ata would be expected to continue trends already
observed in the parameters. F or e xample, Fig. 7.15 suggests t hat b J m ight
continue to oscillate a bout zero with decreasing a mplitude .
T he s equential results for PI a nd P2 in Model 4. h A/pcV=P I +{J2(TToo). a re s hown in Fig. 7. 16. T he hI values are relatively c onstant b ut the h2
v alues a re not. T he 50% v ariation o f b 2 in the last five steps suggests t hat
t he model should be further examined by using additional d ata o r r elated
models should be tried. Physically the model is q uite attractive b ut s ome
i mprovement is n eeded such as adding an exponent n on T  Too; this was
tried bu t was unsuccessful as mentioned in S ection' 7.5.2 although additional d ata might be needed also to estimate n.
7.9.1.1
0'~rT,4~~.~
16
Other Pouible Model!
I n a ddition to the models mentioned above, m any m ore could be suggested. F or example. one extension of Models I, 2. a nd 3 is to investigate
o ther t erms such as /1413. A nother approach is to always use the linearinI
model (Model 2) b ut to use dirrerent P pairs (such a s /1l i' /11. for the ith
time interval) for successive time intervals [2325).
1 Oo
~tttirl:*+'"!'...J
F lpre 7.15
Estimates o f
Time s tep I ndex,
/J) for Model J for cooling billet data.
2.5
.005
2.3
2.2
. 001
2.1
( left s Clle)
Time s tep I ndex,
F Jpre 7. 16 Sequential parameter estimates for Model 4 for cooling billet data.
399
r
I
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS
7.9.3 SemiInfinite Body H eat Conduction Example
Consider a semiinfinite, heatconducting body initially at the temperature
To of 100" F (311 K) a nd whose surface temperature takes a step increase
to Too = 200°F (366 K). The temperature is measured every 9 sec until 252
sec at the two locations X I = 0.125 in. (0.00318 m) a nd x 2 = 0.25 in. (0.00635
m) where x is measured from the heated surface. This example is o f the
Monte Carlo type because the data are simulated using the known value of
thermal diffusivity a , equal to 0 .01 ft 2 jhr (2.58 X 1 0'm 2jsec) a nd because
15 different sets of random errors are considered. The first five cases use
independent, normal errors with unit variance. T he m easurement errors for
the first case are taken from the first two columns of Table XXIII of (26].
the second case uses those from the next two columns. etc.
Correlated errors are obtained for the firstorder autoregressive process
a I described in Section 6.9.2; the error Ej a t time I j is given by
Ej = pE, _ I + Uj
(7.9.1 )
where the p =0.5 for cases 610 a nd p =0.9 for the remaining cases. T he uj
r andom variable values are the same values taken from (26) as mentioned
above.
A mathematical statement of the physics of the problem is
T able 7.13. T he first five cases tabulated which are for p =O a re more
accurate o n the average than the next five which are for p =O.5 which are
in t um m ore accurate (on the average) than the last 5 cases (p = 0.9). Hence
we can conclude that correlated errors can substantially reduce the accuracy of the parameter estimates.
Table 7.13
Case
I
2
3
4
5
6
7
8
9
10
11
12
13
(7.9.2)
which has the exact solution.
T (x.t)To
TooTo
X
= erfc =::2(at)1/2
where c rfc(y)=  2
'TT1/2
foo e 
14
15
Results for Estimating Thermal Dlffuslvity for Cases
with Noncorrelated and Correlated Errors
p
ci x I()4 (ft 2j hr)
0
0
0
0
0
0.5
0.5
0.5
0.5
0.5
0.9
0.9
0.9
0.9
0.9
99.71
100.64
100.17
99.29
98.56
99.49
101.88
100.39
98 .69
97 .22
98.6
115.3
101.7
97.2
91.0
R
59.39
54.15
58.71
52.55
56.49
. 60.85
53.76
59.2
52.7
56.3
64.67
53.40
59.37
55.15
61 .90
ZTZ
10 6
X
51
197
192
195
199
202
52.4
49.7
51.3
53 .4
55.3
4.275
3.09
4.00
4.40
5.06
1.080
.985
1.067
.955
1.191
1.106
.977
1.076
.958
1.024
1.176
.971
1.079
1.003
1.125
est. s.e.(ci)
X
1()4
0.740
0.716
0.740
0.693
0.768
1.45
1.40
1.45
1.34
1.36
5.25
5.61
5.19
4.97
4.72
(7.9.3)
T (O./)=T oo • T (x,O)=To.T(oo.t)=To
::::::~
••
7.9 EXAMPLES u nLiZING SEQUENTIAL ESTIMATION
II 1
du
(7.9.4)
)I
N ote that T (x,t) is a n onlinear function of a . Because there is no
characteristic length o r time in this problem, the dimensionless temperature given by (7.9.4) and the a sensitivity (given by (8.5.4)) can be plotted
versus the single dependent variable 'T' = a l / x 2• see Fig. 8. 12.
T he parameter a was estimated using nonlinear M L sequential estimation for the multiresponse data mentioned above (two locations a t each of
which 28 measurements were simulated). In most cases only three iterations were needed to converge to I i a fter starting with the initial estimate of
0.006 ft 2/hr which is low compared to the true value o f 0.01 f t2/hr .
Estimated a values for the 15 cases are given in the third column of
T he minImum sum of squares R also has a tendency to increase in
variability with increasing p. T he values o f R were calculated using the
converged values o f I i in (7.3.1) with U =O a nd W =OI; r ather than using
0  1 directly. however. it was introduced by replacing X by Z a nd e by f as
mentioned in Section 7.8.4.
T he m ean sum of squares was formed using s2 = R / ( n  p) where n = 2 X
28 = 56 a nd p = I, in this example. T he e stimated s tandard e rror I i was
found using (6.8.3) which can b e written for this oneparameter case as
est. s .e.(a)= [ Z T Z ]
 1/2
s
(7.9.5).
These values increase by a factor o f a bout 7 from p =O t o 0.9. In each set
o f five cases for a given p value, the true a value o f IOOx 1 0 4 ft 2 /hr is in
the in terval given by I i ± est. s.e.(I i) in three o f the five cases. This is
c onsistent with the 67% c onfidence interval that could be calculated using
the I distributtion.
N
CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNCI10NS
.2
~
0
7.9.4 Analysis of Finite HeatConducting Body with Multlresponse Experimental Data
7.9.4.1 DeJcriptilHl
Measured Temperatures for Finite HeatConducting Specimen
Time
Temperature ( "F) for Numbered Thermocouples
T C)
T C7
T C4
TC5
T C6
TCB
BI.69
81 .60
BI.44
81.52
BI.6O
81 .69
BI.6O
BI.69
BI.6O
BI.44
81 .60
BI.68
81 .60
BI.77
BI.93
81.60
11.77
BI.6O
81 .77
81.60
81.6B
81.77
(sec)
TC I
T C2
0.)
0.6·
0.9
1.2
U
I.B
2.1
2.4
2.7
3.0
).3
BI.67
81 .67
81.75
81.75
81.59
BI.50
81.50
BI.42
BI.67
I U9
81.67
BI.37
81.29
81.45
81.12
BI.45
81.45
BI.37
81.37
BI.37
81 .29
81.29
BI.36
BI.52
B U2
81.44
BI.6O
81 .52
81.52
81.52
81.44
BI .44
81.52
Temperature (OF) for Numbered Thermocouples
Time
(sec)
01 E xperilMflt
In this experiment two adjacent, identical Armco iron cylindrical specimens (I in. (0.0254 m) thick a nd 3 in. (0.0762 m) in diameter) were heated
by a single flat heater placed between them. The heat now was in the axial
direction. All surfaces except those where the heater was located were
insulated. Four thermocouples (numbered 5, 6, 7, a nd 8) were carefully
attached to the heated surface o f each specimen and four (numbered 14)
were attached in the opposite n at insulated surfaces. The sensors were
located at angles of 0, 90, 180, a nd 270°. The sensors a t 0 a nd 180° were
electrically averaged as were also those a t 90 and 270°. This then provided
eight temperature histories with four being a t a heated surface a nd four
being a t an insulated surface.
Before the start of an experiment the specimens were allowed to come to
a uniform temperature. The heating period lasted 15.3 sec after which the
specimens attained a higher equilibrium temperature. For further discussion of the experiments, see reference 27.
The data used to find parameters in this case are given in T able 7.14. An
IBM 1800 computer was used to digitize the analogue signals produced by
the thermocouples. Typical results for the heated a nd insulated surfaces
a re shown in Fig. 7.17; the heated surface temperature increases only
during heating whereas the insulated surface temperature increases after
heating and finally approaches the same equilibrium temperature as the
heated surface.
Table 7.14
T able 7.14 (Cont.)
81.37
81.61
81.45
81.53
81.61
81 .53
81.61
BI.37
BI.4S
BI.6I
81.69
BI.49
81.41
81.49
81.41
81.33
11.41
81.33
BI.58
BUB
81 .58
81.66
BUB
81.74
81.66
81.66
BI.66
11.50
81.66
8 U8
81.74
BUB
81.74
TC I
T C2
T Cl
T C4
T C5
T C6
T C7
l.6
3.9
4.2
4.5
4.B
5.1
5.4
5.7
6.0
6.3
6.6
6.9
7.2
7.5
7.8
1.1
8.4
8.7
9.0
9.3
9.6
9.9
10.2
10.5
10.B
11,\
11.4
11.7
12 .0
12 .3
12.6
12.9
13.2
I l .S
13 .8
14.1
14.4
14.7
15 .0
15.3
15.6
15.9
16 .2
16 .5
8 U9
81.67
11.59
11.59
81.42
11.59
11.67
11.67
11.75
I U3
11.91
B U)
81 .99
81.91
B2.16
82 .24
B2.4O
B2.48
82.65
82.73
82 .81
82.97
83.14
B3.22
83 .46
83 .71
8).71
B3.B7
B4.03
14.20
14.36
B4.52
B4.52
14.BS
85.10
85.10
B5.42
BS.50
85 .75
BS.83
B5.75
85.91
86.16
86.32
80.96
80.96
10.80
80.80
80.96
10.11
10.96
11.04
11.04
11.21
BI.2I
BI.53
81.61
11.37
11.71
82.02
82.02
82.43
82.68
B2.68
82.43
83.17
83.25
83.57
83.74
83.82
14.)1
84.39
14.72
14.96
14.BB
B5.21
14.96
85 .05
15.05
14.96
B5 .05
15.29
85.37
B5 .21
85.62
BS.78
86.0l
86.03
81.36
11.44
11.21
11.21
11.36
II.2B
11.)6
11 .36
11.36
11.44
11.60
BI.60
81.85
11.77
11.93
12.11
B2.09
82.34
12.42
12.75
B2 .5B
82.91
83.07
B3.40
B U8
B3 .56
83.73
B3 .97
14.22
14.lB
B4.38
14.54
84.79
14.71
14.B7
85.11
85.2B
B5.20
85 .21
B5.52
85.B5
85 .9)
85 .85
86.26
81.45
11.69
I U3
11.45
11.37
11.45
11.61
11.53
11.45
11.61
BI.53
BI.6I
11.77
12.02
82.02
12.10
12.26
82.35
82.35
82.59
12.59
82.92
B2.92
83.0B
83.33
B) .24
83 .57
B).57
83 .73
83.90
14.22
14.22
14.63
14.55
14.63
84.71
14.96
BS .20
15.04
BS.37
B5.28
85 .77
85.94
86.02
82 .96
14.03
15.01
I BO
15.99
16.)1
16.80
17.29
17.62
81.03
BB.36
B8.60
19.01
19.25
19.50
19.91
90.07
90.32
90.56
90.97
91 .21
91 .46
91.62
91.87
92.03
92.28
92.69
92.77
92.9)
93.26
93.34
93.50
93.67
93 .75
94.07
93.99
94.24
94.40
94.56
94.73
94.97
95.22
95.30
95.38
83.21
14.03
14.68
15.51
15.83
86.41
B6.81
B7.30
17.62
17.17
11.27
11.68
8B.93
19.0\
19.25
B9.74
89.91
90.07
90.23
90.56
90.64
90.89
91.29
91.29
91.46
91 .70
92.03
92.19
92.27
92.44
92.60
92.93
93.25
93 .17
93.58
93.51
93.91
93.91
93.91
94.48
94.48
94.64
94.56
95.05
8).64
83.24
14.71
14.22
15.19
15.11
IS.BS
15.77
86.42
86.34
86.91
86.75
17.07
87.16
17.56
B U6
17.19
87.97
IB.37
18.14
IB .62
BB.54
11.70
BI.B7
89. 19
19.36
8 9J5
19.60
19.60
19.77
89 .14
89.77
90.09
90. 11
90.25
90.50
90.33
90.83
90.58
91.07
90.99
9\.16
90.74
91 .40
90.99
91.65
91.23
91.81
91.39
92.22
91.72
92.31
91.72
92.46
91.96
92.71
92.21
92.87
9 ) .28 . 92.\3
92.29
93.52
92.62
93.52
92.7B
93.77
92 .94
93.93
92 .94
94.18
93 .27
94.09
93.52
94.42
9).14
94.42
94.17
94.75
94.25
94.99
94 .41
95.07
94.B2
95.07
94.82
95.40
95.07
95.73
T C8
40J
7.9 EXAMPLES l mLiZING SEQUENTIAL E STIMAll0N
Table 7.14 ( Cont.)
Time
T emperature ( OF) ror Numbered Thermocouples
(sec)
...
.....
N
TC I
TC 2
TC 3
T C4
TC 5
T C6
T C7
T C8
16 .8
17.1
17 .4
17 .7
18 .0
18 .3
18 .6
18 .9
19 .2
19 .5
19 .8
20.1
20.4
20.7
21.0
21.3
21 .6
21.9
22.2
22 .5
22.8
23.1
23 .4
23 .7
24.0
24.3
24.6
24.9
25.2
25.5
25.8
26.1
26.4
26.7
27.0
27.3
27.6
27 .9
28 .2
28.5
28.8
29.1
29.4
29.7
30.0
86.57
86.73
87.06
87 . 14
87.22
87.46
87 .63
87 .71
87.87
87 .87
88.12
88.52
88 .61
88.77
88.77
88.93
89.18
89 . 10
89.26
89 .42
89 .59
89.75
89.91
89.83
89.91
89.91
9 0.08
89.91
90. 16
90. 16
90.08
90. 16
90.24
90. 16
90. 16
90.32
90.24
90.40
90.24
90.40
90.40
90.24
90.48
90.40
90.48
86. 11
86.35
86.43
86.52
86.52
86.92
87 .01
87. 17
87.50
87 .66
87 .66
88.07
88.07
88.23
88.56
89.31
88.39
88.72
88.64
8 880
88 .88
88.88
89 . 13
89 .21
89.29
89 .29
89 .37
89.37
89.37
89.54
89.54
89.62
89.70
89.54
89.62
89.62
89 .62
89.62
89 .62
89.70
89.70
89 .62
89.78
89.78
89.62
86.34
86.34
86.58
86 .83
86.91
86 .99
87.32
87.40
87.48
87 .73
88.05
88.14
88 .22
88.54
88.54
89 .79
88 .79
88.87
89.11
89 .20
89.44
89.44
89 .69
89 .44
89.52
89.60
89.77
89.69
89 .77
89. 79
89.85
90.01
89.85
89.77
89.93
90. 18
89 .85
90.01
90.01
9 0.09
90.09
90.01
90.01
90.01
89.93
86. 10
86 .35
86.51
86 .67
86.83
87.16
87 .24
87 .32
87.57
87.49
87.90
88.06
88 .06
88.55
88 .30
89.55
88.63
88.55
88.96
89.04
89 . 12
89.37
89.37
89 .37
89.37
89 .53
89.53
89.45
89 .61
89.61
89.77
89.77
89.85
89.85
89.85
89.94
89.85
89.85
89 .69
89.94
89.77
89.85
89 .85
89.94
89.77
95.71
95.71
96.03
96.28
96.52
96.44
96.77
95.87
94.56
93.99
93.67
93.18
92.93
92 .52
92.36
92. 11
91.79
91.79
91.54
91.62
9 /,46
91.38
91.13
91.13
9 /./3
91.13
9 0.89
90.97
90.64
9 0.81
90.72
90.89
90.56
90.64
90.64
90.40
90.56
90.56
90.56
90.48
90.40
90.40
9 0.56
90.48
90.48
95. 13
95.29
95.54
95.54
95.70
96.03
96.03
95.29
94.31
93 .74
93.25
92.76
92.52
92. 19
9 /.95
9 /,70
91.62
91.46
91.38
91.05
90.89
90.97
9 0.81
9 0.72
90.89
90.81
90.72
90.56
90.56
9 0.48
90.48
90.40
90.32
9 0.48
90.32
90.56
9 0 .40
90.23
90.32
9 0. 15
90.15
9 0.23
9 0.23
90.40
90.32
95 .89
95.89
95.97
96. 14
96.30
96.54
96.71
95.81
94.50
93.85
93.36
93. 12
92.79
92.38
92. 14
92.05
91.89
9 /.65
91.40
9 /.40
91 .24
91.16
9 /.07
91.16
90.91
9 0.67
9 0.75
90.75
90.75
90.67
90.67
90.58
90.58
9 0.58
9 0.50
9 0.42
9 0.42
9 0.50
90.58
90.26
90.26
90.34
9 0.26
90.34
90.34
95.23
95 .31
95.47
95.72
95.96
96.04
96.37
95 .47
94.33
93.52
93.19
93,03
92.45
92 .29
92.05
9 /.88
91.80
9 /.72
91.31
91.39
91.15
91.07
90.74
90.99
90.90
90 .74
90.74
90.66
90.74
90.82
90.58
90.58
90.66
90.41
90.41
90.50
90. 17
90.33
90.50
90.33
90.50
90.33
90.25
90.25
90.50
Owing to the statistical design of the experiment there are two types of
replicated measurements. First, until the heating starts a t 3.3 sec, the
temperatures are the same (within experimental error) for all times and
thermocouples. Second, there are four sensors at both the heated and
insulated surfaces. The standard deviations of the temperatures until 3.0
sec for any given sensor is about 0.1 FO (OJ)6 K); the standard deviation of
the average temperatures of the eight sensors is also about 0.1 OF (OJ)6 K).
During heating and at the heated surface, the standard deviations of
temperature a t fixed times are almost 0.4Fo (0.22 K). These values of the
standard deviation provided estimates of the pure error.
Even though the temperature calibration for the data shown in Table
7.14 was carefully performed, there are slight differences in the initial
temperatures readings given by the various sensors. In an attempt to
reduce bias, corrections of  0.06, +0.18, +0.04, +0.03, +0.09,  0.09,
 0.06, and  0.17Fo were respectively added to measurements given by
sensors I through 8.
Another consideration in the design of the experiment was to achieve a
relatively large signaltanoise ratio while still not having such a large
variation in temperature that the parameters would change. A maximum
temperature rise about 15Fo (8.33 K) satisfies both conditions; using the
pure error standard deviation value of 0.1 FO (0.06 K), the maximum
signaltonoise ratio is 150.
96
308
94
...
92
306
o
90
88
304
86
TC 1. a t insulated
surface
302
84
301
o
3
6
9
'2 Time (Sec)
'5 '8
F Ipre 7.17 Typical temperature histories for the finite heatc:onducting specimens.
.
.....
N
._
N
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS
7.9.4.2 PhYJical M odel o f HeatConducting Body
" , EXAMPLES u nLlZINC SEQUEN11AL £St1MATION
always negative whereas those for k a re both positive and negative. Hence
the sensitivities are not linearly dependent and no difficulty would be
anticipated in estimating the parameters.
T he temperature in each specimen can be mathematically described by
(7.9.6)
i lT(O,/)
k
ax
= q(/),
a T(L,/)
=0
(7.9.7)
ax
(7.9.8)
T (x,O)=81.54°F
where k is thermal conductivity and c is the densityspecific heat product.
In both specimens the coordinate x is measured from the heated surface.
The specimen thickness L was I I 12 f (0.0254 m). T he parameters are k
a nd c. The heat flux q (/) was zero u ntil' = 3.3 sec and was zero after
, ... 18.6 sec; the value of q (t) in the interval 3.3 <; , <; 18.6 sec was 2.67
B tu/ft 2sec (30,300 W 1m 2 ). One can derive the exact solution of this
problem in terms of infinite series. However, P ROGRAM PROPERTY
(developed at MSU) which utilizes finite differences in the solution of
(7.9.67.9.8) was used to obtain the parameters.
The sensitivity coefficients shown in Fig. 7.18 were also obtained by
using finite differences as indicated by (7.10.1). Note that those for c are
7.9.4.1
Parameter EJtinuJtes
F or the data given, the ordinary least squares parameter estimates are
k . . 43.343 B tu/hrftF (75.014 W /mK) a nd c 55.596 B tu/ftlF (3728.6
kJ/mJC). I f the data for only part of the interval had been used, the k a nd
c values would have been modified by the additive changes Ak a nd Ac
given in Fig. 7.19. Note that the changes in k in the last third o f the time
interval is less than 1% o f the final k value; the c value changes about 2%
in the same interval. These small values suggest that the model is correct.
F or times less than 7 sec the corrections become very large which would be
expected since the sensitivity coefficients shown in Fig. 7.18 are approximately linearly dependent for such small times.
The residuals for all eight temperature histories are graphed in Figs. 7.20
a nd 7.21. Evidently, the measurement errors cannot be considered to be
independent. From reference 28, p. 97, the mean number of runs (changes
of sign plus I) of the residuals is a bout I I 12 for independent observations.
4r~~,~'~r,r',~'~.~,
2
3
4
0
....
....
c:
2,
,.,
2
0
.
1
U
;:
.;
0
II
U
...
;:
2
u
2
4
....

...
~
~
i
~
...
8
3
c:
4
II
5
~
.~
..
~
;;
II
'"
.:
ak
L
100
....... .....
0 ' _''..:.;,..._ _________  _M..;.;;::.~"".~_t 0
_____
. ..
\V·
..... ....
..
.... . . .... . ..
U
~
o
.
...
~
"'1
2
k
Ii'\
3
~ .
s
 10
t: 
. ...
I 1  .
c:
;:
\
...~
u
;:
6
'"
II
II
0
II
0
.....
c:
2
200
200
2
'/":"
~
•.•..••
.
300
3
LC~3~~6~9~I~2~15~~18~2~1~2~4~2~,~~~
I
0
3
6
9
12
IS
18
21
24
27
3D
3l
Tl,"~ (S~c)
F lpre 7 .'1 Sensilivily coefficienls for Ihe finile healco"ducling specimens.
Till!! (S~c)
F I..,.e 7.• 9 Sequential corrections to the paramelers for finite heatconducting u ample.

CHAPTER 7 MINIMIZATION O F SUM O F SQUARES F UNcnONS
7.9 EXAMPLES u nuZING SEQUENTIAL ESTIMATION
1.0 ~r'"T'.r....,
.5
il.5L'~~;__:20~~25;~
16
10
15
Time (sec I
F lpre 7.20 Residuals for thermocouples 14. wh ich are located at the insulated surface of
the specimen.
TIme (secl
F Ipre 7.11
specimen.
There are 89 observations for each thermocouple and the number of runs
varies from 4 to 23 with the average being 12.75. This value is considerably
less than 8 9/2 = 44 .5. For firstorder cumulative errors, the ellpected number of runs is about n J / 2, which is 9.4 in this case. Hence the measurement
errors would be more appropriately modeled as being cumulative rather
than as being independent. (An analysis for p similar to that in Section
6.9.5 modified for nonlinear parameters gives p::::::0 .87.)
I f the usual equation for estimating S 2 is used. we have
Residuals for thermocouples 58, which are located a t the heated surface of the
where n  89 a nd m == 8. T he resulting s value is 0.339, which is different
than the pure error estimates which were near 0.1 partly because the
residuals are highly correlated.
In finishing this analysis we should calculate the confidence region for k
a nd c. But since the confidence region is discussed above for other cases, it
is not given. Note, however, that various basic assumptions should be
checked first. The standard assumptions o f additive, zero mean, constant
variance, normal measurement errors seem reasonable. Also, the time and
location measurement errors are probably negligible a nd no prior information is used. The assumption of independent errors is not appropriate.
•
CHAPTER 7 M lNIMIZAnON OF SUM OF SQUARES FUNCI10NS
41.
These assumptions are designated 11101011. H a firstorder autoregressive
model is used, p is f ound to be a bout 0.95. Using this p, the covariance
matrix o f the errors '" is a pproximated by ~DT as indicated in Appendix
6A. F or this case in which OLS is used, a n a pproximate confidence
interval could be constructed using (7.7.9) a nd (7.7. lOa) a nd as discussed
following these equations.
A final c omment is m ade regarding the estimation procedure. D ata from
all eight thermocouples were used and eight sets of residuals were found.
Because the measurements a t x '" 0 a nd X " L a re both repeated four times.
the same parameter values would be found using the average values a t
e ach time a t these two locations. In other words, instead o f having m = 8,
we would have m=2. Justification for this simplification is given in Section
5.5.
7.10 S ENSmVITY COEFFICIENTS
411
S ENSntVrrY C OEmCIENTS
B ard [5, p. 118J gives a brief discussion leading him to recommend the
value o f
(7.10.3)
I
I
I
I
T here are several methods of determining the sensitivity coefficients. O ne
c an e valuate them by generating sensitivity equations by differentiation of
the model equations. O ne c an also use finite differences which are simpler
in terms of computer programming a nd a re usually adequate, although
there are cases when direct solution of the sensitivity equations is needed.
This section discusses both methods. Even though one may evaluate the
sensitivity coefficients utilizing finite differences, the study o f the sensitivity equations can yield insights into the sensitivity coefficients themselves.
7.10.1
7•• '
•
I
where f is the relative e rror i n the computed values o f S a nd where CJJ is
defined by (7.4.13). H e also recommends that lower a nd u pper limits b e
p laced o n the values given by (7.10.3) s uch a s IO'I~I < 8bj < 1 021bJ T wo
sources o f e rror contribute t o i naccuracies in approximating X!J(i): (1) the
rounding error when two closely spaced values o f " a re subtracted, a nd (2)
the truncation error due t o t he i nexact n ature o f (7.10.1). I n o rder t o
reduce the truncation error the constant 0.0001 in (7.10.2) is m ade small.
I f, however, it is t oo small then the rounding error becomes important. A
more accurate finite difference approximation is the c entral difference
scheme,
. " /( bl ,··· ,bJ + 8bJ'"' . ,bp ) 1J,( b l ,··· ,b  8bl" .. , bp )
J
XII (,)~
28b
(7.10.4)
~
U nfortunately this approximation requires almost twice a s m any values o f
'I, to be calculated as does (7.10.1). F or this reason the use o f (7.10.4) is n ot
r ecommended.
I f there is uncertainty d ue to the use o f (7.10.1) o ne c ould repeat the
final iteration by replacing (7.10.1) b y a backward difference a pproximation (replacing 8bJ b y  8bJ ) a nd then averaging the parameter estimates
from the forward a nd b ackward difference approximations.
Finite DIfference Method
O ne finite difference approximation for the sensitivity for the ith observation, j th parameter, a nd Ilh d ependent variable is the forward difference
approximation,
a'l/(i)
.
a n XII(')~ 'II ( b
Pj
l ,.··, b
j
+ 8bj'.'"
bp )  'II ( b l ,··· ,bJ'"'" bp )
8b
~
(7.10.1)
where 8bJ is some relatively small quantity. O ne possible value of 8bJ is
given by
7.10.1 SensitIvity Equation M ethod
When the model is a set o f firstorder ordinary differential equations o r a
p artial differential equation, the solution frequently c annot be written
explicitly. I n such cases explicit expressions for the sensitivities c annot be
given. O ne c an, however, derive sensitivity equations which c an b e solved
separately from the model.
Consider, for example, the model
(7.IO.S)
(7.10.2)
This simple procedure is frequently quite satisfactory. Note. however, that
if the initial value of bj is chosen to be zero o r if bj a pproaches zero (7.10.2)
c an lead to difficulty.
where k a nd c a re parameters. T he b oundary a nd initial conditions are
ax xo,
TI
a
qk
ax
TI
a
0
x L
'
T (x,O)
~
(7.10.6)
C HAPTER 7 M INIMIZATION O F S UM O F S QUARES F UNCTIONS
In this case T represents the dependent variable lJ . Differentiating (7. 10.5)
and (1.10.6) with respect to k yields
a2x, +  2T = cx,
a
a2
2
k
ax
O=kax'i
a x ..  0
+aTI
'
ax ..  0
ax
(7.10.7a)
at
 X'1
=0
a
a x X L
'
x, (x,O)=O
(7.1O .7b)
where X , == aT / ak. Notice that (7.1O.7a) contains a nonhomogeneous term
which is contained in (7.10.5). Solution o f (7.10.7) provides a solution to
the sensitivity coefficient for k. Unlike the use o f differences such as in
(7.10.1) which utilizes the program calculating 11, a special computer
program must be written for calculating X, using (7 .10.7); in many cases
this would involve a finite difference computer program. The sensitivity
equation for c is found by differentiating (7.10.5,6) with respect to c,
7.10 SENSITIVTIY C OEFFICIENTS
413
Insight regarding the sensitivity coefficients can be o btained from this
equation. F or example, the larger T (x,t) ~, the larger o n the average are
the magnitudes o f X , a nd X 2• (7.10.9) also suggests that both sensitivity
coefficients need not be calculated because T must always be calculated.
A final point is t hat in this case the parameters can be transformed so
tha t T is linear in one o f them. This is true for the parameters ( I = k / c a nd
K=k'; with these parameters the model (7.10.5) a nd (7.10.6) becomes
a2T =  T
a
2
ax
at
(7.10. lOa)
(1
aTI
q K= ,
ax ..  0
aTI
ax
=0
x L
a2x)
ax 2
q= _ ax)
ax
aX21
= 0,
a x x o
.N
....
c:..n
aX21
ax
,,L
= 0,
X 2 ( x,O)=O
(7.10.8b)
where X2 ==aT/ac.
F rom a knowledge of the solutions of the above equations one can
predict several results. First, the basic problem (7.10.5) a nd (7.10.6) is
linear a nd so are the sensitivity problems for X, a nd X 2. Second, X, is a
function of k a nd X 2 is a function of c. Hence T is a nonlinear function of
k a nd c. Third, in general X, a nd X 2 will not be zero a t the boundaries
x . . O a nd L. Next, in (7.10.8), the only nonhomogeneous term is aT/at
which, for monotonically increasing time values of T, will be positive; such
a term acts like a heat sink which will cause X 2 to be always negative
(provided T is m onotonically increasing). Fifth, the a2T/ ax 2 in (7.10.7a)
which is equal to ( k/c) aT/at simulates a heat source b ut the aT/ax a t
x =O term in (7.1O .7b) simulates a heat s ink a t x =O. Hence X , could be
either positive o r negative a nd would in general be smaller in magnitude
t han X 2•
A nother point relates to the relationships between T, 1";, X" a nd X 2• By
multiplying (7.10.7) by k a nd by multiplying (7.10.8) by c a nd then adding
the corresponding equations together, one can show that
T (x,t)  T, =  kX, ( x,t)  cX 2 ( x,t)
(7.10.9)
~
(7. 10. lOb)
T he sensitivity coefficient for K, d enoted X )' is found from a solution of
ax)
( 7.10.lla)
(1=
(7.1O .8a)
T (x,O)=
'
I
'
. 00
ax)
ax
I
at
=0
x L
x ) ( x,O)=O (7. 10. l ib)
'
Notice that X) is i ndependent o f K a nd hence is linear in K. By multiplying (7.10.11) by K a nd then comparing with (7.10.10), one can find
T (x,t)
~=
K X)(x,t)
(7.10.12)
a nd thus X ) ( x,t) c an be very simply found from T (x,t) using this
equation.
In summary, we have given two basic methods for evaluating sensitivity
coefficients when the model is n ot a n algebraic form b ut r ather involves
solution o f differential equations. I t is assumed that these equations cannot
be readily solved to obtain closed forms since otherwise algebraic forms
can be written. We imagine that these equations are solved numerically
using finite differences o r some other method. The first method o f evaluating the sensitivities involves only a computer program to approximate the
differential equations for the model; (7.10.1), which requires only dependent variable values, is used. In general this is the simplest a nd recommended method. The second method involves numerically approximating
the sensitivity equations which derived from the model. Examples o f
sensitivity equations are given by (7.10.7) a nd (7.10.8). Even though the
sensitivity equations are n ot solved, inspection o f these equations can
sometimes provide considerable insight a nd/or relations between the
sensitivity coefficients.
N
~
414
CHAPTER 1 MINIMIZATION OF' S UM OF' S QUARES F UNcnONS
4.,
P ROBLEMS
en _flEFERENCES
22.
I . Bard. Y.• "Comparison of Gradient Methods for the Solution of Nonlinear Parameter
Estimation Problems." S IAM J. Numrr. AMI. 7 (1910). IS7186.
2. Beveridge. G. S. G. and Schechter. R. S.• Optimization: n rory and Practicr. McGraw·
Hill Book Company. New York. 1970.
3. Levenberg. K .• " A Method for the Solution of Certain Nonlinear Problems in Least
Squares." Quart. Appl. Mat". 1 (1944). 164168.
4. Marquardt. D. W .• " An Algorithm for Least Squares Estimation of Nonlinear Parame·
ters," J . Soc. Ind. Appl. M at". I I (1963),431441.
S. Bard, Y., Nonlinrar Parameter Estimation, Academic Press. Inc .• New York, 1974.
6. Box, G . E. P.• " Use of Stitistical Methods in the Elucidation of Physical Mechanisms,"
Bull. Inst. Intern. Stat., 36 (19S8). 21S2SS.
7. Booth, G . W. a nd Peterson, T . I.. Nonlinrar Estimation, IBM Share Program Pa. No.
681 WI N lI, 1958.
8. Hartley, H. 0 ., "The Modified GaussNewton Method for the Filling of Nonlinear
Regression Functions by Least Squares." Tec"nometrics 3 (1961), 269280.
9. BOll, G . E. P. and Kanemasu, H., "Topics in Model Building, Part I I. On Nonlinear
Least Squares." Tech. Rep. N o_ 321. University of Wisconsin, Dept. of Statistics.
Madison, Wis .. Nov. 191~.
10. Davies, M. and Whitling. I. J .• " A Modified Form of Levenberg's Correction." in
Numrrical Met"ods for NonUnear Optimization. edited by F. A. Lootsma. Academic
Press. London. 1912. pp. 191201.
11. Ganant, A. R.• "Nonlinear Regression," A m Stat. 19 (197S). 7381.
12. Seinfeld, J. H. and Lapidus, L., Mat"tmatical Modtls in C"emical En,inrrri",. Vol. J.,
Proctu Modeling, Estimation and Identification, PrenticeHall, Inc ., Englewood Cliffs,
N.J ., 1914.
13 . Graupe. D .• identification o f Systems. Van NostrandReinhold Company, New York.
1971.
14. Jones, A.: " SPIRALa New Algorithm for Nonlinear Parameter Estimation Using
Least Squares," Comput. J . 13 (1970),301308.
IS . Beale, E. M. L., "Confidence Regions in Nonlinear Estimation," J. Roy. Stat. Soc. B ll
(1%0).4176.
Box, O . E. P. &lid HWlter, W. G .. " A Useful Method l or Mociel.BuildiD. ... T «"trIu 4 (1962), 301311.
Beck, J. V., "DeterminatioD 01 Optimum, Transient EXperiments for Thermal Contact
Conductance," Inl. J. Heal MtUS TrtllUfrr 11 (1969). 621~ll.
24. Beck, J. V., "Transient Sensitivity Coerficienta for the Thermal C ontact Conductance,"
Int. J. Heat MtUS TrlltUfrr 10 (1967), 161S1617,
23.
2S. Van Fossen, O. J., Jr., "Desian o f Experiments for Measurina Heat.Transfer
Coefficients with a LumpedParameter Calorimeter," N ASA T«1t. Note N ASA T N
0·7857, Jan. I 97S.
.
26. Burinlton, R. S. a nd May, D. C .• HaNIbooIr. Df Prohobility a NI S tlltutic, wit" Tabln. 2nd
ed .• McGrawHili Book Company, New York, 1970.
F amia, K.• "Computer·Assisted Experimental a nd Analytical Study o f T ime/Tempera.
tureDependent Thermal Properties o f the Aluminum Alloy 2024T3S I ," Pb.D. Thesis,
Dept. o f Mechanical Enaineerina. M ichipn State University. 1976.
28. Draper, N. R. a nd Smith. H., Applirt! Rqrrssion ANllysu, J ohn Wiley.t: Sons, Inc.• New
York,I966.
27.
P ROBLEMS
7.1
U sing OLS w ith t he G auss l inearization m ethod, e stimate
, ,377 / (1+ fJl) r or t he datA
I,
Y,
(a)
S tart w ith
fJ6
0.2S
ISO
O.S
90
0.7S
70
2
5S
i n t he m odel
1
30
fJ
20
b eing t he i nitial e stimate.
Aaswer. 6 .0648.
(b) S tart w ith f J1 b eing t he i nitial e stimate.
( e) S tart w ith f J12 b eing t he i nitial e stimate.
7.2
16. Gultman. I . and Meeter, D. A., "On Beale'. Measures of Nonlinearity," Trc"nomrtrics 7
(1965). 62~37.
17. Bllcon. D. W. and Henson. T. L., "Statistical Design and Model Building," Course
notes, Dept. of Chemical Engineering. Queen's University, Kingston, Ontario.
18. Boll, G . E. P., "Filling Empirical Data." A"n. N. Y. Acad. S d, 116 (1960),792.
19. BOll, G. E. P., "Some Notes on Nonlinear Estimation." Tech. Rep. No. 2S, University o f
Wisconsin, Dept. of Statistics, Madison, Wis .• 1964.
20. Ganant, A. R., "The Power of the Likelihood Ratio Test of Location in Nonlinear
Regression Models," J. Am. Stat. Auoc. 70 (1975). 198203:
21. BOll, G. E. P. and Hunter, J. S., "A Confidence Region for the Solution of a Set of
Simultaneous Equations with an Application to Experimental Design," BiorMtrilco 41
(1954), 190199.
7.3
7.4
U sing OLS w ith t he G auss l inearization m ethod. e stimate
m odel , , fJd ( I + fJzl) r or t he d ata g iven i n P roblem 7.1.
I
1
a nd fJz i n t he
( a) S tart w ith f Jtl77 a nd fJzS.8 b eing t he i nitial e stimates.
( b) S tart w ith f Jt300 a nd f Jz4 b eing t he i nitial e stimates.
(e) S tart w ith f Jt600 a nd f Jz8 b eing t he i nitial e stimates.
( d) S tart w ith fJ t  600 a nd fJz  4 b eing t he i nitial e stimates.
U sing OLS w ith t he G auss l inearization m ethod, e stimate ao a nd a. r or t he
m odel a nd d ata i n E xample S.2.1.
F or t he m odel T 8I.S+ 1 98.1e IJt, e stimate u sing OLS a nd t he G auss
m ethod t he p arameter fJ r or t he d ata g iven i n T able 6.2. U se a s t he d ata t he
Y, v alues r or o bservations S, 9 , 11, a nd 17,
AIIIWa'.
~
fJ.
7.49611 X 1 0·,
4 16
CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCflONS
7.5
7.6
7.7
7.8
Repeat Problem 7.1 using the Box Kanemasu modification.
Repeat Problem 7.4 using the Box Kanemasu modification.
Repeat Problem 7 .la using the sequential method of Section 7.8.3.1.
F or the model T= PI + P2e  P". estimate the parameters P I' /32' a nd 133 for
the Yi d ata given in Table 6.2. Use a computer program incorporating the
sequential method. Use O lS .
Answer.
7.9
4.,
PROBLEMS
Answer. b l 100.922, b 23.2877, P II 0.9744, P'2""O.024S, Pl l 0.OO7933, S .
. . 0.8502820.
M in
7.15 W hen there are large numbers of equally spaced measurements in the region
0"
I, the O lS sum o f squares function is proportional to the integral,
I"
1
2
1
S  0 ( Y'I) d l
100.60. 178.83. 0.000882.
F or the special case of Y  16 + 11 a nd ' 1 13 1 /6 + 13211, show that
For the model given by
N u= PIReP,+P,loa Re
estimate the parameters using a sequential computer program with O lS for
the data given in Example 6.2.2. (Do n ol take logarithms of both sides.)
7.10 Repeat Problem 7.9 using as the sum of squares function,
S =l:(I
where Z i'" I  Pi ' Use the method given in Section 6.8 to find the major a nd
m inor axes o f this curve. Plot for S I a nd 10.
7.16 F or the m odel",,,,pl/+exp(p2/) a nd the data
~J
I,
Y,
Using the BoxKanemasu approximation and the initial estimate of M (D)=
10, estimate M for the problem of Section 7.5. 1. Use a programmable
calculator or a computer.
7.12 Sometimes two functions are known and one or more parameters in one
function is to be adjusted to cause "agreement." An example is 14 tanh 14,
which is to be approximated by 14 2/( 1 + pu 2) for "small" values of U .
P~ ~.
(5S)I/4) provided S <0.67. I f S >0.97, what is the maximum value of
p2? W hat condition(s) must be given with respect to Pl t o insure that the
contour for fixed S is closed?
( c) Derive an expression for the PI value a t the end point(s) o f a n S contour.
(At the end o f t he contour, the discriminant is equal to zero.) F or S I,
show that the e nd p oint is P I'"  1.3898 a nd 132=  0.9144 .
( d) E xpand S for small values o f P2 t o obtain S~5( 1  PI + P1J2 a nd plot for
S  0.1. C ompare with the result o f p art (a).
(e) Show t hat IX T XI(2e P>(Ie P2»)2. F ind the location o f the 131 values
corresponding to the minimum a nd maximum values of IXTXI. Plot
IXTXI versus 132 for  1.5 < Pl < 4.
agreement in this case.
2
2
7.13 Another set of functions as in Problem 7.12 is 14 /[2(1 + /314 ») a nd 14/ 1(14)
/ 1 (14) where 1,(14) is a modified Bessel function. By using a Taylor series
0
show that "agreement" is obtained by letting P= i ·
7.14 T he following measurements and associated standard deviations are for the
model.
'I, = 13 I exp(  1321;)
Y,
fl,
0.25
44
6
0.375
28
2
0.5
20
Use maximum likelihood estimation to estimate /31 a nd /3 2 assuming the
standard assumptions indicated by 11011111 are valid. l et the initial values
of PI a nd /32 be 100 a nd 3, respectively. Use a sequential method of solution.
Also find the covariance matrix of the estimates and the minimum sum of
squares.
3
(a) Plot the c;ontour of S =O.I in the PI,p2 plane.
( b) Suggest a mathematical function which would be minimized to obtain
0 . 125
66
3
2
( b) F rom the discriminant in the e~pression for PI show that the P2 value
can be as small as  In(I+(5S)I/4) a nd c an be as large as  In(l
( a) By using Taylor series. show that " agreement" is obtained by letting
0
101
I
I
2
a nd using the O lS sum of squares function, show that for a fixed value of S
the value o ( PI is related to P2 a nd S b y
7.11
I,
0
Using the data o f t he preceding problem and the initial values o f 13, 1.5
a nd P2'" 1.5, estimate p, a nd P2 using the BoxKanemasu modification o f
the Gauss method.
7.18 Show t hat the sensitivity coefficients for the model
7.17
I +PII
.,, P2+PJ I
. ...
CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FlINCflONS
C HAPTER
8___________________
a re linearly dependent ror a certain combination or the parameter values.
(Hint: write the sensitivities in terms or z  fJ,t / fJ2 a nd R  fJl fJ2/ fJ,·!
W hat c an you conclude rrom this linear dependence? What new parameters
could be selected to eliminate the linear dependence mentioned above?
7.19
7.20
D ESIGN O F
O PTIMAL EXPERIMENTS
R epeat Problem 6.26 ror the model
U sing the d ata in Table 7.14 between 3.6 a nd 18.0 sec ror temperat~re
histories rrom thermocouple I (which is a t x  L) a nd thermocouple 5 (which
is a t x  0) estimate k a nd I I in the model given by (8.5.25) with T o 81.66°F,
L  I / 12 r. a nd q  2.67 B tu/rt2sec. Let t in (8 .5.25) be the times i~ T able
7.14 minus 3.3. In o ther words. 3.6 sec in Table 7.14 corresponds to time 0.3
sec in (8 .5.25). Use as initial estimates k  40 B tu/hrft of a nd I I  I f tl/hr.
(Be carerul with units.) Use OLS with the Gauss o r o ther method.
7.21
R epeat Problem 7.20 b ut use the average or temperatures I. 2. 3. a nd 4
instead or 1 a nd the average or 5. 6. 7. a nd 8 instead of 5.
7.n Derive (7 .5.16) and (7.S.17).
7.23 Derive (7.8.18).
7.14 Verify the sensitivity coerricient values given in Fig. 7.8 using the approximate equation. (7.10. 1). A programmable calculator o r c omputer should be
used. Investigate using values of 6bJ rbJ where f is equal to (a) 0.01, (b)
0.001. a nd (c) 0.0001.
8.1
I NTRODUcnON
Carefully designed experiments c an result in greatly increased accuracy o f
the estimates. T his h as been demonstrated by various authors, b ut special
mention should be made o f the work o f G . E. P. Box and collaborators.
See, for example, Box a nd Lucas ( I) a nd Box a nd H unter (2). An important
work o n o ptimal experiments is the book by Fedorov (3).
I n m any areas o f research, great nexibiJity is p ermitted in the proposed
experiments. This is particularly true with the present ready accessibility of
largescale digital computers for analysis of the d ata a nd a utomatic digital
d ata acquisition equipment for obtaining the data. This means that transient, complex experiments c an be performed that involve numerous
measurements for many sensors. With this great flexibility comes the
opportunity o f designing experiments to obtain the greatest precision o f
the required parameters. A common measure o f the precision o f a n
e stimator is its variance; the smaller the variance, the greater the precision.
Information regarding the variances a nd covariances is included in determination o f c onfidence regions. We shall utilize the minimum confidence region to provide a basis for the design o f experiments having
minimum variance estimators.
T he design o f o ptimal experiments is complicated by the necessity o f
a dding practical considerations a nd constraints. T he best design for a
..19