Back to Parameter Estimation in Engineering and Science Home
Table of Contents
CHAPTER ~ jr_______________________________________ .. -'1 ' a 'J' M INIMIZATION O F S UM O F S QUARES FUNCTIONS F OR M ODELS NONLINEAR IN PARAMETERS 7.1 I NTRODUcnON T his chapter is concerned with methods for minimizing a general sum o f s quares function when the dependent variable is nonlinear in terms o f the parameters. The function to be extremized is assumed to be known although selection of such a function is n ot a trivial matter and comprises the first optimization problem in parameter estimation. (Extremize means either maximize o r minimize.) The extremization o f the chosen function, the topic of this chapter, is the second optimization problem of parameter estimation. O ther optimization problems relate to optimal design o f experiments a nd optimal designs for discrimination between competing models. In engineering a nd science most phenomena are modeled using differential equations. The solution of these equations may be available in closed form o r may be obtained through the use o f finite difference o r finite element computer solutions. Regardless of the method of solution of a differential equation, the model is more often than not a nonlinear function of the parameters. The differential equations and boundary a nd initial conditions may be linear in the usual mathematical sense a nd still have a solution that is nonlinear in terms o f the parameters. See, for examples, the models in Section 7.5. l J4 1.1 INTRODUcnON 335 A p roblem can be either linear o r nonlinear. T he nonlinearity in one case can pose more difficulties in obtaining a solution than in another nonlinear case, however. T he extremum found for the sum o f squares function when the model is linear in the parameters can be proved to be the correct one for OLS estimation as well as for M L estimation provided the conditions 11-1011 (see Section 6.1.5.2) are satisfied a nd a unique minimu~ p oint exists. Complete a ssurance- cannot readily be given for nonlinear cases since there may be more than one extremum. I f o ne has reason to d oubt t hat the extremum found is the desired one, it is recommended that contours o f c onstant S b e plotted in· the region in which the solution is expected. This involves a n extensive search. Another possibility is to start the iteration procedure with different sets o f initial parameter values. I t should be noted that the same problems associated with iII-conditioning in linear problems also arise in nonlinear estimation problems. In linear problems the theoretical existence o f a minimum may be a mirage for iII-conditional cases, that is, those associated with relatively small IXTXI values; see Section 6.7.6. In such cases slight changes in the measurements c an cause large movement in the location o f t he minimum, resulting in large perturbations in the estimated parameter values. Because o f this sensitivity for ill-conditioned cases, convergence proofs for nonlinear cases m ay also be more academic than practical. Several simple optimum seeking methods are given in this section. In Section 7.4 the Gauss method is given, a nd i n Section 7.6 several modifications o f t hat method are described. Also in Section 1.6 a comparison o f several methods is given. Later sections discuss sequential methods a nd c orrelated errors. 7.1.1 Trtal and Error S earch One o f the simplest procedures for extremizing a function is trial and error. I t is quite inefficient a nd is n ot recommended. I t is easily described and understood, however. As for most nonlinear search procedures, the trial a nd e rror procedure starts with a set o f estimated values o f all the parameters. Let the initial parameter estimates be designated .,CO) a nd for simplicity let there be a n associated O LS s um o f squares (7.1.1) - For lOme c:ues converJence proof. are available but they are _ ny not practical t o implement for a number of different reaaonl. See Bard (S, p. '7~ 336 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS (A more general function is given in Section 7.3.) Ne1lt, a nother set o f p arameters b(l) is c hosen more o r less arbitrarily a nd S ( I) is c alculated. I f (7.1.2) then the combination o f p arameters given by b(l) m ust be " better" t han b(O); o ne might wish to select a b(2) to be a modification o f b(l) in the same m anner t hat b(l) was of b(O) . I f the inequality in (7.1.2) is reversed, then we would not proceed in the same manner. There are many possible strategies t hat c ould be employed to choose different b's. O ne is to fi1l all the b;'s e1lcept o ne which is varied until S reaches a relative minimum, a t which time this parameter is fixed a nd a nother o ne is v aried . This p rocedure usually requires considering further changes in each parameter a fter t he o ther p arameters have been reestimated. I n t he procedure outlined above, there is n o rule t hat m ust b e followed i n selection of new sets o f p arameters. Instead o ne c an try any set o f p arameters that seems reasonable. This c an b e d one interactively using a teletype computer terminal. In so doing o ne usually finds t hat this m ethod o f s olution is inefficient, time-consuming, a nd n ot p ractical. O ne c an, however, obtain a " feel" for the difficulty o f e1ltremizing a function, particularly when there is m ore than o ne p arameter. I I , 7.1 I NTRODUcnON • 7 S • 1 o ••4• 23•5 b (4). ..... - .j - . ....1 T he exhaustive search procedure is u ndoubtedly expensive b ut it does have the potential o f revealing local minima in addition to the global m inimum. This is i llustrated b y Fig. 7.1 b. T he e xhaustive search procedure is m ore likely to produce the global minimum t han s ome o ther schemes. Irrelevant local minima are encountered in p arameter e stimation b ut a re n ot c ommon. • 6 0.5 2.0 b F Ipre 7.1. Sum o f squares in an exhaustive search proc:edure s ' -global mfnfllKlm 7.1.2 Exhaustive Search A nother simple (and also inefficient) procedure is t ermed a n exhaustive search. T o illustrate this procedure consider the case o f only o ne u nknown parameter, fJ . T he " best" value of this p arameter is t o be associated with the m inimuql value of S. Instead of selecting only a n initial estimate, a region o f fJ is chosen in which region the minimum value o f S is e xpected to be. Suppose that the parameter fJ is k nown to be between 0.5 a nd 2.0. In a n e xhaustive search S is c alculated a t e qually spaced values o f b in this region (fJ is the true value a nd b is a n e stimated value.) Fig. 7 .la shows S for ~b intervals of 0.25. T he best be;) v alue as indicated by Fig. 7. 1a is b (3)= l . A m ore accurate value o f b c ould be found by conducting a n e xhaustive search with a smaller ~b in the reduced region between b (2) a nd 337 o b (bJ ' !pre 7. lb S with a local a nd global minimum. I t s h?uld also b e n oted t hat t he global m inimum m ay n ot necessarily be the deSIred o ne. F or e xample, a certain mechanical device m ay b e k nown to h~ve a ~atural fr~~u~ncy a bout I H z. This frequency is t o b e m ore preCIsely estImated u tdwng several measurements o f d eflection versus time o f the device. A local minimum o f S w ould be e xpected n ear I H z b ut the global minimum might o ccur a t a c onsiderably higher frequency. A nother e xample occurs when the functional form o f S i ncorporated in a c omputer pr~gram m ay hav~ a g lobal m inimum a t negative values o f p arameters whIch a re n ot phYSIcally possible. 7.1.3 O ther Methods T here a re m any o ther m ethods for locating extremes o f a rbitrary f unctions. These include direct search, Fibonacci search, gradient methods, r andom l JI C HAPlD 1 MlNlMIZAnON o r SUM o r SQUAIlES t tlNCIlONS search, Hooke-Jeev~s search, simplex exploration, a nd d ynamic program· ming methods. See a text o n optimization such as Beveridge a nd Schechter (2). Most o f these methods are much more efficient than the two methods just mentioned. Rather than giving many methods we shall emphasize one basic method and then suggest some modifications of it. This method is called the Gauss method. It has proved to be very effective for a large class o f different parameter estimation problems. 7.1 M ATRIX F ORM OF T AYWR SERIES EXPANSION 7.l SUM 0 ' SQUADS f tJNCIlON i ?cluded in (7.3.1). .W hen a single sensor is used t o obtain many observa. bons, we have a Stogie response case. I n this situation the observation vector Y a nd corresponding vector found from the model, " , a re n vectors. T he s quare m atrix'! is n X n. T he p arameter vector fJ c ontains p components as does p.; U IS a square matrix o f dimensions p x p. Much o f this chapter explicitly considers the discrete single response Extensions to multiresponse cases o f t he algorithms given in Sections 7.4 a nd 7.6 a re n ot difficult. Section 7.8 provides a sequential method for this case. The multi response case occurs when m ( > I ) measurements are taken a t n different times. The observation vector can be written as case. Y (I) , In the Gauss linearization method of minimizing a sum of squares function a Taylor series expansion o f a vector is needed. Let 'I b e an n vector a nd a function of the p parameters in the fJ vector. Let ' I have continuous derivatives in the neighborhood o f fJ - b. T hen the Taylor series for a point fJ near b begins with the terms 'Ie f J)- ,,(b) + [V ~'1 T (b) r (fJ-b) + . .. Y- Y 2(i) where Y (i)Yen) _ (7.3.2) Y ",(i) (7.2.1) where V~ is the matrix derivative operator defined by (6.1.21). 7.3 S UM OF SQUARES F UNcnON I n o rdinary least squares, weighted least squares, maximum likelihood, a nd m aximum a posteriori estimation, the sum of squares functions to be minimized are generally different. In some cases, however, the M AP sum o f s quares function reduces to that for M L which in t um reduces to that for O LS estimation. F or t hat reason a nd for economy in presentation a sum o f squares function is given that is appropriate for OLS, WLS, ML, a nd M AP estimation when appropriately specialized. The function that we consider in this chapter is s - [Y - : _ , ( i) Y Y(2) H ence the Y vector contains mn components. T he ' I vector c an be similarly defined a nd W is m il X mn. T he fJ, 1', a nd U matrices remain as given above. In some situations it is natural to consider continuous rather than discrete measurements in time. This may be either because the measurements are actually continuous o r because it is more convenient to analyze t hem as if they were. T hen we would replace (7.3. 1) b y s - folj(y( t) - 'Ie l ,fJ) ( W( I )[Y(/) - ,,(/,fJ)] d l + ( I' - fJ) TU( I ' - fJ) (7.3.3) where the time limits a re 0 a nd I f' I f the case being considered involves a single response, Y(/) becomes the scalar Y (/). F or the multi response case the Y(/) vector is 'Ie fJ) ( w[Y - 'Ie fJ)] + ( 1 '- fJ )TU( 1 '- fJ) - [V - ,,( fJ) ( w[Y - 'I( fJ)] + ( fJ- ,,)TU( fJ- 1') (7.3.1 ) Both W and U are weighting matrices which are symmetric; W is positive definite a nd U is positive semidefinite. I n many cases W and U will b e assumed to be completely known. The cases o f single a nd multiresponse for discrete measurements a re (7.3.4) a nd , ,(/.fJ) is similarly defined. In many algorithms to be given, the summations on a time index can be replaced by integrations over time. CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS O ne further modification (or interpretation, depending o~ on~'s viewpoint) of (7.3.1) is for situations in which the d ependent , vanables m the system model are not measured ~irectly. Th!s is a case discussed frequently in the systems literature. T o Illu~trate this cas~ assum~ t hat t~e system model is the set of nonlinear, fust-order o rdmary differential equations, i =f(t,x,l),u) (7.3.5) The dimension of the dependent variable x is r; f is some known function o f x, t, I) a nd u which could b e r elated to some forcing function o r c ontrol. It may be that all the components o f x a re n ot m easured directly. Instead some linear o r n onlinear function of the x may b e m easured, V =hx+£ or V =g(t,x)+ f (7.3.6) Here the dimension m o f the Vi c omponent o f the V v ector would be equal to or less than r. I n such cases the d ependent v ariable 1) in (7.3.\) could b e replaced by hx o r g (t,x). so t hat (7.4.1) set equal to zero a t I) = 7.4.1 Ii becomes (7.4.3) p Ii ~nf~~un~tely, we c annot easily solve for the estimator s ince a ppears Imphc.ltly m 1) ~nd X a s well as appearing explicitly. Suppose t hat w e have a n e stImate o f ~ denote~ b ~nd t hat 1) h as c ontinuous first derivatives in IJ a nd b ounded hIgher denvatlVe5 n ear b. T wo a pproximations a re n ow used i n (7.4.3). First, replace ~(Ii) b y X(b) a nd second, use the first two terms o f a T aylor series for 1)( I) a bout b. T hen (7.4.3) becomes XT(b)W[V - 1)(b)-X(b)(P-b)] + U ( p -b)- U(Ii-b)~O ~ote t~at t~is (7.4.4) Ii. equa.tion is linear i n I f ( I) 1) is 'not t oo f ar from being In IJ In a regIOn a pout t he solution t o (7.4.3) a nd if (2) this region Includes b, t~e v alue o f IJ s atisfying (7.4.4) will b e a b etter approximation to t~e. s olution (7.4.3) t han t hat p rovided b y b. Assuming these two conditIons to b e true, (7.4.4) is set e qual t o zero. In the interest o f c ompactness o f n otation a nd t o indicate a n i terative p rocedure let ~lnear bCk) = b, 7.4 341 7.4 GAUSS M EllIOD O F MINIMIZATION bCH I ) . .. 11, 1)Ck) =1) (b), XCk) =X(b) (7.4.5) G AUSS M ETHOD O F M INIMIZATION Using this n otation i n (7.4.4) set equal to 0 yields p e quations in matrix form for bCH I), Derivation O ne o f the simplest a nd most effective methods of minimizing the function S is variously called the Gauss, G auss-Newton, N ewton-Gauss, o r linearization method; we call it the G auss m ethod. It is a ttractive because it is relatively simple a nd b ecause it specifies direction a nd size o f the corrections to the parameter vector. T he m ethod is effective in seeking minima that are reasonably well-defined provided the initial estimates are in the general region o f the minimum. F or difficult cases (i.e., those with indistinct minima) modifications to the G auss m ethod discussed in Section 7.6 are recommended. A necessary condition a t the minimum o f S is t hat the matrix derivative of S with respect to I) b e equal to zero. F or this reason operate upon S using (6.1.30) to get V I/S=2( - VI/1)T(I)]W[V-1)(I)]+2[ - I]U(p-l) (7.4.1) (1.4.6a) (7.4.6b) whic~ is t he G auss l inearization equation. Iteration o n k is r equired for n onhnear models. F or l inear-in-the-parameters model n o i terations are required. N ote t hat f or." = XIJ, (1.4.6) reduces to the M AP e quation (6.6.6a) ~y s etting ~Ck) equ~1 t o zero. N o c onstraints a re i ncluded in (7.4.6). In uSing (1.4.6) I D n onhnear cases a n initial estimate o f I), d esignated b CO), is n eeded. With this vector 1) C a nd XCO) c an b e c alculated, which, in O) turn, are used in (7.4.6a) to o btain t he improved estimate vector b(l). T his c ompletes the first iteration. T hen 1)(1) a nd X(I) a re evaluated s o t hat b(2) c an b e f ound. T he i terative procedure continues until there is negligible change in a ny c omponent o f b; o ne c riterion to indicate this is Let us use the notation X( fJ) for the sensitivity matrix, (7.4.2) f or i = 1,2, . .. , p (7.4.7) CHAPttR 1 M INIMIZATION OF SUM OF SQUAR£S FUNCIlONS 1A GAUSS Mt:11IOD o r M lNlMlZAnoN '- ·'i. where 3 is a small number such as 1 0-·. In order to avoid embarrassment if hP') goes to zero, the quantity 8, is set equal to another small number such as 1 0-'0. When good initial estimates of the parameters are available a nd the experiment is well-designed, (1.4.1) is frequently satisfied by the seventh iteration. See Table 1.3 fO.r an example o f this. (The fact that (7.4.7) is satisfied does not guarantee that the last b(A:+ ') minimizes S, particularly when the minimum is ill-defined.) As a minimum is being sought, the function S should logically decrease from iteration to iteration. One might then include a check in a computer program to see if S(A:+ ') is less than S (k). I f it is not, the procedure could either terminate or the correction o f the parameters, h r + I ) - hr), could be decreased as discussed in Section 7.6. In some cases, however, a temporary increase in S could permit larger parameter changes to the region of the minimum and actually lead to more rapid convergence. Its sensitivity coefficients are X II - il'J1 ap i - exp( /J2'/)' X12 - a,,1 CJP 2 (7.4.11) The matrix XTWX is a symmetric matrix of dimensions p x p. Let C EXTWX (7.4.12) where the CI} element o f C is (7.4.13) I f the weighting matrix is diagonal, (7.4.13) simplifies to 7.4.1 Components of G....... Unearlzatlon Equation ~ Consider the sensitivity matrix as defined by (7.4.2); without showing the dependence on b (k), it can be written for a single response case as PI'I exp( P2'/)' XI) - I a", a", aD C I}- ~ wUXIIXq-l:wlI aD I -I . PI (7.4.14) PJ Let the matrix product X TW(y - IJ) in (7.4.6a) be designated H , T x- f~". x,,, a a/J, x~ ]_ ['1, . .. " .. ] a a/J, X. - a'll a/J, ch" a/J, (7 . 4.1~) which is a p X I vector and has a typical component o f HI - a".. aft, a"" aft, "" l": l": W,.X'I (Y. - 1 -1.-1 (7.4.16) ".) F or CJk) a nd H r), the quantities Xu a nd " . are evaluated with lJ-b(A:). F or the simple case of one parameter ( p - I) the iterative equation is (1.4.8) Hence the i j element of X (k) (7.4.17) is An initial estimate o f (1.4.9) p, designated h(f1), must be provided. E x.mple 7.4•• This definition o f X is consistent with the linear model. A simple example is " 1- p,XI , + P XI2 where XI} has the same meaning as in (7.4.9). A model 2 which is nonlinear in a parameter is (1.4. 10) Estimate /1, and /11 in the model , ,- /1, +(1 + /11ty ulinS +- [ II 1 J +-1- _I2 [ I 2 1 2, 2 3 o -I 2 -I -n' 1',- hr"-2, 1'1- hr-I, CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS ( fbis tf function is associated with cumulative errors in o -I r l .) 7.4 GAUSS ME11IOD OF MlNlMIZAnON the first iteration we calculate S(I) to be The data are I 8.5 3 Use the sum of squares function given by (7.3.1) in which tf - 1 is to b e used for W. Solution + [: This problem is solved using (7.4.6) in an iterative manner. Consider the first iteration. The sensitivity coefficients a nd values are r[ -I 2 -I _][ -~.251 ? I 1.25 =:'5 g ~][ - ~.5 ] - 2.6875 which is lower than S(O). ( If it were not, then a method given in Section 1 .6 might be used.) For the second iteration the calculations proceed in a similar manner. We have X TCD) = [ I I] ~o4 p (l)- .!. [ 31 30 - I Using this vector we rind XT(D)tf-IX(D) .. [~ a nd thus p -I ~ !][-1 -I 2 -I -n: ~ 1-[~ I~ r 0][ -.5 ]_[-2.25 1 -.25] 4 ] which results in is p - I(O)=XT(D)~-IX(D)+U=[~ I~]+[g ~] .. [~ P (O) = b(l)_ 2g] which has the inverse 0 [01 0.05 ] Another expression needed in (7 .4.6) is - II ] [I1.5 ]+.!.0-11 [3 3 - I ] [ - 0.25] _ [ 0.81667] I - 2.25 1.4333 The associated S value is S (l)-2.497059 which is, as it should be, less than S(I). The above results along with those of the third and fourth iterations are summarized in Table 1. 1. T he 4b;(k) expression means 4 b/k)_blk+I)-b/k). Notice that the results converge quite rapidly since the relative changes in both parameters by the fourth iteration are less in absolute value than 1 0- 4 (thus satisfying (7.4.7) if 8 is 1 0- 4). There are some changes in sign in the corrections, 4b/, but no instability is noted. XT(D)tf - I (Y _ ,.(0) + U( " _ b(D) .. III [ o 0 4 ][_~ 0 -I 2 -I 1 -2] . -:0]( 8.~=~ +[g]=[I~] Then we can use (7.4.6a) to complete the first iteration, b(\)-[n+[~ g . 05][1~]-[::~] ~ C t) ...... The initial sum of squares as defined by (7 .3.1) has a value of S (0) - 8.25 . Arter T able 7.1 k +1 0 1 2 3 4 SWIlIIIII}' o f CaIcuIadoas f or Example 7.4.1 bik+') 4bfk)/bfk) 4 W)/bik) bfk+') 2 I 0.816667 0.810561 0.810630 I 1.50000 1.433333 1.435251 1.435166 - 0.5000 - 0.1833 - 0.748 x 1 0- 2 8.56 x 1 0- 5 0.500 - 0.0444 O I34x 1 0- 2 . - 5 .85X 1 0- 5 S(k+I) 8.250000 2.681500 2.497059 2.496940 2.496939 t- . ~ 30M r 104 GAUSS M l:THOD 0 ' M lNlMlZA110N ! CHAI'1'£R 1 M lNlMIZAnON OF SUM OF SQUARES FlJNCI10NS c an b e w ritten a s I ~.:~ .. 7....3 CommeDt! o n G auss U neartzatlon Equation Several c omments a nd o bservations regarding the G auss l inearization e quation a re given in this section. ( a) By letting W "" I a nd V = 0, (7.4.6) provides o rdinary l east s quares e stimates. ( b) U the observation errors 2 s atisfy the s tandard c onditions o f b eing a dditive. zero mean. it is k nown w ithin a multiplicative c onstant a nd t he i ndependent v ariable(s) a nd IJ a re n on s tochastic (i.e., 11---011), t hen n onlinear G auss-Markov e stimation c an b e used by setting W = ( J - 1 a nd V -O. ( We a re using i t· a 2(J w here ( J is completely known.) ( c) U in addition t o t he conditions given in (b). t he errors a re n ormal. the assumptions a re d esignated 11--1011. T hen (7.4.6) provides a n onlinear M L e stimator if W =(J-I a nd U =O. ( d) U t he conditions in ( c) a re valid except there is p rior i nformation a nd a 2 is known. then a n M AP e stimator is provided by (7.4.6). S uppose t hat IJ is a r andom p arameter v ector with a m ean " p' c ovariance V. a nd n ormal p robability density. T he a ssumptions a re t hen designated 11--1112. By letting W -it- I • a nd V "'V- I • t he corresponding M AP e stimator is given b y (7.4.6). ( e) I n o rder to utilize (7.4.6) to estimate the parameters it is necessary t hat p -I have a n inverse. t hat is, P - I b e nonsingular. T hen its d eterminant m ust n ot b e zero o r "-"11' II f t I • I C jXjj-O j-' f or ; -1.2• ...• " f or a t l east o ne Cj~O (7.4.20) I f (7.4.20) is true. then & given b y (7.4.19) is e qual 10 z ero. T his c ondition o f l inear d ependence is a lmost s atisfied in m any m ore cases t han w ould b e e xpected. I n s uch c ases & is a lmost z ero; this is w hat is m eant b y iII-conditioning. See Section 6.7.6. I f (7.4.20) is almost satisfied, the s um o f s quares f unction S will h ave a u nique m inimum p oint a nd t hus a u nique s et o f p arameters. T he m inimum p oint will n ot b e v ery pronounced, however. As a n e xample c onsider t he s um o f s quares f unction S -(2- /11-e-flJ)2 + (3-2/1,- e- 2JlJ )2 which is p lotted i n Fig. 7.2 f or S - O,.I, I a nd 3 . T he s ensitivity matrix f or 3r---------------~----~~~----~~~--------- ( 7.4.18) i n t he region o f t he minimum. W e call this the identifiability c ondition. U t his d eterminant is identically equal t o zero, there is, in general. no u nique p oint a t which the m inimum o ccurs. A m ethod d oes n ot give the complete l ocation o f t he minimum when it specifies a single p arameter p oint while there is more than o ne p oint a t which the minimum occurs. S ince t he G auss m ethod will n ot yield a ny p oint in this case, o ne is alerted t o t he nonexistence o f s uch a point. F or least squares estimation W = I a nd V ... O. F or this case it is necessary t hat (7.4.19) i n the neighborhood o f t he m inimum o f S. T his d eterminant is shown t o b e e qual t o z ero in Appendix A if a ny c olumn o f X c an b e expressed a s a l inear combination o f o ther c olumns. This condition, linear dependence, -2~----~~----~-------L-------L------~--~ -2 -1 0 2 3 81 Fta-e 1.2 C ontoun of l um of Iquafel func:tiOD S -(2-Jl,-,"f+(J-2J1.-,·2IJf for several S "alues. CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS 7.4.4 the two points indicated in S is X= 7.4 GAUSS ME11IOD OF MINIMIZATION [~ which exhibits linear dependence for either /32=0 o r /32--+00. Since t he. minimum S value is a t /3 I= I a nd /32 = O. the condition of linear dependence is almost satisfied in the region of the minimum point. Along the long axis of the S =0.1 contour, the change in S is much more . gradual than in other directions. Contours that are long, narrow. a nd curvIDg such as S = 0.1 are frequently associated with near linear dependence or, equivalently, with !:J. being relatively small. Moreover. such contours are typ!cally associated with slow convergence of the Gauss method. F or thIS reason it is important to examine the sensitivity coefficients over the region of interest. F or M L estimation, it is necessary that Linear Dependence o f Sensitivity Coefficients As stated in the preceding subsection u nder p oint (e), t he function S f or LS o r M L estimation has n o u nique minimum point if the sensitivity coefficients are linearly dependent. I t h as been found from experience t hat difficulty encountered in convergence is frequently d ue t o a pproximate linear dependence. In most of these cases the sensitivity coefficients were n ot p lotted a nd e xamined beforehand. Indeed, in m any c ases·the user of a nonlinear least squares program may n ot realize their importance a nd thus n ot even examine them after lack of rapid convergence is apparent. F or effective nonlinear estimation, the careful examination o f these sensitivity coefficients is imperative. In o rder to demonstrate what should be inspected, the following discussion is given. F or single response cases with approximately constant standard deviations o f the measurements, it is convenient to examine (7.4.21 ) b ut ML wiJI not lead to a choice of W = ' " - I which would cause this determinant to be equal to zero if XTX is n ot equal to zero. Also if XTX is equal to zero, there is n o '" - I which will make (7.4.21) be true. (See Appendix A.) Hence the condition given by (7.4.19) is again the important one. For MAP estimation with W =", - I a nd U =V ii ' , it is possible that (7.4.18) may be true even if IXTXI =0. T hus if there is p rior information. the sensitivity coefficients may not have to be independent to permit estimation using (7 .4.6). ( J) When convergence to the estimates is a ttained. the matrix derivative of S goes to zero as indicated by (7.4.4). Since the same terms appear in (7.4.6), the p X I vector given next must also go to zero at convergence, (7.4.22) -,;. T his means that every component of this vector must be zero. F or cases when U . . O this results in each H, given by (7.4.16) being equal to zero. . Knowledge of this fact can sometimes aid in checking computer codes that a re n ot yielding converging solutions. ( g) Though (7 .4.6) has been obtained by using the linear approximation given by (7 .2. 1). the Gauss equation is not a rigorous first order approximation because a first order series was not used for X lk + I ). (7.4.23) N ote that Xij h as the units o f 1J. T hen the magnitude o f e ach sensitivity c an b e compared with the others as well as with 1J itself. F or multiresponse cases it is o ften more meaningful to plot /3 . a1J,,(;) X / ( ;)=_1_ ___ " O',,(i) a~ (7.4.24) which is dimensionless. ( Note; refers to " time," j to the parameter, a nd k to the response.) In Figs. 7.3 a nd 7.4 s ome sensitivities are plotted-versus the variable I ;. T hose in Fig. 7.3 a re linearly dependent b ut those in Fig. 7.4 a re not. The first nine graphs in Fig. 7.3 a re for two parameters; it is n ot difficult to see the linear dependence in the sensitivities in each case. N ote t hat the location of the Zero value on the X axes is n ot a rbitrary in most cases. The last three cases are for the three parameters being estimated simultaneously; the linear dependence is less obvious than for two-parameter cases. The importance of the zero value o f X is also shown b y Figs. 7.4a,h,c.d, a nd f which are n ot linearly dependent cases as drawn, b ut each becomes dependent if the zero is moved. W hat z ero location would d o this in each case? 350 CHAPTER 7 MINIMIZATION O F S UM O F S QUARES FUNCTIONS X Ik X'k , 00 ~ , , ~ 0 Xu 0 t, ( a) Xu , Xu t, ( b) L -----__ ( f) o ~------:-, L -___ ( g) o ~. , Xu X;j ( a) X Ik 0 t, ( b) t, ( e) , Xu t, o o~-~ x L ------..t, Ce) ( d) Cf) , . ........ • Xu , X'k t, ( e) 35. , 0 o t----'-- .. 7.5 GAUSS ME11fOD INVOLVING DIFFEREN11AL E QUAll0NS , XIk , t, , •• X'k ... ...... o .-.--,~--, X, L ----::..L.-_t, o ( h) o . •• , ~-----XIt 1-----~t, L -----_t, ( j) f Ipre 7.3 E umples ( k) o L-..!..!.._ _ ---.;~ t, 7.5 EXAMPLES T O ILLUSTRATE GAUSS MINIMIZATION ME11fOD INVOLVING ORDINARY DIFFERENTIAL EQUATIONS •• ... , . Xfj oL - - - - -__ ( 1) ( 1) or some linearly dependent sensitivity eoerricients. • ••••••• • ••• L_----XIt , X fk f Ipre 7.4 t, Examples o f some linearly independent sensitivity coefficients. equation for this case can be written as (7.5.1) 7.5.1 Fstlmatlon o f. P. ...meter f or. Long Fin Consider a long fin which has a temperature at its base, Z = 0, of 200°C and which is exposed to a nuid a t 100°C, see Fig. 7.5. The differential where M 2 is given by (7.5.2) J !l CHAPTER 7 MINIMIZATION OF SUM O F SQUARES FUNCTIONS T he p arameter M is n onrandom a nd there is no prior information. Using o ur notation, these assumptions a re designated 11111-11; t he value o f (12 n eed not be known to estimate parameters. {}:_:C_F_i"_ _L_h_ Fluid at T_~ 7.5 GAUSS M ElHOD INVOLVING DIFFEREN'IlAL EQUATIONS Table 7.2 Simulated Data for Fin Example l 00·C Figure 7.5 Geometry for fin in a nuid. h is the heat transfer coefficient, k is the thermal conductivity of the fin, A is the fin cross-sectional area normal to z, a nd P is the perimeter of A . T he boundary conditions are T(O) = To , T (oo)=T"" (7.5.3) Equations 7.5 .1 and 7.5.3 give a complete mathematical statement of the classical boundary value problem. The solution for T assuming constant M is (7.5.4) which contains the three parameters To, Too' a nd M. In a sense only M is a parameter since To a nd T"" can be considered " states"; that is, they are the temperatures at z = 0 a nd infinity, respectively. From examining (7.5.4), we see that T is linear in To a nd Too b ut nonlinear in terms of M . Each of the three parameters (To, T "", a nd M ) enters the problem in such a manner that all three could be found simultaneously, rather than only in certain combinations. On the other hand, h, P , k , a nd A can not be found independently but only in combination M . I f other boundary conditions were known, a different set of parameters might be found. For example, if the heat flux q a t z = 0 has a known value, the boundary condition is - kdT(O) =q (7.5.5) dz and the parameters M, k . a nd Too can be simultaneously estimated. Simulated data for this problem are given in T able 7.2 . The assumptions are Y;= 1";+f; V (Y.)=O , ; =2,3,4,5 E(f;£.i) = 0. ; -:j;) a nd ; .)=2,3,4,5 Position z (m) I 2 3 4 5 6 Temperature Y 0 0.125 0.250 0 .375 0.5 200 166 144 128 120 100 00 ( 0C) T he assumptions V (Y,)- V(Y6 )=0 m ean that To a nd T co .are the known values of 200 a nd 100°C, respectively. T he p arameter is M. F or the standard assumptions given above, O LS a nd M L estimation provide the same parameter estimates. The weighting matrix W in (7.3.1) is W = , ,,-' = ( 1-21 since (7.5.6) The iterative relation for finding M c an be found using (7.4.17) which can be written as M(k+ t) = M(k) + [ . ± , -2 ,X?)( Y 11 k») j[ ~ (X}k)tj -, (7.5.7) ; -2 where the sensitivity coefficient is (7.5.8) From a knowledge of heat transfer for this particular problem, a n initial estimate o f M c ould be given. M any m ethods are available, however, that use only the given d ata r ather than relying o n experience. O ne o f these is used in this example. Since only o ne p arameter is unknown, let us pick a single Y" Let us choose the value o f Y )= 144°C which is nearest the average o f To a nd T ",,' F or T = 144 a nd z = 0.25, (7.5.4) yields 1 44= 100+ ( 2oo-100)exp[ - M(O)(0.25)] 354 CHAPTER 7 MINIMIZAnON OF SUM OF SQUARES FUNCTIONS M which can be approximately solved for (7.5.4) gives the residual vector o f ( 0) = 3.28m - I. Using this value, 7.5 GAUSS M ElHOD INVOLVING DWRRENTIAL EQUAnONS A third iteration yields e(l).,. 166-166.36502501] [ -0.36502501] ( 0) = Y _ TO) = 144-144.04316545 = - 0.04316545 e -1.22925771 [ 128-129.22925717 120-119.39800423 0.60199571 The sum of squares associated with these residuals is S ( 0) = 2.008580086 using 0 2 = I .The sensitivity matrix X(O) a nd X T(O)X(O) a re X (O). aTO) aM = j - 8.29562813 - 11.01079136 [ - 10.96097166' - 9.69900211 ~'o!~286~581~4 ... 3.28 + .0275522 - 3.3075522m-\ T he associated sum o f squares is S (I)-1.700958954. N ote that M (I) - M fO ) M (O) = 0.0275522 3.28 =0.0084 which is small compared with unity and thus the initial estimate was very good. This can also be verified from an examination of the small residuals in e (a). O ne might desire to minimize S more precisely. Then using the above M (I) value we can find e (l). [ -0.13685534 0.25916365 - 0.92881366 0.86739235 j M m c o 3.3077428 + 826710692] 10.93520909 10.84830512 9.56630382 = 3.3075522 + 0.000 1906314 . . 3.3077428m - I M(l)-M(I) - 5.763x IO-~, . M (I) S (2) = 1.10094504 8.26690996] 10.93468804 [ - 10.84752977 - 9.56539220 3.3077433m-\ Several observations c an b e m ade from the above results. First, M seems t o b e converging rapidly. T he relative corrections are decreasing by a factor smaller than O.QI. T he s um o f t he components in e (2) is n ot zero, unlike a model containing a parameter which has a constant sensitivity vector. N ote t hat the residual values changed much more between iterations than the sensitivity vector (X) values. Another observation is t hat S is decreasing as the iterations proceed. In the example given above the initial i t ( 0) value is relatively close t o the converged value, resulting in only three iterations being required. N ot m any iterations are required, however, for a range o f initial values a s large as 0 to 10 ( or even - 3 to 10, as indicated by the i t ( 0) - 10 case) a s shown in Table 7.3. Eight o r fewer iterations were required t o converge t o within Iteration number 0 .I M (l)=3 3 075522+ 0.07570426 . 397.123747 - - S(3)-1.700945oo Table 7.3 X(l) _ [ - X U)... 0·:.~7:6 - X T(O)X(O) = I X/(O) =404.268514 Using these values a nd I X/0)ej(0)-11.I3849884 in (7.5.7) yields M(I) =3.28 + - 0.13527965 ] 0.26124785 [ - 0.92674605 ' 0.86921561 2 3 4 5 6 7 8 P anmeter Values as a Function o f I tentlon for Various Initial Estimates for Fin Example M(O)-O M(O)-6 M(O)-8 M(O)-IO 0.00 1.8186667 2.9666083 3.2883466 3.3076357 3.3077430 3.3077433 6.0 2.1484134 3.0970338 3.3001212 3.3077155 3.3077432 3.3077433 8.0 -0.0457249 1.782S492 2.9506108 3.2865352 3.3076195 3.3077430 3.3077433 10.0 -3.1824944 -1.1017732 0.8839079 2.45S4228 3.1918187 3.3053045 3.3077364 3.3077433 CHAPTER 7 356 MINIMIZATION O F S UM O F S QUARES FUNCTIONS seven significant figures . T he sum of squares for this example has the same shape as given in Fig. 1.7. In order to design a n e xperiment to obtain the greatest accuracy (minimum v ariance o f the parameters if there is n o bias), the sensitivity coefficients should b e p lotted a nd examined before the experiment is p erformed. As a result, one can more intelligently design the experim~nt in terms o f p lacement o f sensors a nd d uration o f t he experiment. I t is suggested in C hapter 8 t hat a reasonable optimal experiment criterion is t o maximize ~ = IX TX I for independent errors a nd subject to constraints o f a m aximum duration of the experiment a nd m aximum range o f the dependent variable. Figure 7.6 depicts the dimensionless temperature a nd dimensionless sensitivity coefficient for the example of this section. N ote t hat the sensitivity coefficient starts at zero a t z = 0, increases in magnitude until Mz = I, a nd gradually decreases in magnitude. T he ~ c riterion for the single parameter M is maximized by selecting the maximum magnitude values o f t he sensitivity coefficient. I f a single measurement is to be utilized in estimation, it should be chosen corresponding to a bout Mz = I which corresponds to the dimensionless temperature ratio of 0.368. Owing t o the flatness of the sensitivity curve shown in Fig. 7.6, little decrease in accuracy would result if the dimensionless ratio were chosen to be as large as 0.5 o r as small as 0.25. A Y, value corresponding to the dimensionless T r atio o f 0 .5 was chosen in the above example for obtaining the initial estimate of M = 3 .28. F or the more common case of many observations, see C hapter 8. 7.5 G AUSS M E11IOD INVOLVING DIFFERENTIAL EQUATIONS 7.5.2 357 Example o f Estimation of P arameters In Cooling Billet Problem A si~ilar. p roblem to the preceding o ne in terms o f the differential equation IS t hat. o~ c ooling a billet ( or · a ny object) that has a negligible temp~ra!ure vana!lOn t hrough it. T he t emperature o f the billet changes afte~ It IS. p laced 1 0 a fluid a t a d ifferent temperature. A n analysis o f a c oohng billet was also given in Section 6.2. ~et T be the temper~ture o~ t he billet a t a ny time t a nd let Too b e the fl~ld tem~erature. A differential equation describing the temperature in thiS c ase IS dT p cV- .ahA(T - T) dl 00 (7.5.9) where p is density o f t he billet, c is specific heat, V is volume, h is h eat transfer coefficient, a nd A is billet heated area. Various terms could be parameters, b ut t he most c ommon o ne would be t he h eat t ransfer coefficient. F or convenience, however, the factor hA / pc V is c onsidered as the parameter; several parameters may be f ormed from it also. . Three cases a re investigated below. F or e ach o ne the initial temperature IS To o r T (O)= To. Also, Too is considered to b e a c onstant. In the first case, let h A/pcV be t he c onstant p arameter p. T he s olution for T is then (7.5.10) : "nother c ase is for hA / pc V b eing a function o f time. A possible function IS T - T.. - ,::r " e -Hz (7.5.11) o .. H -~ 0.5 a(T - T..) 1 0-1.. aH " Hze -Hz a nd t he solution o f t he differential equation is ( (7.5.12) 2 3 4 A third possible model for h A/pcV is Hz FIIIUf"e 7.6 Dimensionless lemperalure a nd sensitivity coefficient for example o f Section 7.S.1. ~V - PI + Pz ( T- Too)" for n~O (7.5.13) .... . 00 CO 351 CHAPTER 7 MINIMIZAnON OF SUM o r SQUARES FUNCTIONS where n could also be a parameter. The solution in this case is - II" T (/)- Too T o-Too = e - fJ ,,[ 1 + ! 2(To- Too ) "(1- e- fll "')] PI for (7.5.14) n~O In each of the models given above. (7.5.10.12.14). T is nonlinear in terms of the parameters. Only for the last model of h. (7.5.13). was the differential equation nonlinear. I f the factor hA / p cV (or more specifically h ) varies during an experiment. (7.5.12) and (7.5.14) provide a number of competing models. For example. 112 a nd/or PJ might be set equal to zero in (7.5.12). In (7.5.14). P2 might be zero or n might be unity. etc. The sensitivity coefficients for the second model. (7.5.12). are found to be ('i) exp [- (P t+P2 T a T(/) X /(/)=--ap;-=-(To-T oo ) I 12 /2+P J I J / 3)] (7.5.15) where i -I for PI' etc. For the third model. (7.5.14). the sensitivities are 1.5 GAUSS ME'IlIOD INVOLVING D IFRREN'IlAL EQUAnONS F or estimating h the above models are superior to the power series model for T given in Section 6.2 because the above models utilize basic physical laws while the power series for T does not. Whenever possible, models based o n the physical mechanisms (called mechanistic models by O . E. P. Box) should be employed. A further choice based on physical arguments can be made between the power series in I given by (7.5.11) a nd' the temperature-dependent model for h A/pcV given by (7.5.13). In many situations the heat transfer coefficient does change with time because some relaled quantity is changing with time rather than the passage o f time per se. In the present model, h might change because a billet's temperature changes with time. Physically, the heat transfer coefficient h might account for heat transfer by both natural convection and radiation which could both cause h to be a function of T - Too' This suggests that the model (7.5.13) would be superior to (7.5.11). T o illustrate parameter estimation involving the above models, consider again the measurements given in Table 6.2. Results o f calculations are summarized in Table 7.4. Ordinary least squares was used with the 7J values being the T values o f (7.5.12) o r (7.5.14) a nd with T o· 279.59°F, the T at 1- 0, a nd Too -81.S°F. Models 1 ,2, a nd 3 are for h A/pcV given by (7.S.II) with Model I being fll' Model 2 being fll + fll/, a nd so on. Model 4 is for h A/pcV given by (7.5.13) with n~ I . Table 7.4 Estimation of P anmeter In Models for Cooling Billet Data" Model $No. of Parameten )] } (7.5.16) No. (7.5.17) - ( I - e - fJ,'" 1 2 3 4 Parameten bl 1 2 3 2 2.70882 2.90679 2.90824 2.13656 . b2 b) -1.39433 -1.41968 0.011344 0.0041327 R ( R/(n-p)JI / 2 38.73731 0.7896890 0.7892203 1.162777 1.6070 0.23750 0.24639 0.28819 ·Units consistent with time in houn. and where C is the expression in the brackets of (7.5.14). The h values in units of Btu/hr-ftl-OF can be found using the appropriate model (7.5.11) o r (7.5.13)] and multiplying by the p cV/ A value of 0.83432. (This means that hi o f Fig. 7.7 is 0.83432 times hi of Table 7.4). Resulting curves for Models 1 ,2. a nd 3 a re shown in 7.7. Also depicted are the Fig. 6.2 results for the temperature power series ·analysis. Notice that Models 2 and 3 results are almost identical so that Model 3 is not needed. The results of Model 2 a nd the power series model are very CHAPTER 7 M INIMIZATION O F S UM O F S QUARES F UNcnONS 2 .5 r - - - - r -......- ,---,r--r--:--,:--,--,--, models 2.4 ( t i s hours) •• ~ - .,- 2.3 ~. 2.2 ..., N .. ..... , 2.1 L. ~ ..... :> ... 2.0 .~ 1.9 1.8 0 Or---~--~--'----r---r---r--~--~ 0 U ;:: .... ... - 0.2 - 0.4 u (b; = .83432 bi ) ~.~ ~ ;; :; . ~ I II c::: en ---'." Tfme s tep fndex. 2 Tfme step index. 1 Figure 7.7 ... ..c..:::; ... 361 0 From polynomial model f or T. See Example 6 .2.4. CD ~ _L~~~mode~ _ _ 7.5 GAUSS M ETIIOD INVOLVING D IFFERENTIAL E QUATIONS Figure 7.1 Sensitivity coefficients for Model 4 of Table 7.4 for n -1. Parameter estimates for Models 1 .2. a nd 3 of Table 7.4 ( i-tJAt). similar. The constant h given by Model I does not apppea~ to ~e adequate from inspection of Fig. 7.7 because the other results are qUIte dIfferent a nd appear to be consistent with each other. Anoth~r a~gument that suggests that Model I is not adequate is the large reductIOn t n the sum of s~uares functions (38.7 to 0.789). This sum R also suggests that Model 3 IS not needed because the decrease is very slight between Models 2 and 3; further note that the estimated standard error· of th~ temperatures actually increases for Model 3. For further related discussIOn, see Example 7.7.4. Model 4 results given in Table 7.4 show a slightly increased value of R compared with Model 2 which also has two parameter~. Even though R for Model 4 is larger one mighl prefer Model 4 because It represents a more reasonable physical model as mentioned above. . An attempt to obtain simultaneous estimates of fit, f32' and n t n (7.5.14) with n = I initially was unsuccessful. A further calculation was perfor~ed for Model 4 to estimate just n with the converged values o.f f3 t and .f32 gIven in Table 7.4. The value obtained was \.000006. Since t~ls value IS ~~a~ly unity, the value previously used, linear. depe~dence t n the sensItIvIty coefficients is suggested. To investigate thIs prevIOusly u~sus~ected dep~n­ dence, the sensitivity coefficients were plotted as shown t n FIg. 7.8. NotIce 0ln Table 7.4 the value of n _ 16 was used rather than 17. the number of the observations. because the first value was used to determine To· that the sensitivity coefficients for n and 112 a re very nearly proportional, which tends to make XTX singular a nd thus the parameters very difficult to estimate simultaneously. Let us now compare from another point of view these results based on the solution of the describing differential equation with these from the power series method of Section 6.2. The results of this section required the evaluation of two nonlinear parameters whereas the Section 6.2 method requires the estimation o f five linear parameters. This is an illustration of the principle of parsimony which states that we employ the smallest possible n umber of parameters for adequate representation [22]. Also, note that the estimated standard deviation of the measured temperatures was O.243°F for the T series, 0.238°F for Model 2, a nd O.288°F for Model 4; the small differences between these values indicate that the mechanistic models are good in this case. After obtaining parameters using OLS o r ML, say, it is advisable to examine the residuals to see what assumptions seem to be valid regarding the measurement errors. The residuals for Model 2 are very similar to those given in the upper part of Fig. 6.1. Visual inspection of these residuals does not lead to a contradiction of the assumptions of additive, zero mean, constant variance, independent, and normal errors. Hence accurate parameter estimates would be expected with the least squares method used for this problem. T 362 CHAPTER 7 MlNlMIZAnON O f SUM O f SQUARES f UNCIlONS 7.6 MODIFICATIONS OF GAUSS METItOD T he G auss method has the feature of giving both the direction and t~e m agnitude of the change in the estimate o f the parameters. of each. ste~ t n the iteration procedure. Small changes in the parameters 1 0 the direction indicated by the Gauss method decrease the sum of squ~res. Occasionally, however, the size of the change indicated by the method IS so large that the successive estimates oscillate and, even worse, the procedure may be unstable. This can result from near-linear dependence of the sensitivit! coefficients a nd/or very poor initial parameter estimates. When the sensItivity coefficients are nearly dependent (which .mi~t b e t~~ed "overparameterization"), one should co~sider altematl~es 1 0 a ddition to other minimization procedures. O ne obvIous procedure IS to decrease the number of parameters being estimated. Another is to redesign the experiment so that the correlation between parameters is reduced. See C hapter 8 for optimal experiment d~sign. . A great many algorithms have been proposed to Improve the convergence of the Gauss method. Some of these may be termed modifications to the Gauss method whereas other methods some would call distinctly different methods' in the latter category are the Levenberg (3) a nd M arquardt (4) methods. We choose to treat these methods as modifications of the Gauss method, however. . . This section considers just a few of the possible methods. In thiS as 1 0 o ther iterative problems there appears to be no end to the possibi~ities; the ingenuity of various researchers evidenced by the numerous algonthms for this problem is impressive. For a survey, see Bard 11,5). 7.6 MODlflCAnONS o r GAUSS M£11IOD c ation to the Box-Kanemasu method that has been used b y Bard (5) a nd others is included. Since the linear approximation is valid over some region, a sufficiently small correction in the direction given by the Gauss method should improve the estimate (i.e., reduce S). Many methods have been proposed which use the direction provided by the Gauss method b ut modify the step size. We generalize (7.4.6) to (7.6.1) (7.6.2) where h (k+ I ) is a scalar interpolation factor. Note that this factor may be iteration-dependent. I f h (A:+I) is set equal to I, we have the Gauss method. In one class of methods, a search on h is performed to precisely determine the minimum S along the Gauss direction (5). Interpolation methods attempt to find good, acceptable values o f h (A:+ I ) without bothering to locate precisely the value associated with the minimum S value. O f the many methods possible, one of these is the halving a nd doubling method (6-8). T he modification that we describe utilizes an equation given by Box and Kanemasu [9]. T he modification is more general, however, since ( a) the sum of squares function given by (1.3.1) is used rather the OLS S function a nd ( b) a check for decreasing S is included. In the Box-Kanemasu method, S is approximated a t each iteration by (7.6.3) 7.6.1 Box-Kanemasu Interpolation Method Since the Gauss method depends on a linear approximation to " , in some nonlinear estimation cases the corrections can oscillate with increasing amplitudes a nd thus lead to nonconvergence. In this section we give the Box-Kanemasu modification of the Gauss method which may converge when the Gauss method does not. The Box-Kanemasu method does not, however, include a check that the sum of squares function S decreases from iteration to iteration. Bard (5) has made the point that all acceptable methods should ensure that S does monotonically decrease to a minimum. This is a reasonable requirement b ut in some cases it may lead to more calculations than without it; this is illustrated by Table 7.6 which is discussed later. On the other hand, this requirement might improve convergence in other cases. In order to ensure S continually decreases, a modifi- where aOt a i' a nd a z a re constants characteristic of each iteration. The value is taken where S given by (7.6.3) is minimized. A second approximation in this method is t hat IJ is given by h (A:+I) (7.6.4) A minimum of three conditions are needed to find the parameters a Ot a i' a nd az. O ne condition is to use the S value a t h - 0, t hat is, a t IJ = b(k); this S value is designated Sdk). A second S value, denoted S:k), is found a t h - a. Initially a is set equal to I . The third condition for finding the a /s uses (7.6.4) to find the derivative of S a t h - 0 a nd in the 4,b(k) direction. This derivative is (7.6.5) CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCIlONS 7.6 MODIFICATIONS OF GAUSS METHOD The matrix derivative Vp S is found from (7.4.1) to be s (7.6.6a) s o ( k) and the derivative of f j with respect to h is found from (7 .6.4) to be d fj/ dh = !:J.gb(lc) (7.6.6b) Then using (7 .6.6a. b) in (7 .6.5) yields ( dS/dh)lh_O= - 2G(Ic) G (It) (7.6.7) == [!:J.gb(k) JT[ XT(lc)We CIc ) + U (" - b(lc») ] (7.6.8a) (7.6.8b) Note that G(Ic) is a scalar so that it is also equal to its transpose. From the definition of G it can be proved that G ;> O. Using the three conditions for S yields The minimum S is located where the derivative of S [given by (7.6.3») with respect to h is equal to zero; it occurs at the h value of - a l /2a 2 o r (7.6.10) This h value is used in (7.6.1) to find the ( k+ I)st iterate for the b vector. The equation given by Box and Kanemasu (9) is obtained from (7 .6.10) by setting a = I . An equation similar to (7.6.10) is given by Hartley [8) a nd is attributed to Dr. K. Ruedenburg. There are some restrictions on the use of (7.6. 10). These relate to the possible values of a2. See Fig. 7.9. Three different cases are for Q2=0, a2 < 0, and a2 > 0 which are discussed individually below. In each case a condition suggested by Bard (5) is that (7.6.11) The parameter a is made sufficiently small for this condition to be a nd the satisfied. I f this inequality is not true for a = I. a is made inequality is checked again. Should the inequality require the investigation ! o 2 3 h F Ipre 7.9 Sum of squares venus the h parameter for the Box-Kanemasu method using (7.6.3) for approximating S. o f a values less than 0.01, say, the calculations are terminated. I t may be that the problem has been incorrectly programmed; for example, the sensitivity coefficients may be incorrect. I t is also possible that the sensitivity coefficients are nearly linearly dependent. In the Box-Kanemasu method, the inequality given by (7.6.11) is not considered and a is always one. The first case that we consider is for a2 =0. I f this occurs the S expression as given by (7.6.3) is a straight line which has no minimum at any finite value of h; see Fig. 7.9. Hence for this case we set h (Ic+I)=Aa where A is some constant equal to or slightly larger than unity; one possible value is 1.1. T he second case if for a2 < 0 which would cause h given by (7.6.10) to be negative. Again we set h(lc+ 1 )= A a. The third and most interesting case is for a2 > 0 as shown in Fig. 7.9 . For this case h is calculated using (7.6.10) provided h(1c+ I)..;; A a; if inequality is not satisfied, again we set h (Ic+I)=Aa. All three cases can be included by requiring that the inequality, S~Ic) ;> SJIc) - (2 - A - l)aG(Ic) (7.6.12) be satisfied in order to use (7.6.10). I f it is not satisfied, we suggest that h(lc+l) be set equal to A a a nd the calculation proceeds to the next iteration. A computer program flow chart incorporating the above constraints is CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNCflONS U M ODlnCATIONS OF GAUSS M£J1IOD F or t he B ox-Kanemasu m ethod. t he s ection o f t he n ow c hart in Fig. 1 .10 t hat is enclosed in a b ox w ith d ashed lines would be b ypassed a nd a w ould b e a lways unity. C alculate S ( k) a t !1.(k) C alculate 4 b(k) and G(k) E xample 7.6.1 For the model TJ - Iltl + exp( - Ill/) and the observations of 2 at 1 - I and 3 at , - 2, use the Box-Kanemasu method for one step starting at Il, - I and I ll- 2. Use OLS. S et a • 1 .0 and A • 1 .1 S olution C alculate Salk) a t b (k) + - G IS NEGATIVE 4 b (k) Q The S function for this is ~ase is shown in Fig. 7.2. F or OLS the matrix correction ~.b(·) 9'" For tl.e first iteration, k - 0 and XT is "0------ --- XT(OI_[ - exp/-biOI) Also I (XT(O)X(~ - I and XT('»(Y - - 2exP:-2blOI) ,,(~ J-[ -0.~353 are I I I I No I I r-----~------, I I Yes I Yes I ___ - ____ I : , P rint: P rint: AlPHA LESS THAN .01 a , S ( k), s ( k), , (k) r-----------------J-------, I : I , , , Tenllinate C .lcul.tlon XT(DI(y - ,,(01) - [ I - 0.1353 0.2086 ] 5 2 -0.03663 J[ 0.9817 J - [ -0.1529 J 0.8647 2.828 which give (~.b(O)JT - (0.4323 - 3.195). Hence the Gauss parameter estimates are 1.4323 and - 1.195 which results in an S value of 123.42. Such a large relative change In bl ( - 3.195 compared with 2.0) suggests that "overshooting" and nonconvergence might occur using the Gauss method. The method does converge for this example, however, as shown below. For the Bolt-Kanemasu method we check the inequality (7.6.12) for a - I. rr the number of observations n equals the number of parameters p , it can be shown that G (')- sdk). In this example n -p-2. Then using A - 1.1, (7.6.12) gives S/0)-123.42;> s dO)-(2- A -I)G(DI ,-------- --- - -- --- - ---- -- --FJpre 1.10 (X T(OIX(DI) - , _ ___ _ [ 0.0\ 966 1 0.05477 0.2086 Proceed t o n ext i teration on k . Flow c hart of a procedure using the BOll-Kanemasu equation. - 1.7113-(2-1.1- 1)1.7113- -0 .1556 so that (7 .6.\0) can be used to get the small value of h (l)- G(O)[ SIO)- S~O)+2G(O)r' =0.0137 given b y Fig. 7.10. I n a ddition to the two inequalities given by (7.6.11) a nd (7.6.12), there is a check o n t he sign of G. F rom t he definition o f G, it must be positive; thus if it is negative s omething is incorrect. I t is i mportant in such a program to calculate SJ t ) a nd s~t) c orrectly; the s ame w eighting W m ust be used a s is used t o e valuate 6 ,b(t) a nd t he term involving U in (1.3.1) m ust be included if it is a lso implied in the 6 ,b(t) c alculation. With this value we use (7.6.1) to get b(I)-b(O)+h(I)A b(O)_[ I J +00137[ 0.4323 2 . - 3.195 • J-[ UIOS9] 1.9563 which has an associated S value of 1.6645, a value lower than S(O). CHAPTER 7 MINIMIZA n ON O F SUM O F SQUARES FUNCIlONS F or the modified Box-Kanemasu method we use (7.6.11) to check if sf 0)= 123.42 is less than SJO)= 1.7113 . Since it is not, a is m ade equal to t a nd sl~ is calculated at blO) + t .:1,rb(O) to get S 11~2 "" 0 .0279 ; now the inequality given by (7 .6. 11) is satisfied. Hence set a = t. The right side of (7.6.12), which is Sdo, - (2 - A - l)aG(O)= 1.7113 - (2 - 1.1 - 1)( t (7.6.13) h (l)=Aa= I iI = 0.55 which gives S(l)=0.00884. I f we seek the location of the minimum S for this iteration, we get h = 0.5336 a nd S = 0.000862. I t is partially fortuitous that the modified Box-Kanemasu method happened to yield an h value so near this latter value. As a result, the modified B ox-Kanemasu method a t the e nd o f the first iteration is much nearer the minimum than the other two methods. A summary of the h and S values for the three methods for several iterations is given below. The modified Box- Kanemasu method converging most rapidly of the three methods was a direct result of the excellent choice of h ... 0 .55 in the first iteration . As is shown in Section 7.6.4.3, the same method is n ot most efficient for all cases. Iteration Number 0 I 2 3 4 6 8 10 15 7.6.2 h S 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.7113 1.234 x 102 4.559 5.520 x 10 - 1 6.19X 1 0- 2 5.5 X 1 0- 4 3. lxlO- 6 1.37 x 1 0- 1 1.4 x 1 0- 14 Box-Kanemasu Modified Box-Kanemasu h h S 0 .550 0.924 0 .955 0.951 0.944 0.942 0 .941 0.941 1.7113 8.7xlO- J 8 .8x 10 - 4 5.2xlO - s 3 x 1 0- 6 1.74 x 1 0- 1 I x 1 0- 10 6 x IO- IJ 2 xlO- 18 S 1.7113 0 .0137 1.664 0 .0208 1.595 0 .036 1.48 0 .077 1.25 1.48 x 1 0- 2 0 .638 0 .372 2.8x 1 0- 4 0 .444 6 .6x 10- 5 3.9xlO- 1 0 .933 Levenberg's function by using (7.3.1) with po being b (k), fJ b eing b (k+ I ), a nd U b eing replaced by Ml where 0 is a d iagonal matrix. U sing these definitions in (7.4.6) gives )(1.7113) = 0.779 is found to be greater than S m so that (7.6.12) is n ot satisfied. Consequently h(l) is calculated using Gauss 7.6 MODIFICATIONS OF GAUSS ME11IOD Levenberg Damped Least Squares Method Levenberg (3) tried to overcome the instability o f " overshooting" in the Gauss method by introducing constraints into the minimization of S . T he function that Levenberg considered was the O LS s um of square function plus a n a ddition term. By using a WLS function we can generalize T he effects of the 0 matrix are to reduce the size a nd t o change the direction of the step. Provided there is a unique minimum which is the only stationary p oint a nd the iteration procedure given by (7.6.13) c onvergences, the estimates found would be those sought. T he presence o f the ~o term tends to reduce oscillations o r instabilities particularly a s t he diagonal components o f ~o are made relatively large c ompared t o the diagonal terms in XTWX. Box a nd K anemasu (9) in describing the changes in the estimates o f the parameters in progressing to the minimum o f S s tate that the term ~Ib(k) - b(k+ I )fo(klb(k) - b(k + I») i ntroduces a spherical constraint that causes a spiral path. Levenberg proved that S d ecreases in the initial iterations if A is first large a nd t hen allowed to decrease (provided S does not have a stationary point a t b(k~. O ne r ecommendation o f Levenberg was to make 0 =1 Incidently if 0 =1 a nd ~ (7.6.14) is very large, (7.6.13) can be written as (7.6.15) which is c alled the m ethod o f steepest descent. T his method gives a direction for the step b ut n ot a s tep size. Since the step size is a rbitrary, this method can be very inefficient particularly as the minimum is a pproached. Hence the method o f steepest descent is n ot recommended. Note, however, that it does n ot require the inverse of a matrix as d o the Gauss a nd Levenberg methods. Concomitant with 0 = I, L evenberg suggested the two possibilities of ( I) letting ~ b e a c onstant value a nd (2) varying A a s the minimum is a pproached. O ne possibility is t hat the k th value o f A is (7.6.16) We shall call this the unsealed Levenberg procedure. Note that A( k) given by (7.6.16) goes t o zero as the minimum S is a pproached because each c omponent o f XTWe goes t o z ero a t t he minimum o f S . (See point ( f) in I 31t CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNrnONS Section 7.4.3.) As the minimum is a pproached S also decreases but the numerator decreases more rapidly. Box and Kanemasu (9) have made the point that the use of 0 = I in '. (7.6.1 S) results in the m ethod not b eing invariant u nder l inear t ransformations of the parameters unlike the Gauss method which is invariant. Another recommendation of Levenberg was to set 0 equal to the diagonal terms o f XTWX o r 0 ", = diag[ CII Cl l . .. C",] (1.6.17) whe~e Cjj. is given by ~7.4.13). T his choice for 0 has the effect of making the Ite~atlOn. problem Invariant u nder scale changes in the parameters 19). F or thIs chOIce o f 0 , the following expression f or" has been suggested by Davies a nd Whitting (10), (7.6.18) which also goes to zero as the minimum is a pproached. Using this expression a nd (7.6.17) gives what we call the scaled Levenberg method. When 0 is set equal to I a nd w hen" is given (10) b y ,,(A:).... 3eT(A:WX(.\:)n XT(A:We(A:) '" eT(A:)WX(A:)XT(A:)We(A:) (7.6.19) we term the associated procedure the modified Levenberg method. Though the Levenberg method a nd its modifications can remove instability a nd reduce oscillations, it also can increase considerably the number o f iterations in a given case. 7.6.3 MarquardC's Method A method similar to l evenberg's is the well-known method due t o M arquardt {4J. This method uses (7.6.13) with 0 given by 0 which is defined b y (7.6.17), b ut Marquardt uses a different choice for t than d oes Levenberg. Again if AO", is large compared to X TWX, the parameter correction is in the same direction as given by steepest descent which does not require that IXTWXI ~O. I t is for this reason that the Levenberg a nd M arquardt methods are helpful when XTWX is poorly conditioned a t the starting parameter vector but is b etter conditioned in the neighborhood of the least squares solution (Gallant ( II ]). Both methods also provide a compromise between the steepest descent a nd G auss methods with the initial iterations close to the steepest descent method a nd the final iterations close to the Gauss method. 7.6 M ODInCAnONS OF GAUSS ME11IOD 371 M arquardt p roposed what Box a nd K anemasu call the (A,,,) algorithm in which A(A:) is c alculated from (7.6.20) T he initial value o f AO, c orresponding to the first iteration, is then where .. is s ome constant greater than unity. This method supposedly possesses the virtues o f the steepest d escent a nd G auss methods where each is most effective. Marquardt's recommendations have been followed by many a nd h ave been incorporated in numerous c omputer programs. T hough the M arquardt m ethod has been widely used since the p ublication o f M arquardt's paper in 1963, there are still some unresolved questions regarding the effectiveness o f this method compared to others. This is discussed further in the next section. Moreover, Box a nd K anemasu [9] h ave presented a n analysis t hat indicates there is n o n eed t o c ompromise the direction o f a s tep to be between those given b y steepest d escent a nd G auss methods. They demonstrate this b y showing t hat the steepest descent a nd G auss vectors have the same direction if the parameters a re t ransformed into a linearly i nvariant metric. They also showed that the constrained minimization in this metric is merely equivalent to using a modification employing (7.6.1,2). Nevertheless, Gallant's observation given above regarding the value o f the M arquardt method when the initial X TWX matrix is poorly conditioned is valid. "'1-AoO",/" 7.6.4 C ompuhon 01 Methods I n a ddition to the modifications o f t he Gauss method given above, doubtless m any m ore will be suggested in the future. Moreover, many quite different approaches are presently available. These include derivative-free methods (5], quasi-linearization (12], stochastic approximation (13], a nd i nvariant embedding (13). F or o ur purposes, however, some interpolation scheme such a s t hat o f Section 7.6.1 c ombined with a sequential procedure (such a s given in Section 7.8) is usually adequate, particularly if the experiment is carefully designed a s discussed in Chapter 8. T he p urpose o f this section is t o provide comparisons o f several m ethods. Assuming t hat all the methods converge t o the same parameter vector, o ne o f the most important considerations in selection o f a m ethod is the relative computer time needed. In m any parameter estimation problems, the model is a set o f o rdinary o r p artial differential equations which may require time-consuming finite-difference methods for solution. I n such cases the time used in repeatedly solving the model is much greater 372 CHAPTER 7 MINIMIZATION O F SUM O F SQUARES FUNCTIONS I han t hat used in p erforming the other requisite c alculations in p arameter e stimation. Hence the n umber o f e valuations of the model gives a n a pproximate relative measure o f the c omputer time required for a p articular m ethod . A nother c riterion for comparison of methods is the power to solve difficult cases. Unfortunately many of the more powerful procedures c an t ake considerably more computer time t han a relatively simple m ethod s uch a s t he G auss m ethod . T he relative a mount o f time depends o n t he p roblem . I ndeed t here are cases for which the G auss m ethod d oes n ot c onverge. U sually a t the expense of more c omputer time a nd g reater p rogramming complexity. c omputer m ethods c an b e evolved to treat more difficult cases. Though there is a n a dvantage in having more powerful c omputer p rograms available, such programs c an n ever s upplant c areful design of experiments. Thus o ur t endency is to r ecommend m ore efficient though less powerful methods n ot o nly to save c omputer time b ut a lso t o e ncourage careful experiment design . In a case t hat is p oorly designed n ot o nly is the minimum S difficult to locate b ut the associated p arameter v alues are probably very sensitive to the measurement errors. G ood d esigns yield more accurate p arameter e stimates a nd r equire less c omputer t ime. O ne w ay to compare methods is to apply them to a large variety o f p roblems a nd to also vary the initial p arameter e stimates. S ome a uthors h ave d one this. Usually the cases are not r andomly selected. R ather t here is a tendency to use as test cases those that have S f unctions which a re d ifficult to minimize. O ne such case is given next. 7.6.4.1 373 7.6 MODIFICA1 10NS O F GAUSS ME11IOD w here XI is used r ather t han X I f or its g reater c onvenience. F rom (7.6.21) we c an o bserve t hat 7J is linear in P2 b ut n onlinear i n PI' A lso the sensitivity coefficients a s e xpressed b y (7.6.22) c an b e p lotted o r t abulated v ersus PI~I for ~2 e qual t o I a nd 2; see T able 7.5. T he i nitial PI a nd P 2 v alues c hosen were 300 a nd 6, respectively, a nd t he c onverged v alues a re 716.955 a nd 0.944469. T hen i n the vicinity o f t he m inimum, t he m aximum v alue o f PI~I w ould b e a bout 1500. N otice t hat i n T able 7.5 the X I a nd X 2 sensitivities for PI~I less t han 1000 a re a pproximately p roportional a t b oth ~2 . .. I a nd 2. T his m eans t hat t he m inimum o f S is p robably i ll-defined. T his is d emonstrated b y t he long n arrow valley shown i n Fig. 7 .lIa; s uch c ases might pose difficulty in convergence for the G auss m ethod. T able 7 .5 T able o f S ensitivity Coeffk:lents for 11 M odel Given by (7.6.11) €2"'" I (1==2 XI PI~I 0 2 50 500 1000 2000 3000 100,000 00 0 0 I Xl 0 .0238 .0453 .0826 .1389 .1775 .0827 .0476 .0909 .1667 .2857 .3750 .9524 .0453 .0826 .1389 .2041 .2344 .0453 0 XI Xl 0 0 .0244 .0476 .0909 .1667 .2308 .9091 I Box-Kanemasu Example T he m odel a nd d ata investigated by Box a nd K anemasu (9) a re 7) = /1 1P2~1 ( I + /11~1 + 5000~2 ) - 1 (7.6.21) A nother i ndication t hat t he m lmmum S is poorly defined c an b e o btained b y e xamining the (XTX) - I m atrix for the final b's; it is 6 ~I ~2 Y 2 2 0 . 1165 I 2 0 .2114 0.0684 - 2.57548 X 10 ] 2.79758 X 103 0.1159 2 T he i ndependent variables are ~I a nd f 2 • T he e rrors in Y are a ssumed to be additive. to have zero mean, to have c onstant v ariance. a nd to be i ndependent a nd n ormal. o r in s ymbols 11111-11. O rdinary l east squares a nd m aximum likelihood give the same p arameter e stimates in this case. T he sensitivity coefficients c an b e given by ~. /11 XI = I i; a7) ap , = 7) ( 7 )) /12 1 - P2; a7) X 2= aP = 2 7) /12 (1.6.22) T he c orrelation coefficient o f t he estimators is a pproximated b y C 12 /( C II C22)1/2 = - 0.997909. Since t he absolute value o f this n umber is very n ear unity, the two p arameters a re s hown t o b e highly correlated. Figure 7.11 b shows a c omparison o f results given b y Box a nd K anemasu (9) f or the M arquardt m ethod d escribed in S ection 7.6.3, the modified G auss m ethod i nvolving h alving a nd d oubling m entioned i n S ection 7.6.1 a nd t he B ox-Kanemasu m ethod o f S ection 7 .6.1. T he n umber o f times, n f' t hat t he function 11 h ad t o b e e valuated is p lotted v ersus ~ t he initial value 31~ CHAPTER 1 MINlMlZAnON OF SUM OF' SQUARES F UNct10NS 1.6 MODIFICATIONS OF GAUSS ME11IOD 315 .' 6 5 .... '" 4 ) 2 Contour for S-).82750 K 1 0-) ___ L _ _________________. Halving and Doubling Method / BoX-K.nemasu Method ( II' F 'lpre 1 ."1 Sum of squares contours for the e umple of Section 7.6.4.1. (Reprinted by permission of Prof. G. E. P. BOK .) o f). c hosen in the M arquardt m ethod . ( Note t hat t he n umber o f i terations is n ot e quivalent to the n umber o f f unction evaluations.) T he following conclusions c an b e d rawn from Fig. 7. 11 b : FIpre 1 .ttll Comparison of the Marquardt, halvin. and doublinlo and Boll-Kanamuu modifications of the Gauss method. (Reprinted by permission of Prof. G. E. P. Boll.) was not superior t o t he G auss m ethod with the modifications using (7.6.1,2), with h r eplaced b y 1 /( I + ).), ). b eing the value t hat M arquardt w ould recommend. 7.6.4.1. I. 2. J. T he B ox-Kanemasu m ethod is superior 10 a ny o f Ihose investigated. T he next b est method is usually the halving-doubling method. T he M arquardt m ethod is usually the slowest o f those considered. These conclusions should n ot be overgeneralized because only o ne p articular e xample was considered. F rom t heoretical considerations a nd from this e xample. however. Box a nd K anemasu f ound that M arquardt's m ethod Ao & rd COmptlmo.u Y . B ard (I) gave a n excellent survey o f 13 b est known gradient methods including several modifications o f t he G auss m ethod. Bard found t hat several modifications o f t he G auss m ethod were b etter t han the M arquardt m ethod. In his book, Bard (5, p. I IIJ a ppears t o f avor a m odirication o f the B ox-Kanemasu m ethod which he calls the interpolation-extrapolation m ethod . 376 7.6.4.1. CHAPTER 7 M INIMIZATION O F S UM O F S QUARES F UNcnONS Davies and Whitting Comparison I n the p aper b y Davies a nd W hitling (10) a c omparison is given o f five m ethods for seven p roblems c onsidered b y J ones (14). T he m ethods include the Levenberg p rocedures u sing ,\ g iven b y (7.6.16) a nd (7.6.18). T here was negligible difference in the r ate o f c onvergence between the two m ethods . T hey also used t he m odified L evenberg m ethod w hich used (7.6.19) for 0 = I a nd a nother e quation f or A w hen 0 = Om; a gain negligible difference were observed in t he r ate o f c onvergence. A nother m ethod ( SPIRAL) is d ue t o J ones (14). w ho e stimates t he L evenberg p arameter A i n a nother m anner. T he first six c olumns o f T able 7.6 c ontain t he results given b y D avies a nd W hitling (10). W e h ave a dded t he last two c olumns w hich were o htained u sing t he Box- Kanemasu m ethod ( a= I. always) a nd t he m odified version which r educes a if t he c ondition given b y (7.6.11) is n ot s atisfied. F or b oth m ethods s equential e stimation was used with t he i nitial P matrix being t he l arge v alue o f 10'1 ( or e quivalently the small U o f 1 0-'1); this reduces difficulties w hen XTX is singular. F or all p roblems e xcept P roblem 5, t he r esults of T able 7.6 i ndicate t hat t he G auss m ethod is g reatly s uperior t o t he M arquardt. S PIRAL, L evenberg. a nd M odified L evenberg m ethods. A w eakness o f t he G auss m ethod is i ndicated b y its inability t o t reat P roblem 5; this was caused b y t he XTX m atrix b eing initially singUlar. T he G auss m ethod w orks for P roblem 4 w hich h as t he s ame S f unction b ut d ifferent i nitial p arameter e stimates. T he u nmodified a nd m odified Box- Kanemasu m ethods ( with U = 1 0-'1) c ompare very well with t he G auss m ethod ( based o n t he n umber o f i terations in T able 7 .6) for P roblems 3, 4, 6, a nd 7 a nd a re s uperior for P roblem 5. O nly o ne i teration is n eeded f or P roblems 6 a nd 7 w hich has t he l inear model o f 1}j = f3,Xi' + f32Xi2 + f3JX i3' H ence t hese m ethods h ave t he v ery desirable c haracteristic o f n ot r equiring extensive iterations for l inear p roblems; n otice. in c ontrast, t hat t he o ther t echniques necessitated from 9 to 78 iterations. F or P roblem 4 t he m ethods u sing the B ox-Kanemasu e quation a re a bout e qual t o the G auss m ethod b ut m uch b etter for P roblem 5; all the o ther m ethods a re m uch less effective for P roblems 4 a nd 5. I t is s urprising t hat for P roblems I a nd 2 t he G auss m ethod c onverges in two iterations whereas all the o thers t ake m any m ore. F or t hese two p roblems t he S function is .t... o --:J T he two p roblems h ave d ifferent initial p arameter e stimates, p roblem I s tarting with f3, = - 1.2 a nd f3 2= I . R egardless o f t he initial p arameter ~ e r::$~~~~~ u ~ 8 . .l :! ,.... a.. r -r-O>\Q\QO> . ... - MM\O,..... - en e.8. . u J5 e o =' cl:z 377 371 .-,:~ CHAPTER 1 M INIMIZAnON O F SUM O F SQUARES F UNCllONS values, the Gauss method results in the correct PI a nd P2 values in exactly two iterations. The first iteration results in the correct value of PI b ut usually with S ( I) being much larger than S ( 0). In the second iteration in the Gauss method S is reduced to zero. From this example (as in others) we see that S ( I) being larger than S ( 0) does not necessarily mean that the Gauss method will encounter difficulty in convergence. In fact, in this problem the other methods were much less efficient. In conclusion, although the Gauss method is very competitive in many problems, it may not work because XTX is temporarily singular. F or t hat reason we suggest that the U be made equal to C I where C is a very small value compared to the diagonal terms of XTX (or that P (0)= K I be used with large K in the sequential method as discussed in Section 6.7). I n a ddition the Box-Kanemasu method is recommended in o rder to be more certain of finding the parameter values minimizing S, even though in some cases the Gauss method would be more efficient. In cases when the Box-Kanemasu method does not converge, then the modification of it that we have suggested is recommended provided one is convinced that a unique minimum point exists. 7.7 MODEL BUILDING AND CONFIDENCE REGIONS Most of the developments discussed in this section utilize an assumption of a multivariate normal distribution of errors. In particular this assumption is used in determining confidence intervals a nd regions for parameters, locating confidence intervals for Y; a nd applying the F test for model building. The assumptions used in this section include additive, zero mean, normal errors in Y. The independent variables are assumed to be errorless. The assumptions are designa ted 11-- 1-11. Unlike linear estimation, the below expressions are all approximate, with the approximation being better for cases which are less nonlinear than others. For a measure of the nonlinearity, see Beale r15) a nd Guttman a nd Meeter r16). The expressions may be approximations due to ( I) the sensitivity coerficients being functions of e stimated parameters, (2) linear approximations to derive equations such as (7.7.1), and (3) the usual estimates for 0 2 . 7.7.1 Approximate Covariance Matrix of P arameten T he approximate covariance matrix of the parameters has different forms depending upon the method of estimation. In general, the expressions are similar to comparable linear estimation cases. 1.1 MODEL BUILDING AND C ONnDENCE REGIONS 319 F or o rdinary least squares estimation with the assumptions denoted 11----11, the approximate covariance matrix of bLS is analogous to (6.2.11), (1.7.1) where X is the sensitivity matrix which is a function o f bLS a nd where ", . . E ( /tit T) . (Recall that in OLS estimation using (7.4.6) we set W = I a nd U ... O.) With the additional standard assumptions of uncorrelated a nd c onstant variance measurement errors a nd a 2 being unknown (1111-011), the estimated covariance of bLS simplifies to • 2 T • ( V-V) ( V-V) s::::::------ n -p (7.7.2) When maximum likelihood estimation is used a nd the measurement errors are normal, the approximate covariance matrix of b ML is (7.7.3) The assumptions are 11-1-11 (see Section 6.1.5). I f ' " is known to within the multiplicative constant, 0 2, t hat is, "''''' a 2 0 (11--1011), then cov(bMd::::::(XTO-IX) - ls2 where S2 (7.7.4) c an be estimated using 2 ( V-V{O-I(V-V) s::::::---~---- n-p (7.7.5) 7.7.1 Approximate Correlation M atrix The approximate OLS a nd M L correlation matrices can be obtained using the above equations. T he i j element of the correlation matrix is given by rq - P/I ( P/iPjj ) -1/2 (7.7.6) where Pq is a term o f (7.7.1), (7.7.2), o r (7.7.3) depending on the case. The diagonal elements of r are all unity a nd the off-diagonal element must be in the interval r- I, I). I f ' " is known only to within a multiplicative constant 0 2, the 0 2 value need not be known for r since it cancels in (7.7.6). Whenever all the off-diagonal elements exceed 0.9 in magnitude, the estimates are highly correlated a nd t end to be inaccurate. Bacon (17) has suggested that when this is true, a simpler model form than the one - j' ( CHAPTER 7 MINIMIZATION OF SUM O F SQUARES F UNcnONS 7.7 MODEL BUILDING AND CONFIDENCE REGIONS 381 1 00(1- a) % c onfidence interval originally proposed may be appropriate. A poor experimental design m ay also be responsible for the high correlations. In that case it is r ecommended that the sensitivity coefficients be examined a nd t hat the experimental design strategies discussed in C hapter 8 be employed. Box (18,19] has shown. however. that high correlations among the parameters c an be d ue to a large extent to the nature o f the model itself a nd t hus n o e xperimental design could be expected to yield uncorrelated p arameter estimates. See Problem 7.18. A rule of t humb for anticipating the inaccuracy in calculation is given by G allant [ II]; he suggested that difficulty in computation m ay b e e ncountered when the c ommon logarithm of the ratio o f the largest to smallest eigenvalues of r exceeds one-half the number of significant decimal digits used by the computer. See also Appendix A.4. bk ± est. s.e.( bk )t'-o/2( v ); k -= 1 ,2" . ...p (7.7.7) for a nyone o f the parameters where (7.7.8) a nd v is the n umber o f degrees o f freedom. frequently n - P. a ssociated with the estimate of 0 2• T he t erm PkIc is the k th d iagonal term o f P a nd represents the variance o f bk' F or t he assumptions indicated. (7.7.2) is used to find PkIc for O LS a nd (7.7.4) for M L e stimation. T he expression given b y (7.7.8) is a pproximate n ot o nly because s 2 is used for 0 2 in (7.7.2) a nd (7.7.4). b ut a lso because the sensitivity coefficients a re a pproximate since they are functions o f b. Example 7.7.1 Example 7.7.1 Use the Gallant criterion for the Box-Kanemasu example of Section 7.6.4.1. r Solution The 90% confidence interval is to b e found for each of the parameters of the model and for the data F or that example r can be written ,u] I I Y where ' u - - 0.997909 Approximate Variance o f Solution The sensitivities. sensitivity matrix. and XTX are Y A pproximate variances for the predicted values of the model Y for O LS a nd M L a re given by expressions similar to (6.2. 12) a nd (6.5.6). 7.7.4 I 3 Assume that the standard assumptions designated by 11111011 arc valid. The minimum sum of squares is 0.5 a nd occurs a t b ,-1.5 and b2 - ln2-0.693. Using ( d) and (e) of Example 6.8 .1 gives A I" I - '12 and A2 - I + '12' The Gallant criterion then is 1 0g(1- 'Iz)/(I + '12») "" 2.98 . Hence in order to solve the Box-Kanemasu data. the Gallant criterion indicates that at least six significant figures be used in the calculation. 7.7.3 °° I2 X T(b)_ [ I I °° Owing to the standard assumptions. OLS and ML give the same estimates and approximate covariance matrix of the parameters. Using (7.7.2) gives Approximate Confidence Intervals and Regions As frequently happens for nonlinear problems. there are several approaches for providing approximate confidence intervals a nd regions. S ome o f these are discussed below. Consider first calculation o f a pproximate confidence intervals. Use the linear approximation given by the Taylor series of (7.2.1) (which is used in the Gauss equation); then. analogous to (6.8.4). we have the approximate 2] 3' '. 1 -6 p~ 54-36[ 9 ! - 6]0.5 _ [ 6 I -I "6 From I i 'J tables we get t.9s(l)-6.31 for 90% confidence (see Table 2.15). Then. for r ,J c> :> CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNcTIONS 311 . 1.1 MODEL BUILDING AND CONFIDENCE REGIONS used a long with (6.2.11) a nd (6.5.S). T he approximate confidence contours given by (7.7.9) are ellipsoids. A more natural confidence region is found using a likelihood ratio (20). It is the nonlinear analogue to the F statistic given by (6.2.24) 90% confidence intervals, (7.7 .1) and (7.1 .8) yield or b ± (1/6)1/2(6.31) - 0.693 ± 2.58 2 Generally confidence inten)Q/s provide poor approximations t~ the c~nfi. . . d ence regIOn even for II'near cases . This is even more true for thiS n onhnear .S s o · In addition to the apprOlllmatlons that are present case, as I h wn below . . thOIS examp Ie , the reader should be aware that thiS .example was con10 . I' . s tructed for pedagogical simplicity and not becau~e It I~ a rea IStlC example. We can expect accurate results from the eq~atlOns gIVen above for real cases only when the assumptions are nearly vahd a nd when the number of observations I I is large. . . Gallant has given a Monte Carlo study [ II) for a certain nonlinear model with four parameters that showed small differences using the above method for determining the confidence intervals compared to a more exact method (which involves the likelihood ra~io): He indi~ated, however, ~hat w hen P is an ill-conditioned matrix the hkehhood ratio method can Yield considerably shorter confidence intervals than those using the above method, which is based on the asymptotic normality .of the ~eas~ squares estimator. We will describe this more powerful (but I n a pphcatlon more time-consuming) procedure in relation to the confidence regi~ns. . L et us now consider two methods of finding confidence reglons. Again a n umber of standard assumptions are needed including knowledge o f the probability density which we shall as~ume to be ~ormal. Assume t~at the conditions denoted 11--1011 are vahd. Then uSing a Taylor senes approximation for 1J the 100(1 - a)% confidence region for bOth O LS a nd M L estimation is given by [ S(P)-R]/p R /(n-p) ~FI_.(p,n-p) (7.7.11 ) where p is present instead o f q because a confidence region is needed for all the parameters. ( R is the minimum S .) Usually (7.7. 11) does not p roduce ellipsoids in p arameter space. T he c ontours a re along the lines of constant likelihood ratio b ut t he confidence level I - a is approximate (15,16). T he a pproximation enters in (7.7.11) because the numerator a nd d enominator n o longer contain independent X2 distributions. I f we are dealing with a model which is nonlinear in the parameters, computation of confidence contours using (7.7.11) is more difficult a nd time-consuming than that using (7.7.9) in which a linear approximation is incorporated. F or example, if there are two parameters, a separate numerical search problem is involved to obtain PI for each choice o f P • 2 Example 7.7.3 The 50, 75, and 90% confidence regions are to be found for III and III for the model and data of Example 7.7.2 . Use both methods given above. Solution Method Based 011 ( 7.7.9) For this method ( peu)-I is XTX since 0 -1 for the assumption given. The value of ,2 is R /(II-p)-0.5/(3-2)-0.5. Then using the XTX matrix of Example 7.7.2 in (7.7.9) gives (7.7.9) ( a) where for OLS and M L estimation ( pe)-I is (7.7. lOa) (7.7. lOb) In (7.7.10a) the sensitivity matrices are functions o f bLS wher~as those in (7.7. lOb) depend upon bML. In giving (7.1.9, 10), results of Section 6 .8.3 a re Finding the confidence region using this expression is readily accomplished using the results of Example 6.8.1. Note that C of that example is the square matrix in ( a) given above. From ( d) and (e) of Example 6.8.1 the eigenvalues are found to be AI - (15 - (153)1/1)/2 - 1.31534, A2 - [15 + (153)1/2)/2 - 13.68466. Then using ( g)-U) of Example (6.8.1) gives e - [ -0.788205 0.615412 0.615412 ] 0.788205 CHAPTER 7 MINIMIZATION O F SUM O F SQUARES F UNcnONS Since e is orthonormal and since p S2= 7.7 MODEL BUILDING AND CONFIDENCE REGIONS inaccurate or ill-determined. I f the confidence region were sufficiently small so that I. we can write analogous to ( I) and ( m), (hi - fJl )maj = ± FII~~ (2, I )eUAI- 1/2 = ± 0.68726FII~~ (2, I ) and ( h 2 - fJ2 )maj = + F"~~ (2. I )e 12 AI 1/2= + 0.53660F"~~ (2, I ) - we would say that the parameters are well-determined. F rom the above equations the ratio of the major to minor axes of the ellipses is found to be ( A2/"I)I/2_3.23 which indicates that the ellipses are neither extremely narrow nor approach a circle. The major axis forms an angle of t an-l(eI2/e22)- 37.98° with the b l - fJl axis. Along this angle the S function varies most slowly. Another way of thinking about the major axis is to note that it can be described by (hi - fJl )mon = + FII~~ (2, I )e2IA2-1/2 = + 0.16636FII~~ (2, I ) (h2 - fJ2 )mon = ± FII~~ (2, I ) e Il A2 1 2= +0.21307 F"~~ (2, I ) -/ The three confidence regions can be found by using these equations with the F values which are F ~o(2, 1 )= 1.5, F75(2, 1)=7.5. and F.9(2.1)=49.5. They are shown in Fig. 7.12 as dashed ellipses that are centered at fJ 1= 1.5 and fJ2 = 0.693. Notice that the 90% confidence region extends as far as (hl-fJl)maj= ±.68726(49.5)1/2= ± 4.84 a nd (h 2-fJ2)maj= ±3.78. Since 4 .84/1.5=3.22 and 3.78/.69~=5.4~, the confidence region is relatively large and the parameters must be conSIdered 5 3 Likelihood r atio and approximate contours, 50%. \ ) 8/3.- -0.78088/3. or 8/32 +0.78088/31 = 0 confidence region depends on both. F or related discussion, see reference 21. S=2 \ 2 12 e22 this means that we can estimate the sum /32+0.78/3. more accurately than /32 o r /3 •. O ne should distinquish between the ill-determined condition provided by the confidence region and difficulty in convergence indicated by high parameter correlations or equivalently by a large value of the Gallant criterion. Difficult convergence is related to the character of XTX (and thus near linearity of sensitivity coefficients), but not to the accuracy of the measurements (the .r 2 value). The .... / I \ 4 e 8fJ2= ( - M ethod Based on (7.7./1) \ F or this equation the S function is needed; it is ( b) 82 0 Because of the relative simplicity of this function, we can solve for exp( /3~ in terms of /31 a nd S ; then taking the natural logarithm of exp( /32) gives for /32 -1 ( c) -2 .... In more realistic examples a search for /32 would be needed for fixed /31 a nd S. Since R - 0.5, n " 3, a nd p ... 2, (7.7.11) becomes Approximate .... ~ contour, 9 0%,---6 -3 S=50 -4 -5 N o ~ ( d) -4 -3 -2 7 81 F lpre 7.11 Likelihood ratio and approximate conridence contours for 50, 15, and 90%. S is given in Example 1.1.3 as equation (b). By introducing this value of S into ( c) a nd solving for /32 for a range of /31 values, we obtain the confidence regions shown as solid lines in Fig. 7.12. Unlike the dashed confidence regions which are all ellipses with the same ratio of major to minor diameters, these are quite different in shape particularly as S . becomes large, or equivalently, as the percent confidence approaches 100. I t is ,I CHAPTER 7 MINIMIZATION OF SUM O F SQUARES FUNCOONS significant. however. that a t the smallest S value (corresponding to 50%) the true s hape o f the confidence region does approach the same ellipses given by (7.7.9). F rom Fig. 7.12 note that a likelihood confidence region tends to be larger than the corresponding approximate region. I f a likelihood ratio contour had been inside the associated approximate contour. then the latter would be suggested as a conservative estimate of the likelihood ratio confidence region. Though this condition is n ot satisfied in this case. it may be in some cases. See Bard (5. p. 2(9). F or better-designed experiments a nd for more observations (usually n >3). the two methods show much better agreement. F or a n example. see the Box a nd H unter p aper (22/. I t is instructive to make a comparison of a confidence region constructed using the confidence intervals given in Example 7.7.2 with the confidence regions shown in Fig. 7.12. Consider the 90% case which covers the region - 1.66 < p, < 4.66 a nd - 1.89 < P2 < 3.27. This region contains some of the 90% confidence regions a nd a dditional areas. but also there are areas in the more correct confidence regions not covered. This indicates that a confidence region constructed from confidence i ntervals may not be a good approximation o f the approximate regions; a poor approximation to the likelihood ratio confidence region is provided particularly as the percent confidence approaches 100. 7.7.5 7.1 SEQUENTIAL ESTIMATION FOR MULTIRESPONSE DATA Example 7.7.4 Using results of the estimates of parameters for the cooling billet problem, determine the 95% confidence model selecting from those given b y (7.5.11,12). Assume that the standard assumptions indicated by 11111011 a re valid. Solution T he F test can be used in this example in a manner that is very similar t o t hat used for linear problems. Using the sums o f squares given in Table 7.4 we c an construct Table 7.7. T he F values are compared with F. 9S(I,n-p)-4.54, 4.60, a nd 4.67 for n -p e qual t o 15. 14, a nd 13. respectively. Since 0.0077 is less than 4.67 but 672.8 is n ot less than 4.6, we have a n i ndication t hat the two p~rameters p, a nd P2 in (7.5.11) are needed b ut the P3 p arameter is not. T able 7.7 Model No. o f Degrees o f No. Parameters Freedom I 2 Model BuildIng Using the F test S um o f S quares and F S tatistic for Example 7.7.4 3 I 2 3 15 14 13 M ean Square, R s 2-R/(n-p) 38.73731 0.7896890 0.7892203 2.582 0.05641 0.06071 l lR F 37.9476 672.8 0.000469 - 0.0077 T he F statistic given by (6.2.24). (7.7.12) a nd used for confidence regions can also be used for nonlinear model building. As pointed out below (7.7.11), this expression is a pproximate for nonlinear models. The assumptions denoted 11--1011 a re used in c onnection with (7.7.12). Recall that (7.7.12) implies two groups o f parameters; p, is assumed needed and P m ayor may not be needed. There are p 2 p arameters in all with p, having p - q a nd P2 having q. T he objective is to see if the P parameters should be included in the model. W2 may contain 2 values suggested by a competing model. (See discussion following Theorem 6.2.4.) When all p parameters are simultaneously estimated, the minimum sum o f squares is designated R (b" b2); if the standard assumptions o f 11111011 a re valid O LS c an be used, but if the errors neither have a constant variance nor are independent (11001011), then maximum likelihood estimation should be used. T he statistic F, (7.7.12), is utilized by comparing its value with F,_ .. ( q,n - pl. I f F > F,_ .. ( q,n - p), then we have an indication that the parameters liz a re needed in the model a nd should not be the values implied by W2. 7.1 SEQUENTIAL E SllMAl1<;)N FOR MULl lRESPONSE DATA T here are a n umber o f a dvantages o f sequential estimation including the development o f simpler, yet more general computer programs a nd the possibility o f providing more insight into model building. F or o ther advantages, see Section 6.7.7. T he process is considered to b e sequential with time because dynamic experiments are so common. However, there may be cases when o ne might wish to have some other independent variable, such as position, to be the sequential variable. A t e ach time the experiment is considered to involve m ( > I) different responses. T he m d ifferent responses could include different d ependent variables such as temperature, concentration, velocity, o r voltage; they could also represent measurements from m similar sensors. Two sequential methods are given in this section. Both utilize the Gauss method a nd can incorporate the Box-Kanemasu modification. T he first sequential method is simply a direct adaptation o f (7.4.6). I t would be recommended when a sequential method is being used t o r educe the memory requirements in a computer program a nd the sequential values o f the parameters are not o f interest. CHAPTER 7 MINIMIZATION o r SUM o r SQUARES F UNcnONS T he second method is similar to the linear sequential method of Section 6.7. The measurements must either be actually independent or can be treated as if they are. The latter situation is the case, for example, when OLS is used. Moreover, autoregressive and moving average correlated errors can be analyzed by using certain differences which are uncorrelated. Provided the errors are or can be treated as being uncorrelated, this method is recommended when m <; p because the order of the matrix inversion is reduced. Only a scalar needs to be inverted if m = I. Advantage can be taken of this fact by renumbering the observations as if they were all at different times even though physically m> I . Prior information can be included through the use of " a nd U. 7.B.l 7.1 SEQUEN11AL ESTIMATION FOR MULTI RESPONSE DATA diagonal matrix 1f, ~=diag[ ...( l) . .....( n)] where . ..( i)=diag[ tJ~(i) . .. t J!(i)] Consistent with the above assumptions is estimation using the assumptions 11---011 which are the assumptions used in G auss-Markov estimation for linear models. With the additional assumption of normal errors, the results are equivalent to those obtained using maximum likelihood. The sequential procedure given below could also be used with OLS estimation by simply replacing ... by I . 7.B.2 Direct Method Equations (7.4.6) apply to the multiresponse case. In order to reduce the plethora of subscripts and superscripts we write (7.4.6) as Assumptions The standard assumptions of additive, zero mean, and noncorrelated errors are used. The errors have a known covariance matrix ~. T he independent variables are considered to be nonstochastic. The parameters are assumed to be constant, that is, not time variable o r random. These assumptions are designated 11-1-11-. The assumptions regarding the measurement errors can be written as Y = E (YIP>+ e = ' I + e (7.8.1) E (e)=O (7.8.2) (7.8.6) (7.8.7) where X and ' I a re evaluated a t b which stands for b (k). T he vector b · is the same as b (lr+l) in (7.4.6a). Recall that k is a n iteration index which changes only after all the data ( m X n observations) have been considered. The sensitivity matrix in (7.8.6.7) can be partitioned so that for u =v,j=k =0 otherwise X (I) XII ( i) X(2) (7.8.3) X=- X21(i) X 12 ( i) X 22 (i) where X (i)X(n) Y (i), Y(2) (7.8.4) w hereY(i)= Y= Yen) X",I ( i) X",2 ( i) X...,. ( i) (7.8.8) YI (i) Y 2 (i) Y (I) X I, ( i) X2p ( i) " The measurement vector Y is composed of the m-dimensional vectors 'f (7.8.5) Y", ( i) T he associated error vector e a nd dependent variable vector ' I are similarly composed of the m-dimensional vectors e (i) a nd ' I(i) where i = I " .. . , n . T he covariance matrix of the errors for the given assumptions is the Notice that Xjk ( i) is the sensitivity coefficient for the j th dependent variable and k th parameter and at the i th time. (7 .8.9) A typical term of the triple product XT... - IX o f (7.8.7) can be indicated 390 N o ~. C HAPTER' MINIMIZATION OF SUM O F SQUARES F UNcnONS by _XT«I»-IX = [i ~ ,.I SEQUEN11AL ES11MAnON FOR M ULnRESPONSE DATA o ther words, it would be necessary to j ust find Pen) a nd then use k .l= 1.2 . ..... p (7.S.10) Xjk ( /)XjI ( t)Oj-2(t)]; , -I i -I Rather than waiting to evaluate the sums in XT«I»-IX after a complete iteration. a running summation can be performed to reduce the computer memory requirements. Let us define CJ/(i) as being ; m C J/(i)= VJ/+ ~ ~ X jJ(/)XjI(/)O,-l(/) ,-I i -I ( 7.S.II) At the end o f a n iteration the b vector is replaced by b ·(n) a nd another iteration on k is begun with X being evaluated at the new b. T he process is continued until convergence is attained. 7.8.3 SequentIal Method UsIng the Matrix Inversion Lemma The usually preferred sequential method involves the matrix inversion lemma. Using expressions in the preceding section we c an write T hen the s /term of the p - I matrix can be constructed from b ·(i + I) ... b + P (i + 1)[ d (i) + XT(i + I)«I»-I(i + l )e(i+ I )] m C s/(i+I)=CJ/(i)+ ~ X iJ(i+I)X,/(i+ I)o,-l(i+I) (7.8.17) (7.S.l2) i -I P (i+ 1 )- [P-I(i) + XT(i.+ I)«I»-I(i + I )X(i+ I ) which forms a recursion relation with the starting value of CJ/(O)= VJ/. Both s and / go from I to p. A similar recursion relation can be given for the components of the vector • XT«I»-I e ; one component is m d J(i+ 1)==dJ(i)+ ~ X j,(i+ I )ep+ I )oj-l(i+ I) j -I where typical terrn~ r I (7.8.18) (7.8.19) are X T(i+ I)«I»-I(i+ I )e(i+ 1 )- [ f X).(i+ I)~(i+ l)oj-2(i+ I)] (7.8.20) J -I (7.S.13) XT(i + I)«I»-I(i+ I)X(i + 1 )- [ f X). ( i + I )X)/ ( i + l )o)-2(i + I )] (7.8.21) J -I which has the starting condition of p dJ (0) = ~ VJ, ( Ilj - hi) (7.8.14) i -I T he range of s is I to p . The procedure to find the new parameter vector is first to use (7.S.12) a nd (7.8.13). Then the (i + I )th matrix of P is found by finding the inverse. -I C II(i+I) C ,,(i+ I) (7.8.15) C I,(i+ I) C ",,(i+ I) Finally the parameter vector is found using b ·(i+ 1 )=b+P(i+_ I)d(i+ I) (7.8.16) Note that b ·(i + I) is the parameter vector at the (i + I )th time in the kth iteration where k has the same meaning as (7.4.6). The vector b in (7.8.16) is " for the first iteration and b ·(n) of the preceding iteration for the subsequent iterations on k. I f only the final converged parameter values are of interest. it is necessary to invert the P matrix just once for each complete iteration; in A. A (i + 1 )_ p (i)XT(i+ I) (7.S.22a) B. £1(i+ I)-~i+ I )+X(i+ I )A(i+ I) (7.8.22b) c. K(i + 1 )- P(i)XT(i + 1)£1-I(i+ I) (7.S.22c) E. j P (i+ I)::z T he range of s a nd I in these last two equations is I to p . Using (6.7.5a) and (6.7.5b) we can write analogous to (6.7.6a, b,c, f). P (i + I )-P(i)- K(i + I )X(i+ I )P(i) (7.8.22d) Utilizing (6.7.5b) and (7.S.22c) in (7.8.1S) yields after some rearrangement D. b ·(i + I )-b·(i)+ K(i + I){ e (i + 1 )- X(i + I )[b·(i)-b]} (7.8.22e) - At each time i, the above five equations are to be used in the order A, B, C, D, and E. 391 CHAPTER 7 MINIMIZATION O F S UM OF SQUARES FUNCTIONS I t is i mportant to understand that three different parameter vectors appear in (7.8.22e). First, b ·(i + I) is the estimated vector for the (i + I)th time and a t the ( k + I)th iteration. Second, b ·(i) is for the ith time a nd ( k + I)th iteration. Finally, b is the vector found a t the nth time of the kth iteration; the vector b does not change during an iteration over the time index i. Note that P (i+ I), X (i+ I), ~(i+ I) a nd e (i+ 1 )=Y(i+ 1 )-1I(i+ I) are functions of b a nd not b ·(i). When there is prior information, each iteration starts with b = I ' a nd P(O) = Vp = U - I. A detailed example is given in Section 7.9. 1. When there is no prior information, a special starting procedure is needed at i = I for each of the k iterations. As suggested in Section 6.7, we c an select P(O) to be a diagonal matrix with relatively large terms. Since P represents the approximate variance-covariance matrix of the parameter vector, /1. a P(O) with large diagonal components indicates poor prior information. In general. P(O) is fixed from iteration to iteration (i .e., over the k index). Provided no problems are encountered with round-off, P(O) 6 could have any diagonal values that are large (by a factor of 10 , say) compared to the corresponding values of Pen). 7. B. 1. 1 Seqlll!ntial M ethod f or m = 1, p Arbitrary An important application of the previous analysis is for m = I because the only inverse that must be found. (that of ~), becomes the inverse of a scalar. As shown in Section 6.7.5 , uncorrelated multiresponse measurements can be renumbered so that the m = I analysis can be employed. From (7.8.22) evaluated for m = I a nd p arbitrary we find p A u(i+ 1 )= L Xj(i+ I)Puj(i). u = ',2 ... . ,p (7 .8.23a) p L XJ(i + ')AP + I ) EXAMPLES u nUZING SEQUEN11AL ESTIMATION T he above scalar equations should b e used in the order given. These equations c an b e programmed in a straightforward manner, b ut o ne m ust remember that three b vectors a re involved a nd t hat the P(O) matrix does not vary from iteration to iteration. O nly two sets o f b storage locations are needed, however, one is for b =-b·(n) a nd the other is for b ·(i) a ndb·(i+ I ) since the latter replaces the former as calculations proceed. Since m = I , o nly o ne subscript is needed for Xj ( i + I) a nd none is needed for H ( i + I). When programming, the i + 1 index need n ot b e carried. T he sequential procedure given by (7.8.23) reduces to that given by the linear analysis given by (6.7.8) by setting b =O. F or M AP estimation the starting value of b is I ' a nd the starting P(O) matrix is U - I = VfJ for each iteration. All the sensitivity coefficients for the first iteration (which includes i = 1,2" . . . , n) a nd each 1I(i+ I) is evaluated using b ... 1'. F or s ubsequent iterations X a nd 11 a re evaluated a t b(n), the final vector of the previous iteration. See the s ection 7 .9.1 example. When there is no prior information, follow the procedure described below (7.8.22). 7.8.4 Correlated Fn-ors With Known Correlation P arameters I f t he measurement errors in Y are additive a nd c orrelated in time, the above analyses c an b e used with slight modifications for the autoregressive a nd moving average cases discussed in Section 6.9 a nd Appendix 6A, provided the correlation parameters a re known. The analyses can be performed by replacing X by Z =D-IX a nd e b y f =D-1e. F or example, for first-order autoregressive errors, the terms in the matrices would b e Zjk ( i)= ~k (i) - Pj(i)Xjk ( i - I) / ; ( i) =- "i( i ) - Pj( i )"i( i -I) j-I ~(i + 1 )= 02(i + ' )+ 7!J (7.8.24a) (7.8.24b) for i = 1,2" . .. , n a nd where ~k (0) a nd "i(0) a re defined to be equal to zero. (7 .8.23b) ) -1 7 9 EXAMPLES UTILIZING SEQUENTIAL ESTIMATION u= 1,2, . . . •p Ku(i+ I )=Au(i+ ')/~(i+ I), (7 .8.23c) p H (i+ 1 )= Y (i+ l )-lJ(i+ 1 )- L X ii+ 1 )[b·J(i)-b;J (7.8.23d) ) -1 tV o CJ1 b · u(i + 1 )= b* u(i) + Ku(i + I )H(i + I), u= I , . . .• p (7.8.23e) u. v= I . ... ,p (7 .8.23f) T he purpose of this section is to present four different examples in which a sequential method is used. In the first example, detailed results are given to illustrate the sequential M AP m ethod based o n the matrix inversion lemma. The physical problem involves heat conduction in a finite plate. The second problem, involving a n o rdinary differential equation, is the cooling billet problem previously considered. The third problem uses simulated d ata for heat conduction in a semi-infinite body; the errors are correlated. The final example is a realistic example involving a bout 1000 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 1 .' EXAMPLES u nUZING SEQUENTIAL ES11MATION T able 7.9 Known Conditions for Section 7.9.1 Problem q ... 9612 B tu/ftl·hr " I( - K)-45- 1-0.022222 hr-ft·oF / Btu L -I/I2ft " I( - a)-0.8 f tl/hr To... 81.69°F VI/. II - 4 x 1O- 6(hr-ft.oF /Btu)2 VI/.Il- VI/.21- 0 V1/.22 - 0.01 ftot /hr2 actual measurements; it is a gain for heat conduction in a finite plate. Except for the second example each of the problems in this section involves multi response data. 7.9.1 Simple M AP Example Involving Multiresponse Data This example provides a detailed analysis of a maximum a posteriori sequential estimation procedure for some multi response data. T he d ata .are abstracted from the extensive values contained in Table 7.14. T he physical problem is t hat of a rIat plate heated on one side a nd insulated o n the other. More extensive use of these d ata is discussed in Section 7.9.4 where a more complete description of the problem is given. In t hat section the results o f estimation for the thermal conductivity. k. a nd the speciric h eat-density product. c. a re given; a finite dirrerence procedure is used to calculate the dependent variable (temperature). I n this section a limited amount of data is used; K =k- I a nd a (=k/c) a re estimated; a n a nalyti· cal solution is used for the dependent variable; a nd subjective prior information is used. T he measurements for this case are given in T able 7.8 a nd o ther k nown information is given in Table 7.9. T he t emperature measurements in Table 7.8 can be considered to b e additive. zero mean. constant variance. uncorrelated. and normal. Since a t each time there are two measurements a t each location. a n estimate o f p ure error can be obtained. Using the fact that VeE - E ) =2,,2 (for assumptions 1111----). a n e stimate of ,,2 from the I 2 . h. d ata or-Table 7.8 can be found to be a bout 0.0625. T he e rrors I n t e tIme m easurements are negligible compared to those in the temperature rise - that is. those above the initial temperature which is a bout 81.69°F. T he assumptions given above and that of subjective prior information is d enoted 11111113. The temperatures corresponding to 'II are calculated using the solution given by (8.5.25). which can also be dirrerentiated with respect to K a nd a to get the sensitivity coefficients. F or the times given in T able 7.8. the first three terms a t most are needed in the summation. Values for the sensitivity coefficients of K a nd a . designated X I a nd X 2 • respectively. a re given in T able 7.8 D ata for Section 7.9.1 Problem Time (sec) Time Measured from Start of Heating (hr) TI (OF) ( x- L) T4 ( OF) T, (OF) T6 (OF) ( x-L ( x-O) ( x-O 15.3 18.3 12/3600 15/3600 85.83 87.46 85.37 87.16 94.73 96.44 94.48 96.03 Table 7.10. T he c orresponding calculated values o f t emperature. denoted 'II. a re a lso given. All the values o f X I' X 2• a nd'll in Table 7.10 a re b ased on the parameter values o f K = 1'1 a nd a = 1'2' T he i nformation in T able 7.10 is given in a n o rder a ppropriate for sequential estimation with a single observation being a dded a t o ne " time." O ther o rders a re possible. however. Table 7.10 Sensitivity Coeffldents and'll for the First Iteration o f t he S ection 7.9.1 Problem Time (hr) I 2 3 4 5 6 7 8 12/3600 12/3600 12/3600 12/3600 15/3600 15/3600 15/3600 15/3600 x L L 0 0 L L 0 0 T XI ( i) X1 (i) , ,(i) TI Tot T, T6 TI Tot T, T6 177.8 177.8 570.9 570.9 252.4 252.4 650.0 650.0 8.158 8.158 8.930 8.930 10.49 10.49 10.87 10.87 85.640 85.640 94.377 94.377 87.299 87.299 96.136 96.136 In T able 7.11 a re the s equential values for the first iteration. These values use the sensitivity coefficients given in T able 7.10 (except to more significant figures). Notice that all the X j(;)'s a nd '11(;) a re evaluated for the s ame values o f K = 1'1 a nd a = 1'2; the sensitivity coefficients a nd'll in Table 7.11 a re n ol e valuated using the updated values o f b l(i) a nd bl (;). T he sequential procedure used is given by (7.8.23). which is identical to that given by (6.7.8) for the linear case. F or the nonlinear case. however. iteration is required. The starting values o f b l' bl • r ll • p u • a nd P22 for this M AP analysis a re 1'1' 1'1' VI/.II' V".u. a nd V". 11' . respectively. These s!arting v~l.ues ~re used for every iteration as can be observed from the s tarting conditions I n T able 7.12. which is for the second iteration. In this iteration. however. all the CHAPTER 7 MINIMIZATION O F S UM O F S QUARES FUNCTIONS 7.9 EXAMPLES u nLiZING S EQUENTIAL ESTIMATION Sequential Estimation of K and a in Problem of Section 7.9.1, First Iteration. Based on Parameter Values K = J1.1 = 0.022222 and a = J1.2 = 0.8 7.9.2 396 Table 7.11 T ime (sec) 0 I 2 3 4 5 6 7 8 x 12 12 12 12 15 15 15 15 L L 0 0 L L 0 0 T b. X 102 b2 p •• x l if P12 X l if P22 X l if T. T4 Ts T6 T. T4 Ts T6 2.2222 2.2380 2.2188 2.3074 2.2836 2.2701 2.2815 2.2873 2.2654 0.8 0.81814 0.79603 0.78031 0.78452 0.79407 0.78603 0.78492 0.78912 4 3.408 3.386 0.5543 0.3988 0.3321 0.3030 0.2372 0.2064 0 - 67.89 - 70.47 - 20.23 - 17.48 - 12.79 - 10.74 - 9.48 - 8.89 10000 2211 1915 1024 975.0 645.0 500.9 476.8 465.5 Table 7.12 Sequential Estimation of K and a In Problem of Section 7.9.1, Second Iteration. Based on Parameter Values K = b PI(n) =0.022654 and a = b~I)(Il) = 0.78912 Time (sec) 0 I 2 3 4 5 6 7 8 x T b. x 102 b2 12 12 12 12 15 15 15 15 L L 0 0 L L 0 0 T. T4 Ts T6 T. T4 Ts T6 2.2222 2.2372 2.2188 2.3077 2.2838 2.2701 2.2816 2.2874 2.2654 0.8 0.81793 0.79589 0.78075 0.78481 0.79421 0.78632 0.78525 0.78930 P II X l if 4 3.446 3.425 0.5606 0.4039 0.3354 0.3056 0.2393 0.2083 P 12 X 1()6 0 - 66.17 - 68.63 - 19.87 - 17.20 - 12.53 - 10.50 - 9.28 - 8.71 Pu X 397 Cooling Billet Problem The d ata for the cooling billet problem are given in Table 6.2; results o f analyses o f these d ata a re given in Sections 6.2, 7.5.2, a nd 7.7.5. Though converged p arameter values for a differential equation model are given in the ~ast .two sections, sequential results are n ot given. T he p urpose o f this ~ec.tlon IS to demons.tr~te the power o ! a s equential procedure t o p rovide mSlght for model buJldmg for the coohng billet case. Consider first results o f the power series models for h given by (7.5.11). The values o f b l for Model I a re d epicted in Fig. 7.13. (Recall t hat M odel I is hA/~cV=fJl') T he value o f h l(i) is the approximation to fJl using the d ata ~<>.r times through t;; the estimate is approximate in t hat the sensitivity coefficients were evaluated a t the converged parameter values using all the data. Because the hi values o f M odel I continually decrease in Fig. 7.13 a nd because hl(i) is the average until t;, the actual time variation o f h is probably larger than that shown. P arameters for Model 2 ( hA/pcV= fJl + fJ 2t) a re shown in Figs. 7.13 a nd 7.14. T he hi values shown in Fig. 7.13 are relatively constant with t ime' this is p articularly true for the last five time steps. F or the last half o f t h; time steps shown in Fig. 7.14, h2 is relatively constant. This constancy of parameters suggests adequacy o f M odel 2. 1()6 10000 2103 1809 979.6 934.2 615.6 476.6 454.2 443.8 values of -X1(i). X 2(;). a nd 1'/(;) a re found based on the values of the previous iteration. bll)(n) = 0.022654 and b~l)(n)=0.78912. I f this case had not been one involving prior information. then the initial values of bI1)(0) a nd bfl(O) for the second iteration would be simply the final values of the first iteration at which values the sensitivity coefficients a nd 1'/ a re also evaluated. The p1;I(O) values would not be different, however (unless one wishes to use Pk~2)(0) to introduce the Levenberg o r M arquardt modifications). The iterations in general continue until negligible changes in b ll(n) occur. In this case only two iterations ( k = 2) a pparently are needed. r. , • I:. 2.95 . . +• I:. • 2.9 bl ,hr-1 • 2.85 ++ • 2 .8 + I:. + I:. ••• • 2 4 6 I:. + ~ .. I:. LHo•.' 2 • ~ Model I 0 I:. + Model 3 I:. I:. I:. 2.75 2.7 +++ I 8 - •• •• •• . 10 12 14 t 16 Time step index. 1 F lpre 7.13 Estimates of P. for Models 1 ,2, a nd 3 for the cooling billet data. N 0 OC 3911 M INIMIZATION O F S UM O F S QUARES F UNCI10NS C HAPTER 7 I 6 0 -1 bZ,hl" -2 -2 6 r- .. + • •6 + •• t ;. -3 t ;. Hodel -4 ~j 20 6 6 + ... + . + t• ! at 3 7.8 15 61 Hodel 3 10 s -5 6 -6 Figure 7.14 0 2 Estimates of 4 10 12 8 Time s tep I ndex, 6 14 -s fJ 2 for Models 2 and 3 for the cooling billet d ata . F or M odel 3 (hA / p cV = { jl + { j2' + PJ I 2), the b l values a re relatively stable in Fig. 7. 13. those o f b 2 in Fig. 7. 14 less so, a nd those in Fig. 7. 15 s howing bJ are quite oscillatory. T he oscillatory n ature o f h J • p assing through zero several times, suggests that Model 3 is n ot a ppropr iate. Sometimes o ne w onders a bout the errects o n the parameters o f a dditional d ata . Without o btaining the d ata o ne c annot be certain. However, the sequential method applied to existing d ata s hould give insight i nto the effects. Add itional d ata would be expected to continue trends already observed in the parameters. F or e xample, Fig. 7.15 suggests t hat b J m ight continue to oscillate a bout zero with decreasing a mplitude . T he s equential results for PI a nd P2 in Model 4. h A/pcV=P I +{J2(TToo). a re s hown in Fig. 7. 16. T he hI values are relatively c onstant b ut the h2 v alues a re not. T he 50% v ariation o f b 2 in the last five steps suggests t hat t he model should be further examined by using additional d ata o r r elated models should be tried. Physically the model is q uite attractive b ut s ome i mprovement is n eeded such as adding an exponent n on T - Too; this was tried bu t was unsuccessful as mentioned in S ection' 7.5.2 although additional d ata might be needed also to estimate n. 7.9.1.1 0'-~r--------T--------,4--~~-.~ 16 Other Pouible Model! I n a ddition to the models mentioned above, m any m ore could be suggested. F or example. one extension of Models I, 2. a nd 3 is to investigate o ther t erms such as /1413. A nother approach is to always use the linear-in-I model (Model 2) b ut to use dirrerent P pairs (such a s /1l i' /11. for the ith time interval) for successive time intervals [23-25). -1 Oo ~-t-----t---t---i------rl:--*--+'---"!'-...J F lpre 7.15 Estimates o f Time s tep I ndex, /J) for Model J for cooling billet data. 2.5 .005 2.3 2.2 . 001 2.1 ( left s Clle) Time s tep I ndex, F Jpre 7. 16 Sequential parameter estimates for Model 4 for cooling billet data. 399 r I CHAPTER 7 MINIMIZATION OF SUM OF SQUARES F UNcnONS 7.9.3 Semi-Infinite Body H eat Conduction Example Consider a semi-infinite, heat-conducting body initially at the temperature To of 100" F (311 K) a nd whose surface temperature takes a step increase to Too = 200°F (366 K). The temperature is measured every 9 sec until 252 sec at the two locations X I = 0.125 in. (0.00318 m) a nd x 2 = 0.25 in. (0.00635 m) where x is measured from the heated surface. This example is o f the Monte Carlo type because the data are simulated using the known value of thermal diffusivity a , equal to 0 .01 ft 2 jhr (2.58 X 1 0-'m 2jsec) a nd because 15 different sets of random errors are considered. The first five cases use independent, normal errors with unit variance. T he m easurement errors for the first case are taken from the first two columns of Table XXIII of (26]. the second case uses those from the next two columns. etc. Correlated errors are obtained for the first-order autoregressive process a I described in Section 6.9.2; the error Ej a t time I j is given by Ej = pE, _ I + Uj (7.9.1 ) where the p =0.5 for cases 6-10 a nd p =0.9 for the remaining cases. T he uj r andom variable values are the same values taken from (26) as mentioned above. A mathematical statement of the physics of the problem is T able 7.13. T he first five cases tabulated which are for p =O a re more accurate o n the average than the next five which are for p =O.5 which are in t um m ore accurate (on the average) than the last 5 cases (p = 0.9). Hence we can conclude that correlated errors can substantially reduce the accuracy of the parameter estimates. Table 7.13 Case I 2 3 4 5 6 7 8 9 10 11 12 13 (7.9.2) which has the exact solution. T (x.t)-To Too-To X = erfc --=--:-:2(at)1/2 where c rfc(y)= - 2 'TT1/2 foo e - 14 15 Results for Estimating Thermal Dlffuslvity for Cases with Noncorrelated and Correlated Errors p ci x I()4 (ft 2j hr) 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 99.71 100.64 100.17 99.29 98.56 99.49 101.88 100.39 98 .69 97 .22 98.6 115.3 101.7 97.2 91.0 R 59.39 54.15 58.71 52.55 56.49 . 60.85 53.76 59.2 52.7 56.3 64.67 53.40 59.37 55.15 61 .90 ZTZ 10- 6 X 51 197 192 195 199 202 52.4 49.7 51.3 53 .4 55.3 4.275 3.09 4.00 4.40 5.06 1.080 .985 1.067 .955 1.191 1.106 .977 1.076 .958 1.024 1.176 .971 1.079 1.003 1.125 est. s.e.(ci) X 1()4 0.740 0.716 0.740 0.693 0.768 1.45 1.40 1.45 1.34 1.36 5.25 5.61 5.19 4.97 4.72 (7.9.3) T (O./)=T oo • T (x,O)=To.T(oo.t)=To --:::::----:~ •• 7.9 EXAMPLES u nLiZING SEQUENTIAL ESTIMATION II 1 du (7.9.4) )I N ote that T (x,t) is a n onlinear function of a . Because there is no characteristic length o r time in this problem, the dimensionless temperature given by (7.9.4) and the a sensitivity (given by (8.5.4)) can be plotted versus the single dependent variable 'T' = a l / x 2• see Fig. 8. 12. T he parameter a was estimated using nonlinear M L sequential estimation for the multiresponse data mentioned above (two locations a t each of which 28 measurements were simulated). In most cases only three iterations were needed to converge to I i a fter starting with the initial estimate of 0.006 ft 2/hr which is low compared to the true value o f 0.01 f t2/hr . Estimated a values for the 15 cases are given in the third column of T he minImum sum of squares R also has a tendency to increase in variability with increasing p. T he values o f R were calculated using the converged values o f I i in (7.3.1) with U =O a nd W =O-I; r ather than using 0 - 1 directly. however. it was introduced by replacing X by Z a nd e by f as mentioned in Section 7.8.4. T he m ean sum of squares was formed using s2 = R / ( n - p) where n = 2 X 28 = 56 a nd p = I, in this example. T he e stimated s tandard e rror I i was found using (6.8.3) which can b e written for this one-parameter case as est. s .e.(a)= [ Z T Z ] - 1/2 s (7.9.5). These values increase by a factor o f a bout 7 from p =O t o 0.9. In each set o f five cases for a given p value, the true a value o f IOOx 1 0- 4 ft 2 /hr is in the in terval given by I i ± est. s.e.(I i) in three o f the five cases. This is c onsistent with the 67% c onfidence interval that could be calculated using the I distributtion. N CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FUNCI10NS .2 ~ 0 7.9.4 Analysis of Finite Heat-Conducting Body with Multlresponse Experimental Data 7.9.4.1 DeJcriptilHl Measured Temperatures for Finite Heat-Conducting Specimen Time Temperature ( "F) for Numbered Thermocouples T C) T C7 T C4 TC5 T C6 TCB BI.69 81 .60 BI.44 81.52 BI.6O 81 .69 BI.6O BI.69 BI.6O BI.44 81 .60 BI.68 81 .60 BI.77 BI.93 81.60 11.77 BI.6O 81 .77 81.60 81.6B 81.77 (sec) TC I T C2 0.) 0.6· 0.9 1.2 U I.B 2.1 2.4 2.7 3.0 ).3 BI.67 81 .67 81.75 81.75 81.59 BI.50 81.50 BI.42 BI.67 I U9 81.67 BI.37 81.29 81.45 81.12 BI.45 81.45 BI.37 81.37 BI.37 81 .29 81.29 BI.36 BI.52 B U2 81.44 BI.6O 81 .52 81.52 81.52 81.44 BI .44 81.52 Temperature (OF) for Numbered Thermocouples Time (sec) 01 E xperilMflt In this experiment two adjacent, identical Armco iron cylindrical specimens (I in. (0.0254 m) thick a nd 3 in. (0.0762 m) in diameter) were heated by a single flat heater placed between them. The heat now was in the axial direction. All surfaces except those where the heater was located were insulated. Four thermocouples (numbered 5, 6, 7, a nd 8) were carefully attached to the heated surface o f each specimen and four (numbered 1-4) were attached in the opposite n at insulated surfaces. The sensors were located at angles of 0, 90, 180, a nd 270°. The sensors a t 0 a nd 180° were electrically averaged as were also those a t 90 and 270°. This then provided eight temperature histories with four being a t a heated surface a nd four being a t an insulated surface. Before the start of an experiment the specimens were allowed to come to a uniform temperature. The heating period lasted 15.3 sec after which the specimens attained a higher equilibrium temperature. For further discussion of the experiments, see reference 27. The data used to find parameters in this case are given in T able 7.14. An IBM 1800 computer was used to digitize the analogue signals produced by the thermocouples. Typical results for the heated a nd insulated surfaces a re shown in Fig. 7.17; the heated surface temperature increases only during heating whereas the insulated surface temperature increases after heating and finally approaches the same equilibrium temperature as the heated surface. Table 7.14 T able 7.14 (Cont.) 81.37 81.61 81.45 81.53 81.61 81 .53 81.61 BI.37 BI.4S BI.6I 81.69 BI.49 81.41 81.49 81.41 81.33 11.41 81.33 BI.58 BUB 81 .58 81.66 BUB 81.74 81.66 81.66 BI.66 11.50 81.66 8 U8 81.74 BUB 81.74 TC I T C2 T Cl T C4 T C5 T C6 T C7 l.6 3.9 4.2 4.5 4.B 5.1 5.4 5.7 6.0 6.3 6.6 6.9 7.2 7.5 7.8 1.1 8.4 8.7 9.0 9.3 9.6 9.9 10.2 10.5 10.B 11,\ 11.4 11.7 12 .0 12 .3 12.6 12.9 13.2 I l .S 13 .8 14.1 14.4 14.7 15 .0 15.3 15.6 15.9 16 .2 16 .5 8 U9 81.67 11.59 11.59 81.42 11.59 11.67 11.67 11.75 I U3 11.91 B U) 81 .99 81.91 B2.16 82 .24 B2.4O B2.48 82.65 82.73 82 .81 82.97 83.14 B3.22 83 .46 83 .71 8).71 B3.B7 B4.03 14.20 14.36 B4.52 B4.52 14.BS 85.10 85.10 B5.42 BS.50 85 .75 BS.83 B5.75 85.91 86.16 86.32 80.96 80.96 10.80 80.80 80.96 10.11 10.96 11.04 11.04 11.21 BI.2I BI.53 81.61 11.37 11.71 82.02 82.02 82.43 82.68 B2.68 82.43 83.17 83.25 83.57 83.74 83.82 14.)1 84.39 14.72 14.96 14.BB B5.21 14.96 85 .05 15.05 14.96 B5 .05 15.29 85.37 B5 .21 85.62 BS.78 86.0l 86.03 81.36 11.44 11.21 11.21 11.36 II.2B 11.)6 11 .36 11.36 11.44 11.60 BI.60 81.85 11.77 11.93 12.11 B2.09 82.34 12.42 12.75 B2 .5B 82.91 83.07 B3.40 B U8 B3 .56 83.73 B3 .97 14.22 14.lB B4.38 14.54 84.79 14.71 14.B7 85.11 85.2B B5.20 85 .21 B5.52 85.B5 85 .9) 85 .85 86.26 81.45 11.69 I U3 11.45 11.37 11.45 11.61 11.53 11.45 11.61 BI.53 BI.6I 11.77 12.02 82.02 12.10 12.26 82.35 82.35 82.59 12.59 82.92 B2.92 83.0B 83.33 B) .24 83 .57 B).57 83 .73 83.90 14.22 14.22 14.63 14.55 14.63 84.71 14.96 BS .20 15.04 BS.37 B5.28 85 .77 85.94 86.02 82 .96 14.03 15.01 I BO 15.99 16.)1 16.80 17.29 17.62 81.03 BB.36 B8.60 19.01 19.25 19.50 19.91 90.07 90.32 90.56 90.97 91 .21 91 .46 91.62 91.87 92.03 92.28 92.69 92.77 92.9) 93.26 93.34 93.50 93.67 93 .75 94.07 93.99 94.24 94.40 94.56 94.73 94.97 95.22 95.30 95.38 83.21 14.03 14.68 15.51 15.83 86.41 B6.81 B7.30 17.62 17.17 11.27 11.68 8B.93 19.0\ 19.25 B9.74 89.91 90.07 90.23 90.56 90.64 90.89 91.29 91.29 91.46 91 .70 92.03 92.19 92.27 92.44 92.60 92.93 93.25 93 .17 93.58 93.51 93.91 93.91 93.91 94.48 94.48 94.64 94.56 95.05 8).64 83.24 14.71 14.22 15.19 15.11 IS.BS 15.77 86.42 86.34 86.91 86.75 17.07 87.16 17.56 B U6 17.19 87.97 IB.37 18.14 IB .62 BB.54 11.70 BI.B7 89. 19 19.36 8 9J5 19.60 19.60 19.77 89 .14 89.77 90.09 90. 11 90.25 90.50 90.33 90.83 90.58 91.07 90.99 9\.16 90.74 91 .40 90.99 91.65 91.23 91.81 91.39 92.22 91.72 92.31 91.72 92.46 91.96 92.71 92.21 92.87 9 ) .28 . 92.\3 92.29 93.52 92.62 93.52 92.7B 93.77 92 .94 93.93 92 .94 94.18 93 .27 94.09 93.52 94.42 9).14 94.42 94.17 94.75 94.25 94.99 94 .41 95.07 94.B2 95.07 94.82 95.40 95.07 95.73 T C8 40J 7.9 EXAMPLES l mLiZING SEQUENTIAL E STIMAll0N Table 7.14 ( Cont.) Time T emperature ( OF) ror Numbered Thermocouples (sec) ... ..... N TC I TC 2 TC 3 T C4 TC 5 T C6 T C7 T C8 16 .8 17.1 17 .4 17 .7 18 .0 18 .3 18 .6 18 .9 19 .2 19 .5 19 .8 20.1 20.4 20.7 21.0 21.3 21 .6 21.9 22.2 22 .5 22.8 23.1 23 .4 23 .7 24.0 24.3 24.6 24.9 25.2 25.5 25.8 26.1 26.4 26.7 27.0 27.3 27.6 27 .9 28 .2 28.5 28.8 29.1 29.4 29.7 30.0 86.57 86.73 87.06 87 . 14 87.22 87.46 87 .63 87 .71 87.87 87 .87 88.12 88.52 88 .61 88.77 88.77 88.93 89.18 89 . 10 89.26 89 .42 89 .59 89.75 89.91 89.83 89.91 89.91 9 0.08 89.91 90. 16 90. 16 90.08 90. 16 90.24 90. 16 90. 16 90.32 90.24 90.40 90.24 90.40 90.40 90.24 90.48 90.40 90.48 86. 11 86.35 86.43 86.52 86.52 86.92 87 .01 87. 17 87.50 87 .66 87 .66 88.07 88.07 88.23 88.56 89.31 88.39 88.72 88.64 8 880 88 .88 88.88 89 . 13 89 .21 89.29 89 .29 89 .37 89.37 89.37 89.54 89.54 89.62 89.70 89.54 89.62 89.62 89 .62 89.62 89 .62 89.70 89.70 89 .62 89.78 89.78 89.62 86.34 86.34 86.58 86 .83 86.91 86 .99 87.32 87.40 87.48 87 .73 88.05 88.14 88 .22 88.54 88.54 89 .79 88 .79 88.87 89.11 89 .20 89.44 89.44 89 .69 89 .44 89.52 89.60 89.77 89.69 89 .77 89. 79 89.85 90.01 89.85 89.77 89.93 90. 18 89 .85 90.01 90.01 9 0.09 90.09 90.01 90.01 90.01 89.93 86. 10 86 .35 86.51 86 .67 86.83 87.16 87 .24 87 .32 87.57 87.49 87.90 88.06 88 .06 88.55 88 .30 89.55 88.63 88.55 88.96 89.04 89 . 12 89.37 89.37 89 .37 89.37 89 .53 89.53 89.45 89 .61 89.61 89.77 89.77 89.85 89.85 89.85 89.94 89.85 89.85 89 .69 89.94 89.77 89.85 89 .85 89.94 89.77 95.71 95.71 96.03 96.28 96.52 96.44 96.77 95.87 94.56 93.99 93.67 93.18 92.93 92 .52 92.36 92. 11 91.79 91.79 91.54 91.62 9 /,46 91.38 91.13 91.13 9 /./3 91.13 9 0.89 90.97 90.64 9 0.81 90.72 90.89 90.56 90.64 90.64 90.40 90.56 90.56 90.56 90.48 90.40 90.40 9 0.56 90.48 90.48 95. 13 95.29 95.54 95.54 95.70 96.03 96.03 95.29 94.31 93 .74 93.25 92.76 92.52 92. 19 9 /.95 9 /,70 91.62 91.46 91.38 91.05 90.89 90.97 9 0.81 9 0.72 90.89 90.81 90.72 90.56 90.56 9 0.48 90.48 90.40 90.32 9 0.48 90.32 90.56 9 0 .40 90.23 90.32 9 0. 15 90.15 9 0.23 9 0.23 90.40 90.32 95 .89 95.89 95.97 96. 14 96.30 96.54 96.71 95.81 94.50 93.85 93.36 93. 12 92.79 92.38 92. 14 92.05 91.89 9 /.65 91.40 9 /.40 91 .24 91.16 9 /.07 91.16 90.91 9 0.67 9 0.75 90.75 90.75 90.67 90.67 90.58 90.58 9 0.58 9 0.50 9 0.42 9 0.42 9 0.50 90.58 90.26 90.26 90.34 9 0.26 90.34 90.34 95.23 95 .31 95.47 95.72 95.96 96.04 96.37 95 .47 94.33 93.52 93.19 93,03 92.45 92 .29 92.05 9 /.88 91.80 9 /.72 91.31 91.39 91.15 91.07 90.74 90.99 90.90 90 .74 90.74 90.66 90.74 90.82 90.58 90.58 90.66 90.41 90.41 90.50 90. 17 90.33 90.50 90.33 90.50 90.33 90.25 90.25 90.50 Owing to the statistical design of the experiment there are two types of replicated measurements. First, until the heating starts a t 3.3 sec, the temperatures are the same (within experimental error) for all times and thermocouples. Second, there are four sensors at both the heated and insulated surfaces. The standard deviations of the temperatures until 3.0 sec for any given sensor is about 0.1 FO (OJ)6 K); the standard deviation of the average temperatures of the eight sensors is also about 0.1 OF (OJ)6 K). During heating and at the heated surface, the standard deviations of temperature a t fixed times are almost 0.4Fo (0.22 K). These values of the standard deviation provided estimates of the pure error. Even though the temperature calibration for the data shown in Table 7.14 was carefully performed, there are slight differences in the initial temperatures readings given by the various sensors. In an attempt to reduce bias, corrections of - 0.06, +0.18, +0.04, +0.03, +0.09, - 0.09, - 0.06, and - 0.17Fo were respectively added to measurements given by sensors I through 8. Another consideration in the design of the experiment was to achieve a relatively large signal-ta-noise ratio while still not having such a large variation in temperature that the parameters would change. A maximum temperature rise about 15Fo (8.33 K) satisfies both conditions; using the pure error standard deviation value of 0.1 FO (0.06 K), the maximum signal-to-noise ratio is 150. 96 308 94 ... 92 306 o 90 88 304 86 TC 1. a t insulated surface 302 84 301 o 3 6 9 '2 Time (Sec) '5 '8 F Ipre 7.17 Typical temperature histories for the finite heat-c:onducting specimens. . ..... N ._ N CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCTIONS 7.9.4.2 PhYJical M odel o f Heat-Conducting Body " , EXAMPLES u nLlZINC SEQUEN11AL £St1MATION always negative whereas those for k a re both positive and negative. Hence the sensitivities are not linearly dependent and no difficulty would be anticipated in estimating the parameters. T he temperature in each specimen can be mathematically described by (7.9.6) i lT(O,/) -k ax = q(/), a T(L,/) ---=0 (7.9.7) ax (7.9.8) T (x,O)=81.54°F where k is thermal conductivity and c is the density-specific heat product. In both specimens the coordinate x is measured from the heated surface. The specimen thickness L was I I 12 f (0.0254 m). T he parameters are k a nd c. The heat flux q (/) was zero u ntil' = 3.3 sec and was zero after , ... 18.6 sec; the value of q (t) in the interval 3.3 <; , <; 18.6 sec was 2.67 B tu/ft 2-sec (30,300 W 1m 2 ). One can derive the exact solution of this problem in terms of infinite series. However, P ROGRAM PROPERTY (developed at MSU) which utilizes finite differences in the solution of (7.9.6-7.9.8) was used to obtain the parameters. The sensitivity coefficients shown in Fig. 7.18 were also obtained by using finite differences as indicated by (7.10.1). Note that those for c are 7.9.4.1 Parameter EJtinuJtes F or the data given, the ordinary least squares parameter estimates are k . . 43.343 B tu/hr-ft-F (75.014 W /m-K) a nd c -55.596 B tu/ftl-F (3728.6 kJ/mJ-C). I f the data for only part of the interval had been used, the k a nd c values would have been modified by the additive changes Ak a nd Ac given in Fig. 7.19. Note that the changes in k in the last third o f the time interval is less than 1% o f the final k value; the c value changes about 2% in the same interval. These small values suggest that the model is correct. F or times less than 7 sec the corrections become very large which would be expected since the sensitivity coefficients shown in Fig. 7.18 are approximately linearly dependent for such small times. The residuals for all eight temperature histories are graphed in Figs. 7.20 a nd 7.21. Evidently, the measurement errors cannot be considered to be independent. From reference 28, p. 97, the mean number of runs (changes of sign plus I) of the residuals is a bout I I 12 for independent observations. 4r---~~,~-'--~---r-,--r---',~-'---~.--~, 2 3 4 0 .... .... c: 2, ,., 2 0 . -1 U ;: .; 0 II U ... ;: -2 u -2 -4 .... - ... ~ ~ i ~ ... -8 -3 c: -4 II -5 ~ .~ .. ~ -;; II '" .: ak L 100 ....... ..... 0 ' -------_'---'..:.;,..._ _________- - --_M..;.;-;::.~"".-~-_t 0 _____ . .. \V· ..... .... .. .... . . .... . .. U ~ o .- ... ~ "'-1 -2 k Ii'\ -3 ~ -. -s - 10 -t: - . ... -I 1 - . c: ;: \ ...~ u ;: -6 '" II II 0 II 0 ..... c: 2 200 -200 -2 '/":" ~ •.•..•• . -300 -3 LC--~3~~6--~9--~I~2--~15~~18~-2~1--~2~4--~2~,--~~~ I 0 3 6 9 12 IS 18 21 24 27 3D 3l Tl,"~ (S~c) F lpre 7 .'1 Sensilivily coefficienls for Ihe finile heal-co"ducling specimens. Till!! (S~c) F I..,.e 7.• 9 Sequential corrections to the paramelers for finite heat-conducting u ample. - CHAPTER 7 MINIMIZATION O F SUM O F SQUARES F UNcnONS 7.9 EXAMPLES u nuZING SEQUENTIAL ESTIMATION 1.0 ~-r-----'-----"-T----'-------.r-----...., .5 -il.5L-'-----~---~;__----:20~---~25;----~ 16 10 15 Time (sec I F lpre 7.20 Residuals for thermocouples 1-4. wh ich are located at the insulated surface of the specimen. TIme (secl F Ipre 7.11 specimen. There are 89 observations for each thermocouple and the number of runs varies from 4 to 23 with the average being 12.75. This value is considerably less than 8 9/2 = 44 .5. For first-order cumulative errors, the ellpected number of runs is about n J / 2, which is 9.4 in this case. Hence the measurement errors would be more appropriately modeled as being cumulative rather than as being independent. (An analysis for p similar to that in Section 6.9.5 modified for nonlinear parameters gives p::::::0 .87.) I f the usual equation for estimating S 2 is used. we have Residuals for thermocouples 5-8, which are located a t the heated surface of the where n - 89 a nd m == 8. T he resulting s value is 0.339, which is different than the pure error estimates which were near 0.1 partly because the residuals are highly correlated. In finishing this analysis we should calculate the confidence region for k a nd c. But since the confidence region is discussed above for other cases, it is not given. Note, however, that various basic assumptions should be checked first. The standard assumptions o f additive, zero mean, constant variance, normal measurement errors seem reasonable. Also, the time and location measurement errors are probably negligible a nd no prior information is used. The assumption of independent errors is not appropriate. • CHAPTER 7 M lNIMIZAnON OF SUM OF SQUARES FUNCI10NS 41. These assumptions are designated 11101011. H a first-order autoregressive model is used, p is f ound to be a bout 0.95. Using this p, the covariance matrix o f the errors '" is a pproximated by ~DT as indicated in Appendix 6A. F or this case in which OLS is used, a n a pproximate confidence interval could be constructed using (7.7.9) a nd (7.7. lOa) a nd as discussed following these equations. A final c omment is m ade regarding the estimation procedure. D ata from all eight thermocouples were used and eight sets of residuals were found. Because the measurements a t x '" 0 a nd X " L a re both repeated four times. the same parameter values would be found using the average values a t e ach time a t these two locations. In other words, instead o f having m = 8, we would have m=-2. Justification for this simplification is given in Section 5.5. 7.10 S ENSmVITY COEFFICIENTS 411 S ENSntVrrY C OEmCIENTS B ard [5, p. 118J gives a brief discussion leading him to recommend the value o f (7.10.3) I I I I T here are several methods of determining the sensitivity coefficients. O ne c an e valuate them by generating sensitivity equations by differentiation of the model equations. O ne c an also use finite differences which are simpler in terms of computer programming a nd a re usually adequate, although there are cases when direct solution of the sensitivity equations is needed. This section discusses both methods. Even though one may evaluate the sensitivity coefficients utilizing finite differences, the study o f the sensitivity equations can yield insights into the sensitivity coefficients themselves. 7.10.1 7•• ' • I where f is the relative e rror i n the computed values o f S a nd where CJJ is defined by (7.4.13). H e also recommends that lower a nd u pper limits b e p laced o n the values given by (7.10.3) s uch a s IO-'I~I < 8bj < 1 0-21bJ T wo sources o f e rror contribute t o i naccuracies in approximating X!J(i): (1) the rounding error when two closely spaced values o f " a re subtracted, a nd (2) the truncation error due t o t he i nexact n ature o f (7.10.1). I n o rder t o reduce the truncation error the constant 0.0001 in (7.10.2) is m ade small. I f, however, it is t oo small then the rounding error becomes important. A more accurate finite difference approximation is the c entral difference scheme, . " /( bl ,··· ,bJ + 8bJ'"' . ,bp ) -1J,( b l ,··· ,b - 8bl" .. , bp ) J XII (,)~ 28b (7.10.4) ~ U nfortunately this approximation requires almost twice a s m any values o f 'I, to be calculated as does (7.10.1). F or this reason the use o f (7.10.4) is n ot r ecommended. I f there is uncertainty d ue to the use o f (7.10.1) o ne c ould repeat the final iteration by replacing (7.10.1) b y a backward difference a pproximation (replacing 8bJ b y - 8bJ ) a nd then averaging the parameter estimates from the forward a nd b ackward difference approximations. Finite DIfference Method O ne finite difference approximation for the sensitivity for the ith observation, j th parameter, a nd Ilh d ependent variable is the forward difference approximation, a'l/(i) . a n -XII(')~ 'II ( b Pj l ,.··, b j + 8bj'.'" bp ) - 'II ( b l ,··· ,bJ'"'" bp ) 8b ~ (7.10.1) where 8bJ is some relatively small quantity. O ne possible value of 8bJ is given by 7.10.1 SensitIvity Equation M ethod When the model is a set o f first-order ordinary differential equations o r a p artial differential equation, the solution frequently c annot be written explicitly. I n such cases explicit expressions for the sensitivities c annot be given. O ne c an, however, derive sensitivity equations which c an b e solved separately from the model. Consider, for example, the model (7.IO.S) (7.10.2) This simple procedure is frequently quite satisfactory. Note. however, that if the initial value of bj is chosen to be zero o r if bj a pproaches zero (7.10.2) c an lead to difficulty. where k a nd c a re parameters. T he b oundary a nd initial conditions are ax x-o, TI a q--k- ax TI a -0 x -L ' T (x,O)- ~ (7.10.6) C HAPTER 7 M INIMIZATION O F S UM O F S QUARES F UNCTIONS In this case T represents the dependent variable lJ . Differentiating (7. 10.5) and (1.10.6) with respect to k yields a2x, + - 2T = cx, a a2 2 k-- ax O=kax'i a x .. - 0 +aTI ' ax .. - 0 ax (7.10.7a) at - X'1 =0 a a x X -L ' x, (x,O)=O (7.1O .7b) where X , == aT / ak. Notice that (7.1O.7a) contains a nonhomogeneous term which is contained in (7.10.5). Solution o f (7.10.7) provides a solution to the sensitivity coefficient for k. Unlike the use o f differences such as in (7.10.1) which utilizes the program calculating 11, a special computer program must be written for calculating X, using (7 .10.7); in many cases this would involve a finite difference computer program. The sensitivity equation for c is found by differentiating (7.10.5,6) with respect to c, 7.10 SENSITIVTIY C OEFFICIENTS 413 Insight regarding the sensitivity coefficients can be o btained from this equation. F or example, the larger T (x,t)- ~, the larger o n the average are the magnitudes o f X , a nd X 2• (7.10.9) also suggests that both sensitivity coefficients need not be calculated because T must always be calculated. A final point is t hat in this case the parameters can be transformed so tha t T is linear in one o f them. This is true for the parameters ( I = k / c a nd K=k-'; with these parameters the model (7.10.5) a nd (7.10.6) becomes a2T = - T a 2 ax at (7.10. lOa) (1- aTI q K=-- , ax .. - 0 aTI ax =0 x -L a2x) ax 2 q= _ ax) ax aX21 = 0, a x x -o .N .... c:..n aX21 ax ,,-L = 0, X 2 ( x,O)=O (7.10.8b) where X2 ==aT/ac. F rom a knowledge of the solutions of the above equations one can predict several results. First, the basic problem (7.10.5) a nd (7.10.6) is linear a nd so are the sensitivity problems for X, a nd X 2. Second, X, is a function of k a nd X 2 is a function of c. Hence T is a nonlinear function of k a nd c. Third, in general X, a nd X 2 will not be zero a t the boundaries x . . O a nd L. Next, in (7.10.8), the only nonhomogeneous term is aT/at which, for monotonically increasing time values of T, will be positive; such a term acts like a heat sink which will cause X 2 to be always negative (provided T is m onotonically increasing). Fifth, the a2T/ ax 2 in (7.10.7a) which is equal to ( k/c) aT/at simulates a heat source b ut the aT/ax a t x =O term in (7.1O .7b) simulates a heat s ink a t x =O. Hence X , could be either positive o r negative a nd would in general be smaller in magnitude t han X 2• A nother point relates to the relationships between T, 1";, X" a nd X 2• By multiplying (7.10.7) by k a nd by multiplying (7.10.8) by c a nd then adding the corresponding equations together, one can show that T (x,t) - T, = - kX, ( x,t) - cX 2 ( x,t) (7.10.9) ~ (7. 10. lOb) T he sensitivity coefficient for K, d enoted X )' is found from a solution of ax) ( 7.10.lla) (1--=- (7.1O .8a) T (x,O)= ' I ' . 0-0 ax) ax I at =0 x -L x ) ( x,O)=O (7. 10. l ib) ' Notice that X) is i ndependent o f K a nd hence is linear in K. By multiplying (7.10.11) by K a nd then comparing with (7.10.10), one can find T (x,t)- ~= K X)(x,t) (7.10.12) a nd thus X ) ( x,t) c an be very simply found from T (x,t) using this equation. In summary, we have given two basic methods for evaluating sensitivity coefficients when the model is n ot a n algebraic form b ut r ather involves solution o f differential equations. I t is assumed that these equations cannot be readily solved to obtain closed forms since otherwise algebraic forms can be written. We imagine that these equations are solved numerically using finite differences o r some other method. The first method o f evaluating the sensitivities involves only a computer program to approximate the differential equations for the model; (7.10.1), which requires only dependent variable values, is used. In general this is the simplest a nd recommended method. The second method involves numerically approximating the sensitivity equations which derived from the model. Examples o f sensitivity equations are given by (7.10.7) a nd (7.10.8). Even though the sensitivity equations are n ot solved, inspection o f these equations can sometimes provide considerable insight a nd/or relations between the sensitivity coefficients. N ~ 414 CHAPTER 1 MINIMIZATION OF' S UM OF' S QUARES F UNcnONS 4., P ROBLEMS en _flEFERENCES 22. I . Bard. Y.• "Comparison of Gradient Methods for the Solution of Nonlinear Parameter Estimation Problems." S IAM J. Numrr. AMI. 7 (1910). IS7-186. 2. Beveridge. G. S. G. and Schechter. R. S.• Optimization: n rory and Practicr. McGraw· Hill Book Company. New York. 1970. 3. Levenberg. K .• " A Method for the Solution of Certain Non-linear Problems in Least Squares." Quart. Appl. Mat". 1 (1944). 164-168. 4. Marquardt. D. W .• " An Algorithm for Least Squares Estimation of Nonlinear Parame· ters," J . Soc. Ind. Appl. M at". I I (1963),431-441. S. Bard, Y., Nonlinrar Parameter Estimation, Academic Press. Inc .• New York, 1974. 6. Box, G . E. P.• " Use of Stitistical Methods in the Elucidation of Physical Mechanisms," Bull. Inst. Intern. Stat., 36 (19S8). 21S-2SS. 7. Booth, G . W. a nd Peterson, T . I.. Non-linrar Estimation, IBM Share Program Pa. No. 681 WI N lI, 1958. 8. Hartley, H. 0 ., "The Modified Gauss-Newton Method for the Filling of Nonlinear Regression Functions by Least Squares." Tec"nometrics 3 (1961), 269-280. 9. BOll, G . E. P. and Kanemasu, H., "Topics in Model Building, Part I I. On Non-linear Least Squares." Tech. Rep. N o_ 321. University of Wisconsin, Dept. of Statistics. Madison, Wis .. Nov. 191~. 10. Davies, M. and Whitling. I. J .• " A Modified Form of Levenberg's Correction." in Numrrical Met"ods for Non-Unear Optimization. edited by F. A. Lootsma. Academic Press. London. 1912. pp. 191-201. 11. Ganant, A. R.• "Nonlinear Regression," A m Stat. 19 (197S). 73-81. 12. Seinfeld, J. H. and Lapidus, L., Mat"tmatical Modtls in C"emical En,inrrri",. Vol. J., Proctu Modeling, Estimation and Identification, Prentice-Hall, Inc ., Englewood Cliffs, N.J ., 1914. 13 . Graupe. D .• identification o f Systems. Van Nostrand-Reinhold Company, New York. 1971. 14. Jones, A.: " SPIRAL-a New Algorithm for Nonlinear Parameter Estimation Using Least Squares," Comput. J . 13 (1970),301-308. IS . Beale, E. M. L., "Confidence Regions in Nonlinear Estimation," J. Roy. Stat. Soc. B ll (1%0).41-76. Box, O . E. P. &lid HWlter, W. G .. " A Useful Method l or Mociel.BuildiD. ... T «"-trIu 4 (1962), 301-311. Beck, J. V., "DeterminatioD 01 Optimum, Transient EXperiments for Thermal Contact Conductance," Inl. J. Heal MtUS TrtllUfrr 11 (1969). 621~ll. 24. Beck, J. V., "Transient Sensitivity Coerficienta for the Thermal C ontact Conductance," Int. J. Heat MtUS TrlltUfrr 10 (1967), 161S-1617, 23. 2S. Van Fossen, O. J., Jr., "Desian o f Experiments for Measurina Heat.Transfer Coefficients with a Lumped-Parameter Calorimeter," N ASA T«1t. Note N ASA T N 0·7857, Jan. I 97S. . 26. Burinlton, R. S. a nd May, D. C .• HaNIbooIr. Df Prohobility a NI S tlltutic, wit" Tabln. 2nd ed .• McGraw-Hili Book Company, New York, 1970. F amia, K.• "Computer·Assisted Experimental a nd Analytical Study o f T ime/Tempera. ture-Dependent Thermal Properties o f the Aluminum Alloy 2024-T3S I ," Pb.D. Thesis, Dept. o f Mechanical Enaineerina. M ichipn State University. 1976. 28. Draper, N. R. a nd Smith. H., Applirt! Rqrrssion ANllysu, J ohn Wiley.t: Sons, Inc.• New York,I966. 27. P ROBLEMS 7.1 U sing OLS w ith t he G auss l inearization m ethod, e stimate , ,-377 / (1+ fJl) r or t he datA I, Y, (a) S tart w ith fJ-6 0.2S ISO O.S 90 0.7S 70 2 5S i n t he m odel 1 30 fJ 20 b eing t he i nitial e stimate. Aaswer. 6 .0648. (b) S tart w ith f J-1 b eing t he i nitial e stimate. ( e) S tart w ith f J-12 b eing t he i nitial e stimate. 7.2 16. Gultman. I . and Meeter, D. A., "On Beale'. Measures of Nonlinearity," Trc"nomrtrics 7 (1965). 62~37. 17. Bllcon. D. W. and Henson. T. L., "Statistical Design and Model Building," Course notes, Dept. of Chemical Engineering. Queen's University, Kingston, Ontario. 18. Boll, G . E. P., "Filling Empirical Data." A"n. N. Y. Acad. S d, 116 (1960),792. 19. BOll, G. E. P., "Some Notes on Nonlinear Estimation." Tech. Rep. No. 2S, University o f Wisconsin, Dept. of Statistics, Madison, Wis .• 1964. 20. Ganant, A. R., "The Power of the Likelihood Ratio Test of Location in Nonlinear Regression Models," J. Am. Stat. Auoc. 70 (1975). 198-203: 21. BOll, G. E. P. and Hunter, J. S., "A Confidence Region for the Solution of a Set of Simultaneous Equations with an Application to Experimental Design," BiorMtrilco 41 (1954), 190-199. 7.3 7.4 U sing OLS w ith t he G auss l inearization m ethod. e stimate m odel , ,- fJd ( I + fJzl) r or t he d ata g iven i n P roblem 7.1. I 1 a nd fJz i n t he ( a) S tart w ith f Jt-l77 a nd fJz-S.8 b eing t he i nitial e stimates. ( b) S tart w ith f Jt-300 a nd f Jz-4 b eing t he i nitial e stimates. (e) S tart w ith f Jt-600 a nd f Jz-8 b eing t he i nitial e stimates. ( d) S tart w ith fJ t - 600 a nd fJz - 4 b eing t he i nitial e stimates. U sing OLS w ith t he G auss l inearization m ethod, e stimate ao a nd a. r or t he m odel a nd d ata i n E xample S.2.1. F or t he m odel T -8I.S+ 1 98.1e- IJt, e stimate u sing OLS a nd t he G auss m ethod t he p arameter fJ r or t he d ata g iven i n T able 6.2. U se a s t he d ata t he Y, v alues r or o bservations S, 9 , 11, a nd 17, AIIIWa'. ~ fJ. 7.49611 X 1 0-·, 4 16 CHAPTER 7 MINIMIZATION OF SUM OF SQUARES FUNCflONS 7.5 7.6 7.7 7.8 Repeat Problem 7.1 using the Box- Kanemasu modification. Repeat Problem 7.4 using the Box- Kanemasu modification. Repeat Problem 7 .la using the sequential method of Section 7.8.3.1. F or the model T= PI + P2e - P". estimate the parameters P I' /32' a nd 133 for the Yi d ata given in Table 6.2. Use a computer program incorporating the sequential method. Use O lS . Answer. 7.9 4., PROBLEMS Answer. b l -100.922, b 2-3.2877, P II -0.9744, P'2""O.024S, Pl l -0.OO7933, S . . . 0.8502820. M in 7.15 W hen there are large numbers of equally spaced measurements in the region 0" I, the O lS sum o f squares function is proportional to the integral, I" 1 2 1 S - 0 ( Y-'I) d l 100.60. 178.83. 0.000882. F or the special case of Y - 16 + 11 a nd ' 1- 13 1 /6 + 13211, show that For the model given by N u= PIReP,+P,loa Re estimate the parameters using a sequential computer program with O lS for the data given in Example 6.2.2. (Do n ol take logarithms of both sides.) 7.10 Repeat Problem 7.9 using as the sum of squares function, S =l:(I- where Z i'" I - Pi ' Use the method given in Section 6.8 to find the major a nd m inor axes o f this curve. Plot for S -I a nd 10. 7.16 F or the m odel",,,,pl/+exp(-p2/) a nd the data ~J I, Y, Using the Box-Kanemasu approximation and the initial estimate of M (D)= 10, estimate M for the problem of Section 7.5. 1. Use a programmable calculator or a computer. 7.12 Sometimes two functions are known and one or more parameters in one function is to be adjusted to cause "agreement." An example is 14 tanh 14, which is to be approximated by 14 2/( 1 + pu 2) for "small" values of U . P~ ~. (5S)I/4) provided S <0.67. I f S >0.97, what is the maximum value of p2? W hat condition(s) must be given with respect to Pl t o insure that the contour for fixed S is closed? ( c) Derive an expression for the PI value a t the end point(s) o f a n S contour. (At the end o f t he contour, the discriminant is equal to zero.) F or S -I, show that the e nd p oint is P I'" - 1.3898 a nd 132= - 0.9144 . ( d) E xpand S for small values o f P2 t o obtain S~5( 1 - PI + P1J2 a nd plot for S - 0.1. C ompare with the result o f p art (a). (e) Show t hat IX T XI-(2e- P>(I-e- P2»)2. F ind the location o f the 131 values corresponding to the minimum a nd maximum values of IXTXI. Plot IXTXI versus 132 for - 1.5 < Pl < 4. agreement in this case. 2 2 7.13 Another set of functions as in Problem 7.12 is 14 /[2(1 + /314 ») a nd 14/ 1(14) / 1 (14) where 1,(14) is a modified Bessel function. By using a Taylor series 0 show that "agreement" is obtained by letting P= i · 7.14 T he following measurements and associated standard deviations are for the model. 'I, = 13 I exp( - 1321;) Y, fl, 0.25 44 6 0.375 28 2 0.5 20 Use maximum likelihood estimation to estimate /31 a nd /3 2 assuming the standard assumptions indicated by 11011111 are valid. l et the initial values of PI a nd /32 be 100 a nd 3, respectively. Use a sequential method of solution. Also find the covariance matrix of the estimates and the minimum sum of squares. 3 (a) Plot the c;ontour of S =O.I in the PI,p2 plane. ( b) Suggest a mathematical function which would be minimized to obtain 0 . 125 66 3 2 ( b) F rom the discriminant in the e~pression for PI show that the P2 value can be as small as - In(I+(5S)I/4) a nd c an be as large as - In(l- ( a) By using Taylor series. show that " agreement" is obtained by letting 0 101 I I 2 a nd using the O lS sum of squares function, show that for a fixed value of S the value o ( PI is related to P2 a nd S b y 7.11 I, 0 Using the data o f t he preceding problem and the initial values o f 13,- 1.5 a nd P2'" 1.5, estimate p, a nd P2 using the Box-Kanemasu modification o f the Gauss method. 7.18 Show t hat the sensitivity coefficients for the model 7.17 I +PII .,,- P2+PJ I . ... CHAPTER 1 MINIMIZATION OF SUM OF SQUARES FlINCflONS C HAPTER 8___________________ a re linearly dependent ror a certain combination or the parameter values. (Hint: write the sensitivities in terms or z - fJ,t / fJ2 a nd R - fJl fJ2/ fJ,·! W hat c an you conclude rrom this linear dependence? What new parameters could be selected to eliminate the linear dependence mentioned above? 7.19 7.20 D ESIGN O F O PTIMAL EXPERIMENTS R epeat Problem 6.26 ror the model U sing the d ata in Table 7.14 between 3.6 a nd 18.0 sec ror temperat~re histories rrom thermocouple I (which is a t x - L) a nd thermocouple 5 (which is a t x - 0) estimate k a nd I I in the model given by (8.5.25) with T o- 81.66°F, L - I / 12 r. a nd q - 2.67 B tu/rt2-sec. Let t in (8 .5.25) be the times i~ T able 7.14 minus 3.3. In o ther words. 3.6 sec in Table 7.14 corresponds to time 0.3 sec in (8 .5.25). Use as initial estimates k - 40 B tu/hr-ft- of a nd I I - I f tl/hr. (Be carerul with units.) Use OLS with the Gauss o r o ther method. 7.21 R epeat Problem 7.20 b ut use the average or temperatures I. 2. 3. a nd 4 instead or 1 a nd the average or 5. 6. 7. a nd 8 instead of 5. 7.n Derive (7 .5.16) and (7.S.17). 7.23 Derive (7.8.18). 7.14 Verify the sensitivity coerricient values given in Fig. 7.8 using the approximate equation. (7.10. 1). A programmable calculator o r c omputer should be used. Investigate using values of 6bJ -rbJ where f is equal to (a) 0.01, (b) 0.001. a nd (c) 0.0001. 8.1 I NTRODUcnON Carefully designed experiments c an result in greatly increased accuracy o f the estimates. T his h as been demonstrated by various authors, b ut special mention should be made o f the work o f G . E. P. Box and collaborators. See, for example, Box a nd Lucas ( I) a nd Box a nd H unter (2). An important work o n o ptimal experiments is the book by Fedorov (3). I n m any areas o f research, great nexibiJity is p ermitted in the proposed experiments. This is particularly true with the present ready accessibility of large-scale digital computers for analysis of the d ata a nd a utomatic digital d ata acquisition equipment for obtaining the data. This means that transient, complex experiments c an be performed that involve numerous measurements for many sensors. With this great flexibility comes the opportunity o f designing experiments to obtain the greatest precision o f the required parameters. A common measure o f the precision o f a n e stimator is its variance; the smaller the variance, the greater the precision. Information regarding the variances a nd covariances is included in determination o f c onfidence regions. We shall utilize the minimum confidence region to provide a basis for the design o f experiments having minimum variance estimators. T he design o f o ptimal experiments is complicated by the necessity o f a dding practical considerations a nd constraints. T he best design for a ..19