Back to Parameter Estimation in Engineering and Science Home
Table of Contents
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION 112 C HAPTER 6_________________ _ April 1-170 2-228 3-008 4-340 5-005 6-092 7-303 8-180 9-025 10-147 11-031 12-133 13-205 14-047 15-093 16-131 17-264 18-134 19-036 20-359 21-183 22-101 23-280 24-080 25-110 26-053 27-277 28-050 29-105 30-343 M ATRIX ANALYSIS F OR LINEAR PARAMETER ESTIMATION S eptember 1-175 2-263 3-087 4-199 5-236 6-221 7-322 8-341 9-349 10-347 11 - 173 12-161 13-325 14-343 15-135 16-117 17-307 18-019 19-041 20-230 21-086 22-128 23-156 24-227 25-209 26-231 27-022 28-102 29-089 30-064 Use the model 1'/ = flo · ( a) W hich s tandard a ssumptions a re valid? ( b) E stimate flo using O LS u sing the April d ata. ( c) E stimate 0 2 u sing the April d ata . 5.J1 5.J2 6.1 INTRODUCTION T O MATRIX NOTATION AND OPERATIONS R epeat P roblem 5.30b a nd c using the September d ata . U sing the April d ata in P roblem 5.30. e stimate flo a nd fl. i n the model 1'/, = flo + fl. X, using O LS . Also estimate their s tandard e rrors. T he extension of parameter estimation to more than two parameters is effectively accomplished through the use of matrices. The notation becomes more compact, facilitating manipulations, encouraging further insights, a nd permitting greater generality. This chapter develops matrix methods for linear parameter estimation and Chapter 7 considers the nonlinear case. Linear estimation requires that the model be linear in the parameters. For linear maximum likelihood estimation, it is also necessary that the independent variables be errorless and that the covariances of the measurement errors be known to within the same multiplicative constant. Before discussing various estimation procedures, this section presents various properties of matrices a nd matrix calculus that are used in b oth linear a nd nonlinear parameter estimation. 6.1.1 Elementary Matrix Operations A matrix Y consisting of a single column is called a column vector. We use 113 ..... . ..... 214 C HAPTER' MATRIX ANALYSIS f OR U NUR PAItAMEl'ER t snMAnON 1.1 I NnODUcnON TO MATRIX NOTADON AND O PDAnONS 215 0) boldface letters to designate matrices. In display form V is written as ; a nd C is .r x r, is f ound using (6.1.3a) to be YI :" Y, V- (6.1.1 ) Y" where 0 is a n m X r matrix. Example 6.1.1 For the matrix A being 3 x 2 and B being 2 x 2, find the product AB. S quare brackets are used to enclose the clements of a matrix. This matrix has n elements a nd is sometimes called a n n X I matrix, that is, n rows a nd o ne column. An m X n rectangular matrix A is given by [." 0 '1 A - [all ]- 0 12 a22 : a", I a",2 .'"j S olution Using (6.l.3a) we get a,,, (6.1.2) 6.1.1.1 a",,, TNIUpO. o f M .trbt T he transpose o f a n m X n matrix A is given by I n the notation for an clement, a ll' the first subscript refers to the row a nd t he second to the column. You may may find it helpful in memorizing this o rder by mentally visualizing ! _ with subscripts i j. I f m - n in (6.1.2), the matrix A is said to be square. I f the matrix is s quare a nd also all - ajl for all i a nd j , the matrix is termed symmetric. 6.1.1.1 (6.1.4) If A is a symmetric matrix, A T_A. T he transpose o f the product AB is e qual to Prodllct 0 / Matrice.r T he product o f A times B, where A is a n m X n matrix a nd B is n x.s, is a n m X s matrix C, (6.I.Sa) which c an b e extended t o the product o f a ny finite number o f matrices, (6.l.3a) (6.I.Sb) Note that it is meaningful to form the product A times B only when the number o f columns in A is equal 10 the number o f rows in B. A square matrix A is said 10 b e idempotent if AA - A. The triple matrix product o f ABC, where A is a n m X n matrix, B is n x s, 6.1.1.1 IIIW. ... 1 htI"";".",, I11III E;~s F or a s quare matrix, A, the notation A - I indicates the inverse o f A. A nonsingu!ar matrix is a square matrix whose determinant is not zero. 116 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION (Examples of determinants are given below.) I t can be shown that if and only if the matrix A is nonsingular, it possesses an inverse A - I, that is 6.1 INTRODUCTION T O MATRIX NOTATION AND OPERATIONS 117 T he inverse of the s ymmetric 3 X 3 matrix A is (6 . 1.6) where 1 is a diagonal matrix and is called the unit o r identity matrix. 1 is given by I o 1= 0 = 111 [ a"a" - al, oo I o o I = diag[ I I . .. I] (6.1.1) 0 UOl) - 01l0)) ° llOl3 - °UOll ° 11 °)3 - 0:3 ° IlOIl-OIIOn symmetnc 1 (6. l.Ila) ° 11°22- 0 :2 ( 6.l.Ilb) Notice the notation "diag" in (6.1.1), which indicates the elements of a diagonal matrix. Let f/I be a diagonal matrix, (6.1.8) Clearly nondiagonal matrices are much more difficult to invert than diagonal matrices. For the inverses to exist the determinants given by (6.1.9b), (6.1.I0b), a nd ( 6.l.Ilb) c annot be equal to zero. For a method for evaluating higher-order determinants, see Hildebrand [IJ. See Problem 6.4 for a method for finding inverses of larger matrices. The determinant of the product of two square matrices is IABI = IAIIBI (6.1.12a) I f (6.1.12a) is applied to (6.1.6), the reciprocal of IA -II is found to be equal to IAI, Its inverse and determinant are given by (6.1.9a) (6.1.12b) (6.1.9b) Evidently diagonal matrices have some very convenient mathematical properties. Another convenient relation involving products of two square n X n matrices A and B is (6.1.13) A 2 X 2 matrix A and its inverse are (6.1.I0a) which is a relation similar to (6.1.5) for transposes. F or any square matrix A it is also true that (6.1.14) .~ - where IAI is the determinant of A, (6.1.I0b) or in words, the transpose of a n inverse is equal to the inverse of the transpose. An eigenvalue (also called characteristic value o r latent root) Ai of a 2 1' CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER f SllMAnON U be pO$iti~ definite a re t hat the inequalities, s quare matrix A is o btained by solving the equation a.. a ll a l2 a u a u a2) (6.\.15) There are n eigenvalues if A is n x n (counting repealed values). 1S.1.1.4 2 1t I NltODVcnON TO M ADIX NOTAnON AND O PDAnONS a ll a2J >0, etc. (6.1.18) an be satisfied. Also A is positive definite (semidefinite) i f a nd only i f all o f its e igenvalues a re p ositive (nonnegative. i.e.• AI~O with a t l east o ne e qual t o z ero). I f - A is positive definite (semidefinite), A is negative definite (semidefinite). PartitioMd Matrix Let B b e a n n x n matrix that is partitioned as follows (6.1.16a) 6.1.1.6 where By has size n jXnl'iJ=I,2, a nd where n l +n 2 =n. Suppose that IBI~O, IBIII~O, a nd IB221+0. Set A =B- I a nd p artition A as T ra« T he Irace o f a s quare matrix A is the sum o f the d iagonal elements a nd is also e qual t o the sum o f the eigenvalues A" ( 6.1.I6b) t r(A). I" all - I -I where AIj h as size n, X nJ for i ,j = 1,2. T he c omponents AI} a re A ll" [BII - BIlB221B21 r ' "" B ii' + Bii 'B Il A22 B2,Bii I (6. 1.1 1 a) All=- -BiiIBu[Bn-B2IBiiIBurl- -BiiIBIlAu A ll'" [ B 22 - B21BIIIBur I = B n l + B n lB 21AIIBI2 Bn A21 == - Bn IB21 [BII - BIlBn IB21 r I ... - Bn IB 2I A" (6.1.11c) (6.1.11d) T he d eterminant of B is ' B -1 ... d lag [ B-IB-'] II 12 6.1.1.5 (6.1.I1f) Positive Definite M atri«s Necessary and sufficient conditions for the square symmetric matrix A t o (6.1.19) I t c an also be shown t hat the determinant o f A is equal t o the product o f the eigenvalues, (6.1.20) 6.1.1 Matrix Cakulus I n p arameter estimation the a p v ector o f p arameters, operator VfJ by f~rst P. derivative with respect to the elements of is frequently needed. Let us define the a aP I (6. 1.1 1e) I f B u a nd B21 a re null matrices (only zero components). then the inverse o f B is m ore simply given by A, I-I (6.1.11b) l I" (6.1.21 ) V fJE a ap, Because VII is p x I i t must be applied t o the transpose o f a c olumn vector o r t o a row vector. Let C be a n n X I c olumn vector which is a function o f p. I no CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 6 .. I NTRODUCTION T O MATRIX NOTATION AND O PERATIONS 6.1.3 I II Quadratic Form T hen we o btain aC I aC2 afJl a/31 (6. \.22) [cl" ' cn ]= v cT = (I I n estimation sum o l s quares lunctions a re l requently encountered. These functions c an b e c onsidered to b e quadratic forms Q, aCn afJl a afJl a aC I aC 2 aCn afJp a/3p a/3p a/3p which is a p X n matrix a nd whose determina.n t is called the Jacob;~n. M any times the matrix derivative is applied. to a scalar. ~onsld~r Vp o perating upon the matrix p roduct AO where A IS I X n and 0 IS n X I , then where III is a symmetric positive semidefinite matrix. Q is t ermed positive definite if the conditions given in (6.1.18) a re s atislied b y III [ I, pp. 48-52); for these conditions Q is always nonnegative for all real values o f the variable a j • I n some analytical considerations the equivalent expression for Q. Q =tr(IIIAAT) = (V~>< IIAII >< nl)oln >< II + (V~>< IIOTII >< nl)ATln>< II (6.1.23) T he sizes of the matrices are indicated by the superscripts in brackets. O ne a pplication of (6.1.23) is for A = CT , 0 =~. an~ C ap X I c~lumn v ector whose elements are not functions of the /3 s. U smg (6.1.23) gives (6.1.24) is m ore convenient than (6.1.27). Applying the o perator V p t o Q d efined b y (6.1.27). assuming t hat A is a f unction o f P a nd III is not, gives V p Q-V p (ATIIIA)=(V pAT)IIIA+ [ V p(ATIII)]A I Vp Q=2(V pAT)IIIA since (6.1.25) A nother i mportant application of the matrix d~rivative i~ V{ I o perating o n the product AO where A is a I X n m atrix t hat IS a f unction of p a nd 0 is a n n X m matrix which is n ot a function o f p. In the same m anner t hat (6.1.23) was derived we o btain (6.\.26a) I (6.1.30) Aln>< 11= xln><pl plp>< II w here X is i ndependent o f p. Vp Q=2(V pPTXT)IIIXP w here (6.1.26b) is used. (6.1.31 ) T hen s ubstituting (6.1.31) into (6.1.30) gives T Vp Q=2X IIIXP (6.1.26b) (6.1.29) w here (6. \.23) a nd (6.1.26a) are used. In linear estimation, A in the q uadratic form is given b y I I f A"" PT a nd 0 has the size p X m, (6. \.26a) then yields (6.1.28) I; Q = (XP ) TIIIXP (6.1.32) (6.1.33) 222 C HAPTER' MATRIX ANALYSIS f OR LINEAR PARAMETER £S11MATION 6.1." Expected Value of Matrix and Varlance-Covarlance Matrix 6.1.1.1 ...... 6.. I NnODUcnON TO MA11UX NOTADON AND O PUAnONS m A n i mportant special case in connection with the covariance matrix (sometimes this term is used rather t han v ariance-covariance) is for Y c ontaining additive zero m ean errors e, o r Expected Value Matrix T he integral of a matrix is given by (6.1.31) 1" o f al,,(t)dt Let 1/1 be d efined as the covariance matrix o f Y. F or t he conditions given in (6.1.31). 1/1 is also equal t o cov(e) o r (6.1.34) 1/1 == cov(Y) - cov( e) (6.1.38a) where 1/1 is in detail This result c an be used to find the expected value o f a column vector E(e:> i y,,,)(I), 1/1- : E (Y.) E (ele,,) (6.1.35 ) E (Y,,) ~(.'..) ]' E(r~) f Y"J( Y" )dY" (6.1.38b) w here J( Y,) is the probability density of Y,. 6.1.1.1 VarUurc~CovaritlMe Matrix A n i mportant application of the expected value o f a m atri" is the variance-covariance matrix of a random column vector Y which is c ov(v''' )( I I) = E { [ Y - E (Y) ] [ Y _ E (Y) ] T -E Il I } 1 Y I- E( Y .) : [ YI-E(Y.) . .. Y ,,-E(Y,,)] since 0 ,2_ V (r;)- E(rl) . I n m any e stimation methods, 1/1 given by (6.1.38b) plays a n i mportant role. 6.1.4.1 Coval'ialw 0 / L iMtu ComlHlllltiort 0/ Vector Random Let zI",)(I) b e l inear in the r andom v ector e,,,)(I) o r Y ,,- E( Y,,) (6.1.39) ) cov( YI • Y2 ) cov( YI' Y 2 ) cov( Y 2• Y 2 ) T he m atrix G is not a r andom matrix. I t c an be proved that the covariance matrix of z is given by cov( Y.. Y,,) cov( Y2' Y,,) cov( YI' YI (6.1.40) (6.1.36) cov( YI• Y,,) cov( Y 2• Y,,) N otice that this is a symmetric VaMbies n )( n matrix. cov( Y". Y,,) where 1/I-cov(e). A d erivation of (6.1.40) is given next. Let (6.1.41) m CHAPTER 6 MATRIX ANALVSIS FOR LINEAR PARAMETER ESTIMATION 6 .l I NTRODUcnON T O MATRIX NOTATION AND OPERATIONS 6.1.S Model In Matrix Terms a nd then E (ZZT) c an be written as (6.1.42) C ertain conditions regarding the model a nd assumptions c an b e expressed more generally using matrix rather than algebraic notation. Let the dependent variable 1) b e given by the linear-in-parameters model where ;.1 = 1.2•. ..• n a nd where we used (6.1.51 ) (6.1.43) a nd (6. 1.3b). Using the linearity property of the expected value operator. E (afl + bf2) = aE ( f l ) + bE ( E 2 ) (6.1.44) where for the same dependent variable measured n times, the dimensions o f 1) are [n X I]. a nd those o f X a re [n x p] . W e term X the sensitivity matrix; it is a matrix o f derivatives with respect to the parameters. The parameter vector fJ is [ f x I]. Thus we have permits (6.1 .42) to be written as E(ZZT) = [ X II L L gij E(h Jk ) glk] = GE(H)G 1)11 X l, X II' fJ. fJ2 p= Xy X II1 (6. 1.52 ) fJp (6.1.46) E need not be zero (2). E quation 6 . 1.51 includes all the models given in C hapter 5. F or example. for Model 5 (TJ;= fJ.XjJ + fJ2X;2) the X matrix is -[~II ~121 Expected Value o f a Quadratic Form T he q uadratic form given by (6 . 1.27) a nd (6.1.28) is X-. Q =AT cJ)A = tr(cJ)AA T) (6.1.47) Let A be an I n x I J random vector but cJ) be not a random matrix. Using the expanded form of a matrix p roduct given by (6.1.3b) and the linearity property of the expected value operator indicated by (6 .1.44), it can be shown that E ( Q ) = tr (cJ) E (AAT) } _ XIII . (6.1.53) X II2 I t is p~ssible t hat the terms Xjj could represent different functions o f the same variable such as t ;. Some examples a re X ;I= I . (6.1.48) In general the covariance matrix of A is equal to X ;)=sin6t; In other cases Xjj might be composed o f different coordinates such as time (6.1.49) ... . . X, p X Z2 X= 1J= cov(z) = G cov( E)G T where the expected values of the components of 1)2 (6.1.45) X. 2 X 2• 1)1 T S ince'" = E (H). we have proved (6.1.40). A more g.eneral form of (6.1.40) which can also be proved is 6.1.4.4 115 I; a nd position Zi' Using this equation in (6.1.48) a nd also using (6.1.47) yields E ( Q ) = E (A T)cJ) E (A) + tr ( cJ) cov( A) } This expression can be used to obtain an estimator for 0 2 (6.1.50) for certain cases. Similar to the last case 1J might represent the output o f a chemical process which might be a linear function of air flow rate qj. cooling water inlet temperature 1'; . a nd acid concentration C; ; here if there is a constant term .0 •• 226 CHAPTER 6 M ATRIX ANALYSIS FOR LINEAR PARAMETER E S11MAll0N 6.1 I NTRODUcnON T O M ATRIX N OTATION A ND OPERATORS 227 we have would set This same example involving these independent variables is given in Section 13.12 o f Brownlee (3). C hapter 6 of Draper a nd Smith (4). a nd C hapter 5 o f Daniel and Wood (5). A nother situation that can be represented by ' I in (6.1.51) is called the multiresponse case in the statistical literature. In this case ' I a nd X m ay be given by Unfortunately in most such cases the model is n onlinear in terms of the parameters which are usually rate constants in the reaction problem just mentioned. F or didactic purposes. however. let CA(/;) a nd CB(t;) b e given by CA(I,) - P. I. (I,) + P212 (I;) CII (I;) - P.Jl (I,) + I ll!" (I;) ' Ie I) 11 I ( i) '1(2) 112 ( i) where the ./.;(/;) a re known functions. Then the sensitivity coefficients are where ' I ( i) = ' 1= (6.1.54) 11", ( i) ' I(n) X (I) X I. ( i) X(2) X I, ( i) ( i) X 2p ( i) X li where X (i)= X= X (n) (6.1.55) X",I ( i) An example involving the partial differential equation describing heat conduction is h eat transfer in a plate o f thickness L with temperatures measured a t several locations as a function o f time. Suppose t hat the observations a t m positions X j have been made a t times 11./2• •••• 1". T he temperatures. which would be the dependent variables. are designated T (x"I;) a nd then X"" ( i) for i -I • ...• n ; j -I • .. .• m N ote that the variable XjA ( i). which is called a sensitivity coefficient. is for the j th d ependent variable in 11. for the k th parameter. a nd a t the ith time. Also observe that i = I . .. .. n ; j = I • ...• m F or the boundary conditions o f a known constant heat flux o f qo a t x =O a nd no heat now a t x = L . the temperature distribution for large times is given by (6.1.56) where In the above definitions of ' I a nd X in (6.1.54) a nd (6.1.55) there is a total o f n " times" a nd m d ependent variables resulting in ' I being a (mn X I) vector a nd X having dimensions o f (mn X p). I f m = I. then the j subscripts in (6.1.54)--(6.1.56) a re dropped a nd replaced by i. o r X IA(i)=XiA a nd ')_ qo/, JI(' L' " I(i) ... ,,;. Multiresponse cases commonly arise in parameter estimation problems involving ordinary a nd partial differential equations. An example involving ordinary differential equations occurs in chemical engineering when the concentrations of several components present in a chemical reaction are measured as a function of time. T o be more precise. let the concentrations o f c omponents A a nd B b e designated CAU;) a nd CIIU/) a nd then we T he quantity p is density. c is specific heat. a nd k is thermal conductivity. The initial value (i.e .• a t 1 - 0) o f T is TC)t which is known. I f there are measurements a t two locations (such as a t x =O a nd X " L ). m =2 a nd the "ii) values are j -I.2 UI C HAPTER 6 MATRIX ANALVSIS F OR LINEAR PARAMETER ESTIMATION 6.1 a nd the sensitivity coefficients for i = I • ... •n are X I I ( i) = f l ( i), X 21 ( i)=fl ( i), 5. X 12 ( i) = f2 ( I ) X 22(i)=f2(2) 6. Actually this case could also be considered a single response but multiple sensors are required for m > I . Many other examples can be generated. 6.1.5.1 Identifiability Condition In order to uniquely estimate a ll p parameters in the fJ vector it is necessary that 7. (6.1.57) 8. for estimation problems involving least squares or maximum likelihood. This criterion is analogous to those given by (5.1.2) for some simple models. When maximum a posteriori estimation is used, it may not be necessary that (6.1.57) be true. See Appendix A. 6.1.5.1 Assumptions T he s tandard assumptions given at the end of Section 5.1 are repeated and amplified below. In making this list all cases considered in this text are included but doubtless many other possibilities exist that are not explicitly covered below. I. 2. ~ . ..... r-~ c..~ Y = E (YI fJ) + e = 'Jf(X.fJ) + e (additive errors in measurements) (6.1.58) No. measurement errors are not additive. I Yes, measurement errors are additive. Also the regression function 'Jf is correct and does not involve any random variables. o E (e) = 0 (zero mean measurement errors) (6.1.59) No. the errors in Y do not have zero means. In other words, the errors are biased. Yes, the errors in Y have zero mean. 3. V ( Yil f J)= 0 2 (constant variance errors) (6.1.60) o No. V ( Y,I fJ)~constant; that is. V ( Y,I f J)= 1 Yes. the errors Ei , i = I. .. . , n have a common variance 0 2• 4. E { [ f j - E ( fj)] [ f j - E ( s) J} = 0 for i~ j (uncorrelated errors) (6.1.61) o No. the errors Ej are correlated. 1 Yes. the errors f j are uncorrelated . o or I NTRODUCTION T O M ATRIX N OTATION A ND OPERATIONS 2 No, the errors are described by an autoregressive process. See Section 6.9 e has a normal distribution (normality) (6.1.62) o No, the distribution is not normal. I Yes, the distribution is normal. Known statistical parameters describing e (6.1.63) o No, y,=cov(e) is known only to within an arbitrary multiplicative constant, or y,= 0 20 where 0 2 is unknown and 0 is known. I Yes. y ,=cov(e) is completely known. 2 No, but £ is known to be described by a n autoregressive process although the associated parameters are unknown. See Section 6.9. 3 No, y ,=cov(e) is completely unknown. Errorless independent variables [ V ( Xij) = 0 f odinear models] (6.1.64) o No, there are errors in the independent variables. I Yes, there are no errors in the independent variables. fJ is a constant parameter vector and there is no prior information (nature of parameters) (6.1.65) o No, fJ is a random parameter vector and the statistics o f fJ are unknown. Yes, fJ is a constant parameter vector and there is no prior information. 2 fJ is a r andom parameter vector with a known mean o f I' a nd known covariance matrix Vp ' Also fJ is normal. The random vectors fJ a nd Y are uncorrelated. All the measurements are considered to be from the same " batch." 3 fJ is a constant but unknown vector but there is subjective prior information. Since fJ is unknown, the prior information regarding fJ is summarized by fJ b~ing N ( I'p, Vp) a nd by cov( Pi' lj) = O. As indicated in Chapter 5 the eight standard assumptions are denoted 11111111. The estimation problems are generally less difficult (if the " best" estimates and their variances are of interest) when the standard assumptions are valid. I f, for example, the covariance matrix y, o f the errors is completely unknown and there are no repeated observations, the complete y, matrix cannot be estimated along w ith' the parameters. From repeated observations, investigations o f the measuring devices. and familiarity with estimation in such cases, the y, matrix gradually becomes better known, however. Then estimators other than OLS can be used although a nondiagonal y, does introduce complications; see Sections 5.13 and 6.9. I f Y, is completely unknown in a multiresponse case, Box a nd D raper [6] recommend the use of a different method than OLS; see (6.1.73a). ., 231 C HAPlER' MATRIX ANALYSIS f OR UNEAR PARAMETER f S11MAnON '.1.6 Maximum Ukellhood Sum or Squares FundioDS I N'nODUcnON 1 '0 MATIIX NOTAnON AND O PlllADONS 1.1.1.1 & otrallhpadertl JllltWIt. (MIIltiIYIpOIUe) Suppose that the measurement errors are additive. and have zero mean. normal distribution. known statistical parameters. and errorless independent variables. Also there is no prior information. This case is designated 11--1111. The probability density function ror the above assumptions is f (VIII) =(21t) - N'21~1-'/2exp[ - (V -1I{~-1 (Y - tl)/2] (6.1.66) where ~ - I is the inverse of ~ given by (6.1.38) a nd N is the total number of observations. Before the data are available 1(YIII). the probability density of Y given fJ. associates a probability density with each different outcome Y of the experiment for a fixed parameter vector II. After the data are available. we wish to find the various values in II which might have led to the set of measurement Y actually obtained. In the maximum likelihood method the likelihood junction L ( lilY) is maximized; L ( lilY) has the same form as 1(YIII) but now II is considered variable and Y is fixed . The procedure for forming estimators from L ( II I Y ) usually first involves taking the natural logarithm of (6.1.66) to get 1(IIIY)-lnL(IIIY)~ - U i[Nln(21t)+lnl"'I+SMLJ 231 Q r.w Suppose t hat m different dependent variables are measured a t n different discrete instants. F or example. temperature and concentration might be recorded with time. The formulation below is also appropriate if the same physical quantity. such as temperature. is dynamically measured utilizing several thermocouples each positioned a t a different location. In multiresponse cases, the notation given by (6.1.54) is u sed for 11 a nd a similar expression is used for the nan observation column vector, Y. Let the inverse of ' " be W which is partitioned into a symmetric n x n matrix with components being m x m matrices. ~(I' I ) w- . W (I,2) W (I,n) W (2,n) [ T(I.O) l' (6.1.69) W (n,n) Then the maximum likelihood sum o f squares function is (6. 1.70) (6.1.67a) where (6.1.67b) When wit is known as assumed above. maximizing L (IIIY) is equivalent to minimizing SML' Several possible algebraic forms of SML are given next. which can be simplified further for independent errors. F or example. let the errors be zero mean and independent in "time," t hat is, E[I!(i)I!(j)T]_ o for i '*j. Then (6.1.70) reduces to SML - 6.1.6.1 Smgle I Hpnukttt JltI,u,ble (SiIIgle R~$pOIIM) C IIR Let ' I represent a single dependent variable that is measured at N - n different times o r conditions. For correlated errors, wit - I is not diagonal. Let the inverse of wit be given by the symmetric matrix W with components WI/, F or this case SML is given in algebraic form by " (6.1.71) I f. in addition, the f)(i) values are independent for a given i, that is, EI~(i)!k(i)J-O for j '* k, SML further simplifies to I II " SML - ~ ~ [ lj(i)-1Jj(i)Ya;2(i) ( 6.1.72) ) -1 i - I " S ML- ~ ~ ( Y;-1JI)(Yj -1J)Wij (6.1.68a) I -I j - I I f ~ is diagonal with components f [Y(i)-1I(i)(W(i,i)[Y(i)-1I(i)] I -I a1, then " SML ~ ~ ( Y, -1Jio,- 2 (6.1.68b) I -I F or further discussion of SML for single response cases. see Section 6.5. since W.u(i)-o.u-~i). In this section, SML is given with wit a nd its inverse assumed known. Exactly the same estimates for the physical parameters (fJ) are obtained if ~ is known only to within a multiplicative constant, that is. ~- 0 20 where 1 0 is unknown b ut 0 is known. This case is designated 11-1011. Should the ~ matrix contain more unknown parameters than 0 1, the ML procedure becomes more complicated. See Section 6.9.5. 231 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION When the y, matrix is completely unknown, Box a nd D raper (6) recommend for the multiresponse case that the following determinant be minimized, 6.1 I NTRODUcnON T O MATRIX NOTATION AND OPERATIONS rTfJ, say, which can be expressed as a linear combination o f the observations, the one with minimum variance is fTbMV ' T o s tart the proof, we note that b MV is simply expressible in terms o f fJ a nd I: . (6.1.76) (6.1.73a) where (6.1.17) where SIJ is given by so that (6.1.73b) E ( b MV )'" fJ + A E (I:) = fJ ) F or m = I , this reduces to ordinary least squares. F or m ;;' 2 the minimization of (6.1.73a) must be accomplished by the solution of a set o f nonlinear equations even when the model is linear in the parameters. For other discussion of multiresponse cases. see Sections 6.7.5 a nd 7.8 a nd the Hunter paper (1). 6.1.7 (6.1. 78) o r in words, bMV is a n unbiased estimator of fJ. T he covariance matrix of bMV is cov(b MV )= E[ (Ae)(Ae)T] = E[ Aee T AT] = AOA T 2 =(X TO- 1X) - la 2 a (6.1.79) Gauss-Markov 1 beorem An important result in estimation is the Gauss-Markov theorem: if 11 = XfJ is a linear regression function. if errors are additive and have zero mean. if the covariance matrix of E, y,. is known to within a multiplicative constant ('" = Oa 2). a nd is positive definite, if there is no error in the independent variable. and if the estimation procedure does not use prior information (assumptions 11---011), then of all unbiased estimators of any component Pi of fJ which are unbiased a nd which are linear combinations of the observations, the component biMV of b MV has minimum variance where Consider now the scalar eTb MV ' It follows from (6.1.78) a nd (6.1.79) that (6.1.80) a nd (6.1.81 ) Next we consider any unbiased linear estimator of ,TfJ. It c an be written as (6.1.74) (6.1.82) The covariance matrix of b MV is (6.1.75) Since we are usually more interested in the use o f o ur estimates of fJ in estimating expected values of Y o r o ther linear combinations of the components of fJ, we shall prove a stronger theorem which includes the one just stated. namely: under the conditions stated above. of all unbiased estimators of any particular linear combination of the components of fJ, where C is a vector of constants. Since rTb MV + C Ty is unbiased a nd since (6.1.83) we see that CTXfJ is s imply zero. Since, by definition o f unbiased estimator, this condition cannot depend o n the value o f fJ, CTX must be a . ... 2J4 CHAPT£R 6 M ATRIX ANALYSIS FOR LINEAR PARAMETER £ SJ1MAnON 6.2 LEAsT SQUARES F SnMAnON r .J ~. null vector. The variance of fTbMV+CTy is V(fTbMV+CTy) . . E (fT(XTO-'X) - IXTO-le+CTe) = E (fT(XTO-IX) - IXTO-IU e qual t o zero. Using (6.1.30) a nd (6.1.26b) we g et 1 V~SLS-2[V ~(y _ XI)T][y - XI)] V~(Y - XI) ) T - T O-IX(XTO-IX) - I,) ( 6.2.2) (6.2.3) - V/JIJ T XT _ - XT a nd thus (6.2.2) equated to z ero a t I) - bLS p roduces + E (CTu TC) + 2E(C T T O-IX(XTO-IX) - If) ee = fT(XTO-IX) - 1'a 2 + C TSlCa 2 + 2C TX(X TO-IX) -1 101 ; estimator, 6 .U I bu - (XTX) - I XTY I (6.2.5) This estimator requires for unique estimation o f all the p p arameters t hat the p X P matrix XTX b e n onsingular o r IXTXI + 0. T his m eans t hat a nyone c olumn in X cannot b e p roportional t o a ny other column o r a ny linear combination o f o ther columns because i f s uch a proportionality (i.e., linear dependence) exists, IXTXI - O. (See Appendix A a t the e nd o f t he text.) T he c ondition IXTXI+O also requires t hat n, the n umber o f m easurements o f ~, b e c:.qual t o o r g reater than t he n umber o f p arameters p . I f t he predicted curve ~ is not to pass through each observation i t is further necessary t hat L EAST S QUARES E STIMATION I n o rder to obtain parameter estimates using ordinary least squares (OLS). n one of the standard assumptions need be valid. Instead a prescribed sum o f squares is always minimized. When nothing is known regarding the measurement errors. OLS is recommended. T he r eader should be made aware. however. that he may know more a bout the measurement errors than h e first thinks. After analyzin., similar sets of data. for example, he c an learn much regarding the errors. When information regarding the measurement errors is p resent a nd a ;l is far from constant, some estimation procedure other than OLS is recommended. (6.2.4) Pr~multiplYing b y t he inverse ( XTX)-I results in the ordinary least squares (6.1.84) . Since CTX =O. the third term is o T he first term is unaffected by a choice o f C. Since Sl is positive definite cTac is positive unless C is the null vector. T hus no unbiased linear estimator of ,TIJ has a variance less than the variance of ,Tb MV . Hence ,Tb MV is a minimum variance unbiased linear estimator of ,TIJ a nd in p articular biMV is a minimum variance unbiased linear estimator of /JiMV . U - XTy + XTXbLS - 0 n >p+1. Estimators obtained from (6.2.5) for Model 2, i "1- /lXI' a nd M odel S, ' 11- /lIXII + /l2 X I1' a re those obtained in C hapter 5; i n particular see Table 5.3. F or t he single response case a nd using X given by (6.1.52), XTX a nd X Ty a re given by (6.2.6) OrdInary Least Squares EstImator (OLS) T he sum o f squares function used for ordinary least squares with the linear model '1 - XIJ is symmetric ~. XI'Yll (6.2. 1) X Ty_ which is a q uadratic form . T he principle of least squares asserts that a set of estimates of parameters c an b e obtained by minimizing SLS. T his is accomplished by setting the matrix derivatives o f SLS with respect to I) [ l :X"YI where the summations are from I t o n. (6.2.7) 136 CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION . .' T here a re m any p ossible a pplications 0 f O LS . O ne is t o find a c urve . A h . t o find p arameters w hich is " best" fit t o s ome a ctual d ata. n ot e r IS . h aving p hysical significance. O ne c an a lso use O LS t o pas~ a n a pproximate c urve t hough s ome p oints t hat h ave b een a nalytically derived. 6.1 LEAST S QUARES ESTIMATION S olution The model can be written in the more familiar form where Y = log N u, 11 = E ( Y), a nd I - log Re. In terms o f Y a nd E xample 6.2.1 Give the components of the XTX and XTy matrices for the model 'I, = PI 'i Y; + P2 1, + Pll/ -I - 0.347 2 0 - 0.076 3 I 0.263 4 2 0.708 5 3 1.20 I, the data are 6 4 1.75 7 5 2.39 which is equally spaced in I . The X T matrix is given by S olution T In this example X is X II '2 I~ I. X= II • I II symmet;ic It'l I I:l , X Ty= Itj~j [IV 1 E xample 6.2.2 2 Also give N u as a function o f Re a nd compare with the below given values. fv Nu I 3 9 I 4 16 ~l 25 56 224 ] , 980 X Ty_ 5.885 ] 24 .57 [ 101.3 The inverse of the symmetric XTX matrix is the symmetric matrix (see (6.1.11»), - 1176 3724 - 784 -7~1 196 Then from b u-(XTX)-IXTy we find the estimated parameters to be 1 0gNu= PI + P210gRe+ III (JogRe) + f Re 14 56 224 I llY j I t: Many experimentally based heat transfer correlations are given by the Nusselt number (Nu) as a function o f the Reyr.olds (Re) and Prandtl (Pr) numbers. One such correlation is for flow normal to heated cylinders and wires. Even though Nu may vary several orders of magnitude. typically the variation in the measurements for Nu is ± 30% irrespective of the magnitude of Nu. On a log- log plot of Nu versus Re for the geometry mentioned. the data appear to be described by a second d ree curve and the variance of logNu is nearly constant. For the below values g :!n by Welty (8. p. 268). use OLS to find the parameters in the Iinear-in-theparameters model . - .J [I~ 56 [ n It, I 2 4 11 X TX_ x Tx= I 0 0 Note that the rows in X T (or columns of X ) are not proportional o r even nearly so. This indicates that little difficulty will be encountered due to XTX being nearly singular. Using the results o f Example 6.2. 1, the XTX a nd X Ty matrices are found to be and then ,... c[ -l 0.1 0.45 \0 0.84 1.83 102 loJ lo-t 105 5.1 15 .7 56 .5 245 bl"s - [ - 0.0734 0.314 0.0358] In o rder to express the model in terms o f N u, let I llz:logB which produces A... 0.844. With B in the log Nu expression we can write N u , . BReb2 + b , I oaR. ... 0.844 ReO.ll .. + 0.03511", R . Values obtained using this expression are given in the I hird column o f Table 6.1. A comparison of the recommended a nd N u values shows an agreement within ± 3%. Note that the largest residual ( Nu- Nu) occurs at R e-lQ5 a nd is 0.9. which is much larger than those for small Re. I f a n OLS analysis on N u ( and not 10gNu) had been used. the magnitude of the residuals would have been more uniform over the Nu range, causing the relative differences in columns two a nd three of Table 6.1 to be much larger for the small Nu values than the large values. Hence in this case in which the relative differences in the given a nd predicted values o f N u were to be nearly constant, OLS estimation o n log Nu produced the desired result 2JI CHAPTER' MATRIX ANALYSIS FOR LlN£AR PARAMETER £S11MAnON U whereas O lS on Nu would nol. Furthermore the transformation from Nu and Re to 10gNu and log Re. respectively. results in a linear estimation problem while the problem in terms of Nu and Re is nonlinear. These data are considered further in Example 6.3.1. T able 6.1 UAS1' SQUAUS I'S11MADON s ince AX - I. T hen c ov(bu) i l c ov(b u ) - E[ ( bu - fJ ) (bu - fJ )T] - E[ (IJ+Ae- fJ)(IJ+Ar- fJ )T] -A-ItA T Given a nd O LS Values l or Nu; l or Example 6.1.1 since -It- E(u T). (6.2.10) Utilizing t he definition o f A t hen gives Nu, Re 0.1 I 10 100 lol I~ lOS 6.1.1 Given Values 0.45 0.84 1.83 5 .1 15.7 56.5 245 (6.2.11 ) Values 0.4452 0.8445 1.889 4.983 15.50 56.85 245.9 OLS in which we also introduce the symbol P u ' W ithout the a dditional s tandard a ssumptions o f u ncorrelated a nd c onstant v ariance measurement e rron (1111--11), t he OLS e stimator does not provide the minimum variance estimator. F or this reason. bLS is said to be n ot efficient. S uppose t hat the s tandard a ssumptions o f 1111-11 a re valid, then ." is given b y a nd we have Mean of the 0 1.8 E stimator In order to obtain the p arameter v ector b lS it is n ot necessary to invoke any of the standard assumptions. However, if some statistical statements are to be made regarding b LS • we m ust know several facts regarding the errors. Let there be additive. zero mean e rrors in Y a nd let X a nd P b e nonstochastic (11----11). T hen the expected value o f bLS is which is the m inimum c ovariance matrix o f bLS a nd t hus for these assumptions, which include additive, zero mean, un c orrelated, a nd c onstant variance e rrors, bLS d oes provide a n e fficient estimator. (See Section 6.1.1 o n t he G auss-Markov t heorem.) I n a ddition t o n eeding the covariance o f t he parameters, o ne m ay desire the covariance matrix o f a c ollection o f p redicted points o n the regression line o r s urface. U sing (6.1.46) with t he a ssumptions o f a dditive, zero mean e rrors a nd n onstochastic X a nd fJ (11----11) we find for the set o f points represented by Y- X,b LS, £ (bLS) = E [ (XTX) - 'XTy] = (XTX) - 'XTE(Y) = (XTX) - 'XTXP_ fJ ( 6 .2.8) With the four assumptions given above. the least squares estimator is thus shown to be unbiased. 6.1.3 I Variance-Covarlance M atrix o f bLS (6.2.12a) I See Fig. S.1. F or i ndependent, c onstant v ariance e rrors this expression reduces to I , (6.2.12b) 1 W ith the use of the four assumptions denoted (11----11) a nd utilizing (6.2.8) we c an find the covariance of b LS. Let bLS = A(XP+ t ) = fJ + A r cov(Y) - cov(X,'b u ) - X, c ov(bLS)xf J (6.2.9) I I i T he d iagonal elements o f this square matrix give the variances o f 'f"~ F or a 140 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION typical element o f (6. 2. 12b) see (5.2.41 a), which is f or the simple model TJ = f30 + f3 I X . 6.2.4 141 6.1 LEAST S QUARES ESTIMATION a q uadratic form in terms o f f . T hen using (6.1.50) with £ (£)-0 a nd cov( £) ... 0 21, the expected value o f RLS is E (RLS ) = t r[ 1 - X(XTX) -IXT]02 Relations Involving the Sum of Squares of Residuals F or o rdinary least squares with the linear model1J = XIJ, the minimum sum of squares is (6.2.13) which is the ~um of squares of residuals. ( It is also called error sum of squares and designated SSE.) By expanding R LS a nd employing the bLS expression given by (6.2.5), we can also write (6.2.14a,b) = t r(I )0 2- tr[ (XTX) -I(XTX) ]0 2=(11- p)02 (6.2.18) I t follows that a n unbiased estimate o f 0 2 (provided the assumptions 1111--11 are valid) is R_ s 2=a 2 _ -LS n -p (6.2.19) This is a generalization o f the results in Chapter 5 where p = I a nd 2. A s ummary o f the basic equations for OLS estimation is given in Appendix B. since (6.2.4) and the scalar nature of R LS can be used to obtain yTXb LS = b[SXTy = b[sXTXb LS (6.2.15) The expression given by (6 .2.14a) can be convenient to use for numerically evaluating RLS because the individual residuals need not be calculated a nd also because XTy is known from the b LS calculation as indicated by (6.2.5) a nd (6.2.7). F or the standard assumptions of additive, zero mean, constant variance, uncorrelated errors; errorless X matrix; a nd n onrandom parameters (1111--11), the expected value of R LS given by (6.2. 13) is o btained next. Substituting (6.2.5) into (6 .2.13) gives R LS = ( y - X(XTX) - IXTY ) T( y _ X(XTX) - I XTY ) y T(I_ X(XTX) - IXT)y I f in addition to the assumptions used above, the errors f l , ; = I, . .. , n have a normal probability density, several theorems can be s tated regarding RLS a nd b LS ' T hey will be presented without proof. Theorem 6.2.1 The quantity RLS/ a 2 has the ~emm x2(n - p) distribution. 6.2.2 The vector bu -IJ is distributed as N (0, a 2(XTX) - I). 1beoremm 6.2.3 Let Obi be the ith diagonal element of (XTX)-IS1. Then the quantity (bt • LS - Ili)/ Obi has the t (n - p) distribution. =y T(I_ X(XTX) - IXT ) T ( 1_ X(XTX) - IXT)y = 6.2.5 DlstributloD of R LS and bLS (6.2. 16) since 1 - X(XTX)-IX T is symmetric a nd idempotent. Let us substitute XIJ + f for Y in (6 .2. 16) to find (6.2. 17) Another theorem is c oncerned with testing whether certain subsets o f the parameters are needed in the model. Let the parameter vector fJ be p artitioned so that IJ T=[fJr fJJl a nd let X b e p artitioned conformably so that X =[X 1 X2 ]. T hen Y c an be written as (6.2.20) where IJI c ontains p - q elements, say, a nd fJl contains q elements. Assume that the hypothesis to be tested, simultaneously specifies all the compo- U2 CHAPTER 6 MATRIX ANALYSIS FOR U NEAl P ARAMEttR f STIMAnON :. nents of 112' O ne possible hypothesis is t hat fJ2 = 111 where fJf c ould be a z ero vector. T he test is b ased upon the relative reduction in the residual sum of squares when all p p arameters are estimated as compared with the ...residual sum o f s quares when 11. is estimated with 112 =11!- If the reduction .. js large. the hypothesis that 11 2 '" 111 is untenable. .. T he least squares estimate for all the parameters is the usual expression (6.2.21) which has the associated residual sum of squares corresponding to (6.2. 11) of U L UST SQUARES I'.S11MAnON used t o investigate whether the model should include the X2fJ1 t erms; in this case if the ratio in Theorem 6.2.4 is greater t han F ._.(q,n-p), t hen we h ave a n i ndication that the X2112 t erms may be n eeded . Example 6.1.3 A solid copper billel 1.82 in. (0.0462 m) long a nd I in. (0.0254 m) in diameter was heated in a furnace a nd then removed. Two thermocouples were attached to the billet. The temperatures. f " given by one thermocouple are given in Table 6.2 as a function o f time. See also the plot o f f , versus time shown in Fig. 6.1. T he other thermocouple gave values about 0.2 P (0.11 K ) larger for smaller times and decreased gradually to about 0.14 P (0.08 K ) larger a t 1536 seconds. F or a c onstant temperature environment the thermocouple temperature readings have a n estimated standard deviation of 0.03 FO (0.017 K ). (6.2.22) Now suppose that 112 were set equal to b •. LS( 111> a re obtained. 111 a nd estimates of 11 •• d enoted by Theorem 6.2.4 T he statistic R (bl.l.5'~ ' LS)} / q R ( b l. LS.b2.1s)/(n - p) o 96 192 288 384 480 576 672 768 864 5 6. 7 112 - 111. Now we give the theorem. ).lIrJ - ( sec) I 3 4 (6.2.23b) { R [bl.Ls( l it T ime 2 which produces the R expression. F- f" Observation No. (6.2.23a) when Table 6.1 D ata'or Example 6.1.3 l lR / q - R (bl.LS.bz.LS)/(n - p) (6.2.24) has the F 1_.(q.n - p) distribution. See Section 2.8.10 for an F table. Also see Section 5.7 for other discussion. F or a proof. see Goldfeld a nd Q uandt (9. p. 46). In using Theorem 6.2.4. the hypothesis that 112" 111 is tested a t the a level o f significance by comparing F o f T heorem 6 .2.4 with the critical value F ._ .. ( q.n-p). I f this critical value is exceeded. the hypothesis that 111 -111 is rejected. N ote that the particular null vector of 111'" 0 c an be 8 9 10 II 12 13 14 IS 16 17 9 60 1056 1152 1248 1344 1440 1536 T emperature o f Billet (OF) 279.59 264.87 251.53 239.30 228.18 217.24 207.86 199.36 191.65 184.44 177.64 171.41 165.04 159.89 155.19 150.78 146.68 Find a satisfactory power series approximation to the temperature history o f the billet given in Table 6.2. Assume that all the standard assumptions except known a l are valid. Use the F test at the S% level o f significance. z.u CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION " 280 410 0 0 0 0 .2 0 0 0 " 10 400 \. 260 390 \ ..". L 0. - 0.1 - 0.2 ' ;; " '0 -:;; 8. \ Table 6.3 \~ ,~,,,""" Degrees of Freedom (d.f.) R(PLS)' Residual Sum of Squares Mean Square. S2 (S2= R ld.f.) I lR I lRIs2 16 15 14 13 12 II 27624.667 894.491 16.1613 1.0959 0.7186 0.5781 59.633 1.1544 0.0843 0.0599 0.0526 26730.176 878.33 15.0654 0.3773 0.1405 448.3 760.9 178:8 6.30 2.67 200 360 180 350 160 340 F= 6 t • 96 seconds 140 Figure 6.1 Sum of Squares and F ratio for Example 6.2.3 I 2 3 4 5 6 ::0 .i ... OJ E >< to calculate the parameters and residual sum of squares. R . for p a nd p - I parameters for p "" 2. 3. 4. .... Then an F test based o n Theorem 6.2.4 is used. In each case we have q = I a nd the hypothesis is / 3;=0. T he sum of squares. R LS• for p -I to 6 are listed in Table 6.3. T he mean square (RLS divided by the number of degrees of freedom) is also given. The F ratio is the I lR value in the table. R (bl . .... b p)-R(bl . .... b p-I). divided by the mean square. S2. This ratio is compared with F. 9S (1. 1 7-p). which is 4.75 a nd 4.84 for p -5 a nd 6. respectively. Then a t the 5% level of significance. the p "" 5 model is selected. (This decision is based on the standard assumptions. 11111011. being valid.) The parameters correspond· ing to equation a in English units are 279.660. - 15.451.0.71350. - 0.023024. a nd 0.00039479 for b l . ... • b,. respectively. 0. L O ....J 0 0 145 L EAST S QUARES E STIMATION p L L 0 . "0 - 0.1 ' ;; ::0 0 - 0.2 -':;; - 0.3 OJ '" - 0.4 \ \. .; ..... ~Res1dU'1 ~ 0 '\ "- 0 370 0 0 \ 220 .; , \ "\ ' 240 380 >< 0 0 .1 0.1 0 U 0 6 8 10 12 No. o f t1me s teps. 1 14 The residuals for the p = 5 model are shown in upper portion of Fig. 6.1. They seem to be somewhat correlated and are even more so for smaller time steps. (See Fig. 6.6.) T he assumption of uncorrelated measurements may not be valid. Hence the selection of a five-parameter model based on the uncorrelated errors assumption may not be correct. The p = 4 model (cubic in I) might be actually more appropriate. The inverse of the XTX matrix is .16 Temperatures and residuals for cooling billet example (Example 6.2.3) Solution The suggested model for the temperature in this case has the for m 0.7853 Y= /31 + /32/+ /33 /2 + . .. + /3plp-l+ e XTX I f the actual values of I as given in Table 6.2 are used. the components of the matrix would have disparate values. For example. the first diagonal term is 17 and the fifth is 2.07 x lOll. For this reason. the model is written as I I Y=/3I+/32 ( - II ) + . .. +/1P ( -II I I ), -1 +l ( a) where I II is 96 sec. There are many ways of determining the "best" model. One way for this model is (XTX) - I = symmetric - 0.5581 0.6697 0.1174 - 0.1709 0.04739 - 0.00945 0.01525 - 0.00044 4.305 x 1 0- 4 2 .58x 1 0- 4 - 4.43 x 1 0- 4 1.34 x 1 0- 4 - 1.32x 1 0-' 4 .13x 1 0- 1 I f the assumptions 11111011 are valid. the estimated covariance matrix of bLS is ( XTX)-t. given above. multiplied by S2. which is 0.0599 (the mean square value in Table 6.3 for p = 5). From these values we can find. for example. that the estimated standard error of hi is (0.7853 (0.0599»)1/2=0.217 a nd that of b , is 0.000157. A 246 ".: comparison of these values with b l a nd bs shows that the relative uncertainty in b , is much greater than for b l' T he value of ., - (0.0599)1/2 - 0.245 FO is worthy of comment. First, for p - 6, " a nd 8, its value would not decrease greatly. Next, this value of 0.245 is considerably larger than 0.03 found for a constant, low temperature environment. I t is, however, d ose to the difference in temperature between the two thermocouples. At the higher temperatures the specimen's temperature may be more sensitive to air currents, etc. Also the calibration error over the whole temperature range is greater than 0.03 P . F or these reasons additive errors with as large a standard error as 0.24 are quite possible. J41 6.1 LEAST SQUARES £ STIMAnON CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER t snMAnON 2.4 • • 2.3 • .... , +' .... , 2.2 - 13 2.1 lS! . .... • • •• ~ • +' Example 6.2.4 I II Using the model for temperature found in the preceding example. estimate h in the differential equation. dT pc V--hA(T .., - T) dt This equation describes the temperature T of the billet. assuming that there is negligible internal resistance to heat now. The density. p. of copper is 555 Ib m/H3 (8890 kg/m3); c is the specific heat and has the value of 0.092 B tu/lbm·oF (0.385 k J/kg.K); V is volume; A is heated surface area; h is the heat transfer coefficient; and Too is the air temperature. S olution Solving the above equation for h gives • h- pcV dY A(Too- Y) dt where the true temperature T has been replaced by the estimated value Y. T he ratio V / A is found to be 0.01634 H (0.00498 m). The temperature Y a nd the derivative is obtained from the power series model with p - 5. The derivatives are calculated using c S! - 14 • • ...... ::J N . · 2. •• 2.0 .... . c S! • - 11 • 0 F lpre 6.1 2 • 4 . 6 , , . 8 10 12 Time step fndex, f • , 14 16 Heat transfer coefficient u a function o f time for Example 6.2.4. ( i-td4t). One advantage of this method o f analysis for this problem is that it is not necessary to specify a functional form for hi versus time. Another is t hat the estimation for hi is a linear problem. Some disadvantages are that (1) a functional form for T is needed; (2) sometimes this method yields h values that are more variable than are expected; a nd (3) the method is not "parsimoniOUs" in its use of parameters. The method that is usually recommended for this problem involves the solution of the differential equation given above. This,would give T as a nonlinear function of time and h a nd thus nonlinear estimation would be needed. See Section 7.5.2. 6.2.6 Weighted l ast Squares ( WLS) ("'> where the parameters are given in the preceding example a nd tot - 96 sec or 0.02667 hr. The temperature Too is the room temperature which is nearly constant at 81.5 OF (300.6 K). The hi estimates are depicted in Fig. 6.2. From knowledge of the heat transfer process we know that the magnitude of the hi values are reasonable and that they should drop with the temperature difference, T - Too· ~ • 1 .9 1 .8 ...E I - 12 There are cases when the covariance matrix of the errors is not known but yet the experimenter wishes to include some general information regarding the errors. He might know, for example, that as Y increases in magnitude the variance of ~ also increases. This was the c~se discussed with the heat transfer example in Example 6.2.2 in which o J Y/ was about constant. I t might be appropriate to assume that some symmetric weight~ng matrix IrJ is given. This matrix m ayor may not be equal to " ,-I, the Inverse o f the error covariance matrix. 2a CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER F SllMAll0N In this estimation problem, the function to minimize for the linear model o f" = XP is S WlS = (Y - Xp ) T" ,(Y - Xp ) 6.3 ORTIIOCONAL POLYNOMIAlS IN OLS E S11MAll0N F or these orthogonal polynomials, OLS yields (6.2.25) which yields the estimator (6.3.4) T he first few orthogonal polynomials satisfying the above conditions are bWLS = (XT",X) - I XT" ,y (6.2.26) Using the standard assumptions of additive, zero mean errors a nd errorless X a nd p (11----11), the covariance matrix of b WlS can be shown using (6.2. 10) to be (6.3.5) (6.2.27) where t =Itdn. Notice that the p itl) values are dimensionless a nd a re functions only of n (for given j a nd ; values). A recursive scheme for obtaining Pj+ 1(1;) in terms of Pj(t/) is 6.3 ORTIIOGONAL POLYNOMIALS IN O LS ESTIMATION The polynomial regression model with evenly spaced (in t) observations can be analyzed with certain computational simplifications through the use of orthogonal polynomials. There are many different sets of orthogonal polynomials that could be employed. Some are appropriate for use with uniform spacing in the variable t. Others have been developed for other particular spacings a nd for weighted least squares, Himmelblau [10, p. 152}. A set of orthogonal polynomials is given below for OLS a nd uniform spacing. Suppose that the model is given by the polynomial which is a function of a single independent variable, ) -1,2, . ..• ; -1; ; = 1.2•...• n ( 6.3.6) -en The starting conditions o nj a re pJ,.tl ) = 1 a nd P I(/;) =; + 1 )/2. I f desired, after the parameters ao,al, . .. , a, have been estimated, one c an find the Po, ... ,p, from the cij values. F or example, for r -O, I, a nd 2 we have [since (6.3.1) 'a nd (6.3.2) a re identical polynomials in I) r=O: Po= ciao (6.3.1 ) a nd that Y; has been measured at n values of t; + I = t ; + I lt. One can rewrite this model as t; which are evenly spaced or r =l: r =2: (6.3.7) (6.3.2) -! ' where p i/) is a polynomial in I of order j . The PJ(/) functions are chosen so that they are orthogonal, that is, f or) a nd k = 0, I, . ..• r, ' " ~ PJ(/;)Pk(t,)=O E xample 6.3.1 f or) =1= k ; -1 =1=0 f or)= k (6.3.3) Using the evenly spaced in log Re data of Example 6.2.2. estimate af» . .. , a, and compare the results with those in Example 6.2.2. Let r ~ 3 which causes 11 to be a cubic in I . ~ CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER E STIMAnON U O RmOGONAL POLYNOMIALS IN OLS E STIMAnON l SI The models for , -0, I, a nd 2 in terms of PI(I/) are Solullon Using the orthogonal polynomials given by (6 .3.5), we can find for the given data . . (the i, ' I' Y; table of Example 6.2 .2) that n ..,7 , l :!,-I, a nd X is given by x- 5 0 -3 6 6 0 -6 -6 6 -4 -3 0 5 , -0: which has the simple inverse These are the X, XTX, and ( XTX)-' matrices for r -3 and for any set of seven equally spaced values of I . The XTy vector becomes ~- -0.07340+0.3138611+0.0357911 where the parameter estimates change in most cases as , is increased. In the above example the X matrix for seven equally spaced observations contained only whole numbers. I f a nother column were added, fractions would enter. T o avoid fractions. tables have been prepared ( II , 12) which contain factors ~ a nd elements +1/' F or the n '" 7 case above. a table frequently has the information as given in Table 6.4. The component XI) o f X is related to +iJ by 5.8846] [ P j(I/)-XI/= 3.0066 - 0 . 1516 Then from either ci=(XTX) - 'XTy or from (6.3 .4) we find 0.84066] 0.45703 [ 0.03579 -0.00070 Since the polynomials p ,(/) are orthogonal, the a /s are independent. That is, regardless of the value of r in (6.3.2) we find ao - 0.84. The residual sum of squares as r increases can be easily calculated from the above information. The total sum of squares, y Ty, is given by YTy _ I YI- -0.07340+0.4570311 , -2: 12 .7968 X Ty_ c i- YI -O.84066 , -1: 0] 0 0 216 YI-O.84066+ 0.45703p, (II) +0.03579p2(11) where we note that each parameter iii is unchanged as , is increased. In the standard form given by (6.3.1) this is not true; using (6.3.7) yields Note that X is composed of orthogonal vectors which results in XTX reducing to the diagonal matrix, o 84 o YI-0.84066+0.45703p,(II) , -2: -6 f /- 0.84066 , -1: -3 -2 -I 0 I 2 3 , - 0: Y /- 10.90349564 The residual sum of squares can be found using RLS - yTy - cr TXTy . +iJ A; j =O.I • ... , ,; (6.3 .8) i = 1.2. ... , n 'j Table 6.4 Orthogonal Polynomials for n - 7 +;0 I 2 3 4 5 6 7 DI ~ +/1 1 1 1 1 1 1 1 7 -3 -2 -I 0 1 2 3 28 4»/2 5 0 -3 -4 -3 0 5 84 I 4»;) -I 4»;4 3 1 1 0 -7 I 6 -I -I I 6 , i 1 -7 3 ._---- - ---- 154 7 il 151 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION a nd Dj is related to the ith d iagonal term of XTX b y n D. 'Ii:' 2 ) £.J X,,= >,.2 ; -1 (6.3.9) j Using these relations in (6.3.4) gives m=O, I, . .. , >"0= I (6.3.10) Some o f the advantages o f the use of orthogonal polynomials a re the following. First, n o difficulty c an o ccur in finding the inverse o f the XTX m atrix because it is a d iagonal matrix with nonzero diagonal components. I f r is i ncreased by one. (XTX)-I is c hanged merely by the addition o f a d iagonal element a nd the sum o f s quares RLS is r educed by ( L Y;Xir ) 2 LXj~ Second, for a n umber o f cases, the terms o f XTX a re tabulated, thereby reducing the n umber o f calculations. Third, the p arameters ao,a l , . . · a re unchanged as additional values are estimated. T hus o ne c an easily a dd a dditional parameters by increasing the degree o f the orthogonal polynomials. Fourth, when orthogonal polynomials are used, there is usually less accumulation of rounding errors in the calculations o f the estimates o f t he parameters. A d isadvantage o f o rthogonal polynomials is t hat t hey are n ot c onvenient for sequential estimation; one c annot easily a dd a nother o bservation. See Section 6.7. A f urther disadvantage is t hat polynomials, orthogonal o r not, are limited in their ability to fit functions. There is a t endency to think that orthogonal polynomials c an b e used to fit a ny single independent variable data, provided the degree is h igh enough. T he m odel might b e i mpractical, however, if the degree is h igher t han t he fourth o r fifth, say. The intervals between the d ata p oints could have predicted Y values that are quite oscillatory. F or curves t hat seem to require even higher degrees, splines a re r ecommended (13). 6.4 FACTORIAL EXPERIMENTS 6.4.1 Introduction In Chapter 5 we found that the best design for estimating f3 1 o f M odel 3 is o ne in which one half o f the observations are taken a t the smallest 6." FACTORIAL EXPERIMENTS p ermissible value o f Xi a nd t he o ther h alf a t t he largest. T his is a r ecommended p rocedure when the model is k nown t o b e l inear in X a nd w hen the s tandard a ssumptions indicated b y 1111--11 a re valid. I n ~ther cases there may b e several i ndependent v ariables X li' X 2i ,oo.,X" w hich are commonly called factors. T he s election o f p rescribed values o r levels o f the factors constitutes a design. T here a re m any possible designs discussed in the st~tistic~1 lite~ature. T he o ne t o b e d iscussed below is the complete factonal deSign With t wo levels for e ach f actor. F or o ne f actor a s in Model 3, the two levels are the lowest a nd h ighest permissible values o f X .. A f actor is termed quantitative w hen its possible levels c an be ra~ked in m agnitude o r qualitative w hen they c annot. E xamples o f q uantitative factors a re t emperature, p H, R eynolds number, a nd time. Examples o f q ualitative factors a re t he type o f c atalyst used a nd t he presence o r a bsence o f a p articular additive. 6.4.2 Two-Level Factorial Design Consider the model (6.4.1 ) where each x ij c an a ssume o nly t wo values, termed " high" a nd " low." T here a re t hen 2 ' possible c ombinations o f f actor levels which constitutes a complete 2 ' factorial design. E xample 6.4.1 In a certain chemical process the effects of three operating variables on the overall process yield are to be studied. These are temperature (Xii)' type of catalyst ( xi 2), a nd pressure ( xi 3)' Over the ranges of temperature and pressure studied it is known that the yield is linear in both temperature and pressure and thus two levels of each can be chosen. Also two different catalysts are chosen. The following levels are selected: Low Level Temperature Type of catalyst Pressure Obse~~ High Level 4200K B 2 .2x I WN/m2 that two factors, temperature and pressure, are quantitative and the remalDlD~ fa.ctor,. type of catalyst, is qualitative. A 23 experimental design for this example IS given ID Table 6.5. When each factor is set at a certain level a nd a test is performe~. the result is t~rmed an experimental run. T he runs are not'necessarilY performed ID the order given; rather, the order of the actual tests should be random. " ~. .W ... c:1) •• 254 C HAPTER' MATRIX ANALVSIS FOR LINEAR PARAMtTER E STIMAnON Table 6.~ Run Number A I I Experimental Design l or Example 6.4.1 Catalyst Type ( K) I 2 3 4 s 400 420 7 8 2.0x 10' 2.0x 10' 2.0x 10' 2.0x 10' 2.2x 10' 2 .2x 10' 2.2x 10' 2.2x 10' 420 6 Note that X given by (6.4.2) contains orthogonal columns. This matrix can b e conceived of as representing a particular design for a two-level complete factorial design for three factors. F or o ne factor the upper left 2 X 2 matrix in X is the X matrix for a complete factorial design. F or a two-factor, two-level complete factorial design the design matrix, if interactions are assumed zero, is the upper left 4 X 3 matrix. F or more than three factors, the pattern of the elements of X in (6.4.2) can b e repeated. F or example, if there are four factors a fifth column would b e a dded which would b e a vector o f eight - I's followed by eight + I 's; t he second column would continue the - I, I, - I, I pattern for a total o f ~ - 16 terms, etc. Pressure ( N/m 2 A A B B A A B B Temperature 420 400 420 400 400 6.4.4 Inclusion 01 I ntendloa Terms In tile Model 6.4.3 Coding the Factors I f X is composed of column vectors that are orthogonal, XTX is a diagonal matrix. Hence the analysis can proceed in a similar manner as for orthogonal polynomials. A set of orthogonal column vectors can be o btained by coding. For quantitative factors the coding is typically given by actual temp - 0.5 (high temp + low temp) h t·gh temp - Iow temp Coded t emp-2 which produces either + I or - I. The qualitative factors are also assigned values of + I a nd - I; for example, catalyst A could be signified by - I a nd catalyst B by + I . For the case of three factors such as in Example 6.4.1 the X matrix for the 2) factorial design can be given as XIO XII x I2 X I) I -I -I -I X- I I I I I I I I -I -I -I I -I I I -I -I -I I I -I I -I I I III (6.4.2) In terms of (6.4.1) a nd Example 6.4.1, the first column is a set of coefficients of flo, the second for coded temperature, the third for the coded catalyst type, and the last for coded pressure. T he complete 2 ' factorial design can b e used in cases for which interaction o r cross product terms are included in the regression model. An example is (6.4.3) where fllxllx;2 is a n interaction term. The X matrix remains orthogonal. F or the model given by (6.4.3), the design matrix X can b e x-[ i -I I -I 1 -I -I 1 1 -: 1 -I I (6.4.4) where the fourth column contains the products x/ lx/ 2; X /I is given in the second column and x12 in the third. In such a design as indicated by (6.4.4) there are four observations and four parameters and thus the predicted values Y/ exactly equal Yj" Hence no estimate of 0 2 c an b e obtained. I f each run is replicated the same number of times, then 0 2 c an b e estimated. One should be careful to perform the experiments in a randomized order. 6.4.5 Estimation We have designed the orthogonal nature o f X for use with ordinary least squares. The OLS estimator is given by (6.4.5) Since X is orthogonal and contains only + I o r - I terms, the inverse of 256 C HAPTER' MATRIX ANALYSIS F OR LINEAR PARAMETER ESTIMATION XTX is easily found. S uppose t hat t here a re f our p arameters ( three f actors) so t hat X is given by (6.4.2) a nd t hen 6.4 F AcrORIAL EXPERIMENTS plus or minus one as indicated in XT. T he resulting predicted equation is . Y -54+ 11.5zll-2.5za+0.75zI3+0.5zIIZ,'2+4.75z/IZ;3 I XTX = d iag [ 8 8 8 8 ] = 81 (XTX) - I = - 0.25zaZ13 + 0.25z;IZ2Z;3 ,' .! I 8 Example 6.4.2 Estimate the parameters using OLS for the experimental design given in Table 6.5 and the Y; values given by Since there are eight observations and eight parameters, the calculated Y values I and the measured values Y; are equal; hence the sum of residuals is zero. ( b) T he effect o f each term in the model can be examined independently because the design is orthogonal. This is an important benefit of such designs. From (6.2.14a,b) the reduction in the sum of squares is given by 7 bTXTY=bTXTXb=8bTIb=-8 ~ y T =(49 62 44 58 42 73 35 691 where Y; is in percent. (0) Estimate the parameters in a linear model permitting interaction using OLS. ( b) Suppose that the standard assumptions designated 11111011 apply. (These include normal. independent, constant variance errors.) Also assume that an estimate of the variance of Y; is .r 2 = 2.5 which we shall consider to have come from a separate experiment and to have 20 degrees of freedom. Using the F statistic find a "parsimonious" (in terms of the parameters) model. Solution ( 0) The complete model including all the interaction terms can be written in terms of the coded factors z ij as 11;= f3o+ f3l z jJ + f12 Zi2+ f13 z;3+ f34Z; IZ;2 + PSZ/IZ,l where Z;I' z,2. and Z;l correspond respectively to coded temperature. catalyst. and pressure. The X matrix then becomes 1n teraction terms Main effects -I I -I X= I -I I -I I -I -I I I -I -I I I -I -I -I -I I I I I I -I -I I I -I -I I I -I I -I -I I I -I -I I -I I I -I I -I bI k -II -I I I -I I -I -I I ~ where XTX-81 is used. The sum y Ty is 24624. The residual sum of squares ( R), reduction in R, a nd the F statistic are given in Table 6.6. The F statistic is (~R/I) divided by .r 2( = 2.5). At the 5% level of significance, F.9S(l,20) - 4.35. (.r 2 was stated to have 20 degrees of freedom.) We have a n indication that only the first, second, third, and sixth parameters are needed since only for these is F > 4.35. The first parameter is Po. the second is for coded temperature, the third is for the catalyst, T able 6.6 S um o f S quares and F ratio for Example 6.4.1 R, Residual No. of I 2 3 4 5 6 7 8 Sum of Squares ~R F 1296 238 188 183.5 181.5 I 0.5 0 Parameters 23328 1058 50 4.5 2 180.5 0.5 0.5 9331.2 423.2 20 1.8 0.8 72.2 0.2 0.2 a nd the sixth is for the interaction between temperature and pressure. Since the pressure enters through the interaction term, we also include the linear pressure term (fourth parameter). We assume that these provide a parsimonious set of parameters. The model then is YI - 54+ 11.5zll - 2.5z;2 +0.75z;3 +4.75zII Z 13 or, in terms of the original factors, which contains eight orthogonal columns. The inverse of XTX is (1/8)1 and the XTy terms are obtained simply by taking sums of the Y; terms each multiplied by a 251 CHAPTER 6 M AnlX ANALYSIS FOR LINEAR PARAMETER f S11MAnON ... here T is temperature in K. C - -I for catalyst A. C -I for catalyst B. a nd P is pressure in N/m2. This expression can be simplified to Y -3656 .5 - 8.825T- 2.5C-0.OOI94P+4.75 x I O-'TP The predicted values of T able 6.7 Run No. 1 2 3 4 5 6 7 8 Y are given in Table 6 .7 along with the residuals. I P redicted V alues a nd R esiduals f or E xample 6.4.1 Observed Y; (%) Predicted YI (%) Residual Y ;- Y, (%) 49 62 44 58 42 73 35 69 49 62.5 44 57.5 41 73.5 36 68.5 0 - 0.5 0 0 .5 1.0 - 0.5 -I 0.5 U M AXIMUM U KELIHOOD F SnMATOR 6.5 MAXIMUM LIKELIHOOD ESTIMATOR The treatment of the maximum likelihood (ML) method by some authors leaves the impression that the M L method applies only to cases of noncorrelated errors. In the ML method discussed below, however, correlated errors may be present. Also the errors may have nonconstant ,variance. With only these two exceptions, the standard assumptions are used; these are denoted 11--1111. Note that the errors are assumed to have a normal probability density. I t is noteworthy that ML estimation assumes a great deal of information regarding ~. This is in contrast to OLS estimation. 65.1 M L E stimation F or the above assumptions the ML parameter estimator is derived by minimizing the ML loss function given by (6.1.67b). F or the linear model 11 - XIJ we then seek to minimize (6.5.1) 6.4.6 Importance of R eplicates In the above example the significance of each term in the fitted model was tested using an "external" estimate of the pure error variance. A superior estimate of the pure variance may be obtained by replicating some or all of the runs. In order to preserve the orthogonal character of a 2 ' design it is necessary to repeat all the 2' runs the same number of times. It is important. however. that the replicated run is a genuine repeal. Repeatedly measuring the same response does not constitute a genuine replication. I t is better to interpose at least one change in a factor level before a run is repeated to obtain an independent response. A random choice of the order in which the runs are performed is important in this connection. 6....7 O ther E xperfment Designs Many other experimental designs are possible in addition to a complete two-level factorial. For example, if there are two factors, and one is desired at two levels and the other a t three, then we would use a 2 X 3 factorial . design. The XTX matrix can also be made diagonal for this case. Other possible designs are made up of treatments which are selected from among the treatments of a complete factorial design, the selection being made in such a way that the possibilities of eliminating certain parameters are tested in a systematic manner which preserves the possibility of making the remaining tests (14). with respect to the parameter vector IJ. Using (6.1.26b) and (6.1.30) as in Section 6.2.1 results in I bML c (XT",-IXfIXT",-ly By using V - I (6.5.2) Xp + ~ in (6.5.2) and taking the expected value of bML we get (6.5.3) o r bMl is an unbiased estimator of p. The covariance matrix of bMl can be found as in Section 6.2.3. Let bMl-A(XP+~) (6.5.4) Then from (6.2.10) the covariance of b Ml is (6.5.5) Observe that the maximum likelihood estimator is exactly the same as given by the Gauss-Markov theorem (Section 6.1.7). Thus the maximum 160 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION likelihood method produces a minimum variance unbiased linear estimator for the linear model a nd the s tandard a ssumptions given. T he o nly additional assumption required by M L e stimation a nd n ot b y the G auss-Markov theorem is the knowledge t hat the errors have a normal probability density. The covariance of a collection of predicted points o n the regression line o r s urface represented by Y= X1b ML is o btained in the same manner. as is (6.2.l2a). 161 6.5 MAXIMUM LIKELIHOOD E SllMAT OR Then the ratio is which for X i s o I reduces to (6.5.6) E xample 6.5.2 This expression simplifies to t hat given by OLS. (6.2.12b). if 1#t=021. T he d ifference between LS a nd M L e stimators given by (6.2.5) a nd (6.5.2) is clearly the presence o f t he I#t - I m atrix in (6.5.2) b ut n ot in (6.2.5). (Recall t hat I#t - 1 is the inverse o f the covariance matrix of the measurements errors.) I f I#t = 0 21. the estimators are exactly the same. When I#t deviates considerably from the 0 21 c ondition. the LS a nd M L e stimates c an b e quite different. T he LS e stimators c an have variances that a re a rbitrarily larger than those given by m aximum likelihood. O ne case in which this can o ccur is when (6.5 .7) a'nd the o~ terms are quite disparate in m agnitudes such as the set of values o f 0.1. I. 10. 100. 1000. a nd so on. E xample 6.5.1 For . f- I given by (6.5.7) derive an expression for the ratio of the variance of hLS to the variance of h ML for the model TI, = {JX; and simplify the result for X; = I. Solution From (6.2. 11) the variance of b LS is found to be 0.- Investigate the solution in Example 6.5.1 for TI = {J and 1000. etc. 01 = I. 02 O J""' 100. S olution For n =2. the ratio is ( 1/2)2 (101) ( 1.01)=25.5. For n =3. it is ( 1/3)2 (10101) (1.0101)= 1133.7. In general the approximate result is 1Q21"-I)/(.99n)2 which increases rapidly with n . Hence in such cases the M L estimate is far superior to the O LS estimate. E xample 6.5.3 Investigate the solution of Example 6.5.1 for X i= I. 2. 3. and 4 and for 100. and 1000 respectively. 0 ;= I. 10. S olution The ratio for n = J, 2. 3. and 4 are respectively I. 16.68.480.1. and 18610. The ratio does not increase as rapidly for the model TI = {J but still indicates the superiority of maximum likelihood estimation compared with ordinary least squares for unequal error variances. I f the m easurement e rrors a re c orrelated. the I#t m atrix is n ot diagonal. S ome c ases t hat p roduce n ondiagonal I#t m atrices a re those involving autoregressive a nd m oving average errors. See S ections 5.13 a nd 6.9. Typical terms in XTI#t-1X a nd XTI#t-1y for the single response case a re ; ,j= 1.2 . ... , p 'i. = 10. (6.5.8a) while (6.5.5) yields V (b ML ) - [ f k -I -I Xl 2 0 t- ] ; = I . ....p ( 6.5.8b) · 262 CHAPTER' MAllUX ANALYSIS .-OR LINEAR PARAMETER ES11MAll0N where ~ is the k l c omponent of ~ - I. I f ' " - I is given by (6.5.7) the double summati~~s above can be replaced by single summations as in I .}- t.2 .....p (6.5.9a) .~ 1= 1.2... .• p 6.5.1 Estimation of 0 (6.5.9b) 6.5 MAXIMUM LIKELIHOOD ESnMATOR N ote t hat the physical o r s tructural parameters. 11. c an be e stimated directly from (6.5.13) without knowledge o f 0 2, Using these estimated values permits the estimation o f 0 2 from (6.S.12) as d~L - 0' * R adbMd- , f2. (6.5.10) (6.S.IS) A n advantage the m aximum likelihood method is t hat it c an p rovide a direct method for estimating 0 2• T he e stimator a~L is u nfortunately biased. however. An unbiased estimator for 0 2 is 1 One advantage of the M L formulation is t hat i t c an provide a direct method for evaluating certain " statistical" p arameters such as the variance of the errors. F or example. assume that the covariance matrix is k nown except for the multiplicative constant 0 2 ( assumptions 11--1011). *[yTO-'Y-b~LXTO-ly] - '-Radb . .L) n -p on (6.5.16) See Section 6.8.3 for a derivation. Since (6.5.16) is unbiased. it is recom2 0 0. A s ummary o f t he basic M L mended for estimating 0 2 when e stimation equations is given in Appendix B. . T heorem 6.2.4 regarding a n F d istribution can also be s tated for the M L a ssumptions 11--1011 for a linear model. T he statistic += R~dbl.Mt< (11).11:] - where 0 is completely known b ut (72 is u nknown. We start the analysis with the logarithm of the likelihood function as given by (6.1.61a). with R~dbl.ML.b2 . Md (6.5.11) n -N) (6.5. lIa) where (6.5. l ib) T ake the derivative with respect to o l a nd with respect to the parameters Pl . ... ,P, for t he linear model ,,= XII· Set II = bMl a nd (72 = a~l; t hen setting the derivatives equal to zero gives (6.5.12) ; = 1.2, . ...p (6.5.13) where (6.5.14) Example 6.5.4 T he t hermal conductivity, Ie. o f A rmco iron has been measured using a new method discussed in Beck a nd AI-Araji (15J. T emperatures between 100 a nd 3 62°F (311 a nd 456 K) were covered along with power i nputs between 272 a nd 602 W as given in Table 6.8. T he t emperature a nd p ower c an be c onsidered to be measured so much more accurately than is the measured Ie t hat the error will be a ssumed to b e in the Ie m easurement only. Also it is r easonable to assume additive errors. Comparison o f values o f m easurements from different laboratories frequently indicate nonzero mean for t he errors. Nevertheless a zero mean value for f is a ssumed. I E ach Y, value given in the last column o f T able 6 .8 is the average o f four values o f t he conductivity. T he values a re given to varying significant figures because the original values were given to four figures a nd s ometimes the average yielded four digits a nd o ther times more. Notice that there are approximately two p ower levels. a bout 275 a nd 550. T he e stimated s tandard d eviations found using the f our observations for e ach P a nd T for the smaller powers is a bout 0 .278 whereas for the higher power the estimated s tandard d eviation is 0.16. smaller b y a f actor o f a bout 2. H ence the variance for the even-numbered runs of Table 6.8 is a bout f our times those for the odd-numbered ones which a re for the larger powers. 264 CHAPTER 6 MATRIX ANALYSIS F OR LINEAR P ARAMETER ESTIMATION Table 6.8 6.5 MAXIMUM L IKEUHOOD ES11MATOR Data for Thermal Conductivity Measurements of Armco Iron for Example 6.5.4 K 44 Run No. Temp. (OF) Power (W) Measured Thermal Conductivity ( Btu/hr-ft-OF) ,> I 100 2 3 4 5 6 7 8 9 10 90 545 276 602 275 538 274 550 274 522 272 41.60 42 .345 37 .7875 39.5375 36.4975 37.3525 35 .785 36.36 34 .53 33.915 > 161 149 227 206 270 247 362 352 .c., ... ;, .. - '" = (72 diag[ I 4 I 4 I 4 I 4 I 4) where (72 is unknown. In terms of the parameter estimates the statistical parameter need not be known. An estimate of (72 is given by using the standard deviation values quoted in the example. A plot of the data is shown by Fig. 6.3 . I t a ppears that at least a second degree curve in T may be required and that P may not be needed. From physical considerations it is also expected that k is not a function of P . T here are many possible models and a number of ways of proceeding. Some possible models are as follows : ( 72 :~ t -~ t -- 1. 2. 3. 4. 5. 6. k =/1I ' k=/1I+/12 T. k = /1 1 + P P. 2 k=/1I+/12T+/13P. k =/1I+fl2T+/13 T2 . k=/11 + /1 2T+ /13T2+ /1.T 3. One way to build the model is to start with the simplest a nd add o ne term a t a time. As one progresses, the need of adding terms can be assessed by examining the X About • 550w ~ ~ About 275w 70 40 t-J8 65 ::::I~ . ." C . .. c~ u ~ ~ ~ > L 36 .. :E U ::::I .c., c u ~.e: .EO:: .. ' ..... 60 X • ~ e 34 l- The objective is to find a parsimonious model of thermal conductivity versus temperature T a nd power P using maximum likelihood. The errors f , are assumed to be independent and to have a normal probability density. The assumptions mentioned above of additive. zero mean. independent. a nd normal errors in Y, but errorless 7; and P, c an be designated 11011011. We c an write the covariance m atrix", as • 42 32 Solution 75 0 200 100 300 T emperature, l GI .e: ~ 400 OF F 1pre63 T hermal conductivities measured for Armco iron a nd used in Example 6.5.4. reduction in the R~l value and by utilizing the F test based on Theorem 6.2.4. With tit as given above the parameters were estimated using (6.5.2) with the components given by (6.5.9). The mean square. $ 2, is found from (6.5.16). The results o f the calculations including the F statistic ( ... ~R / $ 2) are tabulated in Table 6.9. These F values can be compared with those given by F.9s(I,n - p) which are 5.99, 5.59, 5.32, and 5.12 for n -p=6, 7, 8, a nd 9, respectively. From a comparison of these F values with those in Table 6.9, it appears that the power factor, PI' need not be used. Also Theorem 6.2.4 leads us to select Model 5 because Model 6 has an unnecessary additional term since F =5.729<5.99=F.9s(I,6). However, since these values o f 5.729 a nd 5.99 are so close, it might be that another Table 6.9 Results of Calculations o f R and F for Example 6 5.4 Model No. I 2 3 4 5 6 No. of Degrees of Parameters Freedom I 2 2 3 3 4 9 8 8 7 7 6 R~l 40.0078 4.47506 39.91282 4.18133 1.13092 0 .57852 Mean Square, $ 2 4.445 0.5594 4.9891 0.5973 0.1616 0.09642 ~R" F 1973 8769.36 35.5327 63 .52 0.09495 0.01903 0.29373 0.4918 3.34413 20.70 0.55241 5.729 "~RI - yTy - Rlk I. ~R2 - RllL.l- RllL.2' ~R3 - Rrlk J - RaL.3' ~R4 - RaL.2RaL.4' etc. 266 .. ' < ~ ~ 6.5 MAXIMUM L IXEUHOOD ESTIMATOR CHAPTER 6 MATRIX ANALYSIS r OR LINEAR PARAMETER ESTIMATION .•. Y -47.5191-0 .07084T+9.669x .5 z: 1 O-'T 2 . - e st. s .e.(Y f ) •X• • ) •25 co ~ a nd the estimated covariance matrix of the estimated parameter vector is 450 400 350 300 Ito. ~ N T e.perlwre. K -....•- experiment or a larger sample size might show significance. that is. indicate M<><,fel 6 in preference to Model 5. F or Model 5 the regression equation for Ie is (in English units) 0 1.0 X 0.5 W 0 .... . .>- - 6.J33x 1 0- 2 6 .080 x 1 0- 4 6.741 est. cov(b Ml ) - ,,2 [ . symmetric . ! 1.242 x 1 0- ] - 1.282 x 1 0- 6 2.796 x 1 0-' -.75 " M II where , ,2-0. 1616. T he variances of b ,.Ml' b 2. Ml • a nd b3• Ml a re the diagonal terms. The estimated standard errors (std . dev.) for them are est. s .e.(b,.Ml)- [ 0.1616(6.740 ) est. s.e.(b 2. Ml ) - '/1 6.5.3 0.00991 100 200 Tetnperlture. OF 400 300 Expected Values o f SML and RML E (SML)T he sizes of the estimated standard errors of the bj.Ml·S d o not reveal the relative importance of the terms in describing Ie. T he estimated standard error of b ,.Ml is 1.04. whereas one standard error in b,. Ml gives a value of 2.6 for the T2 term of Y for T -3S00F. Hence the effects on Ie due to uncertainty in bl,Ml a nd b ,. Ml a re about the same. E[ (Y _ XII)T.,,-I(y - - tr[ XII)] - E[ eT.,,-le ] .,,-I cove ] -tr[ .,,-1.,,] - n (6.5.18) where n is the number o f observations. In finding E (RMJ. let Example 6.5.5 RML - (Y-XbML{.,,-I(y - XbMd Y/ :I :... ! II T he expected value of SML (parameters not estimated) given by (6.5.1) using (6.1.50) c an be written w ith." being known as est. s .e.(b,. Md-2.13 x 1 0-' Find the estimated standard error in • X 0 -1.0 F lpre 6.4 Residuals o f thermal conductivity values for Example 6.S ••• - 1.04 1 [0.1616(6.080 X 10- 4 ) ] '/ _ -0.5 • -.25 >- -.5 4 II! ~ (6.5.19) for the data of Example 6.5.4. and use A defined in (6.5.4). Then (6.5.19) c an be written as Solution T he estimated s tandard error of Y/ is found by taking the square root of the diagonal terms o f (6.5.6) with P Ml replaced by est. f ov(h,..d which is displayed just above:,. F or a rbitrary T, a n expression for est. JI (Y/). the ith diagonal term o f est. cov( Y/). is est. JI ( Yj ) - [ PT. + 2 T,Pf2 + T / (P12 + 2 Pf,) + 2 T/P1J + since r,4pfJ ],,2 Introducing Y - XII + e in (6.5.20a) yields where PJ - P"Ja 2. F rom above. note that a typical P: is P f.-6.741. Also , ,2_ 0 . 1616. The est. s.e.( YI)' which is the square root of est. JI( Y,), is displayed as the curve in Fig. 6.4. RML:a e T.,,-I(I- XA)e I (6.5.21) 261 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 6.6 LINEAR MAXIMUM A P OSTERIORI ESTIMATOR (MAP) where (6 .5.20b) and the following relation are used Solution Since there are 10 o bservations ( n"" 10) a nd t hree parameters, n -p-7 degrees o f freedom for which a table o f t he X 2 d istribution gives (6.5.22) From (6.5 .21), R ML R ML = can be given by P (X2 < 14.07) = .95, £ TI/t -1/2[ 1-1/t - 1/2X(X T - 'xf IXTI/t - 1/2]1/t -1/2£ (6.5.23) I/t This means that only in 5% o f similar cases would R ML - X2 exceed 14.07. Also only in 5% o f the cases would R ML b e less t han 2.107. Since 2 .167< 13 .52< 14.07, we h ave a n i ndication a t the 10% level o f significance t hat the model' a nd y, a re satisfactory. Then taking the expected value of (6.5.23) and utilizing (6 .1.50) we find E (RMd = tr{ [1-1/t-1/2X(X T I/t-IX) - I XTI/t - I/2] cov(I/t-I/2£)} = tr(l)-tr[ (XT", - IXfl(XTI/t - IX)] (6.5.24) The above expression for E (R ML ) can be utilized to check the validity of the model and the I/t matrix. I t is important to have a check because in many cases there is some uncertainty in ' I o r in I/t . F or the assumptions used above (11--1111), R ML has a X2 distribution with n - p degrees of freedom; this then can provide a statistical test. Table 2. 14 gives the X~ value for which p(X2<X~) is a specified value. For example, there is a column of values of X2 for which the probability of X2 being less than X~ is equal to .95, or p (x 2 <xn=·95 (6 .5.25) Example 6.5.6 F or a second set of data similar to those given in Example 6.5.4, ML estimation has been used. There are 10 s eparate measurements of the thermal conductivity as a function o f t emperature a nd power input. T he model used is the one found from the preceding analysis, a nd t he errors in k a re assumed to be independent a nd the y, matrix based on the results o f Example 6.5.4 is y ,=0.16diagI14 I 4 I 4 I 4 1 41 F or the analysis based o n these data, this model a nd y, yielded R ML = 13 .52. Assume that the standard assumptions indicated by 11011111 a re valid a nd investigate the "goodness o f fit" using the X2 distribution. 1 r I An important source of error which could cause R ML to be either too small o r too large (compared to an interval based in the Xl distribution) is an incorrect choice of I/t. F or example, I/t might be assumed to be diagonal while the measurement errors are correlated causing I/t to be nondiagonal. For this reason it is always advisable to investigate if the residuals suggest correlation among the errors. Such correlation" can be inherent in the measurements themselves or a result of selection of an incorrect model. Often the model is dictated by physical considerations. In other cases there may be several models that fit equally well. 6.6 LINEAR M AXIMUM A P OSTERIORI E STIMATOR ( MAP) I ;. 6.6.1 Introduction Maximum a posteriori estimation utilizes prior information regarding the parameters in addition to information regarding the measurement errors. Inclusion of prior parameter information can have the beneficial effect of reduction of variances of parameter estimators. In Section 5.4 two MAP cases are considered. The first is for random parameters. One convenient way to visualize this case is to think of some product that is produced in "batches," each of which is different. The prior information is relative to the mean a nd variance of these batches. The measurements Y, however, deal with a sp<lcific (new) batch. In the second MAP case, the information is visualized as coming from subjective information-belief of an investigator regarding the parameters. This could include mean and variances. In this case the parameters are not random, for example, a parameter o f interest might be a fundamental constant such as the speed of light. The knowledge about the parameters is probabilistic. This probabilistic information leads us to view the parameters as having probability distributions, similar to the random parameter case. ; . ' 1. <• • .. ; m CHAPTER 6 MATRIX ANALYSIS FOR UNtAR PARAMETtR ES11MAnON In both cases or views the derived parameter estimators are formally the same but the meaning of the various terms may be different. In order to be succinct the case of random parameters is treated first and the results for subjective information are given without derivations. U UNtAR MAXIMUM A POSTERIORI ES11MATOR (MAP) 211 When the only parameters of interest occur in IJ, not in 4t. for example, maximizing J( IJ IV) can b e accomplished by minimizing SMAP' Following the usual procedure of taking the matrix derivatives with respect to IJ. etc., one finds using (6.6.4b). (6.1.33), and (6.1.26b), 6.6.2 Assumptions Let us consider the case of MAP estimation involving a random parameter vector. We shall investigate estimation for the s tandard assumptions denoted 11--1112. Each assumption except for the last is the same as used for M L estimation in Section 6.5. Mathematically the assumptions can be given as (6.6.la. b) Y -E(YIIJ)+t. cov( IJ, t ) - 0 V (Xy ) -O.IJ-N ( "/J' V/J)' (6.6.1c.d.e) (6.6.5) Sblving for the estimator b MAP yields ~AP-PMAP[XT"'-'Y+Vi'I'/J]' p~ip=xT"'-'X+Vi' (6.6.6a,b) Notice the definition of P MAP given by (6.6.6b). By a dding a nd subtracting 2X T",-IXI'/J from (6.6.5), the additional expression o f where "', I'/J' a nd V/J are known. Note that the distributions fo'r both IJ, the random parameter column vector, and t are normal. 6.6.3 fAtlmatlon (nyolylng Random Parameters In MAP estimation the estimated parameter vector is the one that maximizes the probability density f<YIIJ) . This density is related to f ( IJIV) a nd that for the random parameter, f ( IJ), by f (IJIY)- f(YIIJ)f(lJ) (6.6.6c) c an be given for bMAP. In this form the second term on the right side can be considered as a correction to the known mean value of the random parameter vector I 'll; it is a result o f the new information, Y, for a given " batch." F or the case of Model 2. 11,- fJlX" (6.6.6 a, b,c) reduces t o (5 .4. I I a, b). Since Y -XIJ+r, E (IJ)-I'/I' a nd E (,)-O. use o f (6.6.6c) shows that the expected value of bMAP is (6.6.2) (6.6.7) which is a form of Bayes's theorem. Recall that f<YllJ) is the same density used in maximum likelihood estimation; for the given assumptions it is given by (6.1.66). F or the conditions indicated by (6 .6.1 d), f ( IJ) is and thus bMAP is a biased estimator. Even so it is recommended whenever appropriate owing to the reduced error covariance matrix as shown below. An example o f M AP estimation is given in Section 6.7.4 in connection with the sequential M AP method. The covariance matrix of interest is that of bMAP - IJ as explained in connection with (5.4.12) and 5.4.13); bMAP - IJ is the difference between the estimator and the parameter vec'.or for the particular batch. Utilizing (6.6.6a) we c an write f ( IJ) - (2,.,.) - ,/lIV / JI-I/l exp[ - fey) ! (IJ - I'/J) TV i I( IJ - I'/J) 1 (6.6.3) where p is the number of parameters. The probability density f(Y) need n ot be explicitly given since it is not a function of IJ· T he maximum of J( IJ IV) given by (6.6.2) occurs at the same parameter values as does the maximum of its natural logarithm, I n[I( IJIY)] - - ! [ (n + p)ln2,.,. + Inl"'l + In IV/JI + SMAP] - lnf(Y) (6.6.4a) SMAP=(Y -1I)T",-1 (Y -11) + ( IJ - I'/J{ V i I( IJ - I'/J) (6.6.4b) bMAP - IJ - P MA,xT", - I(XIJ + e ) - IJ + P MAPV i II'/J - (PMA,xT",-IX - () I J+ P MA,xT",-I,+ PMAPVi Ip./J (6.6.8a) (6.6.8b) Now taking the covariance o f b MAP - IJ, the error covariance matrix o f 172 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION b MAP' yields (6.6.9) where (6.1 .40) is used . Expanding the right side o f (6.6.9) produces COV(b MAP - ", P) = PMAPX T - IXV /lXT",-IXP MAP - PMAPXT",-IXV /l - VpXT",-IXP MAP + VP + PMAPXT",-IXPMAP = (6.6. lOa) -V/iXT", - IXPMAP+V/l = _V/lXT",-IXPMAP+V/lPM~PPMAP (6.6. lOb) 6.6 LINEAR MAXIMUM A P OSTERIORI ESTIMATOR (MAP) 273 u sed regarding Y a bove, the assumptions are denoted 11--1113. T he e stimator a nd c ovariance matrix a re t hen the same as derived in Section 6.6.3. Now, however, IIp a nd Vp have different meanings. Let us briefly consider certain implications o f t he M AP e stimators for subjective p rior i nformation. Suppose first t hat the prior information is very poor. This implies t hat t he Vp m atrix h as large diagonal components. I f Vp h as large diagonal components, b MAP given b y (6.6.6a) approaches the b ML expression given b y (6.5.2) a nd COV(b MAP - IJ) a pproaches cov(b ML ) given by (6.5.5). Hence a c omputer p rogram developed for M AP e stimation could be also used for M L e stimation (for the assumptions denoted 11--1111). T he s ame program c ould be also used for O LS e stimation by letting Vp have large diagonal c omponents a nd b y r eplacing'" b y 2 2 0 1. I f ' " w ere equal to 0 1 a nd all the s tandard a ssumptions were valid, we would have b MAP = b ML = bLS = Vf/[ - XT",-IX+XT",-IX+Vil]PMAP a nd thus the covariance of b MAP - (6.6. lOc) P is I f the s tandard a ssumptions a re n ot valid, the same estimator for O LS given by (6.2.5) is o btained from (6.6.6) b y simply replacing " ,-I b y I a nd b y s etting IIp = 0 a nd V i I = 0; in this case, COV(b MAP - P) will n ot yield the correct relation for cov(bLS)' however. In (6.6.6) all components o f b MAP c an be uniquely found if (6.6. 11) This is valid for the stated assumptions which are denoted 11--1112. N ote t hat the effect of the random parameter behavior disappears as XT", - IX is m ade large. which usually results from a large n umber o f m easurements. A summary of the M AP e stimation equations is given in A ppendix B. 6.6.4 I M I-IXT.'·-IX+V-II~O pr Ip- AP . - . , . (6.6.12) Hence for M AP e stimations it is n either necessary that IXTXI~O n or t hat In both M L a nd LS e stimation,lJ c annot b e estimated if IXTXI = 0 which is the case if n < p o r if there is linear dependence a mong the sensitivity coefficients. T he f act t hat b MAP e stimates c an b e o btained for n as small as I is used in the sequential method discussed in the next section. n ;> p . Estimation with SUbjective Information C onsider now the case o f c onstant parameters with subjective information a bout t hem . This information c an be expressed in probabilistic terms if we imagine that the parameters have probability distributions. that is, are random. A close analogue can be drawn with the random p arameter case. Now IIp becomes the prior p arameter v ector based o n belief a nd V/l is the covariance matrix for a normal distribution . T hat is , o ur knowledge o r belief regarding f1 is expressed by the probability density. f ( (1) = N (Il/l' VII)' We can conceive that this knowledge has been developed from investigation of many tests on similar " batches ," from literature values o r from other sources. This information is i ndependent o f t hat obtained from the new information contained in Y. W ith these assumptions a nd those 6.6.5 Uncertainty In '" O ne m ajor difficulty in using M L a nd M AP e stimation is t hat the s tandard a ssumptions may n ot b e valid. I n p articular, the '" m atrix may n ot b e known. T here a re certain checks a nd c orrections that c an b e used when there is some uncertainty in ",. O ne check involves the expected value o f SMAP which is c alled the p rior value because the new d ata a re n ot y et used to o btain p arameter e stimates. Using (6.1.50), E (SMAP) b ecomes (6.6.13) Now we also know for M L t hat E (SML)=n a nd E (RML)=n-p . H ence it 27~ CHAPTER 6 MATRIX ANALYSIS FOR LINEAR P ARAMttER f S11MAnON . .. is reasonable to assume that the expected value of RMAP' the minimum value of SMAP' is about equal to n. Again, if n "»p, the difference between n + p a nd n is relatively small. Hence for many cases RMAP c an be considered to have a X2 distribution with n degrees of freedom. (This is true provided the assumptions designated 11--1112 or 11--1113 are valid.) - A related example is given by Example 6.5.6. 2 Suppose next that there is uncertainty in ~; let ~ be equal to 17 0, where 1 17 2 is unknown a nd 0 is known. Both 17 a nd p c an be estimated by maximizing f ( PlY) with respect to 17 1 to find its MAP estimator and with respect to P to find the mode of the distribution of the p. After taking the 1 derivative of I nf( PIV} given by (6.6.4) with respect to P a nd 17 a nd setting 2 the resulting equations equal to zero where P= b MAP and 17 = O~AP' we o btain the set of p + I equations given by (6.6.14a, b), OM~P[ - XTO-Iy + x Tn-IXbMAP] - Vi lI'p + V i IbMAP=O (6.6.14a) n(o~AP(I_(O~AP(lQ~AP=O (6.6.14b) Q~AP= (V - XbMAP)TO-I(y - Xb MAP ) (6.6.14c) Unfortunately these equations are no longer linear in O~AP a nd b MAP ' Nevertheless we c an solve them to obtain (6.6.15b) The nonlinearity has not heen removed, hut two different approaches are apparent from these equations. First. if it happens that a~APv til is known. the nonlinearity disappears and a direct solution is given for b MAP a nd O~AP' This case might occur when two (or more) sets of data are analyzed separately but there is a common 17 2 for all the data. Second. an iterative procedure is suggested by (6.6.15). I f an initial guess t or a~AP is available. it is used in (6.6.15a), and then an improved value of a~AP is found using (6.6.15b). whereupon this value is used in (6.6.15a). etc .• until the changes in b MAP a nd a~AP are negligible. I f the initial value o f a~AP were zero, the 2 first estimator for P would be b ML. At the other extreme of a -+00. b MAP = I'p. In this iterative procedure. note that some matrix products. namely. those in braces in (6.6.15). need be evaluated only once. 2 For another case with uncertainty in ~. see Problem 6.29. where ~ = 17 0 a nd V II " " (f2V with (J2 unknown and n a nd V known. 6.1 SEQUEN11AL E SllMAnON 275 6.1 S EQUENTIAL ESTIMATION 6.7.1 Introduction The sequential estimation procedures developed in this section refer to continually updating parameter estimates as new observations are added. O ne o f the most important advantages o f this method is t hat matrix inverses may not be needed. Another is that the computer memory storage can be greatly reduced. Moreover, the method can be utilized to produce an " on-line" method o f p arameter estimation for dynamic processes. These a nd o ther advantages are discussed further a t the end o f this section. T he mathematical form derived for M AP estimation. (6.6.6). includes those derived for ML a nd OLS estimation. F or M L estimation with the standard assumptions o f 11--1111, (6.6.6) mathematically reduces to (6.5.2) if V i 1 -+0. F or the subjective prior information case, this corresponds to no prior information. In this case the value o f I'p is unimportant (provided V i lI'p =O). F or (6.6.6) to reduce to the estimator given for OLS estimation, (6.2.5), we may set ~,. 17 11 a nd V i 10:: 0 in (6.6.6). Whether o r n ot these assumptions are valid. the OLS e stimator is obtained. If the assumptions denoted 11111-11 (17 2 need not be known) are valid, then the estimates obtained using b MAP will equal those given by b ML a nd bOLS ' In the sequential p rocedure we use the fact the M L a nd O LS estimates c an bev~ry closely ~pproximated as indicated above if the VfJ matrix is diagonal WIth large dIagonal components. The sequential procedure also includes ML a nd OLS estimation when a set o f d ata has been analyzed to estimate the parameters a nd then later this information is c ombined with more data; the information for the first set o f d ata summarized by b ML a nd (XT~-IX)-I ( or b OLS a nd ( XTX)-I) c an be mathematically treated in the same manner as I'p a nd Vp in M AP estimation. See Section 5.3.4. Two different sequential procedures a re given. The first is the direct method; it involves matrix inverses o f dimensions p x p . In the alternate. a nd recommended. formulation the inverses have dimensions m X m where m is the number o f responses a t each " time." In the case o f a single response, m is equal to one a nd thus results in only scalar inverses being required. 6.1.2 Direct Method Since the MAP estimator can mathematically include ML a nd O LS estimators a nd since estimates can be obtained for n as small as one, the sequential estimator given by (6.6.6) is used as a building block for t~e sequential method. 276 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION O ne important assumption for sequential M AP a nd M l e stimation is t hat the measurements are i ndependent in " time." T hat is . for the multiresponse case. '" c an be partitioned into the diagonal matrix (6.7.1) where «1», is m X m a nd m is the n umber o f observations taken at each time. T he m easurements a t each time may be correlated since «1», need not be diagonal. I f o rdinary least squares estimation is used. the measurement errors in Y may be correlated in time since in O lS e stimation the matrix '" in (6.6.6) is replaced by I; in this case P MAP would not yield the covariance o f bols ' Sequential M AP a nd M l e stimation can be used w hen", is not given by (6.7. 1) b ut a transformation o f the measurements is necessary to produce pseudo-observations that are uncorrelated in time. See Section 6.9. A s equential estimator can be derived by letting V p -+P,. (6.7.2) a nd i ntroducing into (6.6 .6) to find (6.7.3) _ . -1 P; +1- [XT+1 .....'+1 X; +1+ p ;- I] - I ; ... (6.7.4) T he i s ubscript refers to " time" ( or whatever the independent variable in terms of which measurements a re being added). Thus b,+ I is a n e stimator for all p p arameters based on the d ata Y I' Y 2. . .. . Y,+ I as well as on the prior information. if any . In the sequential procedure (6.7:3.4) a re used for i = 1.2. ... . n. In the above equation X,+ I is a n m X p matrix a nd Y,+ I is a n m X I vector. In order to use the above formulation . it is necessary to invert the p X P matrix P, + 1 a nd the m x m m atrix «1»; + I a t e ach time. 6.7.3 Sequential Method Using Matrix Inversion Lemma 6.7 SEQUENTIAL ESTIMATION 277 See A ppendix 6B for a derivation o f these equations; (6.7.5a) is k nown as the m atrix inversion lemma. N ote t hat even though P;+ I is a p x p m atrix, the matrix that must be inverted o n the right sides o f (6.7.5a, b) is m x m. By i ntroducing (6.7.5) into (6.7.3,4) we o btain (6.7.6a) (6.7.6b) (6.7.6c) (6.7.6d) (6.7.6e) (6.7.6f) where K ;+ I is sometimes called the gain matrix. T his gives a general sequential procedure that can be used for O lS, W lS. G auss-Markov, M l, a nd M AP e stimation. T he s ame c omputer p rogram c an b e used for each. Parenthetically we note t hat the same c omputer p rogram c an a lso provide a Jilter . T hat is, the_estimator b;+1 c an b e u sed to find the best estimate o f Y;+ I ' design,!lted Y;+ I ' b ased o n all the d ata until a nd i ncluding time i + I . N otice t hat Y i+I=X;+lbi+1 is n ot the same vector as would b e o btained from using a ll the d ata ( i=).2, . .. , n) t o evaluate b; when we use all the d ata , as we usually do, the Y; values are termed smoothed values rather than filtered values. In starting the sequential procedure given by (6.7.6) the bo a nd Po m atrices are required. F or M AP e stimation bo is I'p a nd P o=Vp . F or M L a nd O LS e stimation bo m ay be set equal t o a z ero c olumn v ector a nd Po is m ade to b e a d iagonal matrix; t hejth d iagonal term o f Po s hould be large c ompared with bj .". T he Po m atrix for the M L a nd O LS cases is discussed further below. W LS a nd G auss-Markov e stimates are o btained in a similar way. A nother expression for P HI given b y M endel [16, p. 128] is T he l abor in finding the inverses o f the matrices in (6.7 .4) can be reduced if m < p by using the matrix identities (6.7.7) P _ [XT ..... - 1 X p - I] - I , +1' +1 ....'+1 , +1+ , _ T ( T - P; -P;X; +I X '+IP,X'+I+«I»,+I) . ..... ~ -I X ,+IP, (6.7.5a) (6.7.5b) This expression can b e s hown t o b e equal to that given b y (6.7.6f) b y i ntroducing the definitions o f K ;+ I a nd .::1;+ I ' I t is t rue t hat (6.7.7) is a more time-consuming expression to evaluate t han (6.7.6f). b ut M endel shows that it is less sensitive to propagation o f e rrors in K t han is (6.7.6f). .... ~ co ' m C HAPTER' MATRIX ANALYSIS FOR UNEAR PARAMETER ES11MAnON 6.7.1.1 EstimotiOll wit" O nly O M Oburvtltiolt t it Eae" f ilM ( In - I) A n i mportant simplification occurs in the sequential form given (6.1.6) w hen t here is a single observation a t e ach time. This is because 4 . is a scalar a nd t hus its inverse is a scalar. Also n ote t hat ,+, V i+I--' Y H I' .,+ 1--. 0 1+ I ' XH ,- {XH , .,' . . XH 6., SEQUEN11AL E S11MAnoN X I,' a nd X u. T he memory registers caD b e assigned as follows: o 2 bl b2 P II P XHI .• A •. 1 +, H' S TOS , (6.1.8b) P 12 - S T02 Ui+1 ... I e K- (6.1.8d) X ,+uble •1 - A ,A 2 -r +P'2 S T03 A~ P22 --T+ P22 (6.1.8c) " .I+'--A-- ,- 9 A2 S T09 A2 I e-' eHI - YH 7 A 2- X,P 12 + X 2P22 P II - - T+PII I B X 2 AI S T08 (6.1.8a) X H, .• P"•. I ... 6 XI ~-A2X2+A,X,+G2 ~i+,-ol+l+ A 2 A ,-X I P II +X2P'2 I e-I 'K S G I ,... ] ... I 4 P22 Later in the calculations. register S c an be used for ~ a nd then e/~. A set o f equations a nd storage locations are as follows: w here 0,2+, is the variance o f Y H , for M L a nd M AP e stimation. b ut is replaced by unity for O LS e stimation. T he s equential p rocedure for m = I i mplied b y (6.1.6) is A... i +,'"' 3 P 12 S T04 Y -X,b,-X 2b2 STOS ~ I e-I b ,-A'l + b, STOO e b2-A 2K+ b2 STO I (6.1.8e) . ,-1.2 . ....,. (6.1.8f) where " -1.2. ....p . I t is i mportant t o o bserve t hat t here a re n o s imultaneous e quations t o solve o r n onscalar matrices t o i nvert with this method. T his is a somewhat surprising result a nd i t is true for a ny v alue o f p ~ l . T his p rocedure d oes require starting values for b and p . however. E xample 6.7.1 Give a set of equations based on (6.7.8) for two parameters that is appropriate for a small programmable calculator. Also indicate the memory locations. T he i subscript has been d ropped b ut i t is implied; for example. in the ' . . e quation. o n the right is a t time i whereas o n the left is a t time i + \. T he above set o f equations are used for each value of i. A special storage location for Y is n ot necessary because i t is read in a nd used as needed. '11 '11 E xample 6.7.1 Using sequential ordinary least squares. estimate the two parameters for observations Y a nd sensitivity matrix X given by Y-[~] .X-[ ~ ~] 10 Solution b,. ' ... 'n. G'. Before the calculations. values can be stored for b l' 'I"~ X,. a nd X l' T he first five are for " time" index zero whereas X I a nd X , are for index I . that is. I Let 110- 0 a nd P o-Io't. 10'ot. a nd 101l1. (These large Po values simulate no prior information.) ., CHAPTER 6 MATRIX ANALYSIS FOR U NEAR PARAMETER ESTIMATION S olution Since OLS is to be used, Example 6.7.1. ol = I f or; = 1,2,3. The equations to be used are given in 11 1 ", 3 x 105(3) + 4 x IIP(4) + 1 - 2,500,001 The rest of the calculations for the first time are given in the third column of Table 6.10 along with results for times 2 and 3. The calculations were performed using a Texas Instruments SR-56 programmable calculator which has a 12 digit accuracy. For this problem the calculator accuracy can be important since subtractions of nearly identical large values occur while calculating the P....,1 values. The parameters are only slightly affected for a large range of Po matrices such as from l()l to 10 10 for this example. I f the Po matrix is KI where K > lOl" however, the PI matrices are 0 f or; ;> 2 and the parameters do not change after ~. T able 6_10 Quantity PII,I PIl,I Pll,l bl. 1 bl,l P lI,l Pll,l P ll,l bl,l b1,l P lI,l PI1,l Pll,3 bl,3 bl,l . -;. .s:;:.. CD Table 6.11 is given to illustrate the relative errors in the parameters a t the third data point for different values of Po. The large values of P o·I()l1 to 10'1 lead to accurate estimates. Small a nd large values of K in Po=- KI can lead, however, to relatively inaccurate parameter values. Small values imply prior parameter estimates are accurately known, which is not compatible with OLS estimation. Small o r large values of K should be compared with the values of the square of the parameters. In the present case the parameters are about unity so that K <; I is termed "small" a nd K ;> l()l may be termed large. Another indication that K is chosen sufficiently large is that K is large compared with the largest diagonal term of PI f or; ;> 2 (for two parameters). T able 6.11 R elative E rrors In hi,) and h2,) f or E xample 6.7.2 Relative Errors in K in Po ... K I - S.l4x 1 0- 3 - S.24x 1 0- 6 - S.13x 1 0- 1 4.5SX 1 0- 7 1.35 x 1 0-' - 2.21 X 1 0- 4 - 6.02 x 1 0- 3 1.94 x I O- l - 0.1792 R esults for E xample 6.7.2 U sing 1 1 S R-56 P rogrammable C alculator Exact Values .52 - .56 .68 2 I 0.0131826742 - 0,022598870 I 0.1101694915 2.436911488 0.5367231638 P o-IIPI 36000.0256 - 47999.9808 64000.0144 1.759999296 1.319999472 0.51999427 -.55999339 0.67999228 1.9999952 1.0000044 0.0131826666 - 0.0225988336 0.1101692781 2.43691129 0.5367231198 P o-IOlot 9 3 .6x 10 - 4.8x 109 6 .4x 109 1.76 1.32 0.542 - 0.572 0.679 2.0 1.0 0.01311537163 - 0,02206035 0.107167128 2.436373456 0.5462544 Po-10 1l1 3.6 x lOll - 4.8X lOll 6 .4x lOll 1.76 1.32 0 0 0 2.0 1.0 0 0 0 2.0 1.0 Physically P -O implies that the variance of the parameters is zero and thus nothing more can be learned from additional data if P -O; hence the parameters do not change with time for Po-IO"I after the second data point. However, P is effectively zero in this example only because of our method and the limited accuracy of the calculator. Hence though Po can be selected from a large range of values to simulate no prior information, it can be made too large. Z It 6.7 SEQUENTIAL ESTIMATION - 7.56XIO- 3 - 7.56x 1 0- 6 - S.20x 1 0- 1 - 2.16x 1 0-' 5.40 x 1 0- 4 0.0178 0.348 - 0.148 O,S63 I t can be shown that K in p o· K I is too large for the two-parameter case when a nd nc is greater than the number of significant figures used by the computer o r calculator. I t is not difficult to show that oUI1I-IO-~ of 111 - of K[Xll+Xll]+of (6.7.9) Let oU111 be equal to or greater than IO-~, nc being the number of significant calculational digits. Also let K -IO' where K is large. Then for K not too large, we should have n" <; n c-log( Xfl+Xfl) ' 0:' (6.7.10) Using the values for the above example and nc - 12, we find n" <;.1~.6. In ~ther words, K should be less that 1010.6 in order not to be too large. This IS consistent with the results of Tables 6.10 a nd 6.11. To be not near the critical number of significant figures, four less are recommended, that is, Po..,IWI in this case. 212 ~ &qll~lItial A uIyJ;J 6.7.1.1 c,n. o· CHAPTER 6 M ATlIX ANALYSIS FOR UNEAR PARAMETER E S11MAnON o f EXllmp/~ 1.1.4 From (6.6.6b) we can write Computer programs can be readily written based on (6 .7.8). One advantage is that no separate method is needed for the solution of a set of simultaneous algebraic equations. Moreover. the procedure is readily modified for any number of parameters p. F or two parameters a small programmable calculator can also be used. A computer program was written to estimate. using sequential OLS. the parameters in the model TI; - {J, + {J1X/ for the data of Example S.2.4 . Ordinary least squares analysis implies that the variance a} is constant, prior parameter values are unknown. and the diagonal components of Po must be large. Since OLS analysis is unarrected by the choice of 0;1. replace til by I f or;" 1. .. .. 9. For simplicity. let the initial values of {J, and {Jl be zero. as no prior information is given. I f rough estimates were available for the parameters. then the diagonal components of Po could be chosen about loJ to 10' times larger. I f we d o not have this information. then for this two-parameter case. (6.7.10) can be used; since XI . ,-I and X l.,-O. we find that nA:<:n('. Thus if a computer with IS significant digit accuracy is available. Po should have diagonal terms less than 1015. It would be safer to reduce Po by four orders of magnitude. however. say to 1 0". Shown in Table 6.12 are those obtained using P o-IO'I but the parameters are identical to the seven decimal places given to those for P o-KI.lo'<K<IO". for a IS significant digit computer. Actually the values in Table 6.12 after ; - I are exactly the same as those given by the usual least squares procedure if the data were first analyzed for the first two data points. then the first three. etc. One way to check if Po is made large enough is to repeat the calculation with Po made larger. This is not efficient. however. Another way is to compare the diagonal components o f Po and PII • the matri" for all the data. T ab1e6.U Sequential Analysis 01 Example 5.2.4 b, 0 I 2 3 4 S 6 7 8 9 1.1 SEQUENT1AL I S11MAnON b1 PII P12 PZl 0 0.2580000 0.2580000 0 .1281667 0 .4197000 0 .8238000 0.9545238 0 .7957500 1.1195833 1.2864667 0 0.0 0. 17080000 0.2097500 0.1660200 0.1256100 0.1158057 0.1253321 0.1091405 0. 1019883 1 0' 1.0000000 1.0000000 0.8333333 0.7000000 0.6000000 0.5238095 0.4642857 0.4166667 0.3777178 0 0.0 - 0. 1000000 - 0.0500000 - 0.0300000 - 0.0200000 - 0.0142857 - 0.0107143 - 0.0083333 - 0.0066667 1 0' 10" 0.0200000 0 .0050000 0.0020000 0 .0010000 0 .0005714 0.0003571 0.0002381 0.0001667 P ,,- [XT.,,-IX+p;'r l (6.7.11) which for the OLS analysis above becomes (6.7.12) Now as K.... ~. p " ....(XTX)-I. T hen as this condition is approached, the diagonal components of XTX . a nd hence those of P I I' must be small compared to K . Consequently we can check if Po';' K I is large enough by comparing the diagonal components of PIt with K. N ote that in Table 6.12 the P II a nd Pzz values for ;">2 a re much less than K -Io'. Some further advantages are given below of the sequential method compared to the usual OLS analysis illustrated by Example S.2.4. Each advantage relates to the ability o f the sequential method to provide more information than is apparent from the usual OLS analysis. First, the effect of adding a single observation is apparent. For example, the effect o f the fourth observation is to make b l much larger than if only the first three observations are used. Second, decreasing changes in the parameters w ith; show that each new observation tends to contribute less new information than the previous one. S ee b z versus i in Table 6. 12. Third, time variations of the parameters can yield insight into the accuracy of the measurements a nd/or adequacy of the regression function. F or example, b l seems to be increasing with t he; index whereas bl is more constant. This increase in b l could be due to inaccurate data o r to actual time dependence 01 the parameter. (In this example we know that the former is the case because the regression function used is the correct one.) Owing to the larger variation o f b l we suspect that the relative errors in the estimate b l are greater than in the estimate bl • T he possible time dependence of b l could be further investigated by adding measurements o r by repeating the analysis with a new set of data. I f the increase in b l persists, then a change in the regression model would be indicated. Fourth, some conclusions can be drawn from the time variation o f the parameters without any prior statistical knowledge of the measurement errors. I f, however, there is statistical knowledge more can be learned. The sequential method also yields time variation of components of P. Note that P II is decreasing much more slowly than P22• I f the measurement errors are independent a nd have constant variance (or more precisely 1111-01-), P II is proportional to V (b,) a nd Pzz to V (b l ) . Hence the lI4 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION m easurements for ; > 2 in this example are m ore effective in r educing e rrors in b l t han in b •. T he d ecrease in p •• a nd P 22 w ith; s hown in T able 6.12 is n ecessary as indicated b y t he equations in E xample 6.7.1 ( because A:/~>O a nd AiI~ > 0). Physically this is r easonable because a dded m easurements i ncrease the available information. which results in the diagonal c omponents o f P d ecreasing o r a t least n ot i ncreasing. 6.7.4 Sequential MAP Estimation 6.7 SEQUENTIAL ESTIMATION ~. Notice that the measurements tend to be concentrated at the extreme temperatures of 20 and 300°C. I f there were no uncertainty in the adequacy of the linear in T model and if there were no prior information, the optimum design would consist of one-half of the measurements being at each extreme T. The experimenter compromised by putting most of the measurements at the extremes but some intermediate values were included. Estimate sequentially the parameters in the model k - P. + P2T with and without the prior information. Solution k \ A nother a dvantage o f the sequential procedure is t hat the s ame p rocedure ( and t hus c omputer p rogram) c an b e u sed for M AP e stimation as well as for WLS. G auss-Markov. M L. a nd O LS e stimation provided c ertain s tandard a ssumptions a re valid. Sets of assumptions permitting sequential analysis are 11-11112 a nd 11-11113. T he c ondition that the m easurement e rrors b e uncorrelated in time is p articularly important. Example 6.7.3 An engineer has been given the task of measuring the thermal conductivity k of .1 new electrical resistance heating wire. A linear curve of k versus temperture T is needed. Based on his experience with similar alloys he feels that the model ." - P. + P1T is reasonable with prior estimates of P. and Pl of 1'. = 12 WI m-oC and ILl - 0.01 W I m-oC l where T is in 0c. He estimates that the covariance matri1: for these values is v _[ fJ - 2 - 0.002 ] -0.002 IO - s and the prior distribution is normal. For the new alloy he obtained the following measurements. The error in the temperature level can be neglected but the standard deviation in each measurement of k is about 0.2. Also the errors are independent and normal. T (°C) .... 'CJl .- I 2 3 4 5 6 7 8 9 10 Measured Value of k (W I m-°C) 20 20 21 100 150 200 250 297 300 302 11.47 10.94 11.15 11.85 12.55 13.18 13.48 13.90 14.54 14.36 ;: This problem can be viewed as being one involving subjective prior information. The prior means of h. and h2 are 12 and 0.01, respectively. The Po elements are P II -2, p . 2 - -0.002, and P n - 10- 5• The 01 values are 0.04. The algorithm given in Example 6.7.1 can be used to estimate the parameten with X I-I a nd X 2 being the T values. The results are given in Table 6.13. Notice that the first two observations (both of which are at T -20°C) yield estimates of h I and h2 which are near the final values. The variance of h I which is given by P II reduces considerably as a result of the f int two observations. This is not true, however, for h2 since P22 decreases only slightly. This result is reasonable because h l represents the slope of k which. in the absence of prior information, requires measurements at two or more different 7j values. With all the observations used, both P II and P22 have decreased considerably, indicating that the new measurements substantially reduced the experimenter's uncertainty. Table 6.14 gives typical results for sequential estimation with no prior information. Using (6.7.10) it is found that P o-IQ41 is large but not too large for 12 significant figure accuracy. In contrast with the prior information case, the f int two observations do not yield estimates that are reasonable. The reduction in the P matrix is negligible from the f int to second observation. This is because both are at Table 6.13 Sequential Estimates Using PrIor Infonnstlon for Example 6.7.3 b. 0 I 2 3 4 5 6 7 8 9 10 b2 x 102 P II x 102 P I2 X lOS P nX 107 12 11.271 10.997 10.971 10.973 10.961 10.940 10.960 10.979 10.933 10.922 I 1.0669 1.0921 1.0934 .9591 1.0228 1.0803 1.0409 1.0127 1.0813 1.0977 200 4.399 2.386 1.719 1.718 1.634 1.513 1.393 1.290 1.246 1.222 - 200 - 20.37 - 18.52 - 18.18 - 17.56 - 13.33 - 9.916 - 7.557 - 5.977 - 5.318 - 4.953 100 83.50 83.33 83.32 42.58 21.20 11.58 6.930 4.512 3.521 2.982 216 C HAPTER' MATRIX ANALYSIS FOR UNEAR PARAMETER f SIlMAnON 1.1 SEQUIN'IlAL r snMAnON f ,..·• . .... Table 6.14 Sequential Estimates for No PrIor Information a nd Po-I~I for Example 6.7.3; o l-O.04.bo·O b lx 102 bl CJl N 0 I 2 3 4 5 6 7 8 9 10 0 .0286 .0279 12 .274 11 .018 10.967 10.935 10.958 10.977 10.928 10.916 I 'll x 102 I 'll X 105 0 57.207 55 .885 - 5 .349 .8318 1.0054 1.0824 1.0399 1.0113 1.0833 1.1001 1 0' 9 .98 x 105 9.98x 105 2475.9 2.361 1.875 1.627 1.455 1.328 1.276 1.247 0 - 4.99x 1 0' - 4 .99x 1 0' - 1.22x 105 - 33 .82 - 17.28 - 11.27 - 8.127 - 6.258 - 5.510 - 5.104 6.7.6 Ridge Regressloa EstImation S tarting a bout 1960 A. E. H oerl a nd R. W. K ennard [17-21] developed a p rocedure called ridge analysis, which is a graphical m ethod for depicting the characteristics o f s econd-order regression functions having many independent variables. T o a r elated p rocedure Hoerl gave the name "ridge regression." which he a nd K ennard h ave pointed o ut c an h ave a general Bayesian interpretation. Hence the Bayesian estimation p rocedure which we call maximum a posteriori is related t o ridge regression. F or t he s tandard a ssumptions implied b y 1111-11, OLS estimation provides the estimator given b y (6.2.S). T his estimator among all linear unbiased estimators provides the minimum variance (for these assumptions). T he c ovariance matrix o f bu is given by (6.2.11). F or convenience. Hoerl a nd K ennard scale the independent variables s o t hat XTX has diagonal elements all e qual to one. I f the eigenvalues o f XTX a re d enoted " .iJ-I.2 .....p . then a seriously "ill-conditioned" (relatively small IXTXI) p roblem is characterized b y t he smallest eigenvalue Amia b eing very much smaller than unity. Hoerl a nd K ennard h ave noted t hat OLS estimation provides inadequate estimators for a n ill-conditioned problem since 02 /".18 is a lower b ound for the average squared distance between bLS a nd IJ. T hus for such cases bu is expected t o b e far from the true vector IJ with the absolute values o f t he elements o f bLS being too large. T he ridge regression e stimator is given b y I 'll X 1 0' 1 0" loa loa 2 .49x 2 .49 x 5 .99x 105 8 4.02 27.78 13 .24 7.475 4 .732 3.652 3 .075 T - 20 oe. Reasonable values o f b l a nd bl a ppear o nly a t ; :> 4. I t is only a t ; - 4 t hat T changes to l oooe arter being n ear T - 20 e for the first t hree o bservations. 8 0th b l a nd bl c an be estimated for the linear model k - PI + P2 T o nly if measurements a re m ade a t 2 o r more T values. Notice t hat ~" c omponents in T able 6.14 decrease in magnitude more rapidly than for T able 6.13. H ence a s t he n umber o f o bservations increases. the importance o f t he prior information diminishes. P rior information always reduces parameter uncertainty. however. 0 6.7.5 Multlresponse Sequential P arameter Estimation (6.7.13) W hen several ( m> I ) dependent variables are measured a t the same time, it is sometimes possible to renumber them so t hat in effect m - I. T his c an b e done if OLS estimation is being used o r if M L a nd M AP estimation is used a nd the measurements are independent at each time as well as with time. As a n example consider the temperature d ata given in Table 1.14 where there are eight measurements made a t e ach time. Assume t hat a sequential OLS analysis is to be performed. T he t emperature measurements o f this table can be described either by '1 (i); j ... I, 2, .. . , 8. for K :> O. W ith XTX scaled t o h ave unity d iagonal terms, values o f K in the range o f 1 0- 4 to. 1 a re typical. T here is a n " optimum" value o f K for a ny p roblem; Hoerl a nd K ennard discuss methods for selecting K. T he M AP e stimator given b y (6.6.6c) yields t he s ame estimates a s (6.7.13) if 1'/1 is r eplaced b y O. - I by I. a nd V i 1 b y KI. T his has the effect o f i ntroducing the subjective p rior i nformation t hat the mean parameter values are zero; then the estimates given b y (6.7.13) h ave smaller absolute values a s K becomes larger. H ence a s i ndicated b y T heorem 2 o f M arquardt'S p aper (21). (6.7.13) has the p otential o f reducing the innated O LS p arameter estimates found in iII-c:onditioned cases. T hough the estimator given b y (6.7.13) provides biased estimates, Hoerl a nd K ennard (11) h ave demonstrated t hat t here exists a K > 0 s uch t hat E [(b· _1J)T(b. -IJ») < E (bu-IJ)T(bu-IJ)] provided IJTIJ is bounded. Evidently the M AP f ormulation has m any d ifferent interpretations a nd uses. + ; - I, 2, . .. , n or Y,,; k -I.2. .... 8.9.10. . ... 8 n-2,8n-I.8n By using the laller numbering, the problem is changed from o ne with m - 8 t o m - I. See Problem 6.26. l 188 6.7.7 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER E SIlMATION 6. • MATRIX FORMULATION FOR CONFIDENCE INTERVALS Comments and Conclusions on the Sequential Estimation Method Y. = X b .. These a re based o n I. 2. 3. 4. S. ,... c.n w 6. T he method is general as it includes MAP, ML, Gauss- Markov, WLS, a nd OLS estimators. When M AP a nd M L estimators are used it is necessary that the measurement errors t be additive, normal, a nd i ndependent in " time" designated by i . I f the observations are not independent in time, sometimes transformations can be made to obtain new dependent variables that are independent. In Appendix 6A it is shown how certain autoregressive (AR) models for the observation errors can be treated by constructing independent combinations of the observations. I t is assumed that the statistical parameters of (12 a nd p a re known. See also Section 6.9.2.2. I f ( I) (12 is unknown for I/; = (12r! where r! is known a nd (2) there is no prior information, the sequential procedure with (12 replaced by I c an be used to estimate the parameters. After bn is found, (12 c an be estimated using S 2; the estimated covariance matrix would then be P n times s2 . I f the problem can be formulated so that there is only one independent observation at each i, only a scalar needs to be inverted regardless o f how many parameters are present. There are no simultaneous e quations to solve. The method readily extends to more than o ne unknown parameter. T he summations in (6.7.8) can be easily programmed for a n a rbitrary value of p, the number of parameters in the b vector. An examination of the parameters as a function of the index i c an yield information that is not readily available otherwise. First, the models are usually chosen to contain parameters that are constant with time. I f inspection of the parameters indicates that there is a time dependence (as in Table 6.12), then the adequacy o f the model is questioned. Second, one can obtain a n immediate " feel" o f the effect o f a n additional observation which does not depend upon any statistical knowledge of the probability densities. The change in the p arameters becomes less as more observations are used. G ood practice usually entails an inspection of the residuals. F or the linear parameter case the true residuals are not obtained directly in a sequential manner. The residuals e based on the final parameter values (b,,) are e = Y - Y= Y - Xb n • These values are not the same as those calculated based on bi ' _ T he sequential method provides a t e ach i a filtered estimate of Yi as the d ata until t ime; a nd c an b e used in a~ on~li~e analysis. T he sequential method given in this section c an b e In this subsection some observations regarding the sequential method are given. 7. related to the discrete K alman filter (16, p. 159]. The sequential M AP e stimator c an also be interpreted as providing a ridge regression estimator which c an b e helpful when the d ata a re ill-conditioned, t hat is, IXTXI is nearly zero. 6.8 MATRIX FORMULATION F OR CONFIDENCE INTERVALS AND REGIONS M uch more information c an b e conveyed regarding parameters b y specifying confidence intervals o r regions in addition t o p arameter estimates. Whenever possible a nd a ppropriate it is r ecommended t hat confidence regions be presented in addition to estimates. In order to present meaningful confidence regions it is necessary that the underlying assumptions be valid. Two assumptions frequently violated in scientific work are t hat the errors have zero mean a nd t hat t he errors are uncorrelated. Erroneously taking these assumptions to be true has led many to present overly small confidence intervals. Physical parameters have been presented b y d ifferent experimenters with each successive estimate being outside the preceding confidence interval. This has happened so orten t hat o ne s hould be very careful to check his underlying assumptions before presenting his results. F urther discussion o f assumptions is given in Sections 5.10--5.14 a nd Section 6.9. Presentation o f confidence regions is r ecommended b ut m ust be carefully a nd honestly given. F or additive, zero mean, normal measurement errors the j oint probability density for the parameter vector b can be written as ! (b)=(2'IT)-P/2IVbl-I/2exp[ - !(b-II)TV;'(b- II )] (6.8.1 ) where Vb is the covariance matrix o f b. I t is given by (6.2.11) for ordinary least squares, by (6.5.5) for maximum likelihood, a nd by (6.6.11) for M AP estimation. (The assumptions are different for each case.) F or convenience in representing cov(b) [or c ov(b- II) for M AP cases], let us use P for each case, (6.8.2) T o o btain confidence regions, the covariance matrix of t , t hat is, 1/;, s hould be known a t least within a multiplicative constant. Section 6.8.1 . ",.. .c:,.T1 .~ 2M C HAPTER' MATRIX ANALYSIS FOR LINEAR P ARAMEttR f S11MAnON gives confidence intervals. Section 6.8.2 p rovides a derivation of a confidence region p rovided'" is completely known. F or the more general case o f ", ... 0 10 where 0 1 is u nknown a nd 0 known, a confidence region analysis is given in Section 6.8.3. 6.8.1 Confidence Intervals A confidence interval can be found for each parameter P. through the use o f the k th diagonal term in P a nd the I d istribution (if it applies). See T heorem 6.2.3. Suppose that the measurement errors a re additive, zero mean, a nd normal. Also let there be no errors in the independent variables a nd n o p rior information regarding the parameters. Also let ", ... 0 10 where 2 0 is u nknown and 0 known. These assumptions are designated 11--1011. S uppose t hat OLS o r M L h as been used a nd 0 2 was replaced by a ny c onstant c l a nd the matrilt P was calculated; for e umple, for M L it is - p ... C2 (X T O-IX) -I U 291 MATRIX FORMULAnON FOR C ONnDtNCI: INTERVALS L et us think o f t he hyperellipsoid as being centered a t t he origin with coordinates being b l - PI'" .• b, - /1,. Let { Z b e s ome ~pecific value. F or , 2 <; 12, (6.8.5) r epresents the interior o f a hypereJlipsOid. At I, (6.8.5) p roduces hypersurfaces o f c onstant p robability density. A met~od for determining values for I is derived below. Although (6:8.5) WIth. , .-1 describes the ellipsoid. for m any p urposes a m ore convenient descnptlon c an b e given in terms o f t he directions a nd lengths ~f t he axes of ~he ellipsoid. T o t ransform (6.8.5) in such a way as to provIde such a deSCription. we first find the eigenvalues o f P o r P - I since the eigenvalues of P a re simply the reciprocals o f those of p -I. . F or c onvenience in the following derivation, let p - I be desIgnated C whose eigenvalues a re f ound by solving the d eterniinantal e quation ,-= , c~O T he tilde C) is used because 0 1 is u nknown. With the above assumptions, for OLS o r ML we can give the estimated s tandard e rror of hie as - e st.s.e.(hk)=(Pu ) where S 2 is the estimated value of interval is given by 2 0• 1/2 S - c (6.8.3) T he eigenvalues of C a re d esignated AI,A 2 ,.· • • A,. . F or convenience in numbering the A;'s, let AI <; Aj for i < j. L et C =p-I a nd let C be given by T hen t he 1 00(1- a)% c onfidence hie - est. s.e.( hk )ll-tJ/2( n - p) < Pie < hie + est. s.e.( hle )ll-tJ/2( n - p) (6.8.4) (6.8.7) where (6.8.8) where I I_a/in - p) is the I statistic for n - p degrees o f freedom. F or 95% c onfidence a nd n - p .,.. 10, we find in T able 2.15, t.",< 10) ... 2.23. F or a n onlinear example, see E umple 7.7.2. It should be noted that there is considerable danger in constructing confidence ,egions from c onfidence intervals found as above, since such a procedure can yield highly inaccurate confidence regions. 6.8.2 Confidence Regions for Known '" a nd where e is a p x p matrix, e ll e ll (6.8.5) el" e 2p -[ee···e] I2 , ee,,1 I n this s ection", a nd t hus P a re assumed to be known. Consider the matrix product in the exponent o f (6.8.1) a nd set it equal to , 2 s ince it is a nonnegative scalar, e ll en (6.8.9) e". e,,2 T he v ector components of e a re o rthogonal a nd o f u nit length (i.e., orthonormal). or (6.8. lOa. b ) 292 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS which implies t hat T he p robability o f a p oint z ( or b -IJ) lying inside the h ypersphere (6.8. IDe) 12;> r2, w here I is s ome fixed value, is f ound b y using the a bove t ransfor- (6.8. 11) 22 P (r <.1 )= f f··· f(27T)-p'2 exp [ - ; ]dZ 1 dZ 2 · ··dZp (6.8.19) P ostmultiplying (6.8.7) by e gives I ntroducing the vector c omponents of e i nto (6.8. 11) y ields ee, =A,e, mation in (6.8. 1) t o o btain T he i ntegration is p erformed o ver t he i nterior o f t he hypersphere described b y (6.8.18). N ote t hat ( 6.8. 12) which can be considered to c omprise a set o f p l inear homogeneous e quations with the u nknowns e l ,. e 2, • . • .• ep ;' F or e xample. for p = 3 we have ( 6.8.l3a) (6.8.20) is used in deriving (6.8. 19). A v olume e lement i nside t he hypereJlipse c an b e d escribed b y ( '12 e l, + « ('22 - A, )e2i + ( 2 )c), = 0 ( 6.8.l3b) ClJc" + C2 )e2, + ( C)) - \)e), = (6.8.13c) (6.8.21 ) which constitutes three equations with three unknowns, b ut s ince these e quations a re homogeneous. there are a t m ost two i ndependent e quations . A t hird equation is f ound from (6.8. lOb). w here f ( . ) is t he g amma f unction. T he u sing (6.8.21) in (6.8.19) results in 0 e~, + ei, + ej, = I (6.8.14) T hen e l i ' e 2i • a nd e ), w ould usually be found from a solution of (6.8 . 13a,b) a nd (6.8.14). A new coordinate vector can be defined by or (6.8.15) (6.8.22) ( If t he t ransformation r 2 = x is used, (6.8.22) is t ransformed t o a form which is t he integral o f t he c hi-squared p robability density function with p d egrees o f f reedom.) F or p = I , (6.8.22) gives T hen i ntroducing (6.8.7) a nd (6.8 . 15) in (6.8.5) produces (6.8.23a) ,=1 (6.8.16) a nd f or p =2 U sing t he further transformation 22 P (r <.1 ) = 10'exp ( - ~ ) rJr= I -exp ( - ~) ( 6.8.23b) (6.8. 17) . ... we c an write C )1 CJl (6.8.18) T hese p robabilities for 1= I , 2, a nd 3 a re given in T able 6.15 for several values o f t he n umber o f p arameters p . T hese t hree I v alues a re s ometimes called the one-, two-, o r t hree-sigma probabilities. Also given in T able 6.15 a re t he I v alues associated with t he 9 0 a nd 95% c onfidence regions. O ther .c... .n 294 CHAPTER 6 MATRIX ANAlVSIS FOR LINEAR PARAMETER E SnMAnON 1.1 MATRIX FORMULAnON FOR CONnOENCE I NttRV.US C;; Table 6.15 Values o f Confidence Region Probabilities for Various Numbers of Parameters p la I 1=2 1 -3 I 2 3 4 6 8 10 .683 .3935 .1987 .0902 .0144 .00175 .00017 .955 .8647 .739 .594 .323 .143 .0527 .997 .9889 .971 .939 .8264 .658 .468 F or 95% Confidence 1.960 2.441 2.795 1 080 3.548 3 .938 4.279 For 90% Confidence 1.645 2.146 2.500 2.789 3.263 3.655 3.998 parameters. v alues o f 'I -. (6.8.27) 1 Value Probability for No. of the matrix form U . . ± 11 __ ( p)diag[",-l/l "1 1/2 • •. ,.,-1 / 2] -=eT8 fJ for the where 8 _ ( 8 8 . .. 8 ] a nd 8 , contains the coordinates o f b t2 , O c)' ith u is. Solving for 8 using (6.8.1 gives 8- ± '1-_ (p)ediag[",-I/l •.. ~-'12] (6.8.28) Example 6.8.1 Consider the case o f two parameters. Give a n algebraic fonnulation of the confidence region for fJ using C - p- , . Solution ( P) c an b e o btained u sing the F or X2 t ables since ' I_ .. ( p)= [ p FI _.. ( p,oo) ] 1/2 [2 = X ( p)] 1/ 2 Since C is given by ( 6 .8.24) I n s ummary Ihe c onfidence region for k nown P ( and t hus ~) is t he i nterior o f t he hyperellipsoid, (6.8.25) we c an e xpand (6.8.25) to ( b - PI )IC U + 2 (b l - PI ) (b1 - P1 ) C Il + ( b1 l pdc21 -ll -0 ( 2) However, we need new coordinates hi a nd h l s o that ( a) c an be written I l_o-Alhf+AlhJ w here ' l-a(P) is t he I v alue associaled with the I OO(I-a) % c onfidence r egion for P p arameters in the model. T his r egion is m ore c onveniently d escribed b y locating the principal axes o f t he h yperellipsoid. T he e xtremes o f t he a xes in terms o f t he new c oordinales hl . .... h, a re g iven by (6.8.15). T he m aximum c oordinate v alues a long t he new axes a re given by h i = ± 1 1_,,(p)A Ih i = O. h 2 =O, h)~O. ... , h,=O h2 = ± 11 _ 0 (P)>-'2- 1!2. ( b) T he eigenvalues AI a nd A1 a re found using (6.8.6). ( C II -A)(C21 -A)- cll-AI-A(C II + C ll ) +(CIICn - C~I ) -0 (d which can b e solved for AI (being the smaller) a nd A2• I I ]I/I} / 2 ( d) I I ]l/l} /2 (~) AI - (CII+Cn-[(CII+Cn ) - 4C II Cn +4C Il h) = 0, . .. , h,"; 0 I2 /, ( a) ( major a xis) A I-{CII+Cll+[(CII+Cll) - 4C II Cn +4C II T he coordinates hi a nd h I a re given by (6.8.15). h ( 6.8.26) w here (6.8.16) is used with r 2= 112_o(P). T he h, v alues d epend o n e,. which is f ound a s suggested b y ( 6.8.13a, b ) a nd (6.8.14). W e wish t o r elate these values to p oints in the b - fJ c oordinates . E quation 6 .8.26 c an b e w ritten in T he I _~ II (bl-PI)+~21(bl-PI)' . hl-~II(bl-PI)+en(bl-P2) e components are found using (6.8.10) and (6.8.12) o r for ~II a nd ~Il' u ( C II - AI)ell + CIl~II-O ~f,+ell-I ( J) . 296 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER fS11MATION U ~TRIX FORMULATION FOR CONFIDENCE I Nn:RVALS m which yields From ( d) a nd (e) of Example 6.8.1 we find AI =0.990001 a nd A I= 101.009999. The components of e are found from ( g), (II), ( i), a nd ( j) to be ( g) ( II) _ [ -0.9949382 e - +0.1004886 + 0.1004886] +0.9949382 F rom ( k) the end points o f the ellipse are given by ( b l - PI )m~'" ± (2.447)( - 0.9949382)(0.990001) -liZ - =+ ~.447 Similarly for e ll a nd e ll, we have ( b 2 - /32 )~- ±2.447(0.1004886)(0.990001 ) -I/Z ... ±0.2471 C llell + ( C22 - A2 )e22 = 0 ( b l - PI )min - ± 2.447(0.1004886)(101.009999) -liZ,.. ± 0.02447 which has a solution of ( b 2 - P2 )miD'" ±2.447(0.9949382)(l01.009999) -liZ - ±0.2422 These are the end points o f the ellipse ( iJ) If - a (2) ... 2.4472 ""AIIIl +A211~ c :0.99000llIi+ 1OI.00000011l For two-parameter cases it can be shown that e ll = e21 a nd that ell = Symmetry can not generally be arranged for p > 2. The end points of the axes are given by (6 .8.28), e2 ' 2 which is shown as the larger ellipse in Fig. 6.5. F or i =9, the C matrix is 9 c = [ 360 23 60] 0400 (k ) for which AI .,2.646235 a nd A2. . 20406.35377. The e matrix is - 0.99984429 e = [ 0.0176466 since e llen - elle21 = ± I . This equation means, for example, 0 .0176466] 0.99984429 a nd the end points of the confidence region are ( b l - /31 )maJ = ± 11 - (2)eIlA I 1/ 2. 0 ( b l - /31 ) m in = ± 11 _ . . (2)eIlA 2- 1/2. ( b 2-/32)maj= ± /1_ .. (2)e21'\1-1/2 ( b 2 - /32 )min = ± '1- . ( 2)enA2-1/2 ( I) ( m) E xample 6.8.2 Find the 95% confidence region for i = 2 and i = 9 of Example 5.2 .4. S olution Consider the ; =2 case first . The C =p-I matrix for OLS estimation with the standard assumptions is a-1XTX. Since 17 2 = I . we have II c= [ l :X; 10] 100 This curve is also shown in Fig. 6.5; it is very narrow indicating less uncertainty in b 2 than for b l' ... . The estimated values o f b l a nd bl usmg the Rlne observations m Table 6.12 are 1.286 a nd 0.10199. respectively, a nd thus b l - /31-0.286 a nd bl - /32-=0.0199. (The true values of /31 a nd P2 are I a nd 0.1.) This value is outside the 95% confidence region for i = 9 because the , 2 value given by (6.8.5). ( b- II)Tp-l(b- II)-=(b l - PI )IC U + 2(b l - PI ) (b2- P2 ) C I2 +(b l - P2 ) ICU is equal to 12.9, which is greater than Q_,,(2)ca5.99. On~ c~n obse~e directl~ from the plot in Fig. 6.5 that the estimates bl.2 a nd bl •1 are mSlde the , -2 confidence region. (See the point X a t b l - /11" - 0.75 in the figure.) 199 6.1 MATRIX f ORMUUnON f OR CONflDENCE INTERVALS 6.8.3 Confidence Regions for '" =- CJ2Sl with N c ~ Known and CJ2 Unknown In the previous subsection we considered the case of known ",. In this section", is equal to CJ2 times 0 where CJ2 is unknown and S l is known. This 2 case is analyzed by developing an F statistic. which is the ratio of two X statistics each divided by their respective degrees of freedom. The first X2 statistic is ( b- fJ)Tp-I(b- fJ), which has p degrees of freedom. The assumptions are designated 11--101 I which include zero mean, normal measurement errors. In this section we assume that there is no prior information regarding the parameters. Maximum likelihood estimation is L. of Sl ... r .. .. .... u 0::: "0 ~ ~ 0::: • o assumed. The second statistic is RML, u~ I N ..; .Q .!! -! . .., which we wish to show is a X2 statistic with " - p degrees of freedom. As usual, ", ... cov(~). Notice that ordinary least squares analysis is not permitted unless 0 -1 for which case R LS - RML' I n general, 0 is not diagonal. I t could result from autoregressive observation errors, for example. We shalt transform R ML to a sum of squares using the identity . D .2 N 0 a .8 (6.8.30) 0 "i L. .. 0 a ~ c ..... . 0 u c a (6.8.31 ) (3 t , .. ;: c "0 u ~ a , ~ IX where 0 a nd. a re" x " matrices a nd. is diagonal. Note t hat· ~ j I I , '"; and thus lao R ML can be written R ML - • )T -I( • ) - 2 (Y-Y ML 0 Y -Y ML (J (6.8.32) where (6.8.33) I N I Notice that (6.8.32) is a sum of squares. For the linear-in-the-parameters - For Ibis case the square root of Ibe diagonal m atrix. is a diagonal matrix having elements that are the positive square roots of corresponding elements in • . I 300 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER FSTIMATION 6 .t MATRIX ANALYSIS w rm CORRELATED OBSERVATION ERRORS 301 known. W hen", = a ll, (6.8.39) reduces to the more common expression, model 'IJ = XfJ. (6.8.32) can be written (6.8.40) = T · r ) _2 F (F T -b~ll . Z F a. (6.8.34) which has a x 2 distribution with n - p degrees of freedom. See Theorem 6.2. 1. Observe thai Ihe covariance malrix of F is c ov(F)= E[ <I>-I/2D - u 1 T (D - F or convenience. define R · and ( p.)-I 1 { 10 <1>-1/2) = a 21 since then bMl = bLS • W hen dynamic experiments a re p erformed a nd automatic digital data acquisition equipment is used, n is usually quite l arge-possibly several hundred o r even thousands. In such cases n - p is large a nd p FI _ a ( p,n -p):::::/~-a ( p) a nd then (6.8.41) be (6.8.35a) (6.8.35b) Since F in (6.8.35a) is a random vector satisfying the first five s tandard assumptions. is known. and there is no prior information. (6.8.35a) is analogous to (6.2. 19) which can be used to get i F or example, for a - 0.05, c orresponding to the 95% confidence region, and for two parameters ( p=-2), the values o f 2 F._ a(p,n-p) a re 6.18, 6.08, 6.04, a nd 5.99 for n = 1 00,200,400, a nd 0 0, respectively. 6.9 MATRIX ANALYSIS WITII CORRELATED OBSERVATION ERRORS 6.9.1 R· (6.8.36) S2=_- n-p In order to be consistent, let ( p.) - I be found for M L estimation o r (6.8.37) Now the F statistic is the ratio of Iwo independent random variables, each with a X2 -divided-by-degrees-of-freedom distribution (see Section 2.8.10). Hence a joint confidence region for the parameter estimates can be found from (6.8.38) or 2 In deriving (6.8.39) we have assumed that '" is equal to a D where D is Introduction When automatic digital d ata acquisition equipment is used for dynamic experiments, very large numbers o f measurements can be obtained. These observations may not be independent, however. Correlated measurements a re frequently obtained when the s ame specimen is tested over some range. An example is the measurement o f the electrical resistance o f a piece of wire as a function o f t emperature such as a t 20, 21, a nd 2 rc. The measurements a t 20 a nd 21°C m ay be correlated for a given specimen, but a 2 0°C value for one specimen probably would not be with a 21°C value for another specimen. Another example o f c orrelated errors is provided by the case of a cooling billet given by Example 6.2.3. Some o f the temperatures recorded are depicted in Fig. 6.1 a long with the associated residuals. These data are shown for 96-sec time steps, but observations were actually made a t the smaller steps of 12 sec. I f all the d ata between 0 a nd 1536 sec are used, the regression curve for the 129 observations is very close to that obtained using 17 observations. F or the fourth-degree model given in Example 6.2.3, residuals in P are plotted in Fig. 6.6 for the first 25 d ata points for 1l1= 12 sec. O n the upper side of the horizontal axis is a scale that corresponds to Fig. 6.1. T he solid circles show the residuals. There are I I consecutive negative residuals, followed by 14 positive residuals. U ..... L- ~ a ~" 1 0" . - la = lav sa:>uaJaJJ~a ..... 0 ..... 0 0 I U 'I r '), " -...----- ------)( N o N Q/ r- u "'n e ~ .... en "0 C 0 u CII en CII u "n' e ~,. ... 01 o r- L CII en ::::I ... I )( .c 'ui Ci. u ~ . 4 :E . c .E ~ ) (/ ", .c .N ~ .... II / r- I <; ..." . !I ~ ~ .~ )( ... .. o c 0 i ... a: .~ > "' :2 en j L CII l'\c, __ __ ~ 0 -- N o ... .... o 0 -- ---=-~)( 0 . z: 1&0 M Ana ANALYSIS WI11I CORRELATED OBSERVAnON ERRORS J I3 Residuals are not precisely independent even if the observation errors are, but for a large number of independent observations, the residuals are nearly independent. This is far from true for the residuals shown in Fig. 6.6; rather, the residuals are highly correlated. A least squares analysis of the measurements containing highly correlated errors may produce satisfactory parameter estimates. There are, however, a t least two different dangers in the OLS analysis of highly correlated errors (i.e., ." having relatively large off-diagonal terms). One o f these is t hat o ne might present too small confidence regions based on the erroneous assumption of independent errors. Another danger relates to this point and also experimental design when observations are independent and unbiased; doubling the number of them significantly improves the accuracy of the parameter estimates. This may not be true if the additional measurements result in higher correlation between the measurements as in the billet example illustrated by Fig. 6.1 when the time step is halved and the number of measurements is doubled. In such cases one might erroneously design the experiment to have too small a time step. A simple check to see if the residuals are approximately independent is based on the number o f runs. (The number of runs is the number of changes in the signs of the residuals plus one. F or example, for + - + + + - -, there are four runs.) F or n independent, zero mean random variables the expected number of runs is (n + 1 )/2. For significance tests based on runs, see references 4 and 22. F or cumulative errors (see below) the expected n umber of runs is about n l / 2 • T he residuals of Fig. 6.1 exhibit six runs compared with ( n+ 1 )/2=9. There seems to be no reason to question the independence of residuals. I f there are still about six runs when the number of observations is doubled. we should question the independence. Certainly the residuals shown by the solid circles in Fig. 6.6 c annot be considered independent with only two .runs while ( n + 1 )/2 = 13. Also shown in Fig. 6.6 as crosses are first differences of the residuals . There are IS runs o ut o f 24 points, indicating that the first differences are much closer to being independent than the residuals themselves. 6.9.1 Autoregressive Errors (AR) o o o I In this section a first-order, single response, autoregressive model of errors is considered. (A more general analysis is given in Appendix 6A.) Let t l , the ith measurement error, be described by . (6.9.1 a ) £0=0, E (ul)-O for i=l=j, (6.9. lb) (6.9.1 c) , 304 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION for j = 1,2, ... ,n. In words, (6.9.1a) states that the present error is a fraction p of the previous error plus a zero mean component u; that is independently distributed. This is called a first order autoregressive process. It is convenient to relate the n x I vectors e and u by (6.9.2) e =Du 6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS Also, several special autoregressive (AR) cases have error variances as follows for i =2,3, . .. , n for which t he'" matrix, designated p p2 D= 0 P p n-2 p n-t cp I +cp2 c p,,-t (6.9.3) p(1 +Cp2) Cp3 p 2(I+cp2) p ,,-2(1 + Cp2) ( I + p2+cp4) c 0 0 0 0 "'a' becomes "'a = D+DT where D can be found from (6.9.1 a) to be the lower triangular matrix, 0 I (6.9.S) p(1 + p2+cp4) p ,,-3(1 + p2+ cp4) Cp2 symmetric p n-l (6.9.9) Also from (6.9.la) we find [see (6A.5a») the inverse of D to be 0 D- t = -p 0 0 I -p 0 0 0 0 0 0 0 0 0 0 See (6A.IS) of Appendix 6A for a derivation. Three special cases associated with (6.9.9) are as follows: 0 0 0 (6.9.4) which contains a main diagonal of ones and a diagonal just below of - p. The covariance matrix of the errors, "', is given by 1 "'= cov(e) = E (EE T) = E(DuuTDT) = Dq>D T "'at I 2 2 a" a2 = .E ( e .)=--· • (6.9.5b) Several classes of -f matrices can be generated. For p=O and a;2= a; .. a constant, the " standard" covariance matrix, -fST' is found, (6.9.6) -fST= a;1 Next, if p=O, we have -fw which is a,n "'a2 has c = I and p = I; (6A.22b). "'al has c = I; (6A.22c). , l -p2 ' (6.9.5a) where (6.1.40) is used and where q> is defined to be the diagonal matrix "'w=q>=diag[ af o~ . .. "'at has c =(I_p2)-t; (6A.22a). The matrix is the most common of the three special cases. I t might be called a "steady-state" case because the diagonal terms are all equal, -p 0 I. 2. 3. (6.9.7) Notice that as p..... l , the a,2 values become much larger than a;. Physically, time before the taking of measurements. The other extreme physical situation is for measurements starting when the process starts; this is better described by case 3, "'a3. In case 3, the variance of El is a minimum at i = I (provided p > 0) a nd gradually increases to the steady-state value of j ust given above. Case 2 has a simple", matrix as given by (6A.22b). Notice, however, that the i th diagonal term is equal to ia;. Sometimes this case is considered to be unstable because the variance continually increases. We can use it however, with p = I (although case I can not have p = I). F or case 2 E; is "'at is appropriate for some process which has been going on a "long" a; f- -- 306 CHAPTER 41 MATRIX ANALYSIS FOR U NtAR PARAMETER ES11MATION U MATRIX ANALYSIS WI11f CORRELATED OBSERVAnON EIlIlORS ., ~.- . ' " . f ound rrom (6.9. 1) to be the sum N otice t hat t he iy values a re f ound starting a t t he " bottom" o f e ach column. Using this n otation t he XTl/IX p roduct h as typical terms as indicated b y t he Ik t erm below, ' F or this reason case 2 is called the cumulative error case. T he d ifference o f the successive values E ;_I a nd E, is E; - Ei _ 1 =- u; which is i ndependent from uj • j "i' i. Figure 6.6 shows residuals a nd differences o f residuals for the billet example. Since the residuals a re a pparently correlated a nd t he dirrerences are not. the cumulative error model for E; is b eller t han that o f i ndependent E;. ( Estimation o f p is discussed in Section 6.9.5.) 6.9.2.1 (6.9.12) U nfortunately simple algebraic expressions for the covariance o f bLS d o n ot r esult from (6.2.4) for A R e rrors; the c omputer e valuation o f t he terms is straightforward, however. Example 6.9.1 O LS EJtimatiorr Wit" A R Error! O rdinary least squares can always be used irrespective o f a ny o f t he s tandard assumptions. In specifying the covariance matrix o f the p arameters o r the confidence regions. some conditions must be known ( or a ssumed). S uppose t hat the conditions o f a dditive, zero mean, first-order autoregressive errors a re valid. Also assume t hat l/I is k nown within a multiplicative constant. These conditions a re d esignated 11-2-0/ I . T he covariance matrix o f bLS is given b y (6.2. I I ) Derive an expression for V(bLS) for the simple model S olution F or this case, X T - (II· . . I) a nd iT_[ f p i-I I -I (6.2. 11) p ,,-J p ,,-2 p ,,-I Z =OTX= 0 0 0 0 0 0 X,,_l.1 X,,_2., P X,,_I.I X"I = [.iy ] X,,_I.,. I I 0 i is given by " i:lp/-I . . . l +p+p2 I +p I] I -p,,-I . .. I -p' l -p2 l -p]_II-p ( I-p") V". ( b LS )- -I [ II XI,. p2 p 0 0 XII ( I-p) matrix 2- - 2 I I -p ,-I + ,~ ( I_pi) 21 i _I The result for V".(b LS ), the variance of bLS for 1/1- 1/1,,1' can be written in the form C onsider now the OTX matrix product, designated Z. P ( 1.-2 [ --2 (6.9.10) X TX_II. T he I _I _ (I_p" T he t erms in XTX a re given in a detailed form by (6.2.6). T he XTl/IX p ortion o f (6.2.11) is a lillie m ore difficult to evaluate. Using (6.9.5a) we c an write XTI/tXcXTDcI»0TX =(OTX) T cflOTX fJ-P a nd for 1/1_1/1,,1' X", 2p + - - - ( 1- -1 - _ _ _._ - P")] (12 1 I(1-p) 1I(1-p) l -p1 For any fixed value of p between 0 and I a nd increasing values or always decreases. (6.9.13) II, V".(b LS ) E xample 6.9.2 Suppose p a nd the number of observations II are related by (6.9.1 l a) where the c omponents iy c an be found for (6.9.14) ; ... n -l.n-2 . . .. , I; j =I,2, . .. , p which gives greater correlation between observations for a fixed experimental range as the observations become more "dense." In (6.9.14), Q is some constant characteristic o f the data. Using the result of Example 6 .9.1, investigate V .,(b LS ) for n-+oo. Assume that , ,; - ,,~( I - p2) - , is held constant. using the expressions (6.9. l ib) J08 CHAPTER 6 MATRIX ANALYSIS F OR LINEAR PARAMETER E STIMATION U MATRIX ANALYSIS w rm CORRELATED OBSERVATION E RRORS where Solution a; In (6.9 . 13). a J/(I-p2) is replaced by a nd p b ye-a/ft . The result is a n indeterminant form. After using I'Hospital's rule we obtain (6.9.16c) Typical terms of X a nd F are given by (6.9. 15) See Fig. 6.7 for a plot of (6 .9.15). I f the measurements become more "correlated" as n becomes larger in the manner described by (6.9. 14), Va,(b LS ) approaches a constant value for large n rather than going to zero as one obtains from (6.9. 13) for p =constant and n~oo . V (b ) a1 LS V (b ML J a1 1 .l, Z tj=X tj , Z ij=XjJ-pXj_'J F;= Y j-pYj _, for i =2.3 . .... n (6.9.17a) i =2.3 •.. .•i I (6.9.17b) for where 0 -' displayed by (6.9.4) is used. Note that by replacing y. by F . the modified observations F; a re uncorrelated with i since fro~ (6.9.1 a), F; = T/j - PT/j _ , + uj a nd the uj's are uncorrelated. Another way to define the modified sensitivities a nd observations is by using (6.9.18) J. O O. 8 il, r 1 - iI which permits us to write ( I-e - a ) I . E quat i on (6 . 9 . 15) (6.9.19a.b) O. 6 Z a t Z . S ee P roblem 5 .Z6 b where bML has a similar form as given for OLS. O. 4 p O. Results for ModelT/ = p = e - a/n l, F or A R errors for the simple model T/ = P. the components of Z are given by 0 0 15 10 ZO l ,5 30 Z ,=I. a F lpre 6.7 Variance or b ror 1 1-+00 round ror rirst order autoregressive errors using least squares and maximum likelihood. 6.9.2.2 In maximum likelihood or Gauss-Markov estimation, the estimator for the linear model is given by (6 .5.2), b ML = (XTI}-IXrIXTI} - ly (6.5.2) For ML, this equation follows from the assumptions 11--1111. I t simplifies calculations in (6 .5.2) if the relation ~ = ~DT is used, bML = (XT(O-I{4> - 'O-'Xf I XT(O-I{4> - IO - Iy . (6.9.16a) (6.9.16b) . F or case I A R errors, the Zj components for this same model are -, Z ,-o, • M L Estimation With A R E "on . (6.9.20a) ••• Z Z=Z3 . .... · = Z,,=[(I-p)/(I+p)] ' /Z 0,-' (6.9.20b) which are shown in Fig. 6.8 for p =O. 0.5. 0.9, a nd 1.0. The net effect of p ' being between zero and one is to reduce the value of the modified sensitivity compared to Xj' This results in the variance of b being greater than for p = O. F or the simple model the F; values are F ,= Y,. Fz = Y z-pY,. F3= Y3-pYZ" " (6.9.2Ia) a nd the components of 4>a' are 4>.. , =o~diag[ ( 1- pZf' I I . .. 1) (6.9.21b) 3 1. CHAPTER 6 MATRIX ANALVSIS FOR LINEAR PARAMETER £ S11MAnON U , 1 .0 p p·O 0 .8 (. • 1 .0 fo- ~ 0 .6 311 MATRIX ANALYSIS' wrm C ORRElATI:D O BSERVAnON E RRORS = I. 0- 0.8 '0.6 0 .5 ° tZt 0 . 4 fo- o F lpre 6.11 0 .9 l 0. 2 , o W- P 1 2 4 1.~ • I 6 t Modiried sensitivity coerricients for 8 II I 10 I 12 o 14 errors a nd the m odel" - fJ . " ( l- p 2)Yt+ ~(Y;-pYI-d(l-p) 2 =-------------------------l_ p 2+(n_I)(I_p)2 50 40 30 n Using (6.9.16b) with (6 .9.20a) and (6.9.21) gives for the simple m odel" - {J and a l errors, bML zo 10 (6.9.22) which has the variance F Ipre U Variance o f bML for correlated errors, (6.9.23). in a dynamic experiment. F or example, if in Example 6.2.3 I II were decreased from 96 sec to 48, then 24, and finally 12, while keeping the same total time of 1536 sec, the observation errors would become more and more correlated. This is demonstrated by the residuals shown in Fig. 6.6. The asymptotic values of Fig. 6.10 for n-+oo are given by 2 a:/(a+2). (Sec Problem 5.26b.) Figure 6.7 depicts this relation as a function of Q as well as the ratio of the variances for LS a nd ML estimators. The maximum , ,2 V"t ( bMd= 2 u I -p + (n-I)(I-p) 2 I + (n-I)(I-p)/(I + p) (6.9.23) 1 .0 which is depicted in Fig. 6.9 versus n for various p values. For any fixed value of p between zero and unity, V ".(bMJ always decreases with increasing n values. Physically this represents the case of constant spacing of measurements but an increasing number of them. See Fig. 6.1, for example, which is for the billet problem with time steps of 96 sec. I f more measurements are added with the same spacing of 96 sec, n would be increasing while p would be fixed, as in Fig. 6.9. Suppose that the observations become correlated as the time step is reduced and that p is related to n, the maximum number of observations, by p -exp(-a/n). Then for variable n a nd fixed a, we can obtain Fig. 6.10. Here as n becomes large (about 20 for the case of a '" I), V"t(bMJ approaches a constant value. In this case increasing an already "large" value of n will not significantly improve the accuracy. This case of correlated errors can occur as the time steps are made smaller and smaller a = O. 1 0 .8 -..:I ~ -t"b'" Jl 0 .6 z >'" 0 .4 10 o. z OIl 10 zo 40 30 so n t"Ipre U ' Variance of ~L for correlated errors for p_~-.I. a nd " II" filled, (6.9.23). J Il CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION ratio is \.139. occurring at a =2 .7. For this first-order autoregressive error model and simple physical model example. negligible improvement in accuracy occurs using maximum likelihood (or Gauss-Markov estimation) rather than ordinary least squares. Other cases can be exhibited. however. for which ML estimators are far superior to OLS estimators. U M AnlX ANALYSIS WI11I COItRElATED OBSERVAnON ERRORS XTt/lX. By using (6.9.10) we see that the modified sensitivity given by •_ T Z ... = O",X • Z/J, ....- X/J-OX1- (6.9.24) E(u;)=O Uo=O. I n matrix form f! (6.9.26) (6.9.27) where I 0 I - fJ 0 o= 0 -0 0 0 0 I 0 0 0 0 0 0 0 - fJ '" (6.9.28) I 0m 1 = - 0 I oj 0 0 (J I 02 (J 0 0 0 (6.9.29) Analogous to (6 .9.5a) t/I is given by t/I= cov( f !) = O",~o~. (6.9.30) In determining the covariance of bLS as given by (6.2.11). we have the term (6.9.32) (6.9.33) which has components 1J, ... f ori=2, . .. ,11 (6.9.34) • Notice that the AR Z for LS analysis is similar to the Z for M L moving average errors, whereas the A R Z for M L analysis is similar to the LS MA analysis. 6.9.4 Summary o f First-Order Correlated Cases for the Model whose inverse is 0 02 f ori-2,3, . .. ,1I Z ",-O,;;IX Zij, ... = Xij+fJZ1_ can be written as 1J and for j = 1,2. ... ,p. When using ML estimation, we usc the modified sensitivity matrix, (6.9.25) for ;=j:.). E (uju)=O (6.9.31) can be c onveniint to usc. The m subscript denotes moving average. Components of Z", are 6.9.3 Moving Average Errors (MA) A brief treatment is given below for first-order moving average errors. A model for first-order moving average errors is 313 ,,-II I n Table 6.16 the variances of b are given for six different correlated errors; three o f these are for autoregressive errors and the others are for moving average errors. The variances are given for both LS a nd ML estimators of p. Also the ratio o f the variances for LS a nd M L estimation is given for large values of I I, the number of observations. The following are some conclusions relative to correlated errors. 1. I f the measurement errors are actually correlated b ut are erroneously assumed to be independent, two deleterious effects can result. First, the experimenter Dught r eport much more accurate results than he . should. Next, the experimental strategy might be based on a n incorrect premise. That is, the accuracy o f a n estimate as measured b y the variance always decreases as more independent measurements are included, but this is not necessarily true for correlated errors. Thus the experimenter might take 1000 observations expecting to achieve much more accuracy than if 100 were taken; it might be t hat the extra 900 measurements are o f dubious value for this purpose. 1. F or autoregressive errors with O<pC; I, the variances o f estimates can be much less than for independent errors ( p=0). 6.9 MA11l1X ANALYSIS w rm C ORRELAttD OBSERVATION ERRORS 315 I- m . en J. ' :l 1 r:. ' -' u ,. :::.. f!l ........ 01 '3-= ~.E ...V. .Q .. ...V. .Q .. r:.1 ..... .E .q , ..V ... .E r:.1 ..... M .E :::: ' :l ::t ~ ~ ~ 11 :::: u u ...Q. 01 I r:: ·c > c "8 :S 'e !! ' u a. a. ~ ..J U ...+ ...+ 01 1 ~ ,q, q ,... 1:,» " '» ~ I --- ~ .~ ~ I s:- i ~ ~ l B oQ c '"3 ~ ~ + r:. - - Nit :. -r-- M I r:. --'" I 1:,a ~ :;:U u r:: 01 '5 > ... U ~ 0- f Il ~ ~ Ia. Q e .I 1r:. I - --- ~ M .lf Q_ '-' " 'a ~ + r:. M --+ r:. --,..... :a u \0 Q ~-. I'"-IQ. I 13 ...+ .. ,--- .. ..... --jI l -Q. e Q. q, I r:. tr 71" ..,. r:. r:. '''triNe: Q. I .• .. '-' L ..-...I +... by - N.• • .. Q. - 17 I"'... I-IDI 1+... 1oTI == 1+... 1 ~ l = -p ,..... ..... 13 • ... v -q, ~~ I '-' 211 . MI ~ e ~.:. ..... ,..... '-' (6.9.37) Introducing (6.9.37) into (6.1.67a) gives I nL- - i ["ln2W+ln( 1"!.' )h.-'S, 1 II s ._(I-p2)( Y .-1J.)2+ ~ [ ( Y ,-pY,-d-(1J/-P'rI-.)] i -2 314 (6.9.36) I] Then the determinant o f " '... is I r:. 13 where 0 is given by (6.9.3) a nd +... - (J~diag[ ( l- p2f' Q. .-. When high speed digital d ata acquisition equipment is used, the measurements o n o ne specimen are frequently correlated. Neither the best model t o describe the correlations nor the statistical parameters such as p, 9, a nd (J~ may be known. In this section the error model is assumed to be the first-order autoregressive designated a I. T he parameters p a nd (J~ are unknown, however. Maximum likelihood estimation is used because, unlike least squares, it can directly provide estimates o f p a nd (J~. The analysis starts with the logarithm o f the likelihood function as given by (6.1.67). F or a single response case with the errors as described by (6.9.1), t he'" matrix is T (6.9.35) "'... ""D+... O I r:. M .. 6.9.5 Simultaneous Fstlmatlon o f P. a!. and Physkal Parameters for the a l Cases '-' - 10:: - tr r:. -( 0:: j' u r:. + > .0 .!! \0 ~. a. ' "~al"'Q. I i 10 - ... + r:. " '» ,~ ~ ..... ... ~ oS ~Iq, II -g r:. .0.. ~ ~ . + ." q, --- --- E! I I q, q, r:. I r:. ::I .......... ,..... ,..... "il I -- "'» -- --I I 1:,» -... - F or some correlated error cases, the M L estimators can be greatly s uperior to those obtained using least squares. See the last column o f T able 6.16. 4. When the statistical parameters p a nd 9 are known, the estimation p rocedure for maximum likelihood o r generalized least squares is not much more complicated for A R a nd MA errors than for independent errors. 5. Since the covariance matrix o f b for both ML a nd LS estimation can b e readily given for A R a nd M A errors, the confidence intervals a nd regions for known p a nd 9 c an b e developed in exactly the same m anner as in Section 6.8. (6.9.38a) 2 (6.9.38b) 316 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION I n t he M L m ethod we seek to minimize In L s imultaneously w ith respect to the p arameters p, o~, a nd p. N ecessary c onditions f or t he m inimum a re t hat t he first derivatives o f In L w ith respect to these p arameters b e e qual t o zero, Vp Sd,,-b.,.l'P-'; = 0 (6.9.39a) -n S , ( bML,P) 0u -4 0u - =2 + =0 as,/ p -2-2+I2 ap -1 _p °u =0 (6.9.39b) ( 6 .9.39c) b.,.l''; S olving (6.9.39b) for a~ yields (6.9.40a) which c ould be used directly to o btain a ( biased) e stimate o f o~ if p w ere k nown. I f p is u nknown, (6.9.39c) is e mployed t o find ' p= ,.-, ~e,e; +, ["-'][ a~(1-p2)-'+~e] ]-', 6 J MATRIX ANALYSIS W I11I C ORRELAnD OBSERVATION ERRORS The estimate for p is given by (6.9.4Ob) with Y, replaced by bML• These equations must be solved iteratively even for this simple problem. Example 6.9.4 For the a3 autoregressive case investigate using the Monte Carlo method the estimates of bl. b2, a nd for the model1JI-IOO+O.IX, where X ,.O. 10,20, . .. a nd 11; are chosen from random normal numbers of unit variance and zero mean. For p . . l . let n ... lO. 30,40, SO, a nd 120. F or n -60, let p '" - I, - 0.5,0,0.5, a nd J. i>. a; S olution For this a3 case a solution for a given set of measurements is found in a similar manner as for the a I case discussed above. The main difference is tbat the determinant of ~..3 is 0;" rather than the expression given by (6.9.37). The result is the set of equations bML _ (ZTZ)-IZTF, ,. _ ~ ( e/-pe;_I) ( 6.9.40b) Z :=D-IX, F :=D-Iy (6.9.41) I a ;----n--- The diagonal matrix ( 0) 2 (b) i>-("f e;e, + I )( ~I e1)-' w here we have used (6.9.38b) to find 317 ( c) +.'3 1 is not needed in (0) because it is equal to 0 ..- 21 and the . ou- 2 term cancels. The Z matrix is equal to T he s olution for bML • P. a nd a~ is n onlinear e ven t hough t he m odel is linear in t he p arameters o ther t han p a nd 0 2 . V arious i terative procedures c an b e suggested to find P. a~. a nd b ML . O ne is to first guess a reasonable value o f p ( such as p = 0) a nd t hen c alculate b ML u sing (6.9.39a) o r e quivalently (6.5 .2). Next. (6.9.40a) is used t o g et a value for a~ w hich in t urn is used in (6.9.40b) t o o btain a n i mproved v alue o f p. T he p rocedure is r epeated until p is essential1y u nchanged. T he c onverged values o f P. a~, a nd b ML a re t he d esired values. -p ( n-J)IO -p I o Give the set of iterative equations for the model ' I = /3 and a I autoregressive errors . Solution The estimator for bMl' is given b y (6.9.22) with p replaced b y p. The estimated variance of U; is found using (6.9.4Oa). a;= { (/-p2)(y,-b d+ .f M ,-2 [y; - Py, _,-bMl(t-p>t}/n . - 'I-p -p 10 2 0-IOp I -p Example 6.9.3 .~ o 10 20 -p 1 0[(n- J)-(n-2)p] One array of measurements is generated by using Y,-1J1 + '1 where ' 1- Pf'-I + u, a nd the U; are found from a table of normal random numbers with unit variance and zero mean. The Monte Carlo analysis is obtained by generating a large number 3 .. C HAPTER' MATRIX ANALYSIS FOR LINEAR P ARAMttER £ S11MAnON o f sets of b l , b z, p, a nd a~ where each set corresponds to an array o f measuremenls. Table 6.17 is a summary of results of the Monte Carlo analysis for p -I with n -IO to 120 a nd Table 6 . 18 summarizes results for n -60 a nd p - -I 10 I . F or , ,-10 10 60 there were 34 sets of random data for each n ; for n -120, 17 sets were used. (In general many more than even 34 sets of dala should be used.) The terms with bars over them are average values found from Ihe Monte Carlo simulation. T able 6.17 Summary of Monte Carlo Investigation for p = 1 for Example 6.9.4 n R £FUENCIS 3.' Let us examine T able 6.17. T he relative errors i n;, ;'!, ",. a nd"2 decrease a s" is increased. T he first two estimates a ppear to be biased. I t is n ot c lear whether b, a nd "2 a re; however. except for h2 with p -I a nd , ,-10 t he biases indicated by (hl - 1l1)/ Il, a nd (h2 - Ili)/ 112 are very small. T he average estimated standard error o f b l divided by the true value is biased b ut n ot nearly as m uch as t he same ratio for which is about 0.36. This means that confidence regions based o n these estimated standard errors will b e t oo small. o r i n other words. the parameter estimates would b e presented as b einl m ore accurate than they really are. F or p -I in this example, the true value o f the covariance o f b l a nd b2 is zero while Table 6 . 17 shows small n flllive values. T he average number of runs for the modified residuals, O -'(Y - V). is nearly n/2. which is close t o t he(n+ 1 )/2 value expected for independent observations. From Table 6.18 most o f the same conclusions as drawn from Table 6.17 a re valid. An additional one is that only the cumulative error case ( P-I) for b2 has a much smaller estimated standard e rror r atio than for the other p values listed. F or further discussion, see reference 23. "2 JO 30 40 SO 120 0 .362 0 .714 0.783 0.855 0.932 OM 0.803 0.888 0.886 0.928 0.953 (bl - fJM f Jl - 0.00142 - 0.00047 0.00001 0 .00039 0.00058 eSl.s.e. (b l ) / s .e.(b l ) 0 .837 0 .831 0.837 0.901 0 .932 0.0663 - 0.00186 0.0013 0.0045 -0.00003 e sl.s.e.(b2 ) I s.e.(b1 ) 0.426 0.327 0.316 0.363 0.360 H ildebrand, F . B., M~t1totb o j A ppliN M at"-tic$, 2 nd e d ., Prentice-Hall, Inc., E nalewood O irr. . N . J., I96S. 2. Deutsch, R., & timatio" TMory, Prentice-Hall, Inc., Englewood Cliff•• N . J., 1965. eSl.cov(b l ,b 2 ) - 0.0091 - 0.0030 - 0.0023 - 0.0020 - 0.0009 l . Brownlee. K. A., StatuticaJ TMory a NI M~t1tDdol0fY I " S cimce aNI E ",i/tftrl"" 2 nd ed., runs III for O-Ie 0 .512 0 .472 0.486 0.481 0 .491 ... Draper, N . R . a nd Smith. H., A ppliN R egnulo" A M/ylu, J ohn W iley.t Sons, Inc., N ew P -;; (h2 - fJz)1 f Jl Table 6.18 Summary o f Monte Carlo Investigation for 11=60 for Example 6.9.4 p -I P - 0.5 0 0.5 - 0.485 - 0.024 0.430 0.881 0. 0.958 0.966 0.966 0.963 0.933 fJM I II 0 .00013 0.00016 0.00020 0.00014 0.00012 1.024 1.029 1.005 0.938 0.912 - 0.00048 - 0.00060 - 0 .00080 -0.00093 -0.00048 e st.s.e.(b2 ) / s .e.(b1) 0.997 1.004 0.981 0.917 0.353 e st.cov(b l .b2) - 0.00004 - 0 .00008 - 0.00016 - 0.00049 -0.00175 rUriS I n for O -I e 0.523 0.516 0.500 0.493 0.490 ( hi - e sl.s.e.(b l ) Is.e.(b l ) (h2 - fJz)1 fJz I. J ohn W iley.t Sons, Inc., N ew York, I96S. York, 1966. 5. Daniel. C . a nd Wood, F. S., Filti,., EquatiOfl,f to Data, Wiley-Interscience, New York, 1971. 6. Boll, G . E. P. a nd D raper, N. R., " The Bayesian Estimation o f C ommon P aramelen from Several Responses," BiOlfWtrlluJ 52 (I96S). lSS-l6S. 7. H unter, W. G ., " Eslimation o f U nknown C onstants from Multiresponse D ata." INI. Cllr",. , (1967), 461. 8. Welly, J. R., E ",in«rlttg Heal TralUJ~', J ohn W iley.t Sons, Inc., N ew York, 197... 9. Goldfeld. S. M. a nd Q uandt, R. E., NOfIliMtl' M~tllodr i,. EcoflOlftrtrlc$, N orth-Holland Publishins Company, Amsterdam. 1972. 10. Himmelblau, D. M., Proceu AM/y$U b y S latutical M~tllodr, J ohn Wiley . t Sons, Inc., N ew York. 1970. I I. B urinston, R. S. a nd May, D. c., HaNIbooIc o j Probability a NI Slatutic$ willi Tab/~$, 2 nd ed., McGraw-Hili Book C ompany, N ew York, 1970. 12. Beyer, W. H., HaNIboolc o f T obin JtW Probability aNI S tatink$, T he C hemical Rubber Co., 2 nd e d .• Cleveland, O hio, 1968. 13. Rke. J . R., 1 M A "'",1(imati_ o j FlUtCliOtU, Vol. I -U_, TMory, Addison-Wesley Publishina Co., R adina. M a• ., 1964. 14. Myers, R . H., Ruporu~ SJUjaa M~tllodolorY, Allyn a nd Bacon. I nc., Boslon. 1971. IS. Beck, J. V. a nd AI-Araji, S.• " Investiption o f a N ew Simple T ransient Method o f E",. - 0.969 -;; REFERENCES 310 16 . 17 . IS. 19 . 20. 21. C HAPTER 6 MATRIX ANALYSIS F OR LINEAR P ARAMETER E STIMATION T hermal Property Measurement," J . Heat TrafUfer, Trans . A SME, Ser. C 96 (1974), 59-64. Mendel. J . M.. D isaete Techniques o f Parameter Estimation The Equation Error Formulation, Marcel Dekker. Inc .• New York. 1973 . Hoerl. A. E.. "Application of Ridge Analysis to Regression Problems." Chem. Eng. Progr. SS (1962). 54--59. Hoerl. A. E .• "Ridge Analysis." Chem. Eng. Progr .• Symposium Series Vol. 60 (1964). 67-77. Hoerl. A. E. a nd Kennard . R. W ., " Ridge Regression. Biased Estimation for N onorthosonal Problems." Technometrics. 12 (1970). 55~7 . Hoerl. A. E. and Kennard. R. W .• "Ridge Regression. Applications to Nononhogonal Problems,' : Technometrics. 12 (1970). 69 - S2 . M arquardt. D. W .• "Generalized Inverses. Ridge Regression. Biased Linear Estimation. and Nonlinear Estimation." Technometrics. 12 (1970). 591~12 . 22 . Swed. F. S. and Eisenhan, C . " Tables for Testing Randomness o f G rouping in a Sequence of Alternatives." Ann. Math. Stat. 14 (1943). 6~S7. 23 . Beck. J. V.. " Parameter Estimation with Cumulative Errors." Technometrics 16 (1974). 85-92. 24. U. S. Bureau of the Census. Statistical Abstract o f the United States 1974. 95th Annual Edition. Washington. D. c.. 1974. A PPENDIX 6A A PPENDIX 6A A UTOREGRESSIVE M EASUREMENT E RRORS 311 • where (6AAb) P21 Do = P)IP21 + P)2 (6A.4c) P li The lower triangular matrix Do becomes cumbersome as the dimensions n X n become large. The inverse of Do can be found relatively easily, however. Write out (6A.I) for uj as U I=£I u2 = - P2 1f',+f'2 u )= - P)2f',-P)I£2+f') A UTOREGRESSIVE M EASUREMENT E RRORS or Consider the second-order model of autoregressive (AR) measurement errors. (6A.I) u =D;le where D; 1 is a square matrix with (6A.5a) three nonzero diagonals, for which "0= f' _ 1 =0. £ (u,u)=O for (6A.2) i=l=j (6A.3) Let us write out the first few terms of (6A.I). - P2' - P)2 D0- 1 = 0 - P)I (6A.5b) - P42 - P41 0 0 - P,,2 - P"I T he covariance matrix of the e rrors'" c an be written as (6A.6a) Then in general we can write the £ vector in terms of the u vector as (6AAa) where t/J is the diagonal matrix t/J = d iag[ a~ a: ... a,n (6A.6b) 311 F or n onlinear estimation problems the sum of squares function is or interest. It is given by T he inverse of '" is (6A.7) o where the inverse of Da has the relatively simple form given by (6A.5b). Because of the form of Da which is lower triangular with ones along the diagonal. the determinant of Da is unity. Hence the determinant of '" is " n (J~ , ''''I = let-I = where (6A.14b) (6A.8) i -I Again because et- is diagonal. we can write the relatively simple summation For maximum likelihood estimation we know that S ML " = ~ (F; ... - T ~I ( 6A.l5a) H;.a)2a;-2 i -I (6A .9a) cov(bMd = (X "'X) 313 APPENDIX 6A A UfORt:GRESSIVE MEASUREMENT ERRORS CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION " = ~ [( Y; - (6A.9b) P;I Y;~ I -Pll Y ;-l) - (1J; - P;I'II-I-P;2'J1-2)] 1 ( J;-2 ( 6A.l5b) ; -1 for the linear model 1J = X /l By defining F a=Da ly - T he above analysis is readily modified for first-order autoregressive cases by setting P;l equal to zero. Higher-order cases a re also treated simply by adding terms in Z ij... a nd F;.... (6A.IO) T he M L relations in (6A.9) c an be written in the weighted least squares form o f Covariance Matrices for First-Order Autoregressive Errors with Constant p T T b ML =( Z aet- ~ I Z" )~ I Zaet- F.. T ~I cov( b Md = (Zaet-a Za) (6A. lla) ~I Some covariance matrices o f ' " for the first-order AR case with constant P a nd ( 6A.llb) (6A.12a) F; .a= Y ; - Pi! Y;-I - P;2 Y; ~2 for i =2.3 . .... n (6A.16) are given below. F or this case 0 .. a nd et- are given by (6A.12b) Also because et- is a diagonal matrix. the matrices in ( 6A.II) c an be written in summation form as P pl j = 1.2• .. .• p al=a~ a~=ca~. Because of the simple form of the inverse of Da. the terms in Za a nd Fa are a nd Ie = 1.2 •...• p .. 0= pl 0 I 0 0 P p2 0 0 0 0 0 0 0 P et-=diag[ ca~ a~ . . , a~] ( 6A.l3a) p ,,-I j = 1.2 • ...• p ( 6A.l3b) P ( 6A.l7) . '.' 314 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION 325 Case 5. a nd then !J; can be multiplied out to get c APPENDIX 6A AUTOREGRESSIVE MEASUREMENT E RRORS (6A.2Ie) cp I +Cp2 l +p2+cp4 p (l+p2+Cp4) I + p2+p4+ Cp 6 symmetric (6A.18) Consider the terms in the square matrix in (6A.18). Notice that the diagonal term of the j th row, which we designate Gj' also appears as a product in t hejth row for the ( j+ I)th and succeeding terms. For p2:p 1 ,0 is given by l _p2J G =I+p2+ . .. +p2(j - II+{c_l)p2(j - Il= _ _ + {c_l)p2(j - 1l J l _p2 (6A.19) A sj-+oo and p 2<1, Gj approaches the value ( l_p2)-I. This could be called the "steady-state" value of the variance of EJ (1. Evaluating the difference betweeen successive diagonal terms of Gj for any p2 yields ~GJ=GJ+I- G =p2 V - Il[I-c(l- p2)] J I n each of these cases except case 2 a nd if p2 < I, the value of G is the same, namely, ( I_p2)-I. In Case 1 ,0 is this value o f ( l_p2)-1 f~r a llj. In Case 4, 0 increases monotonically to this value whereas in Case 5, G1 is the largest value and G decreases asymptotically to Goo' Hence it is possible j to have constant, increasing, o r decreasing V(~), depending on whether c =(I-p2)-I, c «I- p2)-I, o r c >(I- p2)-I, respectively. I f p =1 as in case 2, Gj =j , which clearly has n o steady-state value; Case 2 is called the cumulative error case. I f p2 > I, the Gj values grow very rapidly w ithj. Some matrices for the above cases are P (12 I P p2 P (6A.20) p2 p' I p ,,-I !J;al=~ -p p ,,-2 p ,,-I p ,,-2 p ,,-l p ,,-l (6A.22a) At least five cases can be identified using (6A.20): (6A.2Ia) !J;a2 = (1; Case 2. 8 0= I ; c= I, p = I. I 2 2 I 2 3 I 2 3 2 Case 1. 3 n (6A.22b) (6A .2Ib) p (6A .2Ic) pl p ,,-I I + pl Case 3. p(1 + pl) p ,,-2(1 + p2) I + p2+p4 p ,,-l(1 + p2+ p4) 1/!al = (1; Case 4. ( 6A.2Id) symmetric (6A.22c) · 316 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION P ROBLEMS 327 Using the substitutions suggested above for A a nd B results in APPENDIX 6B M ATRIX INVERSION L EMMA I mportant relations for inverses occurring in p arameter estimation are derived below. Let A be p x m a nd B be m x p. Let I", be the m x m identity matrix. An identity and some rearrangements o f it are as follows: - A(I",+BA)= - (I,+AB)A ( 68.1) I PXT+-I=V/lXT(XV/lX T + +fll PROBLEMS 6.1 Evaluate the following matrix products (a) - AD = - (I, + A8)A(I", + DA) - IB ( b) (68.2) I , = ( I, + AB) - (I, + AB)A(I", + BA) - IB ( 68 .3) o -I (e) Premultiplying (68.3) by ( I, + AB) - I yields ( I, + AB)-I = I , - A(I", + D A)-I B (68.8) I] [~ 1 ( d) (68.4) Let P be defined by (68.5) ( J) ( t') Using (68.4) for (68.5), with A=V/lXT",-1 a nd B =X. gives P = [ 1 - VIIXT", - I (I + XV/lX T - I) - IX ", [ri ]vII o ][ a ll e2 021 a ll ] 0 22 [ a ll AU 0 12 ][ e l AU 0 ~] 6.1 For X given by ( 68.6) I x-[i J] 2 3 4 This equation is called the matrix inversion lemma. Equation 6.1.5a is found from ( 68 .6) by letting a nd 0 - 1 with p -O.S in (6.9.4), evaluate XTO-IX. where O -OOT Choose the proper size for 0 - 1 6.3 Prove for A a nd B being nonsingular square matrices (68. 1) c an be written as ( I, + AB) - IA=A(I", + B A)-I ( AB)-I-B-IA - I ( 68.1) 6 ." A commonly given method for finding the inverse of a square matrix A given 328 CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER E STIMATION P ROBLEMS 319 convection from horizontal cylinders to liquids a nd gases. Also given are the logarithms to the base 10. by ra" (/21 A= a ll (/22 . 0 .1 0 .2 N uo a,"] G roPr 10gN~o log G roPr 1 0- 4 1 0- 3 1 0- 2 1 0-' I 10 J()2 loJ 104 1()5 - 0.3098 - 0.2596 - 0. 1798 - 0.0752 0.0334 0.1790 0.3243 0.4997 0.7300 0.9699 1.2095 1.4594 1.7101 1.9699 -4 -3 -2 -I 0 I 2 3 4 5 6 7 8 9 ( /2. 0.490 0 .550 0.661 0 .841 1.08 1.51 2.11 3.16 5.37 9.33 16.2 28 .8 51.3 93 .3 OM involves the use of cofactors . A cofactor Aij is ( - I );+jD" where D" is called a minor a nd is obtained by taking the determinant of the submatrix formed by striking out the row and column corresponding to the element a i} ' T he inverse of A contains the elements AJI/IA I. Using this expression for the inverse. verify (6 . 1.10) and (6 . 1.11). 6.S F or Model 5 of Chapter 5. 11, = fJIX;1 + fJ2X;2' show that the matrices (XTX) - ' a nd X Ty used in OLS are )(1 1 0' lOS 109 Let Y =logNuo a nd X =logGroPr. ( a) Using the last seven d ata pairs in the above d ata with orthogonal polynomials, find a o.a ••.. . •a4. Answer. 1.2212, 0.2450, 2.644 X 1 0- 3, 1.6667 X 1 0-', 5.4545 X 10 - '. ( b) F ind the sum of squares for each of the models that can be obtained from the results o f ( 0). 6.6 ( a) F or Model 5 of C hapter 5 give the components, XT -IX a nd XT -Iy, of ¥¥b ML given by (6.5.2). Let ¥- - I be given by W where the components are wij a nd wij~O for i~j. Answer. ( c) Assume that the assumptions 11111011 a re valid. A t the 5% level of significance give a recommended model in terms of N uo a nd G roPr. How does y our model compare with ( b) Using the results of ( a), write out the equation for b l • ML . Check y our answer by letting ¥- = 0 21 and using the Table 5.3 results for Model 5. 6.7 N uo- .53(Gro Pr)"2S Find (XTX) - I for the cases (a) Y, = fJl + fJ2X, + f j ( b) Y, =fJIXi+fJ2 X, 2+f; ( c) Y; =fJ,XiI+fJ 2X;2+E ; X I. = X 4 • = X 'I = - I; X Il=Xn=Xn= - I; 6.8 1.6814, 5.95 X 1 0- 4 ,8.21 x 1 0- 6, 8 .15x 1 0- 6, 6.80 X 1 0- 6 . with X I =1. X 2=2. X 3 =3, a nd X 4=4 . with X ,= - 2 , - I. O. I, and 2. for i = 1.2, . . .• 9. X 21=X51=XSI=0; X 42 -= X 51 = X 62 = 0 ; X 31=X61=X91=J. X 72 = X 82 = X 92 = J. In Welty [8, p . 247) the following are recommended coordinates for natural which is o ften used for 104 < G roPr < I o'? Repeat Problem 6.8 for the f irst seven d ata pairs. Using all the data o f Problem 6.8, repeat Problem 6.8. Use the orthogonal tables in Beyer [12, p. 505] o r s ome o ther book. 6.11 Using the last seven d ata pairs of Problem 6.8 with Y -Nuo a nd X -GroPr, estimate, using OLS, 6 .9 6.10 (0) fJ. a nd /12 in Y/ - /1.+/12X;+f/. 1 lI CHAPTER 6 MAllUX ANALYSIS FOR LINEAR PARAMETER ES11MAll0N (6) PI. P l' and PJ in Y ; p, + P1X + p)x,2 +«. j ( e) C ompare the residuals found using the resulls of ( b) with those given by the model • N uo-0.5638(GroPr) 6.12 PROBLEMS 6.16 t he average rainrall, wind velocity, temperature, etc. a t a ny location over a number o r years is periodic. Assume t hat the dependent v ariable" is the function of , given b y " . II. + III cos 2"" .H50 Using the data of Problem 6.8 a nd starting with the last pair of "observations" for Y -logNuo a nd X -logGroPr, use OLS sequential estimation for the model ( a) F rom reference 24, Table N o. 319, the average wind speed (in mph) a t the airport o f G reat Falls, Montana, is as follows: J an. 15.7 6.13 Repeat Problem 6.12 for the model Y - p, + P2 XI + fJJX/ + ( I I T he following data are a continuation of those in Table 6.2. Using the Y, d ata below and orthogonal polynomials. find a satisfactory model utilizing the F test at the 5% level of significance. Obs. No. Y, ( OF) 18 19 20 21 22 23 24 25 26 27 28 29 30 6.15 Time (sec) 1632 1728 1824 1920 2016 2112 2208 2304 2400 2496 2592 2688 2784 142 .93 139.34 136.04 132.94 130.07 127.39 124 .88 122.46 120.18 118.04 115.97 114.13 112.35 Suppose t hat" is given by Feb. 14.8 Mar. 13.4 Apr. 13.2 May 11.4 J une 11.4 J uly 10.3 Y - p, + 112Xj + (,. I 6.14 331 Aug. 10.5 Sept. 11.7 Oct. 13.8 Nov. 15.0 Dec. 16.0 Estimate II, a nd III using 0 1.8. C alculate the residuals. (Let the January value b e a t , ·0.5/12, etc). A _wer. 13 .1,2.64 (6) Suggest a nother model t hat may b e a ble t o fit the d ata b eller than the one given in (a). 6.17 T he following d ata a re normal monthly average temperatures (in o f) given b y reference 24, Table No. 310, for St. Paul, Minnesota, a nd S an Francisco, California. St. Paul S an F ran. St. Paul S an F ran. June 66.2 58.7 Oct. 49.2 61.4 Nov. 31.9 57.4 Dec. 18.2 52 .0 Apr. 44.6 50.9 Mar. 28.0 54.3 July 71.2 58.5 Aug.. 69.4 59.4 Sept. 59.1 62.2 ( a) Suggest a n a ppropriate function f or" t o describe the St. Paul d ata. The normal minimum monthly temperature is 8 -IO°F less for all months and the normal maximum t emperature is 8 -IO°F g reater for all months. a nd t hat m ,(J) a nd - - - 0 dl Show that for these conditions the model becomes ( b) Suggest a n a ppropriate f unction" to describe the S an Francisco data. T he n ormal minimum a nd m aximum monthly temperatures are within ± 7°F for all months. 6.18 T he model Y, • fJ. + fJl sin 2w" where fJ is the single parameter. 55.3 May 56.5 56.7 Feb. 16.2 53.4 Jan. 11.8 + (, is proposed for the following data. Very little is known regarding (" Estimate , I CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION P ROBLEMS P, l J2 6 .15 a nd 112 using the sequential method. Y, o 1 /12 1 /4 5 /12 1 /2 7 /12 3 /4 1 1/12 I 114 152 198 157 96 54 - 10 51 96 333 U sing the t emperature m easurements a t t imes 0.3, 0.6, 0.9, a nd 1.2 sec for all eigh t thermocouples o f T able 7.14. estima te P, i~ t he m odel T.,." - fJ ,. A ssume ' I}(;) = fJ, + f P) f il)=uj(t) f ii) = f j(i - J) + uj T- Use orthogonal polynomials for the Example 6.2.3 d ata. T he c omponents o f X Ty a re 3390.68. - 3302.44. 2608.68. - 1448.64. 966.00. a nd 2347.2. U se F test a t 5% level o f s ignificance to find the regression line o f s uitable d egree. R epeat P roblem 6 . 18 using the s tandard a ssumptions a nd the subjective p rior . i nformation t hat = 100 200 V p= [ 100 6.17 D erive (6.7.9) a nd (6.7.10). 6.18 T he following a re t emperature m easurements t aken f rom T able 7.14: IIp, = 50 100 ] 3000 IIp, I. time (sec) 94.56 96.52 15 18 E stimate fJ, a nd fJ 2 using this information a nd find the c ovariance m atrix of the estimates. T he s equential m ethod n eed n ot b e used. 6.21 p, + fJ, ( 1- 3.3)'/' f rom 1 = 3.3 to 7.5 sec. fJ, is e qual t o 2 q('lTkpc)-'/' w ith q b eing h eat flux. k t hermal c onductivity. p d ensity. a nd c s pecific heal. U sing t emperatures f rom thermocouples 5 a nd 6 f rom 3.3 to 7.5 sec. e stimate P, a nd fJ2' U se the sequential estimation m ethod f or m '" I . C alculate a nd e xamine t he residuals. Discuss y our a ssumptions . H ow c an t he model be improved? for I, = I. b , = 100.89. b 2 = 103.33. P , I = 4 .464 . P 2 = 0.0000. P n = 13.391 6 .20 i -2.3.4 U se m aximum l ikelihood estimation. Also find the 95% c onfidence i nterval. T he t emperatures for t hermocouples 5 -8 o f T able 7.14 c an b e d escribed b y Assuming t hat the s tandard a ssumptions hold. estimate the c ovariance m atrix o f these estimates. 6.19 f or j = 1. ... . 8; U j(i)-N(O.a'). E [ u j(i)ul(I)] = 0 e xcept w hen ; =1 a ndj = k. 6.26 Answer. (;) , ( 0) E stimate S how t hat P in 93.91 95.70 94.75 96.30 94.17 95.96 t he m odel T = 92.5 + f J(l- 12) using O LS e stimation. ( b) F ind t he 95% c onfidence i nterval for the e stimate o f p. W hat a ssump- tions a re n eeded? 6.29 6.22 6.23 6.24 N ote t hat if V p is sufficiently large c ompared w ith P MI.' t hen b MAP"",bML for a ny finite " fl' W rite a F ORTRAN o r p rogrammable c alculator. program to calculate the o rthogonal c oefficients using (6.3 .6). G ive c oefficients for r = 4. n = 9. a nd i = 1.2. .. .. 6. Using the eight m easurements a t t ime 0.3 sec o f T able 7.14 estimate fJ, in t he model T = fl" F ind t he 95% c onfidence interval. What assumptions a re n eeded? F ind t he 95% c onfidence r egion for the d ata of EJlample 6.7.3 for the two p arameters in the model k = fJl + fJ 2 T . A ssume the errors are additive. have zero mean a nd c onstant v ariance. a nd a re uncorrelated a nd n ormal. Also let T h ave negligible errors. T here is n o prior information regarding the p arameteTS. Let the a ssumptions o n t he m easurement e rrors be 2 -N(0.a 20) w ith a' unknown and 0 known X is errorless. T he p rior i nformation r egarding t he r andom p arameter v ector is ( J-N(pp. a 2V) w here V is k nown . Derive the M AP e stimators f or (J a nd a 2• bMAP '" P p+PXTO-'(Y - Xpp) 62 _ w here n! p { (Y - Y) T O -I(y - Y)+(b MAP - pp{ V -'(b MAP - I'p) }