Back to Parameter Estimation in Engineering and Science Home
Table of Contents
5___________________ CHAPTER INTRoDucnoN TO LINEAR ESTIMATION 131 5.1 MOTlVAnON. MODELS, AND ASSUMPTIONS variable on the conditions· u nder which the experiment is conducted; the method of least squares is frequently used to estimate the parameters. Analysis of variance refers to the breakdown of the variability of the observed values of the dependent variable into a part which is the sum of squares about the fitted regression function and other parts due to the exclusion of parameters o r groups of parameters from the regression function. Those using analysis of variance methods when the independent variables are limited in possible values to 0 and I (presence o r absence) tend to be unaware that a model is implied [ I, p. 243). Analysis o f covariance uses a combination of techniques which are specially adapted to o o r I independent variables and techniques needed in more general cases. 5.1.1 Models 5.1 5.1.1 MOTIVATION, MODELS, AND ASSUMPTIONS Motivation One of the basic principles in engineering is to start analysis with simple cases. F or that reason estimation of parameters in several simple linear algebraic models is studied in this chapter. Many of the estimation ideas can be introduced in connection with these models without the added complexities introduced by nonlinear algebraic models or by models described by differential equations. In addition to the pedagogic value of simple algebraic cases, there are numerous physical situations for which the regression function is linear in the parameters. Moreover, when the regression function is unknown and cannot be derived from first principles, simple models are usually proposed. Simple linear models have been widely studied by statisticians, economists. and others. Various terms designating certain parts of the study of estimation of parameters in statistical models have also been used to refer to much larger segments of that study. When the models are linear in the parameters, regression analysis a nd analysis o f variance are sometimes used interchangeably. However, regression analysis also specifically refers to the analysis of the dependence of the expected value of a random 130 Certain aspects of models are discussed in this section. First considered is the model functional form. which is termed the regression junction. Some restrictions on designs for these functions are also given. Second, two error models are discussed. In one there are measurement errors and in the other the random component is in the equation describing the system. Third, in the next subsection various standard assumptions relating to the statistics of the errors are given. The regression functions for the cases used are considered to have the correct functional forms, that is, not empirical approximations or best guesses. The functions considered in this chapter are linear in the parameters and contain a t most two parameters. For convenience in later references, the regression f.unctions used in this chapter are listed and labeled as follows: M odell, '11- Po (5. 1.1 a) Model 2, ' Ii-PIXI (S .l.Ib) Model 3, '11- Po+ PIXi (5. 1.1 c) Model 4, ' Ii'" Po+ P I ( XI- X); -f X- M odelS, '11- PIXII + P lX;l i-I X -I n (5. 1.1 d) ( S.I.le) The variable 'I is sometimes called the dependent variablet; Xi' XiI' and Xil are independent variables that might represent time, position, tempera·"Conditions" refer. for e umple. to the X, values in (S.l.Ic). t in the statistical literature Y is called the dependent variable. CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION 131 ture, velocity, cost, and so on . Clearly some of these models are . related. F or example, Model 2 reduces to Model I if X ;= I . Also, Model 5 mcludes both Models 3 and 4. In each case there is a restriction related to the me~s~re~en~s . Assume that there are n observations. For Model I the restnctlOn IS simply t hat there is a t least one observation or n> I . F or Models 2, 3, 4, a nd 5 the respective restrictions are as follows : (at least one X ;*O needed) " (5.1.2a) _ 2 (at least 2 different X i values needed) (5 . l.2b) _ 2 (at least 2 different X; values needed) (5.l.2c) L ( X;-x) *0 i -I " L ( X;-x) *0 i -I ( at least 2 different sets of X; .. Xj 2 ) (5 .1.2d) . where X = .I7_IX;/n. In each of the models, except the first, the independent vanables X; o r X ij could represent a number of equally .or unequ~lIy spa~ed values. Alternately, X; might represent values of vanous functIOns of time, I , such as I . I ! 3/ 3 " " I /. , s inal,., c osal;, e - Ol " 5.1 MOTIVATION, MODELS, AND A SSUMP110NS 133 E rror Model D. Errors (Noise) in Process (5.1.5) where Tlj represents the quantity being measured a nd YI is its measurement. Implicit in these models is the assumption that there is n o error in X I; that is, X; is not a random variable as are YI a nd f l' I n E rror Model D, TI; is also a random variable. In Error Model A there are errors in the measurements b ut there is none in TI. I n order to quantify f ; o ne c an study the error characteristics o f the measuring devices be they thermocouples, hot-wire anemometers, micrometers, etc. These errors can b e reduced by more precise devices. As technology improves, one would expect f ; in Error Model A to decrease. T he system model itself is assumed to be errorless o r noiseless. This implies that the physics is well-understood a nd t hat there is n o stochastic noise entering in TI . This would be the case for many physical measurements. Consider, for example, the steady state temperature distribution in a flat plate which is linear with position. The randomness in observed temperatures for repeated measurements would be the result of measurement noise rather than some physical phenomenon causing the fluctuation. I n E rror Model D the measurements are assumed errorless; b ut the model (TI) contains " noise"; t hat is, the variable being measured deviates b y some stochastic component from its expected value. A n example is t urbulent flow between two parallel plates. P art o f the universal velocity profile for turbulent flow is described by the expression In/, , o r some combination of them. The quantity a is here assumed to be dd " Th f known. In most of this chapter the errors are considered to be a Itlve. en or Model 3 (5.1.3) . :~ where f . is the unknown error and Yj is the measurement a t X,. T he model given b~ (5.1.3) can, however, represent the following two cases: Error Model A. Errors in Measurements TJ ; = /10 + /1. X ; (5.1.4) where the dependent variable u + is a dimensionless velocity a nd y + , the independent variable, is a dimensionless distance. I n this case instantaneous velocity measurements fluctuate about the mean value u + owing more to the turbulence phenomenon than to measurement inaccuracies. Hence this is an Error Model D case. F or E rror Model D £1 would not be expected to decrease with time (that is, with improved measurement capability). Also a study of the sensor would not yield any information regarding f l' Regardless of whether Error Model A o r D is correct, the estimation problem is formally the same for the physical models considered in this chapter. The meaning of TI a nd £ is different, however, a s a re the statistics I 134 CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION for t . We shall visualize Error Model A as the model considered in this chapter. ,~ U ORDINARY LEAST S QUADS IS11MATORS (OLS) 135 5.1 ORDINARY LEAST SQUARES ES11MATORS (OLS) I n ordinary least squares estimation the sum o f s quares function to be minimized with respect to the parameters is simply 5.1.3 Statistical Assumptions Regarding the Measurement Errors , .' Assumptions regarding the measurement errors should be carefully stated in each estimation problem. I f the assumptions d o not accurately describe the data, then one c an a t least pinpoint the assumption(s) which are not satisfied. The mere identification of the incorrect assumptions may lead to more realistic assumptions and thus better estimators. Different assumptions lead to different estimation methods. In this chapter we consider three commonly used methods: ordinary least squares (OLS), maximum likelihood (ML). a nd maximum a posteriori (MAP). T he following conditions given in terms of Error Model A a nd Model 3 a re termed the standard statistical assumptions for i = I , 2, . .. , n: I. Y1 = £ (YA Po'P') + £1 = 'Ill + [I (additive errors) 1. £ ([1)=0 (zero mean errors) J . JI ( Y11 Po . PI) = : 0 2 (constant variance errors, homoskedasticity) (S .2.1 ) wher~ "1 is a function o f the parameters such as Po a nd P,. I t .IS. I mportant to observe that no statistical assumptions are used in o btammg OLS parameter estimates, that is, the assumptions are - - - . I n order to make statistical statements regarding the estimators it is necessary to possess information regarding the measurement errors however. In derivations to be given we may need the variance o f'l;d r: where d is n ot a random variable. Assume that the errors in Y, a re addi:iv~, have z ;ro mean, a nd a re uncorrelated (assumptions I, 2, a nd 4, respectively). T hen (5.1.6) (5.1.7) (5.1.8) (Note £ (£;)-0 2 if £(£1)=0.) (5.1.9) 4. £ {( r, - £ ( £JI £j - £ (~))) .... 0 for i -:F j (uncorrelated errors) (or £(£,£.)=0 if £ (£;)=0 a nd i~j.) S. £/ has a rJormal probability distribution (5.1.10) 6. Known statistical parameters (5.1.11) 7. V (X1)=0 (nonstochastic independent variable) (5.1.12) 8. N o prior information regarding Po a nd p, a nd parameters nonrandom (5.1.13) I n order to describe the assumptions concisely a nd explicitly, we assign a I o r 0 t o the above assumptions where I means yes and 0 n o. F or a case when all the assumption are satisfied we designate them as I1111111 where the first I o n the left refers to the additive error assumption, the second I refers to the zero mean assumption, etc. In some cases additional numbers are used to indicate more information than a simple no. For example, for the uncorrelated error condition, 2 designates first-order autoregressive errors. See Section 6.1 .5 for a more complete list of possibilities other than I o r O. I f a n assumption is not used then a dash will be used in lieu of a I o r O. Assumptions 2, 3, 4, a nd 7 a re sometimes referred to as the G aussMarkov assumptions. (5.2.2) where the first assumption is used o n the first line o f (5.2.2), second assumption o n the second line, a nd fourth o n the third line. (5.2.2) is a special case o f (2.6.20). 5.1.1 Models I u d 1 h i· Po and "I - p, XI) Both Models I a nd 2 a re covered in this section. Since Model 2 is the more general, we start with it a nd then apply the results to Model I . F or Model 2 (,,; -PIX;), (5.2.1) c an be written " s- ~ I -I [~_p,X;]2 (5.2.3) CHAPTER S I NTRODUcnON T O LINEAR ESTIMATION 136 Differentiating S with respect to f3 1 replacing f31 by the estimator b l' a nd , setting equal to zero give the n ormal equation, 5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS) 137 Suppose that all the standard assumptions are valid except that C need not possess a normal density and 0 2 m ayor may not be known (assumptions 1111--11); then the variance of bo using (5.2.6) and (5.2.2) is j (5.2.4) (5.2.9) whose solution for Model 2 is From (5.2.5) and (5.2.2) the variance of b l is (5.2.5) (5.2.10) By setting X, = I in (5.2.5) the Model I estimator is (5.2.6) which is the average Y;. For these two estimators, no statistical assumptions are used but at least one observation must be made, and in the case of Model 2, at least one X; must not be zero. The predicted, regression, o r s moothed value is denoted Yj a nd is called " Yj hat." For Models I and 2, respectively, Y is . (5.2.7a,b) The residual e.. is the measured value of Y . minus the predicted value or e,= Y ,- Y; The residual e; is not equal to the error Ej (5.2.8) but it can be used to estimate Ej • Notice that (5.2.9) and (5.2.10) both indicate that estimates as accurate as desired can be obtained by simply taking a sufficiently large number of ob~ervations. This naturally requires that the underlying assumptions be vahd. I f the measurements were correlated, for example, this conclusion might not be true. Also note that for Model 2 (TI; = P IXI ) there is optimum placement of observations. Suppose that n observations are to be obtained and it is desired to obtain a minimum variance estimate by selecting the XI so that IX;I <: IXml· Then the variance of b l is minimized if all the measurements are concentrated a t Xm giving V(bl)=02/nX~. This would be the best choice of the Xi values provided there is no uncertainty in the model (i.e., functional form of TI;). Suppose that all the standard assumptions are valid except there m ayor may not be normality and 0 2 is unknown (1111-011). Then the variances of b~ a nd b l are estimated by replacing 0 2 b y an estimate which is designated s . The square roots of V (bo) a nd V (b l ) with this replacement are called the e stimated standard errors (or standard deviations), est. s.e.( bo) = sn - - 1/2 Using the standard statistical assumptions of additive, zero mean errors and nonstochastic X" f3o' and f3 1(11----11), we get for -the expected value of the Model 2 parameter est. s .e.(bl)=S[ -, ~- ';... One can also show for Model I that E (h o>= Po. Hence the least squares estimators bo and b l are unbiased for the stated assumptions (see Section - 3.2.1). (5.2.11 ) 1/2 Mean a nd Variances o f Estimates f ._-1 Xl] (5.2.12) Expected Value o f S min An estimator for 0 2 is not directly obtained using OLS as it is using ML estimation. One can, however, for the assumptions 1111-011 relate the expected value of the minimum sum of squares, designated S min' to 0 2• , CHAPTER 5 I NTRODUcnON T O LINEAR t snMATION ': ' I II U 1.19 ORDINARY L lAST SQUARES ESTIMATORS (OLS) Smin is f ound using (5.2.13) a nd (5.2.19) to be Since E ( Y ;- Y;)=O. .. E (Smill)"'(n - l)a 2 (5.2.13a) :,. ' , (5.2.20) a nd thus a n unbiased estimator for a 2, designated s l o r ';2, is and thus the expected value of Sm in is 2 -2 S -0 (5.2.13b) is valid for any number of parameters. I t still remains to find Vee;) in terms of a 2• I t is always true that V ( e J = V ( Y; - Y;)"" V ( Y; ) + V ( Y, ) - 2 cov( Y;. Y, ) Smin ----( n-I) I (Y1 - y1 )2 ( n-I) , n>1 (5.2.21) This expression is valid for one p arameter with assumptions 1111-011 a nd c an be used in (5.2.11) o r (5.2.12). For one parameter, s 2 c an b e estimated by only using two o r more observations. (5.2.14) EXimple 5.1. t The V (Y;) term is simply a 2• T he o ther two terlps a re considered below. For the one-parameter models we c an write YI = b lX; = X; I ~ l j so that An automobile is traveling a t a c onstant speed a nd the distances traveled a t the e nd o f I, 2, a nd 3 min a re measured to be 1.01, 2.03, a nd 3.00 km. Assume that distance is the dependent variable a nd time the independent variable. The regression function for this case is t hat the distance traveled, It, is equal to the velocity, v, times the duration traveled. t; in symbols, It - vt. Use OLS to estimate v. (5.2.15) using (5.2.2). For constant error variance a 2 the variance of is Y; for Model 2 Solution T his is a Model 2 case w ith" being t he parameter. Using (S.2.S) with f l being the Itl measurement, we find (5.2.16) a nd then letting X; = I we have for Model I (TI; = /10)' (5.2.17) l - [1.01(1) + 2.03(2) + 3 (3)]lt + 4+9r .I.00S k m/min Observe t hat the variance of the predicted value of Y; is a c onstant for Model I b ut increases with X/ for Model 2. The third term on the right side of (5.2.14) for assumptions 1111--11 a nd Model 2 is • - 2cov( YI' Y;) = - 2X;d;a 2= - 2[ where f l is the observation o f Itl' Example 5.1.1 An object is d ropped in a vacuum a nd t he position Ir is observed a t various times tl • T he observations o f Itl' designated f l' a re given a s 2] -I a 2 2X; I X. tl(sec) f l(m) Combining the above results yields for Models I a nd 2. respectively. 0.1 O.OS 0.2 0.2 0.3 0.4 0.4 0.8 T he measurements are to be used to estimate the local gravitational constant g. T he position It is described by the differential equation ii - g a nd t he initial conditions I t-li-o a t t -O; the solution for It is I t-gt 2 /2. (a) Using o rdinary least squares, find a n estimate o f g. (5.2.19a.b) which are both less than V(E;)= a 2. I n both cases the expected value of \ CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION ·48 (b) Using the standard assumptions except that 0 2 is unknown and be n onnal, give an estimate of the standard error of g. £, 5.1 ORDINARY LEAST SQUARES ES11MATORS (OLS) squares function, (5.2.1), with M odelS, ( 5.l.Ie), we have need not II S ... ~ [ yj-p,Xi,-P2XI2]2 . (5.2.22 I -I Solution (a> T he given model is the same as Model 2 with g being estimator for OLS is (5 .2.5) which can be written as j~1 Y"l] [ ,t 'J W e differentiate S with respect to P I' setting the derivative equal to zero a nd replace /3, by its estimator b, a nd /32 b y b2• Repeating the sam, procedure for P then yields the two normal equations 2 fJ a nd X I being 11/2. T he -I 4 g= - - 2 - -4- 2 (5.2.23b l where Then the numerator and denominator are, respectively, i y,2 i -I " (5.2.23a . b,cl2 + b2cll = d2 [ ! + b2c'2:3 d, b,c" ] II = ! { 05(. 1)2 + .2(.2)2 + . .. 2' I -I i f t /_i[(.1)4+(.2)4+(.3)4+(.4)4]""0.00885 i -i 2 a nd thus the estimate is g =0.08625/0.oo885-9.7458 m /sec . ( b) T he residuals, e; = Y; are, respectively, 0.00127, 0.00508, - 0.03855, a nd 0 .02034 a nd the sum of squares of these terms is Sm in=0 .001928. F rom (5.2.21) the estimated standard deviation is 1/2 1/2 Smln = [.0019281 =0.02535 ( n-I) ( 4-1) r;, s=[ c .,= ~ Xi/eXI" + .8(.4)'} = 0 .08625 I [ = .025351.00885 Notice that the coefficient c 12 a ppears in a symmetric manner in (5.2.23a, b). Solving (5.2.23a, b) for b, a nd b2 yields (for M odelS) N o statistical assumptions were necessary to derive the estimators given in (5.2.24a). Using the three standard assumptions o f additive. zero mean errors a nd nonstochastic Xi it c an be shown that b, a nd b2 a re unbiased estimates of /3, a nd P2' T he variance of b, c an be readily found by writing b, as r 1/2 (5.2.25) = 0 .2695 m /sec 2 which c an be compared with the estimate of 9.7458 m /sec (5.2.23c) (5.2.24b) a nd then from (5 .2.12) the estimated standard error of g is - 1/2 4 ~ 1;4] , -I YjXlk i -I (5.2.24a) 1 est. s.e.( g) = s - 4- II d.= ~ 2 Then using the standard statistical assumptions 1111--11 a nd (5.2.2) the variance o f b, is . V (b.)=ICt;- gia 2 =I(J/-2'/;gj+ g1)a 2 5.2.2 Two-Parameter Models .~ . ,., Model 5, 11,' = /3,X/I + /32 X /2 In order to simplify the presentation of the two-parameter cases, the general two-parameter case, Model 5, is considered first. Using the sum of , = [C~2CII- 2C22C~2 + C:2c22 ]a 2/ a2 of o r simplifying gives (5.2.26a) .. ~ 5.2 ORDINARY LEAST SQUARES ESI1MATORS (OLS) CHAPTER 5 INTRODUcnON TO LINEAR E SnMAnON 142 $.11.1 MotIelJ, I n a similar manner it can be shown that V(b 2) a nd cov(b l .b 2) are given by 0 :' . .. o T he predicted value of Y; is "'1- flo + fl IX, Model 3 results can be found from those o f Model S by replacing in Model S PI by Po. bl b y bOo fl2 by PI' b2 by bl' X'I by I, a nd X,2 by X" This gives (S.2.26b.c) :' (5.2.33) YI , (5.2.34) (S.2.27) T he variance of O ne must be careful where the squares are placed in 11',~ XZ note that ~ I 2 means the sum of Xi whereas ( IXi means the square o f the sum of the X values. I t also can be shown that 11 is also equal to I Y is then I 2 /I . l 1-n I ( X,-X). (S.2.28) From the above relations b l' the estimator o f f ll in Model 3, which is "'i - PO+fJIXi• can be found from b2 in (5.2.24a) to be n (I Y iXi)-(I Y ,)(IXI) b l ---..:.......:...:.......:...-.......:...:....:....--....::..:.. 11 (5.2.29) From (5.2.14), (5.2.28). and (5.2.29) the variance of the residual e; ( = Y ;YI ) is equal to (5.2.30) Then using the result that E(Smin) is equal to I Vee;) given by (5.2.13b). we find that 2 (5.2.31 ) since 11 == C II C 22 - d2' Consequently. for the two-parameter case with Model 2 S and assumptions 1111-011. a n unbiased estimator for 0 is ( n>2) (5.2.32) which differs from (5.2.21) in that there is a factor of n - 2 rather than n - I. Observe that (5.2.32) is properly meaningless for n = 2. For two parameters and two observations the two residuals must be zero also giving S min- O. Consequently. for two parameters, 0 2 can be estimated only if n >2. (5.2.3Sa,b) I -I where (S.2.26) is used. I t can also be shown that cov( Y;. Y;) is equal to the same value or = (n - 2)0 143 J (S.2.36) Using (5.2.35a) this expression can also be written (Model 3) b . I (X,-X)Yi l - I (XI-X)(YI- Y) :::z - I (Xi-X)2 _ _ _ _ _......:... (5.2.37) I (X,-X)2 where Y- I YJ n a nd the range o f each summation is f rom; = I to n. The estimator for bo c an also be found from (5.2.24a) b y using the expression for b l' Instead we shall use (5.2.23a) divided by n ( and b l-+bo a nd b2-+b l ) to get (5.2.38) Hence if X- IX,/ n is equal to zero, bo is simply Y. F or this reason a nd the resulting simplifications in (5.2.37), a transformation sometimes used in hand calculations redefines XI so that X- O. As mentioned several times above, no statistical assumptions are used to obtain the estimators for bo a nd b l given respectively by (5.2.38) a nd .(5.2.37). Suppose now that the standard assumptions are valid. A number · CHAPTER 5 INTRODUCTION T O LINEAR f STIMATION .... Unlike the variances of bo a nd b l' the variances of Y:. a nd e· are functions of i. Note that V (il ) has a minimum a t X;=X a nd ~aximu:n value a t the smallest or largest value of X;. . The variance of the residual e· is different I in that it has a maximum a t XI == X. The estimated standard errors of bo and b l are found from (S.2.39a.b) to be for assumptions 1111-011. y -i-] I x.2 est. s.e.( bo) = S [ F lpre 5.1 Linear model with Y being a random variable with constant 1 0 and normal e st.s.e.(bl)=s [ n /a ] probabilily distribution. of these are illustrated by Fig. 5.1 for Model 3, 11; = fJo+ fJ 1X ;. The normal probability density is superimpo~~d upon the curve for sev~ral ~alu~s. The first two assumptions of addItIve, zero means errors are Imp~l~d I D FI~. 5.1. T he third assumption of constant variance is depicted exphcltly, a~ IS the normality assumption (number 5). The nonst~~hastic ~; a~sumplton (number 1) is implied by the lack of a probablhty densIty I D the X; !; direction. Mean and Variances for Model 3 The OLS estimates of fJ o and fJ l are unbiased for additive, zero mean errors as was demonstrated for the more general case, Model 5: From (5.2.26), (5.2.33), and (5.2.34) the variances and covanance of bo a nd b l are (5.2.39a.b) where a is given by (5.2.35a). Assumptions 1111-111 ar~ used. From (5 .2.28) and (5.2.30) the variances of the predIcted value the residual e; can be written i; a nd (5.2.4Ia) V (e;)= [ I - 2 -I] 0 2 I -n- -n(X;-X) a 1/2 (S.2.42a) 1/2 • (S.2.42b) where from (5.2.32), s=-(Smin/(n-2)]1/2 F or Model 3 the sum of the residuals is equal to zero or (5.2.43) This interesting result can be used to check the accuracy o f calculations for the parameters. This result is true for any linear o r nonlinear model provided there is a fJoterm in the model, that is, a parameter not multiplied by a function of an independent variable, and provided OLS is used. Example 5.1.3 Experiments have been performed for the heat transfer to air flowing in a pipe. A dimensionless group related to the heat flow rat~ is the Nusselt number, designated N u . This is a function of the Reynolds number, denoted Re. which is proportional to the average velocity in the tube. Below are some values for the turbulent fluid flow range. (5.2.40) .... ... .45 5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS) Re Nu 1()4 32 2 x 1()4 60 4 X 10" 90 119 T he suggested model is N u'" aoRea• where the parameters are ao a nd a l' Reduce to a linear form and estimate " 0 a nd a l using ordinary least squares with log N u being the dependent variable. Solution Take the logarithm to the base 10 to get (5.2.4lb) sX 1()4 log N u - Iogao+ ollog Re .' ' .' CHAPTER 5 INTRODUCTION TO LINEAR £STIMATION 146 F or convenience write the model in the Model 3 rorm. 1Ji - Po + PI X,. with log Re . .. X, 10gNu ..... 1J,. T he tabulated values or N u are used to obtain log N u which is now Y, as given below 4.6990 4.6021 4.3010 4.0 X, ( -logRe) 2.0755 1.9542 1.7782 1.5051 Y, T he estimates or bo a nd b l are found using (5.2.37) and (5.2.38). In these equations the following are needed. _ IX, X- I t - 14.0+4.301 +4.6021 +4.699) 4 I ( X ,- X) 1- I ( X, - X) Y, - 1.5051 + I ~782 + . .. _ 1.82825 (4 - 4.40(525)1 + . .. + (4.699 - 4.4(0525)1- 0.30004 53 Normal random error terms (2) with a mean o f zero and unit variance have been a dded to the model 1 J,- Po+ PIX, with Po set equal to I a nd PI set equal to 0.1. T he " data" are tabulated in Table 5.1. ( a) Estimate the parameters Po a nd PI using ordinary least squares. ( b) F ind the estimated standard errors for bOo b l' a nd Y, using the standard assumptions except that the errors need not be normal and that CJl is unknown (1111·011). T lble S.1 O ltl for EXlmple 5.1.4 Observation I 2 ", X, 0 10 20 30 4 5 6 7 8 9 40 50 60 70 80 360 - IX, (4 - 4.400525)( 1.505 I ) + ( 4.301- 4.40(525)( 1.7782) + . .. -0.2335972 14'7 Exlmple 5.).4 3 -4.400525 f _ I nY, _ 5.2 ORDlNA~Y LEAST SQUARES ESTIMATORS (OLS) 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Y, E, - 0.742 - 0.034 1.453 0.963 0.040 0.418 1.792 - 0.374 - 0.222 3.294 - I!t 0.258 1.966 4.453 4.963 5.040 6.418 8.792 7.626 8.778 48.294 - IY, Solution ( a) T he O!::S esti!!!ators for bo a nd b l are given by (5.2.37) a nd (5.2.38). I n these T hen (5.2.37) gives equations X a nd Y are needed. b l I( X ,- X) Y, _ .2335972 - 0.7785397 I (X,_X)l .3000453 a nd from (5.2.38) bo is T he estimate or 0 0 is Thus the prediction equation for Nu is -I~ I 360 X - - '" X ,- -9(0+ 1 0+20+ . .. + 80)- --40 "'_I - I~ Y -- ~ "I-I I 4 8294 Y '--9(.258+1.966+ . .. + 8.778)--'--5.366 9 Additional required calculations a re given in the second, third. and fourth columns o f Table 5.2. T hen the estimates o f PI a nd Po a re bl - bowhere some of the decimal places have been dropped. 9 I (X,-X)Y, 2- I (X,-X) f- 611.93 -0.10198833 6000 b I X-5.366-(0.10198833)(4O)-1.2864667 which happen to be about 2% a nd 29% larger than the true values. CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION I. T able 5.1 Calculations f or E xample 5.2.4 X ;-X o 10 20 30 40 50 60 70 80 360 = ( X;_X)2 - 40 - 30 - 20 - 10 0 10 20 30 40 1600 900 400 100 0 100 - 10.32 - 58 .98 - 89 .06 - 49 .63 0 64 . 18 175 .84 228.78 351.\2 611.93 = 400 900 1600 6000 = ~(X; - X)2 ~X; Y; Y ;(X;-X) Y ,- -1.02847 -0.34035 1.12677 0.61688 -0.32600 0.03212 1.38623 -0.79965 - 0.66753 0.00000 . . 1.28647 2.30635 3.32623 4.34612 5.36600 6.38588 7.40577 8.42565 9.44553 ~(X,-X)Y; Y,=e; ~(Y; - 5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS) Statistical statements regarding the accuracy of the estimates are discussed in Chapter 6 in connection with the confidence region. The estimated standard error of the predicted (or smoothed) value of Y using j (5.2.4Ia) is • est. s.e.(Yj ) I .. - [n + S s= ~ [ n -2 ]1/2 = [5.937718] '/2 =0.921002 _ ~(Xt _ X) 2 .I = I [- 9 + ( Xj _4O)2 1/2 ] (0.921002) 6000 EstillUlton l or M odel 4, TI;" P~ + PI(X; - X) M odel 4 is interesting because a n umber o f the results h ave s imple forms. W ithout a ny s tatistical assumptions t he O LS e stimator for is Po - l: y. o'" y =--' n h Y;= f + b l (X; - X) = 5.366+0.10198833(X, - 40) and is also given in Table 5.2. The residuals e; are also given. Note that the sum is zero. ( b) In order to find Ihe estimated standard errors it is necessary to evalute .1 2 which in tum needs S min = ~el which is 5.937718. Then from (5.2 .32) - 2]1/2 (X ;-X) which varies from a minimum at X j -4O of 0.307 to maximums of 0.566 a t X, .-O . and 80. This latter v~lue is the same as for est. s.e.(bo) because bo in this case is also the X; . . 0 value of Yi • Y,) All eight significant figures given in these estimates are not needed, but it is usually wise to carry a couple of extra significant digits in the calculations because there can be small differences of large numbers. The predicted value of the dependent variable. Y • can be found from I 149 (S.2.44) a nd t he O LS e stimator f or PI is the s ame a s t hat given for M odel 3. U sing t he assumptions 1111-011 t he variance o f ho is 2 V (ho)- !!..n (S .2.4S) a nd t hat o f hi is given b y (S.2.39b). T he c ovariance o f ho a nd hi is simply 9 -2 (S .2.46) which is a n estimate of the standard deviation. Compared with the true value of unity this is only about 8% too low . From (5.2 .42a) the standard error of bo is est. s.e.(bo)= [ ~X,2 _ ]'/2s= [20400 ]1/2(0.921002)=0.56608 - (--) n~(Xk-X) 2 9 6000 and the standard error of b l is obtained from (5 .2.42b) - est. s.e .(b,)=s [ ~(X; -X) 2]-'/2 = 0.921002 (6000) 1/2 =0.011890 Notice that bo±esl. s.e.(bo) is 1.286±0.566 which includes the true value of fJo= I . b, ± est. s.e.( bl ) is 0.10 199 ± 0.0119 which also includes the true value of 0.1. T he v ariance o f ( S.2.4la,b). Y; a nd e; a re e qual t o those given for M odel 3, OptinuJI Experiments l or Models 1 t uUl4 I f o ne h as t he freedom o f t aking the observations a t a ny XI v alues f or estimating p arameters i n M odels 3 a nd 4, then o ne s hould select the XI values s o t hat t he m ost a ccurate e stimates o f p arameter v alues a re p roduced. Such designs o f e xperiments a re t ermed optimal a nd y ield o ptimal p arameter e stimates. O ur c riterion o f o ptimality i n this section is t hat o f m inimum v ariance o f hi. A m ore general criterion a nd a nalysis is given in C hapter 8. . Models 3 a nd 4 p rovide exactly the s ame O LS Y; values. F or t hat r eason we consider t he v ariances for M odel 4 f or assumptions 1111-011. T he CHAPTER 5 INTRODUcnON TO LINEAR ESTIMATION ISO variance of ,,~ is independent of Xi a nd the covariance of b~ a nd b l is zero. Hence only the variance of b l which is given by (5.2.39b) ~ed b e considered. Note that V (b l ) is minimized by maximizing ~(Xi -X)2. Let the maximum permissible range o f Xi be between X m in a nd X m Ol' T hen it can be rigorously shown that V (b l ) is minimized if one half the measurement are made at X m in a nd the other h alf a t X m u' N o i ntermediate measurements are taken. The optimal case is illustrated by Fig. 5.2. T he variances of b l with uniform spacing of the Xi values given by X i=(i-I + c)8. (5 .2.47) i = 1.2 • .. . • n for various models a~e given in the fifth column of Table 5.3 which is a s ummary of the results of this section. The spacing between the Xi values is 8 a nd the first X, value is X 1 = c8 where c is a factor locating X I ' T he largest Xi value is X" = (n - 1+ c)8. F or this uniform spacing the variance of b l is V (b ) _ " I- 120 2 (5.2.48) n(n2-1)6 2 I f o ne half the observations were located a t X m in == c8 a nd the other half a t X m ax"" (n - I + c)6. the variance of b l is (for this nonuniform spacing) V ( b )= " l 40 2 (5.2.49) n (_1)2 6 2 151 5.J ORDINARY LEAST SQUARES ES11MATORS (OLS) T he r atio o f V,,(b,)1 V,,(b l ) is 3( n - I) V" ( b, ) V,,(b,) -= (5.2.50) n+ I which is equal to I for n == 2 a nd monotonically increases to 3 a s n-+oo. H ence for large n. there is a factor 3 in the ratio o f variances o f b for the uniform spaced case a nd t he case o f p lacement of the observatio~s a t the extremes. In using the next to last column o f T able 5.3 o ne s hould note t hat X min= (5.2.51) c8. a nd thus X mill X m . . - Xmin 6- n- I ' C- -6' X min(n-I) == Xm. . - X min -----------=- (5.2.52) In this discussion o f o ptimal design o f experiments it is i mportant to note that the s tandard a ssumptions o f 1111-011 a re assumed. Also there should be no u ncertainity regarding the validity o f the model. I f the model is in question then o ne would be better advised to choose equal spacing o f the Xi values o r e qual spacing in " time" if X. is a function o f time such as t 2 /2 . ' 5.2.3 Comments Regarding Definitions I n this section a n umber o f definitions are given. Some o f these c an b e confusing. T here are. for example. several expressions related t o y .. We have ' • Yi - l); + E;. m easured value o f Y, Y.1 a nd E ( Y/ ) - l);. expected value o f Y; o r model o r d ependent variable 1\ Y1 . Y ,- I bo+ b,X,. p redicted value of Y; for Model 3 - ~y. Y '" - -' • average value o f Y/ for i -I to i . . n n xm ln . X m ax Xi f1I-e 5.1 Recommended location of measurements when model is know" to be • straight line in X. Also used is the symbolE; f or measurement e rror o r noise. This should n ot be ~onfused with the residual t!; which is Y/ - f;. T he i ndependent variable X, IS a ssumed t o b e errorless a nd h as a n average value given by X... ~X;/ n._AII these terms a re i llustrated in Fig. 5.3. Modified definitions for X a nd Y m ay b e used in subsequent sections when 01 is n ot a c onstant. cJt'l ''"''I : . . ,+ T able 5.3 M odel N o. ~ S ummary o r Estimators, Variances, and CovarillDCeS ror Five Simple Linear Models. Standard Assumptions o r 1111-111 Apply. Model Estimators Variances a nd C ovariances for Uniformly Increasing X,; X ,-(i - I + c)8; i -I,2, . .. V ariances a nd Covariance Variances a nd C ovariances f or 1 /2 M easurements a t X -c8 a nd R est a t X -(II-1 + c)6 11-2,3, . .. '11,- Po bo- Y 2 ' II;-PIX; bl - - - I Y,X, large II a2 a2 II II II 3a 2 11 38 2 " II[ cl +(11-1 + c)2]8 2 2a 2 2a " 11 38 2 6 al a2 V(b l ) - - - 2 I X} I X/ 11-2,4,6, . .. a2 II a2 V (bO>-,. 1 large II a2 11(11+ 1)(211+ 1)8 1 b f or c -I 3 '11,- Po+ PIX, a lI x l V(bO>- bo- Y-bIX I (X,-X)Y, al V(b l ) - bl - I (X,-X)2 2 I (.\j-X) cov(bo-b l)- 11(11-1) f or c -I 1 2al 11(112 _1)11 1 2 bo- Y - 6(111(11I2+ 2c)a 1 -1)8 l - 6al 1118 f or I(X/_X)2 a2 I (.\j-xf c ov(b;"bl)-O S '1/,- PIX/I bl - dl c u- dl Cl2 A 11(11_1)2 4a 2 11 38 2 4a 2 11(11_1)211 1 - 2(11-1 +2c)a 11(11-1)2 8 2 - 2a 2 ~ f or II>C II>C a2 II al II al II 1 2a2 11(111-1)8 1 12a 2 11 38 2 4 al 11(11_1)28 2 4 al 11 311 2 0 0 0 0 + PI(X,-i) V(b l ) - l II II I (X/-X)Y/ 2 -IaI a2 V(bO)- !L. bl - 2 [cl +(II-I+d]a l 12al 11 311 1 I (X/-X) o II>C f or c -I _2 ' I/,-P l 4 - Ia I - Xa l 4 f or f or c -I 2(211+ l )a l III(XJ -Xf 2 C ur V(bl)-~ +~XI2 b2 - d zClI- d lc l2 A V (bz)- c al _1_1_ A c lla2 c ov( - ~ " t-I,XI/rXI/ d "-I,Y,X,,, A-CIICU-C~l " If t he m easurements a re o nly a t X -(II-1 + c)8 a nd II>C. the v ariances a re o ne half o f t hose i ndicated. bFor uniform spacing, t hat is, X ,-(i-l + c)8. i -I.2 • ...• II. we h ave 1 1 1 X - ii IX,-y(II-I+2c)8. I ( _ ) 2 - rr"(12-1)82 X ,-X 2 I X?- 8 { (II+C)(II+C-I)(211+2c-1)-c(c-1)(2c-1)} 6 6 1st CHAPTER 5 IIIfI'RODUcnON TO LINEAR E SnMAnON 155 5.3 MAXIMUM LIkELDlOOD ( ML) f S11MAnON 5.3.1 One-Panmefer Cases " f Consider the linear model of ,,; - PIXI (Model 2) a nd i ntroduce this " 1 expression in (5.3.2). T he function I n/( yIn . ... Y.I P I) is maximized with . respect to PI by minimizing S Ml since PI appears only in S Ml' Differentiating with respect to P I' replacing PI by its estimator b l' a nd setting the derivative equal to zero yields the normal equation R esidual. Y i-YI = e i .,/ .~ . ~ True r egreulon (5.3.3) l ine, " " which can be solved for b l to obtain (for Model 2) P redicted l ine. Y (5.3.4) Figure 5.3 Figure showing some term. used in Section S.2. N ote that this expression reduces to exactly the same one as given by (S.2.S) for OLS if o~ - 0 2, a c onstant. Also note that by defining Sol MAXIMUM LIKELIHOOD (ML) ESTIMATION Maximum likelihood estimates make use of whatever information we have a bout the distribution of the observations. We illustrate M L estimation for the case of additive errors, YI = T I(XI.!l)+ f /' a nd when the erro~s f l h av: zero mean, are independent, are normal, and have known vanances 0 ;. T he X 's are errorless and the parameters are nonrandom. These assump. tions ~re designated 11-11111. This information can be used to obtain estimates of parameter variances. .. . . T he natural logarithm of the normal probabIlity denSIty for mdependent measurements is given by YI 1 ;=-, 01 XI Z I=- (5.3.4) can be written as (5.3.6) which is also similar to the O LS expression, (5.2.5); here I ; is analogous to YI a nd ZI t o XI' In terms o f 1; a nd Z I' S Ml is a sum of squares of terms which have constant variance a nd has the same form as for OLS. Finally note that the variance o f FI is unity. From the analogies given above between YI a nd 1;, XI a nd Z I' a nd 0 2 a nd unity, the variance o f b l c an b e f ound from (5.2.10) to be . where the "physical" parameters are only contained in S Ml= .. - L [YI 0-Tl;]2 1 V (b,)-(l:Z,'f'- S Ml' (5.3.2) ;_1 The one- and two-parameter cases are considered briefly in this s~ctio~. It is pointed out that the M L estimators for Models 2 and 5 can be gIven I n a similar form to those given by OLS. (5.3.5) 01 n -I [:1:(: (5.3.7) F or Model I. T lI- POt the estimator bo a nd the variance o f bo a re found by letting X; - I in the above two equations, bo• Y; Y= (l: Y;01-2)(l:0/2r I V (bo)- [l:0;-2r t (5.3.8a.b) (5.3.8e) CHAPTER 5 I NTRODUcrlON TO LINEAR ESTIMATION 156 5.3.2 T wo-Parameter Cases F or t he general model, M odel 5, given b y 1/; = f ilX iI + {32 X ,2' t he e stimators f or {31 a nd {32 a nd t heir v ariances c an b e o btained b y l etting (5.3.9) 5.3 MAXIMUM LIKELIHOOD (ML) ESTIMATION ( a) Estimate the parameters using ML. Let the standard assumptions apply except that we do not assume that ol equals a constant, 0 2• ( b) Find the standard errors for bo a nd bl. S olution ( a) For this example, the model is Model 3 and the estimators are given by (5.3.10) and (5.3.11). Note that X;=sin/;. Some of the required detailed calculations are given below. X; a nd t hus (5.2.24) a nd (5.2.26) c ould b e u sed for t he e stimators b l a nd b 2, t heir variances, a nd c ovariance. F or M odel 3, 1/ j = {3o+ {3I X " w ith a ssumptions 11-11111 (5.3.9) c an b e u sed t o f ind (5.3.lOa,b) ( 5.3.lla,b) 15'7 I 2 3 4 5 0 0.5 I 0.5 0 OJ - 2' X j-X (X; - X)20;-2 - 0.0239 0.4761 0.9761 0.4761 - 0.0239 5.723 90.660 95.273 90.660 5.723 288.039 X;0,-2 10,000 400 100 400 10,000 20,900 0 200 100 200 0 500 Y jo;-2 Y ;0;-2(X;-X) 4926.0 399.4 135.47 380.76 4996.0 10837.63 - 117.847 190.145 132.229 181.271 - 119.522 266.276 In addition to the sums indicated in the above table, X a nd' Y are found from (5.3.11) to be 500 Y= 1~~3 =0.518547 X - 20900 =- 0.0239234, Then from (5.3.10) 266.276 b l - 288.039 . . 0.924449 boos r. N ote t he new definition of X given b y ( 5.3.lla). T he s ame d efinition o f 2 is given in (5.3.8b) a nd (5.3.11b). F or c onstant 0 , t hese definitions for X a nd Y r educe t o t hose given in S ection 5.2. Example 5.3.1 ( b) The standard errors are found from the square roots of (5.3.12a,b) s.e.(bo)- IX/Oj-2/IOj_2jl/2 [ 200/20900 ]1/2 2 = =0.0057639 [ I ( X - X) O k-2 288.039 k _ Simple harmonic motion can be described by 11; = 130 + {J I sin I; where {Jo is a shift of the axis and (JI is the amplitude of the motion. Measurements and their standard deviations vary as indicated in the following table. ." Y- b I X=0.518547-0.924449 (0.0239234)=0.496431 " (0) I 2 3 4 5 0 30 90 150 180 OJ Y j 0.01 0.05 0.1 0.05 0.01 0.4926 0.9985 1.3547 0.9519 0.4996 s.e.(b l )=- [ I(Xl-X) 2 (Jl-2] -1/2 =(288.039)-1/2-0.05892 Least squares estimates of the parameters for this example are bo= 0.5 10329 and b l - 0.872829. The bo value is outside the bo± s.e.(bo) interval found using maximum likelihood. 5.3.3 Estimating CJ1 Using Maximum Likelihood W hen t he e rror v ariance is a c onstant, t hat is, a} = a 2 , a n e stimator f or a 2 c an b e o btained b y d ifferentiating (5.3.1) with respect t o a 2 a nd s etting the 158 CHAPTER 5 INTRODUCTION TO LINEAR E SnMAnON result equal to zero. T he result is 5." MAXIMUM A P OsttRlORl ( MAP) ES11MAnON where Vb,2 ( the variance o f b~ is given by (5.3.13) or (5.3.14) T his is unfortunately a biased estimator for 0 2• F or o ne parameter, the denominator should be n - I to provide a n unbiased estimator. F or t hat a nd o ther reasons use (5.2.21) to estimate 0 2 for one parameter a nd use (5.2.32) for two parameters when the assumptions 1111-011 a re valid. 5.3.4 M aximum Ukellhood Estimation Using Information from PrIor Experiments A fter one set of data has been used to estimate the parameters, a second set o f d ata may become available. I f the second set o f observations is i ndependent of the first a nd p arameter estimates based on all the d ata a re needed, then the first set of d ata c an provide prior information for analysis o f the second set. A method is given below whereby the number o f calculations in simultaneously analyzing all the d ata c an be reduced by taking advantage of the results o f the analysis o f the first set of data. F or simplicity let us derive the method for one parameter. The ML estimator for one set of data when the standard assumptions 11-11111 are valid is given by (5.3.6); assume that there are n l observations and write (5.3.6) as (5.3.15) where Vb" is the variance o f b. 1 Vb ,= V(b.I)=(,~ Z/) ,-1 159 -I (5.3.16) Consider now a combined analysis of n = n l + n 2 observations. Then (5.3.6) becomes (5.3.17a) (5.3.17b) We point o ut t hat (5.3.17) uses only the previously calculated b. 1 a nd Vb .• values; n o o ther information regarding the first n l o bservations is needed to calculate improved values o f b and V. T he same procedure c an b e used for more than one parameter. 5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION Therf'! a re several ways to introduce prior information. O ne o f these is given in Section 5.3.4 a bove for M L estimation. In this method, information from previous tests is included in such a way that exactly the same estimates a re o btained as if all the d ata were analyzed together. This M L m ethod also assumed that the parameters were nonrandom. Another way to include prior information utilizes the maximum a posteriori ( MAP) method. T he M AP estimators a re based o n Bayes's theorem a nd a re therefore called bayesian estimators. I n t he M AP method the parameters either a re r andom o r a re conceived as being random. Hence there a re two situations when M AP estimators might be used: ( I) when the parameters a re r andom a nd (2) when there is subjective information. W hat is meant by random parameters is discussed further below. In this section the s tandard a ssumptions o f additive, zero mean, uncorrelated, normal errors as well as known statistical parameters a nd nonstochastic independent variables a re considered to be .valid. Also, there is i nformation a bout a p rior distribution o f values o f t he parameters ({J). We assume this prior distribution to b e n ormal with known mean a nd variance. We assume throughout o ur experiment t hat the {J's a re c onstant, that is, nonrandom. These assumptions are designated 11011110. ( In Chapter 6 where a more detailed set o f s tandard assumptions are given, two particular sets o f M AP assumptions considered are designated 11--1112 a nd 11--1113.) 5.4.1 Random P anmeter Case In the random parameter case the parameter for a particular e xperiment o r set o f experiments is considered to be constant ( or nonrandom). This may be clarified by a n example. A particular steel is occasionally produced by a 160 CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION plant. The thermal conductivity is known to vary from batch to batch. The long-run room-temperature average thermal conductivity (the parameter. P. o f interest) is 20 W / m-oC with the standard deviation among batch averages being 0.1 W / m- 0c. T he distribution is normal. Then this information regarding the random nature of P from batch to batch is described by the probability density of f (P)=[(2'IT)1/2(O.I)r 1exp [ -k( Po~,2°f] (5.4. 1) 5." MAXIMUM A POSTERIORI (MAP) ESTIMATION . :et us now ~evelop a n e stimator for the parameter PI in Model 2, TJ; - PIX;.' PI belD~ chosen a t r andom from a given popUlation. With the a ssumptions mentIOned above a nd t hat PI is i ndependent o f [ -2"(0:4) 2] I Y -fJ feY) j • we have P I-N (Jtp. Vp ) (5.4.4a) t ;-N( O .al). E (t;P I ) =0 (5.4.4b) a nd t hus the (prior) probability density o f the random parameter PI is f ( P .) = (2 'IT Vp ) -1/2 exp f _! (PI - Jtp)2] t 2 Vp (5.4.5) a nd that o f Y I . .... Yn given PI is (5.4.2) f (YI . .... YnIPI)={n(2?Tan-1/2}exp[-t .~ (Yj-PIX;)2(Jj-2] , -I Let us use Bayes's theorem in the form f (PIY)= f (YIP)f(P) t Y; = PIX; + t it T he s tandard deviation of measurements Y, for a given batch is k nown to be 0 .4. F or a single normal measurement the probability o f this measurement given the true conductivity fJ of the batch is 12 -I f (YlfJ)=[(2'IT) 1 (0.4)] exp 161 (5.4.3) where (P I Y) is the posterior distribution o f P given Y. I t includes information both from a large number of batches. f ( P ), a nd from a given batch. f ( YI P). I f additional measurements Y; are made, they are also considered to be from this given batch. Since the parameter fJ appears only in the numerator of (5.4.2) and since it is convenient to take the logarithm of (5.4.2), we find that f ( PI Y ) is maximized by minimizing with respect to fJ. N otice in this example that the conductivity of a batch chosen a t r andom is a r andom parameter. Once the batch is chosen, however. all our specimens are from this batch and thus the expected value of each is the same. I f we examine the conductivity as a function of temperature, instead of having a single parameter corresponding to room temperature conductivity we have a regression function containing a number of parameters. These parameters vary from batch to batch but our estimates are estimates of the specific values of this particular batch. (5.4.6) I ntroducing (5.4.5) a nd (5.4.6) into (5.4.3) a nd then taking the logarithm of P.I Y1•• . .. Y,,) gives f( In [I( /1,/ Y, •...• Y ,) J- + (P I _p.p )2 Vp - t [( n + I) ln2n+ In " , +:!: Ina! 2-2] -In!(YI,· ... Y ) + l":(Yj-PIXj)aj n (5.4.7) Note that f ( YI. .. ·• Y,,) is ~ot a function o f the parameter PI' In (5.4.7) we a re effectively considering the j oint p robability o f each random choice o f ( both) PI a nd the subsequent collection o f observations. We con~entrate o ur a ttention o n those possible choices which include the ob~ervatlons we a~~ual~y o btained a nd h unt a mong them for that PI for whl~h the probablhty IS greatest. This PI we use as a n estimate o f the p articular ~alue for the batc.h chosen. Note that we are dealing with a random vanable. P I' a collection o f possible values. a nd a constant P I' the value actually chosen, that is, the parameter for the particular batch used in the experiment. Ta~ing the derivative o f (5.4.7) with respect to PI yields the normal equation, (5.4.8) 162 CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION which. after the addition and subtraction of p./IX/ within the summation, can be written as (5.4.9) where Y; F =-. I (5.4.lOa.b) 0; Solving (5.4.9) for b; then yields I F;Z; + p.p Vp- I I ( F; - p.pZ; ) Z/ b l=p.p+ I Z/+Vp-I .... I Z/+Vp-I (5.4.lla.b) The expected value of b l given by (5.4.11) is p.p. Hence the ~AP estimator for b l is biased since it is not P I' the value for the particular batch. The variance of b l is affected not only by the errors in the measurements, Y . but by the variability of PI from batch to batch. F or measurements in~olving a particular batch we are interested in the variability of b l compared to the value of the batch ( PI)' Hence we are interested in the variance of the difference. b l - PI' Using (5.4.11 b) we can show that Then the variance of the difference, b l - PI' is given by (5.4.13) where V ( P I)" Vp is used. Notice that as more observations are taken. the relative effect of the prior information regarding the random parameter diminishes. As the number of measurements becomes arbitrarily large, IZ2-+oo a nd thus V (b l - PI)-+O. This means that the variability of estfmators obtained using (5.4.11) approaches zero for a particular batch if a very large number of measurements are taken for this batch. . . Equations for the two-parameter cases involving Model 5 are gIven t n Problem 5.21. 5." MAXIMUM A POSTERIORI (MAP) ESTIMATION 163 to a certain proposition. In this context the concept o f developing probabilities utilizing repeated observations is regarded merely as a means of calibrating a subjective attitude. In this view to say that one thinks the probability is one half that candidate A will be elected president means that we have the same belief in the proposition " candidate A will be elected president" as we would in the proposition " a toss of fair coin will produce a head." We need not imagine an infinite series of elections in half of which A is elected. and in half o f which he is defeated. This view can also be applied to the estimation o f a physical property. The following is an example given in reference 3. Two physicists. A a nd B. are concerned with obtaining more accurate estimates of some physical constant P. known only approximately. Imagine.physicist i f is very familiar with previous measurements o f P a nd thus can' make a moderately good guess of the true P value; let his prior opinion about P be approximately represented as a normal density centered at 900 and having a standard devia tion of 20. (5.4.14a) This implies that A believes that the chance of P being outside the interval of 860 to 940 is only about one in 20. By contrast. suppose that physicist B has little experience regarding values of P a nd that his rather vague prior beliefs can be represented by .a normal density with mean of 800 and standard deviation of 200. f jJ(P)·[(27T)1/2 2OO ] - I exp [ I -8(0)2j - 2 ( P200 (5.4.14b) We can see that B is much less certain of the true P value because any value between 400 and 1200 is considered plausible. Suppose that one of the physicists performs an experiment a nd an observation of P is made. Further assume that this measurement contains a n additive. zero mean, normal error with a standard deviation of 40. The probability density o f Y is the same as given by (5.4.2) with the 0.4 replaced by 40. T o make the results more general let us use the notation f ( PII p.) for the prior subjective information for PI; for a normal distribution we have 5.4.1 Subjectlye PrIor Information Some authors such as Box a nd Tiao (3] regard the prior probability distribution as a mathematical expression of degree of belief with respect (5.4.15) CHAPTER S INTRODUCTION T O LINEAR ESTIMATION 164 T he conditional probability density of ! ( Y I ••••• Ynl {31) is given by (5.4.6). F or this case the use of Bayes's theorem leads to maximizing the natural logarithm of the p roduct!({3I/1)!(Y I •··· • Yn l{3). o r ~ -1 [(n + 1)ln20+ In.;+ ~In.,' I n[f( P,I.)f( Y, ....• Y.I P,) J + ({31 - /1)2 2 0~ ~( Y; - {3IX;)2] +----2 0, (5.4.16) 1615 5." MAXIMUM A P OSTERIORI (MAP) ESTIMATION W e see that after the single observation the ideas o f A a nd B a bout {3 ( represented by the posterior distributions) a re m uch closer t han b efore using the observation. N ote t hat A d id not learn a s m uch from the experiment as did B. T he r eason is that for A t he uncertainity in the measurement indicated by 0 =40 was larger than that indicated by the prior s tandard d eviation. 0 " = 20. I n c ontrast, for B t he uncertainty in the measurement was considerably smaller t han t hat o f B 's p rior (o~ = 2(0). F or A t he greater influence o n the posterior distribution is the prior whereas for B t he measurement h as g reater effect. As, however, more a nd m ore Y j m easurements a re u sed for estimating {3, (5.4.17) a nd (5.4.1S) i ndicate that the prior information has less a nd less effect u pon t he estimate a nd its s tandard d eviation. which is q uite similar to (5.4.7). T he e stimate for {31 is ~(F, - /1Z; ) Z; b =/1+ I ~Z ,2+0-2 J" ~ F,Z, + /10,,- 2 =----~Z2+0-2 5.4.3 (5.4. l7a,b) J" 0; . which is identical to ( 5.4.lla.b). with Il being p.p a nd Vp b eing I t is also very similar to (5.3. l7a.b) which give M L e stimations for a combined analysis o f two sets of observations. As for the random parameter case the expected value of b l a nd the variance b l - {31 are (5.4.18a.b) Note that though the estimators given by (5.4.11) a nd (5.4. 17) a re identical in form. the meanings attached to the quantities /111' VII' a nd are different. Let us r eturn to the example o f the two physicists. F or o ne measurement Y ... 850 the estimator b a nd its variance for physicist A a re (since X ; = I for 11 = {3) 0; Comparison o f Viewpoints T hree d ifferent types o f p rior information have been discussed. First, in Section 5.3.4 prior information from actual experiments is combined with t hat from a new set of experiments. O nly m aximum likelihood need b e used a nd t he ideas a re relatively straightforward. I n t he M AP cases, which use Bayes's theorem, the ideas a re less clear a nd h ave been the subject o f c ontroversy. In the first case, the parameters a re r andom, as in the case of the thermal conductivities o f d ifferent batches o f steel in the example above. I n t he second M AP c ase the parameters a re n ot r andom b ut o ur p rior b elie! c an b e i ncorporated i nto a subjective prior. F or e ach viewpoint the form o f t he p arameter e stimators a re identical. T he o nly differences a re in symbols a nd m eanings o f the terms for the prior mean a nd v ariance. I n e ach case, the variance o f b .- {3. gives the same mathematical expression. Problem 5.21 gives the estimators for the two-parameter model (Model 5). E xample 5.4.1 A scientist has measured a certain physical phenomenon and obtained the data given below. From knowledge of his measuring device, the variances of the measurements are also given. From his previous experience he feels that he can give a prior nonnal distribution with a mean of 1.01 and a variance of 0.001 for the parameter. R epeating the same calculation for physicist B gives b B =S48 a nd V (b B )= 153S . N ote t hat though the observation was the same for both physicists, the different normal prior distributions resulted in physicist A having the posterior distribution of n (890, 17.92 ) a nd physicist B h aving n(84S, 39.22). H ence physicists A a nd B have different estimates a nd d ifferent standard deviations of 17.9 a nd 39.2. respectively. XI 1 2 3 4 0 .01 0 .1 1 10 YI 0; 0.02 0.12 0.8 13 0,01 0 .05 0.1 2 CHAPTER 5 IN11tODUctJON t o LINEAR E SnMAnON 166 The regression function is 7/,. fJlX, and the assumptions regarding the data are ror;~j V (X,)-o . (J,2 values are known. Estimate fJl using ( a) OLS, (b) ML. and ( c) MAP estimators. Also find the variance o r the estimate in each case. S olution The assumptions given above can be designated 1101 I I 10. Various sets of assumptions are used in the dirrerent estimator methods. ( a) The OLS estimator does not use any statistical assumption. Using (S .2.S) the estimate is bl.oLS-II Y j J [IX/rljX 0.01(0.02)+0 .1(0.12)+ 1(0.8)+ 10(13) 0.0001 +0.01 + 1 +100 . . 1.29S0 The calculation or the variance or hl .ot.s does require some assumptions: we use those designated 1101-11-. With the nonconstant a 1• (S.2 . 1O) is not valid for finding the variance. Instead the reader should derive V(bI.O LS ) - [IX/a,1] [ IX,1r 2 0.0001(0.01)2 + 0.01(0.OS)2 + 1(0.1)2 + (100)(4) --------~---------~ . . 0.0392 (101.0101)1 ( b) F or ML estimation the assumptions needed are those given above. Prior inrormation is not used. From (S .3.4) and (S .3.7) we find I Y,X,a 1 Ix1 - 1 j- b l .MI.- J (JJ 0 .02(0.01)(0.01)-2 + . .. + 13(10)(2)-1 1 2 1 1 1 (0.01) (0.01)- +(0.1) (O.OS) + 11(0.1)- + 101 (2)-1 - IIIj 3 " '0.91769 0 V(hI. MI) - [IX/a} - 1 r 1 - (130) - 1-0.00769 ( c) For MAP estimation the subjective prior information is included. Using the assumptions given above permits the use of (S.4.17b) and (S.4 .18b) to get I 119.3+ 1.01( .OOW - ------0.99938 130+ (.001) - 1 V(bI. MAP ) - (IZ/+ a,,-1] - 1_(1 130) - 1-0.00088S For the OLS estimation no statistical assumptions are used : this implies that no inrormation is used regarding the errors. Maximum likelihood estimation uses 5.5 MULTIPLE D AtA POINtS 161 information regarding the measurement errors. MAP estimation uses the prior information regarding the parameter in addition to the information used i n M L estimation. This suggests that the parameter variance for ML would be less than that o f OLS a nd that o f MAP would be the smallest. This is indeed what occurs in this example. However. if many additional measurements are given. the effect of the prior information is to reduce the disparity in values given by ML and MAP. I f the errors do not have constant variance, the OLS values could be different from those given by ML and MAP even for a large number of observations. 5.5 M ULTIPLE DATA P OINI'S One way to gain insight into the assumption of the constant error variance (that is. alo:::: ( 2) is to use repeated measurements. For Models 2, 3, and' 4, this means to have more than one measurement of Y a t each XI' For Model S repeated measurements occur for more than one Y/ value a t each combination of Xit' X/ 2• Repeated measurements are not always possible to obtain, but whenever possible they should be obtained for each new problem until the nature of the dependence of 01 on ; is understood. Furthermore, multiple data points could be useful in investigating the validity of other assumptions such as those of zero mean, uncorrelated, and normal errors. In some cases repeated measurements can be simply obtained by investigating another specimen a t the "same" conditions. In other cases, repeated measurements can be obtained by using several sensors attached to the same specimen. An example of the latter is for temperature mea'surements in solids and fluids; the thermocouples (if they are used) might be all placed to measure the same temperature. The same could be true for other sensors as well. I t is important to distinquish between repeated measurements and taking repeated readings of the same measurement: A failure to do so may lead to inefficient design of experiments and to erroneous statements regarding accuracy of the parameters. The difference between repeated measurements and those that are essentially repeated readings can be illustrated by an example involving the temperature history of a solid copper block that is initially hot and then allowed to cool in open air. Several thermocouples are attached to it. Because of the high thermal conductivity of the copper the temperature of the block is quite uniform throughout it a t any given time. The temperature of the block gradually decreases with time, however. Consider first a given thermocouple. At any time the thermocouple would yield a temperature measurement which is in error owing to a number of different factors. Perhaps the largest factor is that due to 161 CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION calibration errors. Over the whole calibration temperature range the average error is nearly zero b ut a t most temperatures the calibration error is n ot zero. Hence if several temperature measurements are made with only a short time interval between them. the " same" calibration errors would be in each measurement. Very nearly the same measurements would be obtained so that these could be considered repeated readings of the same measurement. These repeated readings may contain random components b ut the variance would be small compared to the calibration error. A repeated measurement of the temperature a t a specified time is more appropiately given by another thermocouple embedded in the specimen. I t too would have a calibration error but the error would be independent of that of the first one (provided the calibrations are independently made for each sensor). I f a measurement is taken a t some later time when the temperature has dropped considerably. the calibration error in the temperature measurement will be nearly independent of the early measurements for the same sensor. I t is also possible to obtain repeated measurements involving thermocouples (or other sensors) using the same sensor. This would occur in the above example if the calibration were very good a nd the associated variance were small compared to fluctuations in the readings due to electronic noise. For example. it might be that unbiased measurements of the temperature of a stirred water-ice mixture would produce values of 0.11. - 0.06. - 0.01. 0.03 •...• 0.05°C when the correct value is O°c. T he same type of random measurements might be produced for small o r large time spacing between the measurements. In this case the errors are random with zero mean. These measurements can be considered repeated values even if the " same" specimen and sensor are used. The above examples illustrate that it is necessary to be careful to distinguish between repeated measurements and repeated readings. 5.5.1 Sum o f Squares T he case of ordinary least squares is first considered. O ne can always number the observations so that we can write n s= .~ L ( Y,-1Ji , -I (5.5.1) if there are any repeated values. the estimators given in Section 5.2 still . apply. Some saving in effort. however. can be sometimes achieved by denoting the observations YI.J. a nd the regression function .... .. There might '1)0 Ie 5.5 M ULTIPLE DATA P OINI'S b e m l measurements of Y a t X I' m2 measurements a t X 2, . .. , and m, a t X,. Typically the Y values will be designated Ylr for location X; with j = I, 2, . .. ,m;. T hen (5.5.1) c an be written , S= , WIj LL ( Yij-1Ji L where m j=n (5.5.2) j-I i-I j-I Let us now derive another expression for .S t hat is frequently easier to use than (5.5.2). I t applies equally well for both linear a nd n onlinear cases a nd shows that minimizing S n eed involve only means of the Yy's for each i. C onsider first the identity, (5.S.3) where Y/ a nd a nother mean (to be used later) are (S.S.4) Squaring a nd summing (S.5.3) over i a ndj gives + 2 L ( Yij- Y;)( Y;-1J;) (S.5.5a) I ,j (S.5.Sb) T he cross-product sum in (S.S.Sa) is zero because the summation on j is e qual to zero. Note that the first summation in (S.S.5b) is n ot a function of the parameters. Hence for linear a nd nonlinear p arameter estimation problems with repeated measurements the same parameters will be found if we start with the function (5.5.6) rather than (5.5.2). Note that (5.S.6) requires less computation, however. I f the measurement errors are independent, b ut h ave variances dependent only o n i , maximum likelihood estimation (with the assumptions 11-11111) 171 CHAPTER 5 INTRODVcnON TO LINEAR E SnMAnON 5.5 MULTIPLE DATA POINTS c an be performed by minimizing y , S I = ~ (F, - H;)l .0-: .• - II V i-VI (5.5.7) T-------------~~~~~II i -I I V;. XI" __ 1"1)1 where Ib~. X, I--------.L....----.r (5.5.8) -- __ I I ). • XI" I r elreuion v alue a t XI e xpected v alue a t XI I"~. i", W hen estimating parameters using repeated measurements, it is necessary t hat, >p where p is the number of parameters. In Model 3, for example, estimates of Po and PI would require measurements at no less than two different XI values regardless how large ,. is. 5.5.2 Parameter Estimates X Parameters can be estimated by minimizing (5.5.7) for various models given in this chapter. Economy in obtaining estimators can be obtained by utilizing previous results. Consider first Model 2 (T/; = PIX;) a nd ML estimation with the assumptions 11-11111. T hen b l is given by (5.3.6) with F; as defined by (5.5.8) a nd Z I by (5.5.9) T he variance of b l is given by (5.3.7) with Z I defined by (5.5.9). F or M odelS given by T/; - PI XII + fJ2XI2' the estimators b l a nd bl , their variances, and covariance can b e obtained from (5.2.24) a nd (5.2.26) by letting FIIIft 5.4 Relationships a mon, observations, etc. for repeated measurements. Example 5.5. t F our measurements are made for both XI - 0 a nd X l-SO with the same errors e, as in Example 5.2.4 except the fifth error is not used. Then the Y I) measurements at XI a re 0.25B, 0.966, 2.453, a nd 1.963, whereas a t X l-BO, Y~ is 9.4IB. 10.792, B.626, a nd B.778. The assumptions of additive, zero mean, constant variance, uncorrelated, normal errors, and errorless X, are valid. There is n o prior information and a 2 is unknown. ( a) Using expressions developed in this section, estimate the parameters flo and fll in Model 3. (b) Find the estimated standard errors of bo a nd b l • Solution (5.5. IOe,d) F or Model 3, TJ/-fJO+PIXI, the M L results can be obtained from the above proc~dure more simply from ( 5.3.10-12) by replacing al by al/m/ a nd Y(by Y/. T he number of terms related to Y a nd T/ has increased in this section. In addition to the observed value Y!I' th~re is the value ~, which is the average of the Yu values at a given XI' YI is the predicted regression value a t XI; 1J1 is the actual regression "yalue a t Xi' that is, by ~efinition E ( Y/i) a nd thus the expected value of Y/ a nd of Y/ also; and Y is the weighted average of the Yu values over all the XI values. These symbols are illustrated by Fig. 5.4. (a)With the assumptions given, 11111011, the estimates can be obtained using OLS o r ML. T he simplest expressions to use are those given by (5.3.10,-12) by replacing al by a2/ m, a nd YI by f,. Since al is a constant (5.3.10) a nd (5.3.11) can be written I~f,m/(XI-X) bl ------:...--,..Ijm)(.\j - X t ( a) ( b) (e) 171 CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION In the above equations r =2, m l=m2=4, X I=O, and X2=80; 173 a nd the estimated standard error is s -0.9867.Then using (d) and (e) also 2 6400(4)/8 ] 1/2 est. s.e.(bo) =0.9867 [ 2(1600)(4) - 0.4933 ~ 4 X. X -- • - I8 5.6 C OEFnClENT O F MULTIPLE DETERMINATION ( R2) = 8I" [ 0 + 4( ) ] = 40 80 est. s.e.(b.)-0.9867[2(l6oo)(4)] - 1/2 - 0.00872 Though the value of bo is less accurate than that given in Example 5.2.4, the variances are smaller in this example than in Example 5.2.4. These estimated variances corroborate the theoretical result that smaller estimated variances are generally obtained for Models 3 a nd 4 by concentrating the measurements a t the minimum and maximum XI values. 2 ~ ~2 y= k-~ = 5.6 C OEFFICIENT O F M ULTIPLE DETERMINATION ( Rl) ! (I.41O+9.4035)-5.40675 Then using the expression (a) for b l, we obtain = [ 1.41(4)( - 40) + 9.4035(4)(40)11[4(1600) + 4(1600)] -0.09991875 ( b) The expressions for the estimated standard errors can be obtained from (5.3.12) by replacing u ;-2 by m ;s-2 to get In this section the sum of squares are compared for two different models applied to the same data. Ordinary least squares is used as the estimation procedure. The analysis will start in sufficient generality to permit the models to be linear o r nonlinear in the parameters. Later the results are specialized to Models I a nd 4. In the following discussion we consider two models, designated A a nd B. Frequently Model B has the same functional form a nd parameters as Model A except there is a n additional parameter in Model B. M any authors restrict the meaning o f R 2 to the case where Model A is Mode:l I. / Let A Y b e the predicted value of Y/ for Model A a nd BY; for Model B. We start with the identity (5.6.1 ) (d) , which can be also written as _ e st.s.e.(bl)=s [ ~(Xk-X) 2 m.] - 1/2 Since ~ere a r! two X; values and two parameters, the predicted line passes through YI and Y2 • Then the minimum sum o f squares resulting from (5 .5.2) is the first term on the right side of (5.5 .5), .'It Smin= ~ _ ( 2 Y ij- Y;) = 2.9179+2.9239=5 .8418 (5.6.2) ( e) for which the residuals for Models A and B are defined by (5.6.3) a nd Let us square a nd sum (5.6.2) o ver; to get 'J (5 .6.4a) so that the estimated variance of the errors is S2= S min/(n - 2) = 5 .8418/6=0.9736 SST = SSE + S SR + 2SC (5.6.4b) ·7" 5.1 ANALYSIS o r YAIlIANCE A BOtrr 1 1IE SAMPLE MEAN CHAPTER 5 INTRODtJcnON TO LlNtAR f SMMAnON Each term in (5.6.4b) corresponds to the term in (5.6.4a) directly above. N ote t hat SST is t he minimum sum of squares for Model A a nd SSE is the minimum sum of squares for Model B. Let us specify Models A a nd B so that R 2 c an be calculated from (5.6.7) which becomes SSR R 2=1- IBel = -1- SSE I,.e/ SST (5.6.6) l:(Y,- i)2 b:I(X,-X)2 bll:(X,-X)Y, SST I(~- R 2_--_ (5.6.5) which would be always true if Model A could be obtained from Model B by making a certain parameter in Model B equal to zero. Divide (5.6.4) by the left side a nd rearrange to the form 175 .• , it _ l :(Y,- it _ I(-Y,- i)2 (5.6.11) . where i is associate~ with Model A ( Modell i n this case) a nd Y, with Model B (i.e., 4). I f Y, . . Y" t hat is, the prediction is perfect, then R 2 =- I. I f i, t hat is, bl - 0 o r the model Y =Po + e alone fits the dat,!.. R 1 ... O. T hus R 2 is a 'measure o f t he usefulness o f t he term P I(X,- X) in the model, it being n ot n eeded for R 2~O a nd n eeded for R 2~ I . R 2 a s given b y (5.6.11) is the correlation coefficient o f (2.6.17). Y,'" Example 5.6. t where R 2 is ca11ed the coefficient of multiple determination a nd is defined by Investigate the goodness o f fit as indicated by R 2 for Example 5.2.4. Solution (5.6.7) Because of condition (5.6.5), an examination o f (5.6.6) reveals that 0 <: R 2 <: I where R 2~0 corresponds to both models being nearly as effective a nd R 2~ I corresponds to Model B being much better than Model A. Then R 2 c an b e used to say something about the improvement in the "goodness o f fit," R 2_0 being the poorest a nd R2"" I being the best improvement in using Model B r ather than Model A. F or nonlinear problems, the parameter estimates a nd sum of squares can be found separately for Models A a nd B a nd then R 2 would be evaluated using (5.6.6). F or the simple linear models given next a simplified form o f (5.6.7) is frequently used. A classical case considered in connection with R 2 is for Models I a nd 4 b eing A a nd B, respectively, (5.6.8) The term SC in (5.6.4b) a nd (5.6.7) is then (5.6.10) where the normal equation for Model 4 a nd p arameter p. was used. Hence Using (5.6.11) a nd values given in Example 5.2.4 gives 2 R 2_ (0.101988) (~) - 0.9131 1:( Y ,- 5.366) which is nearly unity, indicating that the f Jl(X,-X) term may be needed in the model. 5.7 ANALYSIS O F VARIANCE A BOur m E SAMPLE MEAN T he s ubject o f analysis o f v ariance is a b road o ne a nd c ontains many different facets. I n this section only certain aspects o f the analysis o f v ariance ( ANOVA) a re considered. The preceding section e mployed n o statistical i nformation a nd t hus n o probabilistic statements c ould b e made. This section uses many o f the s tandard assumptions. Assume t hat t he errors a re additive, uncorrelated, a nd n ormal a nd have zero m ean a nd c onstant variance. T he 0 2 value is unknown a nd there is n o p rior information regarding the constant parameters. T he X, values are nonstochastic (i.e., errorless). These assumptions a re designated 11111011. F or models I a nd 4 given b y (5.6.8) a nd (5.6.9), equation (5.6.4a) c an be written -2 I (Y,- Y) SST .2 _-2 - I(Y,- Y,) + l:(Y,- - SSE + Y) SSR ( ( CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION 176 Y is for Model A (or I) and Y is for Model B (or 4). The sum of squares j o n t he left side of (S.7.1a) is sometimes called the total sum of squares a nd designated SST. The first term on the right of ( is called the error s um of squares, SSE. T he remaining term is (, called the regression s um of squares, SSR. I t can be proved that SSE a nd SSR are independent. Any sum of squares has associated with it a number called its degrees o f freedom. Let the sum of squares be written as a sum of the squares of independent linear forms. (A linear form, for example, is ~ OJ Y where the j a;'s a re constants and the Y;'s are variables.) Then the number of independent linear forms is the number of degrees of freedom. T he sum of squares o f y.I - Y.I for the assumptions 11111011 is n - p for n being the number of observations and p the number of independent parameters. Hence SST h as n - I degrees of freedom and SSE has n - 2. Since SSE a nd SSR a re independent, we know from Cochran's theorem (4] t hat the sum of the degrees of freedom of SSE a nd of SSR is equal to the degrees of freedom of SST. This information can be used to obtain that which is displayed in Table S.4. Table 5.4 ANOVA Table for Partition of Variance About Y, (5.7.1) Source of Variation Sum of Squares Mean Square n -2 I . Deviation about regression line (residuals) Degrees of Freedom T his statistic can provide a measure of how much the additional paramete. 13. (i.e., using the model Y ;- 130+ 13.(X1 - X)+ £1 r ather than Y ;- 130+ £;) i~ needed. I f F is n ear unity (corresponding to R2:::::::0 i n (5.6.11)], then the two-parameter model (Model 4 ) does not significantly improve the fit c ompared to the one parameter model (Model I). T he o ther extreme is large F (which corresponds to R2:::::::1 i n (S.6.1I)]; in this case we c an be confident that the 13. p arameter is needed. A probability statement c an b e made utilizi~g the F statistic a nd a table of its distribution which could b e used to obtain the value o f F ._ a (l,n-2). See Section 2.8.10. T he p robability of F being less than F ._ a (l,n-2) is I -a o r (5.1.3) p [ F < F ._a(l,n-p)] =- I-a Alternatively we c an write (5.1.4) In words, if the null hypothesis H o :13.=O is true, the probability that the calculated value F exceeds the tabulated value is a . I f F is greater than F ._a(l,n - p), we reject the null hypothesis a t the given significance level a . I f the calculated F value is less than F ._a(l,n-p), we say that we cannot reject the null hypothesis - that is, it may be that 13. = o. s 2=SSE/(n-2) Example 5.7.1 2. Deviation bet- SSR ween the regression line and mean 3. Total deviation between data and mean IT1 5.7 ANALYSIS OF VARIANCE A Bour m E SAMPLE MEAN Using the data of Example 5.2.4 (ievelop a n analysis of variance table and determine if the P. parameter is needed. Make the probability 1% o f falseh deciding that PI is needed. Solution SST = I ( Yj - y)2 n -l Using the data from Example 5.2.4 the following ANOVA table is constructed. Sum o f Squares Degrees o f Freedom Mean Square I . Residual 5.9377 7 We n2..w wish to employ an F test to o biain an indication if the 13.(X; - X) term in Model 4 is needed. F or the assumptions indicated by 11111011, an F statistic can be given. Recall that an F statistic is the ratio o f two independent random variables, each having a x2 distribution a nd e ach divided by its respective degrees of freedom. O ne x2 statistic can be formed by SSE divided by 0 2 a nd another independent x2 statistic is S SR/ 0 2• T hen an F statistic is ( SSR/ 0 2 )/ I F= (5.7.2) ( SSE/02)/(n-2) 2. Deviation between line and mean 62.4097 3. Total 68.3474 Calculated F 0.92100 Source 62.4097 62.4097 0.92100 67.763 8 From a table of the F distribution, we find CHAPTER 5 I NTRODUcnON T O LINEAR f S11MAnON 171 co 00 Since F >Fo.,,(I.7). we reject the null hypothesis that 11,-0. I f fl, is not needed. our method has only a Ill. chance of causing us to use the model '11.110+ 11,(X,X) rather than ' I," Po" flo · The use of the F test for model building is considered further in Chapter 5.1 ANALYSIS o r vAllIANCE A BOtrr n u: REGRESSION LINE 1'79 T he number o f degrees of freedom o f the last term are given by subtraction. The various terms are labeled SS" SS.., a nd SS,; note that the terml a re not completely analogous to those in (5.7 . 1), b ut a re similarly labeled. In fact, (5 .8.1) can be used in (5.7.1) to get 6. S ST-SSE+SSR (5.8.3) - [SS.. + SS, ] + SSR S.I ANALYSIS O F VARIANCE ABOUT m E REGRESSION LINE FOR MULTIPLE MEASUREMENTS AT EACH X, where an additional summation is used in (5.7. 1), a nd then (5.8.4) Consider the case of partitioning the variation about the predicted regression line for multiple measurements at each X" F rom (5 .5.7). which applies for linear a nd nonlinear parameter estimation, we have (5.8.5) where i is defined by (5.5.4). Table 5.5 shows the analysis o f variance table for (5.8.1) in lines 2 a nd 3; the table as a whole illustrates (5.8.3). The mean square which is defined by or SS, SS.. T otal s um o f s quares· b etween data and . ' regressIOn I Ine; "residuals" (d.f. == n _ p ) + $:, SS, Sum of squares of of squares local mean about within data sets; " •• regressIOn h ne; pure error sum of + " I ack 0 f f'It sum " squares of squares" (d.f.... n - r) ( d.f.=r-p) where d.f. stands for degrees o f freedom. The number o f degrees of freedom on the left has been discussed previously; it is the total number of points minus the number of parameters. The first term on the right has the contribution f rom; = I of m, - which has I degrees of freedom; the second contribution ( i - 2) woul.d have mz- I degrees o f freedom. Hence for the first term on the right hand side of (5.8.1). the number of degrees of freedom is , d .f.= ~ ( m,-I)-n - r I -I · SS, i . our former SSE. SS.. n -r 1 Sum (5.8.2) .. (5 .8.6) $ --- Table 5.5 ANOVA Table for Partition of Variance Aboot an~ Aboot Y, (5.8.3) fl , (5.8.1), Sum of Squares Degrees of Freedom SS.. - 1:1:( Y ,,- Y,)z I t-r s~-SS.. / (It- r) 2. Lack of fit sum of squares SS, - 1:m,( Y, - r,)z r -p s !-SS,/(r-p) 3. Residual sum of squares S S,-1:1:(Y,,- r,)z I t-p s 2-SS,II(It-p) r, - Y)z p -I SSR Source of Variation I . Pure error sum Mean Square of squares 4. Sum of squares between line and mean SSR - 1:m,( S. Sum of squares S ST-1:1:(Y,,- f~ between data and mean I t-I CHAPTER S INTRODUCTION T O LINEAR ESTIMATION 180 is a n unbiased estimate of 0 2 even if the true model is not used o r if the model is nonlinear. Hence this estimate of 0 2 is said to arise from " pure error." On the other hand , 5 2 , 2 SSt 5 =-- 5.8.1 Expected Values of 52 0 2 W hen the model is incorrect, the residuals contain both r andom (qij) a nd s ystematic o r biased components (B;) which are respectively called variance a nd bias e rror components o f the residuals. A n i ncorrect model results in a n inflated residual mean square. 5.8.2 if the model is incorrect. F T est with Repeated Data F or this case o f r epeated observations, a n F statistic is ( forp=2) J for Incorrect Model l et us investigate the effect upon 5 2 of an incorrect mathematical model. We recall that eij is the residual for t hejth measurement a t X, ; it " contains all available information on the ways in which the fitted model fails to properly explain the observed variation in the dependent variable Y " [ I, p. 26] . Recalling lJ, = E ( Yij ) a nd writing eij = Y!I- Y'=(Y!I - Y, )-E(Yij - y,)+E(Yij- Y,) = {( Yij - Y, ) - [ lJ, - E ( Y, ) ] } + [ lJ, - E ( Y (5.8.8) j) ] (5.8.9) where qij = {( Yij - Y,) - [ lJ, - E ( Y; ) ] }, .81 (5.8.7) n-2 is not a n unbiased estimate of 5.8 ANALYSIS O F VARIANCE ABOUT 1 HE REGRESSION U NE B ,=lJ;-E(Y,) (5 .8.10) A B; is calIed the bias error a t X j ; it is zero if the model is correct ( E [ Y;]= lJJ. T he random variable qij has a zero mean whether the model i5 correct or not since E (Yij) = lJ; is true in a ny case. These statements regarding B; a nd qij are true for nonlinear as well as linear models. F or M odelS with the assumptions denoted 1111--11 (except that E ( Y;) = 1/; - Bi ) it can be shown for O lS a nd M L estimation that A (5 .8.11 ) which reduces to (5.8.12) where (5.2.3\) is used . I f the model is correct, the last term in (5.8. 12) disappears. SS.. I SS, [ ~-~~ F ,= I (5.8.13) SS .. where numerator a nd d enominator contain X2 d istributions if the model is c orrect; is called the mean square due to lack o f fit. T his F, value should be compared with F ,_,,(r-2,n-r). I f F ,>F,_,,(r-2,n-r), we say that F, is significant a nd we m ean that the model is i nadequate. An estimate of 2 0 using would be unbiased, b ut using S 2 o r would be biased a nd tend to yield too large a n estimate. I f, o n the other hand, F, < F '_a(r - 2, n - r), F, is said to be not significant; there is no reason to d oubt the adequacy of the model a nd b oth 'the p ure error a nd lack of fit mean squares (s; a nd s;) c an b e used as estimates o f 0 1• Moreover, S 1 is a pooled estimate o f S 2. See Fig. 5.5 for a schematic diagram summarizing the steps for checking for lack of fit with repeated observations. T he use o f t he F, statistic as given by (5.8.13) does not preclude the use o f the F statistic given by (5.7.2). T hey give different information. F, (5.7.2), c an b e used whether there are repeated measurements o r not; it tells whether p, is needed a nd c an b e generalized to investigate the validity o f a dding another o r several parameters to the model. F or cases where there are repeated measurements, the F, test can indicate if the model is s atisfactory (with no reference to adding a nother p arameter) a nd c an tell if 1 0 c an be estimated from S 2. F or r epeated measurements b oth tests should be used. With the two F tests we can have four combinations associated with ( a) significant (or not significant) lack o f fit a nd ( b) significant (or not significant) linear regression. These combinations are illustrated in Fig. 5.6 a nd the results are summarized in Table 5.6. In each case the model s; s; s; Y =Po+P.X+t:=P(,+P. ( X-X)+t: is used. ... ,. •r ', '; I a DOT 1. P rovides e stimate c f G' i f m odel c or r ecto IT 1. + b ias t e r m i f m odel i nadequate . I m ean s quare 1. S Sr s r- - - r-1. Lack o f f1 t sum o f s quares SSr T est nc s i gni [ icant . No r ea, o n t o q uestio n m odel. ,......- :--- d .f . = r -D R esidual 5 5. S St d . f. = n -p II s F r= ~ o~'.1 F, .•" . z.. 1. r -r s e \ P ure e rror 5 5 { rom r epeated m easurements S Se' d £ = n-r I -- m ean s qutre s e E q.(5.8.6 ) r -nl T est s ig n ificant. L..-. M odel i r a dequate . P rovides e stimate o f IT 1. e ven 'i £ m odel i nadequate. F Ipre 5.5. Schematic diagram for c heckingiaclt o f fit w ith repeated oblcrvatioDi. (Adapted from Applied R egreuion ADalysia by Norman R. D raper a nd H arry Smith, John Wiley & SoDS.) y •• • C ASE Z y N o l ack o f f it, S ign. l inear r eg. ( Model a dequate. 13 :j: 0 ) 1 • • • •• • • • y • • N o l ack o f f it. l inear r eg. n ot s ign. ( Model a dequate, ~l m ay b e z ero) X y X y .. • i. • • • C ASE 4 S ign. l ack o f f it ( Model i nadequate) S ign. l inear r eg. (13 :j: 0) 1 X i ... • ' .. • y • • • S ign. l ack o f f it ( Model i nadequate) l inear r eg. n ot s ign. (~l m ay b e z ero) X F Ipra 5.6. Typical s tnipt line situatioDi. (Adapted from Applied R qraaion ADalysia by N orman R. D raper a nd H arry Smith, J "... ., w a .." .... c::',...-~ , CHAPTER 5 I NTRODUCTION T O LINEAR E STIMATION 184 115 5.10 T HE STANDARD A SSlIMP110N O F ZERO M EAN ERRORS Table 5.6 Summary of Observations from Figure 5.6 O bservation Significant lack of fit F ,>F I _ n (r-2,n-r) S ignificant linear regression F > F I _ a (l.n-2) Case 3 Case 4 X Case 2 Case I X y X X F or c ase I the linear model is a dequate since there is no lack of fit a nd t here is significant linear regression. F or case 2 the linear regression is n ot s ignificant; hence the model Y= Y would be recommended. F or c ase 3 there is lack of fit, but the linear regression is significant; thus one might try f = 130 + 131 X + 1311 X 2 + (. In c ase 4 there is a significant lack of fit a nd n ot significant linear regression. A model such as Y = Po + 13 1X + 13 11 X 2 + ( w ould be recommended even though there is n ot significant linear regression. (Why?) Both tests need not be limited to testing the adequacy of the simple linear model f, = f3 0 + 13 1: ( + ( i' b ut can be applied to linear estimation with more parameters a nd even to nonlinear parameter estimation; this c an be done if there are repeated observations for the s tandard c onditions of zero mean, independent. constant variance. a nd n ormal errors. After saying the above. it should be emphasized that considerable insight can sometimes be gained in unfamiliar cases if the residuals a re p lotted a nd inspected visually. 5.9 C ONFIDENCE I NTERVAL A BOlIT T HE P OINTS O N T HE REGRESSION L INE Let us c onsider a confidence interval a bout a ny point o n the regression line (5.9.1) . .~ R egression l ine T his requires the variance of f k' which is given by (5.2.4Ia). Using this expression with (J r eplaced by s the estimated s tandard e rror is ". . est. s.e.( fir ) = I S - f n + - 2]1/2 (X/r - X) _2 ~(Xi -X) 950/0 c onfidence l imits o n Yk f or e ach X k X k X X F lpre 5.7 Confidence intervals about points on the regression line. which is c learly a m inimum a t X" = X a nd b ecomes larger toward the extremities; (5.9.2) implies that we d o n ot k now a. T he c onfidence limits for 'flc a re (5.9.3) for n o bservations o f Yi , p p arameters, a nd 1 00(1- a) c onfidence. Figure 5.7 shows the 95%, say, confidence limits for the model (5.9.1); the curved. hyperbolic lines a bout t he straight regression line give the confidence limits. These limits can be interpreted as follows. S uppose t hat repeated sets o f m easurements of f a re taken a t the same X values as were used to find the confidence limits given in Fig. 5.7. Then. of all the 95% confidence intervals constructed for 'l}1c = E ( flc) a t X". 95% of these intervals will contain E ( flc ). C onfidence intervals a nd regions for parameters a re discussed in C hapter 6. 5.10 V IOLATION O F T HE S TANDARD A SSUMPTION O F Z ERO M EAN E RRORS (5.9.2) I n the next few sections violations o f the basic assumptions a re c onsidered. O ne o f the easiest t o t reat is t he case of additive errors t hat d o n ot h ave a 1116 CHAPTER 5 I NTRODUcnON TO LINEAR E SnMAnON z ero mean . T he a ssumptions then are 10111111. W e a re concerned here with n onzero m ean errors that remain after a ny a ppropriate c orrections have b een m ade. Suppose, however, after all known corrections h ave b een made, the errors still d o n ot have a zero m ean so t hat (5.10.1 ) w here . /;'#0. L et f, b e written as two terms o ne o f which h as a z ero m ean, f i-h+Vi' (5.10.2) C onsider several functions o f h in connection with Model 2. 7Ji - P.X " w ith Xi n ot b eing the same for all i. T he first function t hat we consider is h - c, c onstant. Then Y, for Model 2 c an b e written (5.10.3) w here now the bias c is a p arameter t o b e e stimated in addition to P•. In this case a one-parameter Model 2 p roblem becomes a two-parameter Model 3 problem. I f h h appens to be proportional to Xi o r 1 ," e Xi t hen instead o f (5.10.3) we write (5.10.4) a nd t hus it is possible to estimate only the sum p. + c. A nother case is when h " cZ/ is s ome known function which is n ot p roportional to X" This reduces to a M odelS e stimation problem which involves two parameters. . ~.11 V IOLATION OF m E S TANDARD A SSUMP110N O F N ORMALITY 5.11 VIOLATION OF 1 1ft STANDARD ASSUMP110N o r NORMALITY 117 W e n ote t hat t he previously used estimators o f t he v ariances o f the p arameters a re unchanged. C onfidence i ntervals a nd tests for significance given in this c hapter a re b ased o n t he a ssumption o f n ormal errors, however; for small n umbers o f o bservations the intervals a nd tests could b e s ubstantially in error. Fortunately, for larger s ample sizes a nd p rovided the distribution is n ot r adically n on n ormal, the confidence limits a nd tests o f s ignificance c an b e u sed as r easonable a pproximations. I f t he form o f t he underlying probability density o f t he e rrors is known, then the maximum likelihood a nd m aximum a p osteriori methods c an b e used. F or e xample assume t hat a ll t he s tandard a ssumptions a pply e xcept t hat t he p robability o f t , is given b y (5.11.1) T hen the M L f unction to minimize is S ML - " ~ I Y, -7J;/ (5.11.2) I -I U nfortunately, minimizing m easurement e rrors. S ML is n ot a s simple a s i t would b e f or normal E xample S .ll.I '1,- For Model I, flo, estimate fJo for the data as given below. Assume that the assumptions 11110111 are valid and that f(~) is given by (5.11.1). ( a) Y .-O, Y 2-1. (b) Y .-O, Y2-O.5, Y ,-I. ( c) Y .-O, Y2-0.25, Y,-0.5, Y4 -1. ( d) Generalize the results. Solution ( a) For the observations Y .-O and Y2 -I, I f t he s tandard a ssumptions excluding t hat o f n ormality are valid (11110 I II), o rdinary least squares estimation c an still be used. T he resulting least squares estimators are unbiased a nd h ave minimum variance a mong all linear unbiased estimators, b ut they are n ot efficient. A c onsequence o f the central limit t heorem is t hat the least squares estimators are consistent a nd a symptotically efficient almost regardless o f the d istribution of the errors, however. Hence when the normality assumption is n ot justified. least squares estimators still retain most o f t hdr d esirable p roputies. A plot of S ML versus fJo shows that S ML has a minimum between 0 and I . In that range S ML is equal to I. Thus there is neither unique minimum nor parameter estimate. (6) For the three observations of 0, 0.5, and I, a plot of S ML versus Po gives a minimum value of S ML also equal to I a t bo-0.5. ( c) For this case a plot of S ML versus fJo shows that a minimum occurs between bo·0.25 and 0.5. CHAPTER 5 I NTRODUCTION T O L INEAR E STIMATION 188 ( d) F rom the pallern of the answers obtained. it a ppears t hat there are two possibilities; one is (or a n even n umber o ( o bservations n a nd t he o ther is ( or a n o dd n umber. Let Ihe r, values b e o rdered so that the s mallest )', value is r l • t he next larger value is Y 2' etc . T hen for n even. the b o v alue is l ocated b etween Y n / 2 a nd } 'n/2+ I ' F or n o dd. bo is e qual to } '(n + 1 )/2" A nother example with o ther t han the normal distribution is given in Section 4 .9 in c onnection with M onte C arlo m ethods. 5.12 V IOUTION O F T ilE S TANDARD A SSUMPTION O F C ONSTANT VARIANCE W hen V (f,)=o,2 varies with i. o rdinary l east s quares e stimation does n ot yield minimum variance estimators. M inimum v ariance estimators c an b e obtained. however. using maximum likelihood. These estimators for o neand t wo-parameter cases a re given in S ections 5.2 a nd 5.3. T he effect upon the estimator(s) c an b e investigated for m any a,2 f unctions . Assume t hat the s tandard a ssumptions (11011111) apply in this section where two possible functions a re c onsidered. F or i llustrative p urposes, the o ne-parameter case. Model 2. which is 1), = ~IX" is used. T he O LS a nd ML estimators a nd v ariances are _ 2 V(bl.M">-(~X,(J, ) (5.12.2) In the case of the M L e stimator a nd v ariance the q uantity Z, = X;/ a, c an be considered as a modified sensitivity coefficient; Z; plays the s ame r ole as X; when OLS is used with all t he s tandard a ssumptions being valid. s ome s ituations a re Before investigating some cases of n onuniform suggested where nonuniform m ight arise. E rror v ariances tend t o increase with the amplitude o f signal ( or o bservation). When the response o f Y; varies over several o rders o f m agnitude-say, from 0.001 to lOO-the a ccuracy of the measuring device(s) is r arely c onstant. F or small signals the errors usually are even smaller; for the large signals the s tandard d eviation of the errors may be the same small fraction of the signal. b ut the actual error may be many times the value o f t he smallest signal. F or 0; . j! 2 -1 " The eslimalor bo conrorms 10 a;, Ihe definilion or Ihe median given in Section 3. \. \. 5.11 T HE S TANDARD ASSUMPTION O F C ONSTANT VARIANCE 189 e xample, s uppose t he voltage o f s ome device, s uch a s h eat flow meter, varies from 0.00001 t o 0.1 V in a series o f o bservations. ( Another device h aving l arge variations in o utput is a thermistor, for which the electric resistance varies greatly with temperature.) I n o rder t o m easure s uch a range, a digital voltmeter with several full scale settings c ould b e used. O ne r ange might go u p t o 0.001 V, a nother r ange might be used f or 0.001 to 0.01 V, a nd s o o n. T hen f or readings n ear 0.001 a nd 0.01 V t he percent a ccuracy m ight b e t he s ame; n ote t hat this infers a v arying a;2 t hat is a pproximately p roportional t o T/}. 5.12.1 Variance of E; Given by a/=(X;/~)2a2 O ne p ossible variation o f a } is a /=(XJ ~ia2 w here ~ is s ome q uantity with the same units as X;. T he O LS e stimator is unaffected, b ut t he variance of b •. O LS b ecomes (5.12.3) T he b •. M l e stimator a nd v ariance b ecomes (5.12.4) N ote t hat the variance o f b l • Ml is a simple expression, b ut t hat for O LS is not. I n o rder t o m ake a c omparison l et X ;= ;~. O ne c an d erive the following s ummation e xpressions ± ± ; 2= n (n+ 1 )(2n+ I) i _I 6 ; 4= i _I n (n+ 1 )(2n+ 1)(3n 2 +3n-l) 30 w hich yield for the stipulated a} t he expression for V (bl,OlS) o f 6 (3n 2 + 3 n - I ) a 2 l OlS V (b . )= 5 n(n+ 1)(2n+ 1)152 F or l arge values o f n this expression reduces to 9 0 2 /5n8 2• H ence f or large values o f n , t he O LS e stimate f or this M odel 2 c ase with a;2=(XJ~)2a2 h as a v ariance o f b l w hich is 80% l arger t han t hat o f b l given b y M L. T his · 190 CHAPTER 5 I NTRODUcnON T O LINEAR E SnMAnON means that ML estimation is substantially superior in this case to OLS estimation. O ne further benefit of the maximum likelihood (ML) method of estimation is that it can be used to provide an estimate of a 2• This can be accomplished by replacing a 2 in (5.3.1) and (5.3.2) by ( X;! /i)2a 2, dirferenti,2 . 2 -2 dbating (5.3.1) with respect to a , and then replacmg a by a an TI/ y Y; to get (5.12.5) which is a consistent, asymptotically efficient, and biased estimate. A commonly occurring case is ror the standard deviation of the error to be proportional to the dependent variable T lj' In terms of the variance or l j' this can be expressed by (5.12.6) The OLS estimator is the same as usual, but the variance can only be approximated. For our purposes it is permissible to replace E ( Y;) = Tlj by Yj , the regression value for OLS; then let (5.12.7) In ML estimation the a; = a"r,} relation makes the problem nonlinear 2 because the parameters appear in both the denominator and numerator of SML given by (5.3.2) and also in the In a} term contained in (5.3.1). A suggested procedure to get approximate ML values is to first solve for the parameter(s) using OLS and so obtain ~pproximate values of YOLS ' These j• are then used to approximate 0,2 as a 2 yj • OLS in the ML estimators such as (5.12.2a). V IOLATION O F S TANDARD A SSUMPTION O F U NCORRELATED ERRORS In the past decade there has been widespread use of automatic digital data acquisition equipment in connection with dynamic experiments. Transient temperatures have been measured, for example, by using such equipment .'1 to digitize the response of thermocouples. However, measurement error! tend to become correlated as the high sampling rate capability is used. In such cases the standard assumption of independent observation errors b n ot valid. One might also obtain correlated measurements by testing the sam( specimen using the same sensors for different ranges o f the independent variable XI' Examples are measurements for a particular steel specimen at different temperatures for a property such as thermal conductivity, electric resistance, o r hardness. The standard assumptions of zero mean and uncorrelated measurement errors given by (5.1.7) a nd (5.1.9) result in for 5.12.1 Variance of l ; Equal to a"r,l 5.13 5 .U STANDARD ASSUMP110N OF UNCORRELATED ERRORS (5.13.1) i+1c When this equation is n ot true many descriptive terms have ~en used; these terms include colored, correlated, not independent, a nd dependent errors. Some specific types of correlated errors are called autoregressive (AR), moving average (MA), a nd autoregressive-moving average (ARMA). Only AR errors are considered in this section. For further discussion see C hapter 6. Let us consider a case with additive, zero mean, autoregressive errors in Y/, There are no errors in the X/so We can then write (5.13.2) E (YAIJ)=Tlj T he measurements errors are described by the model =al for i =j (5.13.3) 0 for i +j which is called first-order autoregressive since the error l / depends on the error l ;_1 which is for the preceding time. (Second-order errors would depend o n two preceding times, etc.) In the following analysis the p; a nd a} values are assumed to be known. There is n o prior information. The associated assumptions are designated 1102-111. Rather than using the direct matrix maximum likelihood approach of Chapter 6, we shall attempt to construct some sums of squares of terms that are uncorrelated and have constant variance. In other words a transformation is to be used to obtain modified measurements for which the assumptions 1111-111 a re valid. Then write (5.13.3) a t time i a nd i -I as (5.13.4a) Y; . . TI; + p ;l; _ 1 + u; = Y ;_I = TI;-I + l;_1 (5.13.4b) J91 CHAYTER 5 I NTRODUcnON T O LINEAR FSTIMATION 5.14 E RRORS IN INDEPENDENT AND DEPENDENT VARlABLfS 193 t hat is subject to n equality constraints, Multiply (5.13.4b) by Pi a nd subtract from (5.13.4a) to get j Yi - Pj Y - , = (TJi - PITJj - .) + uj (5.13.5) Define the transformed observation F; a nd model H; as (5.13.(ia, b) Then analogous to (5.13.2) a transformed model is (5.13.1) where the model value F; is now independent from other fj (j~ i ) values. Notice that the term u; divided by (1; has a variance of unity for all i's. This suggests that a sum of squares of independent, constant variance terms can be constructed from the uJ (1; values, o r ; = 1,2, . .. , n where m > n a nd the 1ft; are differentiable functions. Since the m variables a"a2, . .. ,a", must satisfy n constraints, there are in effect only m -n independent variables. A stationary value of J(a" . .. •a",) requires that aJ aJ d J=-da + . .. + -da = 0 a a,' aa", '" but the differentials da; are not independent. the n differential relations Th~ (5.14.3) alftll (5.13.9) I t is important to note that (5.13.8) has been derived without restricting the problem to cases for which TJ is linear in the parameters; hence i t can be used for linear and nonlinear cases. In Chapter 6 it is shown that the function given by (5.13.8) must be minimized for ML estimation if, in addition to the assumptions given above, the errors u; are normal. (5.14.2) constraints (S.l4.1) imply (5.13.8) where F; a nd H; are given by (5.13.6) provided (S.14.1) alftll alftll - da + -da + . .. + -da = 0 aa, ' aa2 2 aa", '" A direct method of solution can be illustrated by a simple case. Suppose m - 3 so that (S.l4.2) becomes aJ aJ aJ d J=-a d a'+-a da 2 + - d al-O aa3 a, a2 (5.14.4a) Let there be only one constraint so that n = I a nd then (5.14.3) gives 5. •" E RRORS IN INDEPENDENT AND DEPENDENT VARIABLES A nother violation of the standard assumptions is that of the independent variables. designated Xij in this chapter, being stochastic as well as Y;. In order to present a method of solution that can be generalized to complex situations the method of Lagrange multipliers is introduced in this section. F or the simple example to be given it is not required, but this method of solution is illustrated. Before giving the example, the method of Lagrange multipliers is presented. 5.14.1 Method of Lagrange Multipliers We consider the problem of finding a stationary (a relative maximum or minimum) value of the continuously differentiable functionJ(a •• a2, . .. ,a",) .. ........ ~ a1ft , a1ft , alft. - a- da .+ - a da 2+ - a-dal=O a, a2 al (5.14.4b) which could be solved for dal' say. This expression substituted for dal in (5.14.4a) then would give dJ=( . .. )da, + ( ... )da2 (S.l4.4c) where the two different expressions in the parentheses are set equal to zero because the da, a nd da2 terms can now be arbitrarily assigned. These two equations coming from the parentheses in (5.14.4c) plus 1ft,'""0 would provide three equations for the three unknowns, a" a 2, a nd a l' An alternative procedure is called the Lagrange multiplier method. This method is introduced using the same example of m = 3 a nd one constraint. CHAP1l:R 5 INTRODUcnON TO LINEAR t snMAnON . 194 Multiply (5.14.4b) b y". a nd add the results t o (5.14.4a). Since the right-hand members are zeros, there follows for a n arbitrary value o j " •. N ow let " . b e d etermined so that one of the parentheses in (5.14.5) vanishes. Then the two differentials multiplying the remaining parentheses can b e a rbitrarily assigned a nd h ence these two parentheses must also vanish. Consequently we must have (5.14.6a) 5••4.2 Problem o f £ noon In the IDdependent and Dependent Variables A p roblem which is nonlinear even though the model is linear in the parameters is t he estimation o f the .parameters in the presence o f e rrors in the independent variables a s well as the dependent variables. T he p roblem is formulated in this subsection a nd t he solution o f a simple case is considered in the n elt. C onsider first the dependent variable Y/ which is related to the model by (5.14.8) Y /-,,;+ly, a nd t hus the error lr: is additive. Also let ly, h ave a zero mean. b e i ndependent from l y 'for i "pj, h ave a normal probability density with • J k nown v anance terms, o r for E (ly,)-O, (5.14.6b) (5.14.6c) T hen these three equations, (5.14.6a,b.c) plus the constraint 4 1.=0 comprise four equations for solving for the four unknowns a ., a 2• a), a nd " •. T he q uantity". is k nown as a Lagrange multiplier. T he i ntroduction o f these multipliers frequently simplifies a nd organizes the relevant algebra in minimization problems with equality constraints. I t is i mportant to n ote t hat the conditions given by (5.14.6) a re equivalent to requiring that J+>'.cfI. be stationary without a ny further constraints being imposed. Applying this observation to the more general problem given above suggests that be extremized with respect to a •• a2, . .. ,a",. H ence the following m e quations m ust b e satisfied. ilJ - ;- + o a; /I ilcflj ".J-;- -0, j-I o a; ~ i -I,2• ... , m E ( l~,) . . o~, a nd i "pj, is n ormal ly, (5.14.9) These assumptions a re d esignated 110111--. W ith this information the probability density o f ly"lyJ, . ... ly. is J(ly" . .. , ly)11/2 I e l P[ . (2'11') O y" • Or• I· t f l~0~2] (5.14.10) j-I T here are also errors in the independent variables XI} which are described by E (lX ) =0. • E(lX.lX,,) - 0 e lcept w hen i - k (5.14.11) a nd j -I. 0: a nd EX h as a normal density. T he values a re a ssumed to be known. T he va1ue XIJ is measured a nd is the· t rue value of XI}' T he e rrors ly, a nd lX a re considered to b e i ndependent for all values o f I. j . a nd k. A~alogous t o (5.14.10) we c an write 4 (5.14.7) along with the n c onstraints given by (5.14.1). T hus (5.14.1) a nd (5.14.7) constitute a set o f m + n e quations for the m + n u nknowns 195 5•• 4 E UORS I N INDEPENDENT AND DEPENDENT VA RIAau:s J (lx , . .. " ,lX• )- (2'11')"P/ 1 e l P [ - -2 2I . O x . .. O x " f f l:.Oi,l] (5.14.12) j -I i -I O wing to the independence o f t he lr: a nd lX errors. the m aximum likelihood method o f e stimating the p a;ameters ~equires t hat the product CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION 196 of (5.14.10) and (5.14.12) be maximized with respect to the parameters 1/1' . ... 1/n'~II . .... ~1\P values. This is equivalent to minimizing 197 5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES (5.14.16) for any model, linear o r nonlinear, can be written as fJ l'fJ 2' . ... fJp and the P n k =I, . .. , p (5.14.18a) ; = I , . .. , n (5.14.18b) n S(1/.~)= ~ (Y;-1/iO~2+ ~ ~ (Xy'_~ij)2ox/ (5.14.13) j -I i - I i-I with respect to fJl'''''~1\P or a tot~1 of (~+f+np) parameters. This will produce the estimates b l ,b2 ... , bp , Y I, .. ·, Y n,XII,,, , ,XI\P ' The 1/i' fJle' a nd ~ij • values are not independent. however. and must be related through the model for 1/; which can be written as the equality constraint (5.14.14) g a. - (;Ie - X;Ie) ax- 2 - A~;1e'/·,.b, . .... X.. . = 0 X ,. ;a- Y A where (5.14.18c) applies for ; = I, . .. , n; k = I , . .. , p. In addition to the equations given by (5.14.18) there are the constraints g ;=O which, for the linear model considered in this section, are equivalerat to which applies for i = 1,2 . ... , n. The method of Lagrange multipliers involves minimizing the function (5.14.15) with respect to parameters f3 1, fJ 2, . .. , fJp. Necessary conditions for a minimum are aL I as -=--+ af3" 2 afJ" ClL = .!. as + a1/; 2 Cl1/, L AClgj -=O, j _ 1 } ClfJk k =I,2, . ... p (5 .14.16a) ± ag ; = 1.2 . .. . ,n (5.14.18c) ; = 1,2, . .. , n (5.14.19) Then (5.14.18) and (5.14.19) provide p +2n+np equations for the same number o f unknowns which are b l, . .. , b" YI, . .. , Y", AI,· . . ,A", X II,,,,,XI\P' Consider first (5.14.18) without introducing the assumption of a model linear in the parameters such as (5.14.19). Then in general (5.14.18b) yields A. =( Y, ., Y.)a y 2 I (5.14.20) I (5.14.16b) n A) j_ I j = 0, Thus the Lagrange multipliers are weighted residuals. Introducing (5.14.20) into (5.14.18a,c) eliminates A; a nd gives Cl1/; k = I , . .. , p i = I, . ... n ;k= I. .... p (5.14.2Ia) (5 .14.16c) ; = I, . .. , n;k= I, . ... p The expressions in (5 . 14.16) are all evaluated at f31 = b l • · . . • f3p = bp • 1/1 = • 1/ t . • t. Y I '··"" = Y' ~ ll=X II ...~"" =X"P . I t is important to note that f t' s = S (1/1 . .. · .1/n'~II'·" .~np) "., ..- (5.14.17a) gj=g)(~.f3I.f32. ···. f3p'~I'~2'···'~jp ) (5.14.17b) (5.14.2Ib) Thus S is not an explicit function of the parameters f3 1. .. · ,fJ,· Then Hence, for the general nonlinear case. (5.14.21) a nd a set of constraint equations, g; = O. can be solved for the p + n + np unknowns of b l . ... , y l . .. ·.XII' ... Let now the linear model a nd its constraint, (5.14.19). b e used. Then CHAPTER 5 INTRODUcnON TO LINEAR E SnMAnON ·91 .o... (S.l4.2S) into (5.14.22a) gives the nonlinear equation. (S.14.21) can be given as f (lJ- f blXj; )0~2Xjlt ... 0, ' 00 ) -1 t! .99 5.'4 ERRORS IN .NDEPENDENT AND DEPENDENT VARIABLES k = 1,2, . .. , p I (S.14.22a) i -I ~+ lJba J0 - 2 XJ + lJba - 0 [ YJ - b 1+ b 2a y (S.14.26) I + b1a F or convenience let i"'" 1, . .. , n;k= 1, . .. , p S yy-IY/. S xx-Ixl (S.14.22b) which comprise p +np equations for the unknowns b" . .. ,b" XII'''''X"" N otice that, even though the model is linear in the parameters PI,· .. ,p" the solution of (S.l4.22) is nonlinear and thus is not straightforward. O ne way to start is to note that (S.14.22b) for fixed i provides a set of linear equations for X ;I''':'X;, whiCh can be solved in terms of the bl, . . · ,b, values. When the X j; values are substituted into (S.14.22a), a set of p nonlinear equations results for the unknowns h I"'" h,. T he simplest case is for p " I, which is considered next. 5.14.3 As a n example of the above procedure consider a case involving model 2, TJi - P(;, where there are errors in both the dependent variable TI, a nd the independent variable (, Y;=TI;+ t y, (S.14.23a) X j =t+tx, (S.14.23b) 0; Let the assumptions given above for t y a nd t x apply except let o~ a nd _be the constants o~ a nd ai, r espectively.' ., We can obtain the solution for h, an estimate of P, through the use of (S.14.22a, b). Using first (5.14.22b) gives which can be solved for X, • [X + Y;ha] , [ I + h 2a] where a == 0; / 0:. (S.14.24) to obtain X-= N ote that j X; , i = I , . .. , n (S.14.27c) a nd then (S.14.26) c an be e xpanded to SXY + h aSyy + b1aSXy + bla1Syy - bSxx + 2b2aSXY +b a Syy (S.l4.25) is a nonlinear function of h. Introducing l 2 which can be simplified to (S.14.28) which in turn can be solved for b, a S yy - Sxx± [ (aS yy - Sxx )2 + 4 aS;y]'/2 b = - -----=------:-_____------~ 2aSXY M odell ( TI;-Pt) Example with E nors In both TI, and (; (Xi - X; ) 0;2+( Y; - hX; )hOy2=0 (S.14.27a,b) (S.14.29) T he positive sign is chosen in the ± sign in (S.14.29) because then the estimate will cotlVerge to the correct value of S Xy/Sxx when a=o:/o~-+ O. I f a -+oo, b approaches S yy/SXY' Equation S.14.29 also gives b = S u/ S xx for all values of a if it happens that S u/ S xx is equal to S yy/ SXy o r in other terms, S xxSyy - S;y = 0. I n ordinary least squares estimation involving Model 2, we d o n ot permit S xx to be equal to zero. I f SXy is equal to zero, (S.l4.28) gives h =O. A fter h is calculated using (S.14.29), the estimated values Xi c an be obtained from (S.14.2S). Observe that a different XI is calculated from (S.14.2S) for each i value if the Yi values are different even if the XI values are actually the same. Physically a given Xi value may not be known precisely, b ut it may be known that it is c onstant for several measurements. However, if this is the case the assumption of independent errors in each Xi is violated. Hence another analysis is required for this special case of repeated YI values a t precisely the same XI value. Example 5.14.1 Consider a case involving Model 2, with errors in either Y, o r X, o r both. that satisfies the assumptions given above in Sections 5.14.2 a nd 5.14.3. T he data are CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION 200 S .'4 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES given below. x, L east S quares o nX(a- co ) Y, + ( XI' Y I ) 1 -6 -I I I 2 101 Y Let 6 be a positive value. Also investigate the case for 6-+0. ( a) F ind b . X,. a nd Y, for Ox = Oy . ( b) F ind b and Y, for Ox = 0 . ( e) F ind b and : (; for Oy = O. 0 L east S quares o n Y (6 - 0) Solution T o find the b values (5 .14.29) can be used. Hence find S xx, S yy. a nd SXY from (5.14.27) to be S xx,..2, S yy=2-26+6 2, a nd S Xy=6. ( a) In this case cr = I a nd (5.14.29) gives -I -I X 0 Flame S.8 Predicted lines for errors in dependent and independent variables for data points ( -I, I) and (I, I) for Example 5.14.1. [ - 2+6+(8-46+6 2 )1/2] b= -=------;;:2,...---~ 2 1/ 1 . .. 0 .4142 I f 6-+0, . b-o - I + \ 36. T he X; values are found from (5.14.25), • Xj + Yjb X=--, 1 + bl E xample 5.14.2 is equal to b:(,. F or 6 -.0 we o btain : (1=-0.5. YI = - 0 .2071067 a nd Xl"" 1.2071067, Y2= 0 .5. F or O x'" Oy = I the sum S given by (5.14. 13) is precisely 2. ( b) This is the usual least squares case a nd b = S xr/ S xx is e qual to 6 /2 . T he Y, values are YI '" - 6 / 2 a nd Y1 = 6 /2 . F or 6 = 0, the values are zero; hence the predicted line is Y, = O. Again the minimum S for 6 = 0 is 2. ( e) F or this case b =Syr/Sxr=(2-26+61)/6. T he X, values are found from a nd Y, • X .. " A case for which the predicted lines are much closer together is w hen 6 . . I c ausing YI - O with X I = - I a s shown in Fig. 5.9. I f 6 were equal to 2 so that YI , .. - I, t hen for a ny a s uch t hat 0 <: a <: I t he predicted lines a re all the same. Y, YI 6 b N ear a wall over which a turbulent fluid is flowing, the velocity is a l inear function o f position. Let the velocity (in c m/sec) b e designated u a nd the distance from the wall (in cm) be designated x . T he below d ata were taken from Fig. 6.20 o f Kreith 2 -26+62 = - = - ..-..:- -:- F or 6 -.0, b -oco a nd XI = : (2=0. Unlike part (b) the predicted line is now the vertical axis for : (; = 0 for all i . T he m inimum S is again ~ . I t is instructive to examine the predicted lines for each o f the cases above. See F ig. 5.8 for 6 -00+. N otice that the usual least squares case ( a = 0) has the predicted line o f Y'"' 0 ; the YI = I , X 1 = - I obse~vation is replaced with Y = 0 a nd X 1 = - I, I a nd Yl = I, X 2 = I is replaced with Yl=O. X l = I. T he case for a -oco h as the vertical predicted line o f X = 0; the two observations are replaced by the single point Y1 = Yl = I with X = O. F or the cr = I case, the predicted line is inclined as shown. I t is thus clear that the three cr values can yield quite different predicted values. In other words it c an make a large difference in the predicted line whether the errors are in Y o r X o r b oth . This case shown in Fig. 5.8 is a n extreme one, however. because many times the predicted lines are quite close. y L east S quares o n X (a-cu) o _ IL-__ ~ -I _ _____- L_ _____ o ~L- ____ ~ x Flame S.9 Predicted lines for errors in dependent and independent variables for data points ( -1,0) and 1.1) for Example 5.14.1. CHAPTER 5 I NTRODVcnON T O UNl:AR E S11MAnON l 2 3 4 (cm) v (cm/sec) 0.0112 0.0162 0.0215 0.0310 I f iX 0.0003 0.0003 0.0003 0.0003 80 125 165 235 The models for the true velocity II and the true distance f lU ( em/sec) • x, - 3 3 3 3 oX are 1 1- fJx X,+ V,M 1 + b 2" After values are calculated for ,I I Solution In each of the cases I I and V are analogous to " and Y. and JC and X to Eand X in the notation given in this section. With this in mind let us then evaluate S yy. Sxx' S;ry . a nd a in (5.14.29). 4 S yy- ~ V l_(80)2+(I25)2+(I65)2+(235)2_ 104475 X,. li, - - X, + 1.5947438 x I O-'V, 1.5768013 the values for 0, are found using bX,-7594. 7438 X, The resulting values are given below. x, X -x+rx where V and X are measured values and fJ is a parameter which is proportional te the shear stress at the wall. Estimates for fJ are to be obtained using ( a) the above information; ( b) f lx-O and 0u is unknown; and •• (e) Ox is unknown and 0 u-O. Also calculate the V and X values for each case The assumptions indicated in Section 5. 14.2 are valid. l ilt ( a) F or the above values the parameter fJ is estimated using (5.14.29) t o b b-7594.7438/sec. The values of X, are obtained from (5.14.25) to be (5). Estimated values of (Ix and (lu are also given. X (em) 5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES I 0.01096 2 0.01629 0.02158 0.03098 3 4 ~ 83.21 123.75 163.91 235.28 (b) This is the usual least squares analysis for which b - SXy/ S xx-7593.8778 The predicted or regression line is now 0 ,- bX,. The values for X, and 0, are tabulated next. X, I 2 3 4 0, 0.0112 0.0162 0.0215 0.0310 85.05 123.02 163.27 235 .41 i -I 4 S xx- ~ X l-(0.01l2)2 + (0.0162)2+ (0.0215)2+ (0.031)2 i -I -0.00181113 4 S Xy- ~ X ,V,-0.0I12(80)+0.0162(125)+0.0215(165)+0.031(235) i -I -13.7535 a _ o} _ (0.0003 )2 - 10 - 1 o~ 3 (e) In this case the role of X a nd Y are interchanged in the least square~ analysis. Here b - Syy/ S;ry-7596.2482; X, is obtained from X,- V,fb; and V, is the measured value. The results of the calculations are as follows: X, I 2 3 4 V, 0.01053 0 .01646 0.02172 0.03094 125 165 235 80 A comparison of the b values in this example reveals t hat there are somt differences b ut they are very small-the largest difference in the b values is 0.03% This case is more common than that shown in Figs. 5.8 and 5.9 where the predicted lines are quite different. Because the curves are so similar in this example. only thr lower two points are shown in Fig. S.IO. There are negligible differences in t ht curves that can be drawn between the three sets o f predicted points. CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION PROBLEMS PROBLEMS +0 t:. S .l Prove using the s tandard a ssumptions that S.2 a nd indicate which assumptions are used. Show that 1 20 1 10 U. 1 a nd + D ata 1 00 1\ S.3 • L. S. on Ui o L . S. on Xi Ui ( ern/sec) a nd use to show that (5.2.37) follows from (5.2.36). W hat is the expected value o f e/ for O LS e stimation when the followin! a ssumptions apply? S.6 6 90 t:. 80 Y;"" E (YMn+E;"!J;+E;-PX/+Ej E «(;)-o*O Y(X;)""O v (p)=0 Prove (5.2.43) for YiJ" Po + PtX; + PIX? + P lX/ + (; when using OLS. F or Model 5 what weighting functions for maximum likelihood estimation would cause the sum o f the residuals to be equal to zero? Assume that the assumptions designated 11011111 a pply. Show that the minimum value of S for y; . . Po+ PtX;+ ( j is S.7 T he following d ata a re given ( 0) ( b) ( c) ( d) E rrors i n b oth U i a nd X i S.4 S.5 I I~ 75L---------~--------~--------~--------~ . 01 . 0lZ . 014 . 016 . 018 1\ X i a nd X i ( ern) FllUre 5.10 Predicted line for errors in d ependent a nd i ndependent variables for Example S.14.2. REFERENCES I. 2. 3. - :::... 4. S. Draper. N. R. a nd Smith H .. Applied Regre55ion Ana/Y5i.f. J ohn Wiley a nd Sons. Inc .. New York. 1966. Burington. R. S . a nd May. D. C . Handbook o j Probabili/y a nd S/a/istic5 wi/h Table5. 2nd ed .. McGraw-Hili Book Company. New York. 1970. Box. G. E. P. a nd Tiao. G. C . B ayuian InJerence in S/a/is/ieal Ana/y5i5. Addison-Wesley Publishing Co .• Reading. Mass .. 1973 . Brownlee. K. A .. S/a/i5/ica/ Theory a nd Me/hodology in Science a nd E"ginteri"g. 2nd ed .• John Wiley and Sons. Inc .. New York. 1965. Kreith. F .. Principle.f o j Hea/ TramJer. 3rd ed .• Intext E ducational Publishers. New York. 1973. 2 2 2 I 4 4 3 3 3 3 5 5 6 Assume that the s tandard a ssumptions apply. ( 0) F ind estimates o f the parameters in y j Answer. ,.. Po + PtX/ + ( j. 0, I . (b) F ind estimates o f the parameters in Y ;=po+Pt(X;-X)+E;. Answer. 3, I . ( c) G ive the residuals ej • ( Do they a dd u p to zero?) ( d) Estimate the variance o f f ;. PROILDts cHAPTER 5 INTRODUcnON TO LlNtAR £S11MAnON Answer. a ture a nd f t o the yield: 1.333. ( t) Give the estimated standard error of boo '1 : ABIWft. X, f, 1.211. f ( f) Give the estimated standard error of be,. o -4 -3 -2 4 3 6 I,, ( b) Answer. 0 .365. (e) ( h) Give the estimated covariance of bo a nd b,. ( d) ( t) - 0.4. S.l1 ( i) Give the estimated covariance of bi, and b, . I 2 o o 9S 6 o o 10 100 o 110 5 .oS 40 4 3 f, 90 7 10 SO 8 9 10 5S 10 4S 10 10 60 Consider the model I 40 0.32S 2 50 0.332 3 60 0.340 4 70 0.347 5 80 0.353 6 90 0.359 7 100 0.364 Assume that the standard assumptions apply. ( 0) Estimate Po and p, for the model ( 0) Derive an unbiased, minimum variance estimator for p; ( b) Give an unbiased estimate o f the variance o r (, ( a 2 is unknown). S .1l Repeat Problem 5.11 for the model f ,-pxl+(, Assume that the standard assumptions apply. Answer the same questions as in Problem S.7. The following values have been reported for a certain set of experiments. X, f, 5 11 where the standard assumptions apply for " . The following data are given i 4 12 Y ,-p+" i X, S.9 3 13 Y,-Po+P,X,+(, estimate Po a nd PI ' What is the prediction equation? Construct an analysis o f variance table. Let the null hypothesis be t hat p,-O with a risk 0 .05. What are the 95% confidence limits for PI? W hat are the confidence limits a bout'l, a t X -3? Are there any indications t hat another model should be tried? Aaswer. O . S.8 2 12 0 ( 0) F or the model V ( g) Give the estimated standard error of b,. 1 918 -I Assume that the standard assumptions apply. ,I, Answer. O 16. .S Answer. -5 f ,-Ilo+ PIX, + ',. Answer. 0.2997,0.000657. (b) Estimate variances for bo a nd b, . Answer. 2.38 x 1 0- 6• 4.49 x 1 0- '0. ( c) Calculate t , and plot. ( d) Are the residuals correlated? ( t) Based on the conclusions of ( d). are the estimates given in ( b) valid? ( f) How could the model be improved? S.10 A study was made on the effect of temperature on the yield of a chemical process. The following d ata were collected with X linearly related to temper· 5.1l Repeat Problem 5. I I for the model Y, - psinX, + (, S.14 Use the ft column o f T able 5.1 as d ata ( that is. f ,- -0.742. f 2 - -0.034. etc.) a nd use the model or Problem 5.11. ( 0) Estimate p. ( b) Estimate a. 5.15 Repeat Example 5.2.4 with t he" values replaced with nine consecutive values o f a column o f Table XXIII o f reference 2. The column is to b e the ' one corresponding to your birth date a nd the first value used in the column is to correspond to the birthday month. For example, if your birthday is March 14. then pick the fourteenth column a nd s tart with the third entry since March is the third month. S.16 T he temperature 01 a nuid nowing over a plate is ncarly linear ncar the plate. Let f be proportional to the temperature and X be the distance from the wall. The following results are obtained: _ X-O.OS, I (X,- X)(f,- f)-BO, I (X,-X) 2 - 0.016. I (Y,- f)2 - 8320. f -300, CHAPTER ~ I NTRODUcnON T O LINEAR ESTIMATION Assume that the model Y,=/JO+/JIX, +(; and the standard assumptions apply. PROBLEMS a nd ( 0) Estimate /Jo a nd /JI. ( b) Prepare the analysis of variance table. 5.17 Show that ~7_I(X; - ii is maximized for n even with - R < X; < R by choosing one-half of the X; to be - R and the other half to be at R . 5.18 Derive an expression for cov( Yi • Y,). thus proving (5 .2.29). 5.19 Derive the expression for V (bl-/JI) given in (5.4.12). 5.20 Modify the analysis given in Section 5.5 to obtain estimates for /JI a nd P2 in Model 5 when maximum likelihood estimation is used for for Show that b l - PI can be put in the form b l-PI- PIA 1 + P2 A 2+ Ie,B, lJ. +C where B, . . ( Z'I [ 221- Z,2 (121)a,-I. e is n ot a random variable ; +k o rj+/ a nd whatever other standard assumptions are needed. 5.21 Consider MAP estimation for a random parameter for Model 5. Let the standard assumptions implied by 11011110 be valid. All the measurements are taken from the same batch. The random parameters PI a nd /J2 have the joint density Derive the given expression for V (b l - PI). (e) T he expressions given in ( a) a nd ( b) can also be applied for the case of subjective prior information. Reinterpret the meaning o f P I' P l. VI' Vl • Vu. b l' b2• V (b l - PI). V (b l - Pz}. a nd cov(b l - - PZ} for this case. 5.22 Before measuring the thermal conductivity of a particular steel alloy. a research engineer has developed from experience knowledge relative to · values for steel alloys in general. The thermal conductivity over a limited range of temperature can be described by the regression m odel",,- PI + P1XI where XI is temperature in dc. This prior information regarding PI a nd P l can be described by ! (PI.Pl) given by that in Problem 5.21 with 1 '1-38. 1 '2- -0.01. VI -2. a nd VI,- -0.001. Assume that the standard assumptions designated 11011113 apply. Using the results of Problem 5.21 find bl' b ,. V (b l - fll)' V (b,- P2)' a nd cov(b l - P I .b2- PZ} for the following data: V,-IO-'. The quantities VI' V2• and lJ./l must be greater than zero. ( 0) Derive f or;= I a ndj=2 or i =2 a ndj= I. X; ( 0C) lJ.:=[II)[22)-[12)2 c" == ~ Z "Z", X" Z ,,:=a, D,:= ~ (F, - 1'1 Z,I - 1'2Z/2 )Z,,. ( b) I t can be shown that I 2 3 4 5 Y; (W 1 m- DC) al 100 200 300 400 36.3 36.3 34.6 32.9 31 .2 0.2 0.3 0.5 0.7 1.0 600 5.23 Utilizing (5.13.8) derive for Model 2 (.,,; - PIX;) the following estimator for Y, F ,==- first-order autoregressive errors a, ..... w i -I.2•.. .• n F ;- (Y;-p; YI _ 1 ) a/- I • ..... Z /-(XI-P;X;_I ) a/-I. ; = 1.2• ...• n a nd where Xo:=O a nd Yo==O . CHAPTER 5 I NTIlODUcnON T O LINEAR I StlMAnON 21' .... ~ PROBLEMS , ,5.14 Simplify the results of Problem 5.23 for the case of Model I. Show that ~ 211 Test I II 0 l-Jy. + l : ( 1- PI)( YI - 2 Pi Y _ I )Oi- 2 I l 4 5 6 7 8 9 10 2 ba • A V (ba)-A- I• _ -2" A =ol + l : ( I-p,)20, - 2 2 5.lS F or the case of first-order autoregressive errors show that the variance of I, given by (5 . 13 .3) is the c orulanl value for I~ I when V, (mV)' lei ( W/m-K) Test V, lei 8.81 8.35 7.97 7.66 7 .l8 7.10 6.86 6.64 6.44 6.27 1.178 1.133 1.148 1.159 1.148 1.136 1.144 I .Il6 1.133 I .Il6 II 6.11 5.96 5.75 5.51 5 .l4 S.IO 4.77 4.S2 4.19 l.87 1.129 1.133 1.101 1.101 1.091 1.087 1.084 1.087 1.080 I.OS8 12 13 14 IS 16 17 18 19 20 F or the model 1c, -Po+PIT, find the estimates bo a nd b l' Also find est. I.e.(bo). est. s.e.(b l). J . a nd e" L ellhestandard assumptions be valid. 5.21 Modify the program for Problem 5.27 for variable til. Let til be a table of input values. In particular. let tll-.ol V, a nd obtain new bo and b l values for the data of Problem 5.27. P ,-P, for 1 -2. .... n 5.16 ( a) Using the results of Problem S.2S show that A- I of Problem 5.24 can be written as 5.19 The Moody chart provides the followins data for the friction factor IDw as a function of the Reynolds number Re for a roughness ratio I I D-O.OOOI. Fit these d atato an equation of the f ormJDw·c+d(Re)'" ulinS the linear OLS method with c set equal to 0.0118. (Notice that IDw is approachins a constant for larse values of Re.) Re ( b) Suppose that as n becomes larger the measurements become more correlated as indicated by the expression p -exp( - al n) where a is some positive constant characteristic of the errors, Show that IDw Re low S x 101 0.0370 O.OlIO 0.0214 0.0180 0.014S O.OllS S x 1 0' I x 107 5 x 107 I x loa 0.0123 0.0121 0.0120 0.0120 Ixl~ Sxl~ I x 10' S x 10' I x 1 0' for fixed a? What is the physical significance of this result? ( c) Modify the result of part ( a) for fixed a? a nd p a nd large n . What is the physical significance of this result? 5.17 The following are actual data obtained for the thermal conductivity Ie of Pyrex. The temperature T (in K) is related to the voltage (in mVs) by T - 301.6+ 18.24 V. Let l og(/Dw-c) be the dependent variable and logRe be the independent variable. Calculate also the residuals in terms of IDw - lDw and the relative residuals. ( /DW- lDw)/iDw, 5.30 The United States d raft lottery issued in March 1975 gave the call order for the standby draft for men born in 1956. Results for birthday months o f April and September are given below. CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION 112 C HAPTER 6_________________ _ April 1-170 2-228 3-008 4-340 5-005 6-092 7-303 8-180 9-025 10-147 11-031 12-133 13-205 14-047 15-093 16-131 17-264 18-134 19-036 20-359 21-183 22-101 23-280 24-080 25-110 26-053 27-277 28-050 29-105 30-343 M ATRIX ANALYSIS F OR LINEAR PARAMETER ESTIMATION S eptember 1-175 2-263 3-087 4-199 5-236 6-221 7-322 8-341 9-349 10-347 11 - 173 12-161 13-325 14-343 15-135 16-117 17-307 18-019 19-041 20-230 21-086 22-128 23-156 24-227 25-209 26-231 27-022 28-102 29-089 30-064 Use the model 1'/ = flo · ( a) W hich s tandard a ssumptions a re valid? ( b) E stimate flo using O LS u sing the April d ata. ( c) E stimate 0 2 u sing the April d ata . 5.J1 5.J2 6.1 INTRODUCTION T O MATRIX NOTATION AND OPERATIONS R epeat P roblem 5.30b a nd c using the September d ata . U sing the April d ata in P roblem 5.30. e stimate flo a nd fl. i n the model 1'/, = flo + fl. X, using O LS . Also estimate their s tandard e rrors. T he extension of parameter estimation to more than two parameters is effectively accomplished through the use of matrices. The notation becomes more compact, facilitating manipulations, encouraging further insights, a nd permitting greater generality. This chapter develops matrix methods for linear parameter estimation and Chapter 7 considers the nonlinear case. Linear estimation requires that the model be linear in the parameters. For linear maximum likelihood estimation, it is also necessary that the independent variables be errorless and that the covariances of the measurement errors be known to within the same multiplicative constant. Before discussing various estimation procedures, this section presents various properties of matrices a nd matrix calculus that are used in b oth linear a nd nonlinear parameter estimation. 6.1.1 Elementary Matrix Operations A matrix Y consisting of a single column is called a column vector. We use 113