5___________________
CHAPTER
INTRoDucnoN TO
LINEAR ESTIMATION
131
5.1 MOTlVAnON. MODELS, AND ASSUMPTIONS
variable on the conditions· u nder which the experiment is conducted; the
method of least squares is frequently used to estimate the parameters.
Analysis of variance refers to the breakdown of the variability of the
observed values of the dependent variable into a part which is the sum of
squares about the fitted regression function and other parts due to the
exclusion of parameters o r groups of parameters from the regression
function. Those using analysis of variance methods when the independent
variables are limited in possible values to 0 and I (presence o r absence)
tend to be unaware that a model is implied [ I, p. 243). Analysis o f
covariance uses a combination of techniques which are specially adapted to
o o r I independent variables and techniques needed in more general cases.
5.1.1 Models
5.1
5.1.1
MOTIVATION, MODELS, AND ASSUMPTIONS
Motivation
One of the basic principles in engineering is to start analysis with simple
cases. F or that reason estimation of parameters in several simple linear
algebraic models is studied in this chapter. Many of the estimation ideas
can be introduced in connection with these models without the added
complexities introduced by nonlinear algebraic models or by models
described by differential equations.
In addition to the pedagogic value of simple algebraic cases, there are
numerous physical situations for which the regression function is linear in
the parameters. Moreover, when the regression function is unknown and
cannot be derived from first principles, simple models are usually proposed.
Simple linear models have been widely studied by statisticians,
economists. and others. Various terms designating certain parts of the
study of estimation of parameters in statistical models have also been used
to refer to much larger segments of that study. When the models are linear
in the parameters, regression analysis a nd analysis o f variance are sometimes used interchangeably. However, regression analysis also specifically
refers to the analysis of the dependence of the expected value of a random
130
Certain aspects of models are discussed in this section. First considered is
the model functional form. which is termed the regression junction. Some
restrictions on designs for these functions are also given. Second, two error
models are discussed. In one there are measurement errors and in the
other the random component is in the equation describing the system.
Third, in the next subsection various standard assumptions relating to the
statistics of the errors are given.
The regression functions for the cases used are considered to have the
correct functional forms, that is, not empirical approximations or best
guesses. The functions considered in this chapter are linear in the parameters and contain a t most two parameters. For convenience in later references, the regression f.unctions used in this chapter are listed and labeled as
follows:
M odell, '11 Po
(5. 1.1 a)
Model 2, ' IiPIXI
(S .l.Ib)
Model 3, '11 Po+ PIXi
(5. 1.1 c)
Model 4, ' Ii'" Po+ P I ( XI
X);
f
X
M odelS, '11 PIXII + P lX;l
iI
X
I
n
(5. 1.1 d)
( S.I.le)
The variable 'I is sometimes called the dependent variablet; Xi' XiI' and
Xil are independent variables that might represent time, position, tempera·"Conditions" refer. for e umple. to the X, values in (S.l.Ic).
t in the statistical literature Y is called the dependent variable.
CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION
131
ture, velocity, cost, and so on . Clearly some of these models are . related.
F or example, Model 2 reduces to Model I if X ;= I . Also, Model 5 mcludes
both Models 3 and 4.
In each case there is a restriction related to the me~s~re~en~s . Assume
that there are n observations. For Model I the restnctlOn IS simply t hat
there is a t least one observation or n> I . F or Models 2, 3, 4, a nd 5 the
respective restrictions are as follows :
(at least one X ;*O needed)
"
(5.1.2a)
_
2
(at least 2 different X i values needed)
(5 . l.2b)
_
2
(at least 2 different X; values needed)
(5.l.2c)
L ( X;x) *0
i I
"
L ( X;x) *0
i I
( at least 2 different sets of X; .. Xj 2 )
(5 .1.2d)
.
where X = .I7_IX;/n.
In each of the models, except the first, the independent vanables X; o r
X ij could represent a number of equally .or unequ~lIy spa~ed values.
Alternately, X; might represent values of vanous functIOns of time, I , such
as
I . I ! 3/ 3 "
"
I
/. ,
s inal,., c osal;, e 
Ol
"
5.1 MOTIVATION, MODELS, AND A SSUMP110NS
133
E rror Model D. Errors (Noise) in Process
(5.1.5)
where Tlj represents the quantity being measured a nd YI is its measurement.
Implicit in these models is the assumption that there is n o error in X I; that
is, X; is not a random variable as are YI a nd f l' I n E rror Model D, TI; is also
a random variable.
In Error Model A there are errors in the measurements b ut there is none
in TI. I n order to quantify f ; o ne c an study the error characteristics o f the
measuring devices be they thermocouples, hotwire anemometers, micrometers, etc. These errors can b e reduced by more precise devices. As
technology improves, one would expect f ; in Error Model A to decrease.
T he system model itself is assumed to be errorless o r noiseless. This implies
that the physics is wellunderstood a nd t hat there is n o stochastic noise
entering in TI . This would be the case for many physical measurements.
Consider, for example, the steady state temperature distribution in a flat
plate which is linear with position. The randomness in observed temperatures for repeated measurements would be the result of measurement noise
rather than some physical phenomenon causing the fluctuation.
I n E rror Model D the measurements are assumed errorless; b ut the
model (TI) contains " noise"; t hat is, the variable being measured deviates
b y some stochastic component from its expected value. A n example is
t urbulent flow between two parallel plates. P art o f the universal velocity
profile for turbulent flow is described by the expression
In/,
,
o r some combination of them. The quantity a is here assumed to be
dd "
Th f
known.
In most of this chapter the errors are considered to be a Itlve. en or
Model 3
(5.1.3)
. :~
where f . is the unknown error and Yj is the measurement a t X,. T he model
given b~ (5.1.3) can, however, represent the following two cases:
Error Model A. Errors in Measurements
TJ ; = /10 + /1. X ;
(5.1.4)
where the dependent variable u + is a dimensionless velocity a nd y + , the
independent variable, is a dimensionless distance. I n this case instantaneous velocity measurements fluctuate about the mean value u + owing
more to the turbulence phenomenon than to measurement inaccuracies.
Hence this is an Error Model D case. F or E rror Model D £1 would not be
expected to decrease with time (that is, with improved measurement
capability). Also a study of the sensor would not yield any information
regarding f l'
Regardless of whether Error Model A o r D is correct, the estimation
problem is formally the same for the physical models considered in this
chapter. The meaning of TI a nd £ is different, however, a s a re the statistics
I
134
CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION
for t . We shall visualize Error Model A as the model considered in this
chapter.
,~
U
ORDINARY LEAST S QUADS IS11MATORS (OLS)
135
5.1 ORDINARY LEAST SQUARES ES11MATORS (OLS)
I n ordinary least squares estimation the sum o f s quares function to be
minimized with respect to the parameters is simply
5.1.3 Statistical Assumptions Regarding the Measurement Errors
, .'
Assumptions regarding the measurement errors should be carefully stated
in each estimation problem. I f the assumptions d o not accurately describe
the data, then one c an a t least pinpoint the assumption(s) which are not
satisfied. The mere identification of the incorrect assumptions may lead to
more realistic assumptions and thus better estimators.
Different assumptions lead to different estimation methods. In this
chapter we consider three commonly used methods: ordinary least squares
(OLS), maximum likelihood (ML). a nd maximum a posteriori (MAP). T he
following conditions given in terms of Error Model A a nd Model 3 a re
termed the standard statistical assumptions for i = I , 2, . .. , n:
I.
Y1 = £ (YA Po'P') + £1 = 'Ill + [I (additive errors)
1. £ ([1)=0 (zero mean errors)
J . JI ( Y11 Po . PI) = : 0 2 (constant variance errors, homoskedasticity)
(S .2.1 )
wher~ "1 is a function o f the parameters such as Po a nd P,.
I t .IS. I mportant to observe that no statistical assumptions are used in
o btammg OLS parameter estimates, that is, the assumptions are    . I n
order to make statistical statements regarding the estimators it is necessary
to possess information regarding the measurement errors however.
In derivations to be given we may need the variance o f'l;d r: where d is
n ot a random variable. Assume that the errors in Y, a re addi:iv~, have z ;ro
mean, a nd a re uncorrelated (assumptions I, 2, a nd 4, respectively). T hen
(5.1.6)
(5.1.7)
(5.1.8)
(Note £ (£;)0 2 if £(£1)=0.)
(5.1.9)
4. £ {( r,  £ ( £JI £j  £ (~))) .... 0 for i :F j (uncorrelated errors)
(or £(£,£.)=0 if £ (£;)=0 a nd i~j.)
S. £/ has a rJormal probability distribution
(5.1.10)
6. Known statistical parameters
(5.1.11)
7. V (X1)=0 (nonstochastic independent variable)
(5.1.12)
8. N o prior information regarding Po a nd p, a nd parameters nonrandom
(5.1.13)
I n order to describe the assumptions concisely a nd explicitly, we assign a I
o r 0 t o the above assumptions where I means yes and 0 n o. F or a case
when all the assumption are satisfied we designate them as I1111111 where
the first I o n the left refers to the additive error assumption, the second I
refers to the zero mean assumption, etc. In some cases additional numbers
are used to indicate more information than a simple no. For example, for
the uncorrelated error condition, 2 designates firstorder autoregressive
errors. See Section 6.1 .5 for a more complete list of possibilities other than
I o r O. I f a n assumption is not used then a dash will be used in lieu of a I
o r O.
Assumptions 2, 3, 4, a nd 7 a re sometimes referred to as the G aussMarkov assumptions.
(5.2.2)
where the first assumption is used o n the first line o f (5.2.2), second
assumption o n the second line, a nd fourth o n the third line. (5.2.2) is a
special case o f (2.6.20).
5.1.1 Models I u d 1 h i· Po and
"I  p,
XI)
Both Models I a nd 2 a re covered in this section. Since Model 2 is the more
general, we start with it a nd then apply the results to Model I . F or Model
2 (,,; PIX;), (5.2.1) c an be written
"
s ~
I I
[~_p,X;]2
(5.2.3)
CHAPTER S I NTRODUcnON T O LINEAR ESTIMATION
136
Differentiating S with respect to f3 1 replacing f31 by the estimator b l' a nd
,
setting equal to zero give the n ormal equation,
5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS)
137
Suppose that all the standard assumptions are valid except that C need
not possess a normal density and 0 2 m ayor may not be known (assumptions 111111); then the variance of bo using (5.2.6) and (5.2.2) is
j
(5.2.4)
(5.2.9)
whose solution for Model 2 is
From (5.2.5) and (5.2.2) the variance of b l is
(5.2.5)
(5.2.10)
By setting X, = I in (5.2.5) the Model I estimator is
(5.2.6)
which is the average Y;. For these two estimators, no statistical assumptions are used but at least one observation must be made, and in the case
of Model 2, at least one X; must not be zero.
The predicted, regression, o r s moothed value is denoted Yj a nd is called
" Yj hat." For Models I and 2, respectively, Y is
.
(5.2.7a,b)
The residual e.. is the measured value of Y . minus the predicted value or
e,= Y , Y;
The residual
e;
is not equal to the error
Ej
(5.2.8)
but it can be used to estimate
Ej •
Notice that (5.2.9) and (5.2.10) both indicate that estimates as accurate as
desired can be obtained by simply taking a sufficiently large number of
ob~ervations. This naturally requires that the underlying assumptions be
vahd. I f the measurements were correlated, for example, this conclusion
might not be true.
Also note that for Model 2 (TI; = P IXI ) there is optimum placement of
observations. Suppose that n observations are to be obtained and it is
desired to obtain a minimum variance estimate by selecting the XI so that
IX;I <: IXml· Then the variance of b l is minimized if all the measurements
are concentrated a t Xm giving V(bl)=02/nX~. This would be the best
choice of the Xi values provided there is no uncertainty in the model (i.e.,
functional form of TI;).
Suppose that all the standard assumptions are valid except there m ayor
may not be normality and 0 2 is unknown (1111011). Then the variances of
b~ a nd b l are estimated by replacing 0 2 b y an estimate which is designated
s . The square roots of V (bo) a nd V (b l ) with this replacement are called
the e stimated standard errors (or standard deviations),
est. s.e.( bo) = sn 
5.2.1.1
 1/2
Using the standard statistical assumptions of additive, zero mean errors
and nonstochastic X" f3o' and f3 1(1111), we get for the expected value of
the Model 2 parameter
est. s .e.(bl)=S[
5.2.1.2
, ~
';...
One can also show for Model I that E (h o>= Po. Hence the least squares
estimators bo and b l are unbiased for the stated assumptions (see Section
 3.2.1).
(5.2.11 )
1/2
Mean a nd Variances o f Estimates
f
._1 Xl]
(5.2.12)
Expected Value o f S min
An estimator for 0 2 is not directly obtained using OLS as it is using ML
estimation. One can, however, for the assumptions 1111011 relate the
expected value of the minimum sum of squares, designated S min' to 0 2•
,
CHAPTER 5 I NTRODUcnON T O LINEAR t snMATION
': '
I II
U
1.19
ORDINARY L lAST SQUARES ESTIMATORS (OLS)
Smin is f ound using (5.2.13) a nd (5.2.19) to be
Since E ( Y ; Y;)=O.
..
E (Smill)"'(n  l)a 2
(5.2.13a)
:,. ' ,
(5.2.20)
a nd thus a n unbiased estimator for a 2, designated s l o r ';2, is
and thus the expected value of Sm in is
2
2
S 0
(5.2.13b) is valid for any number of parameters. I t still remains to find
Vee;) in terms of a 2• I t is always true that
V ( e J = V ( Y; 
Y;)"" V ( Y; ) + V ( Y, )  2 cov( Y;. Y, )
Smin
( nI)
I (Y1  y1
)2
( nI)
,
n>1
(5.2.21)
This expression is valid for one p arameter with assumptions 1111011 a nd
c an be used in (5.2.11) o r (5.2.12). For one parameter, s 2 c an b e estimated
by only using two o r more observations.
(5.2.14)
EXimple 5.1. t
The V (Y;) term is simply a 2• T he o ther two terlps a re considered below.
For the oneparameter models we c an write YI = b lX; = X; I ~ l j so that
An automobile is traveling a t a c onstant speed a nd the distances traveled a t the
e nd o f I, 2, a nd 3 min a re measured to be 1.01, 2.03, a nd 3.00 km. Assume that
distance is the dependent variable a nd time the independent variable. The regression function for this case is t hat the distance traveled, It, is equal to the velocity, v,
times the duration traveled. t; in symbols, It  vt. Use OLS to estimate v.
(5.2.15)
using (5.2.2). For constant error variance a 2 the variance of
is
Y; for Model 2
Solution
T his is a Model 2 case w ith" being t he parameter. Using (S.2.S) with f l being the Itl
measurement, we find
(5.2.16)
a nd then letting X; = I we have for Model I (TI; = /10)'
(5.2.17)
l
 [1.01(1) + 2.03(2) + 3 (3)]lt + 4+9r .I.00S k m/min
Observe t hat the variance of the predicted value of Y; is a c onstant for
Model I b ut increases with X/ for Model 2.
The third term on the right side of (5.2.14) for assumptions 111111 a nd
Model 2 is
•
 2cov( YI' Y;) =  2X;d;a
2= 
2[
where f l is the observation o f Itl'
Example 5.1.1
An object is d ropped in a vacuum a nd t he position Ir is observed a t various times tl •
T he observations o f Itl' designated f l' a re given a s
2] I a 2
2X; I X.
tl(sec)
f l(m)
Combining the above results yields for Models I a nd 2. respectively.
0.1
O.OS
0.2
0.2
0.3
0.4
0.4
0.8
T he measurements are to be used to estimate the local gravitational constant g.
T he position It is described by the differential equation ii  g a nd t he initial
conditions I tlio a t t O; the solution for It is I tgt 2 /2.
(a) Using o rdinary least squares, find a n estimate o f g.
(5.2.19a.b)
which are both less than V(E;)= a 2. I n both cases the expected value of
\
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
·48
(b) Using the standard assumptions except that 0 2 is unknown and
be n onnal, give an estimate of the standard error of g.
£,
5.1 ORDINARY LEAST SQUARES ES11MATORS (OLS)
squares function, (5.2.1), with M odelS, ( 5.l.Ie), we have
need not
II
S ... ~ [ yjp,Xi,P2XI2]2
. (5.2.22
I I
Solution
(a> T he given model is the same as Model 2 with g being
estimator for OLS is (5 .2.5) which can be written as
j~1 Y"l] [ ,t 'J
W e differentiate S with respect to P I' setting the derivative equal to zero
a nd replace /3, by its estimator b, a nd /32 b y b2• Repeating the sam,
procedure for P then yields the two normal equations
2
fJ a nd X I being 11/2. T he
I
4
g=   2 
4
2
(5.2.23b l
where
Then the numerator and denominator are, respectively,
i y,2
i I "
(5.2.23a .
b,cl2 + b2cll = d2
[
!
+ b2c'2:3 d,
b,c"
]
II
=
! { 05(. 1)2 + .2(.2)2 + . ..
2'
I I
i f t /_i[(.1)4+(.2)4+(.3)4+(.4)4]""0.00885
i i
2
a nd thus the estimate is g =0.08625/0.oo8859.7458 m /sec .
( b) T he residuals, e; = Y; are, respectively, 0.00127, 0.00508,  0.03855, a nd
0 .02034 a nd the sum of squares of these terms is Sm in=0 .001928. F rom (5.2.21) the
estimated standard deviation is
1/2
1/2
Smln
= [.0019281 =0.02535
( nI)
( 41)
r;,
s=[
c .,= ~ Xi/eXI"
+ .8(.4)'} = 0 .08625
I
[
= .025351.00885
Notice that the coefficient c 12 a ppears in a symmetric manner in (5.2.23a,
b). Solving (5.2.23a, b) for b, a nd b2 yields (for M odelS)
N o statistical assumptions were necessary to derive the estimators given in
(5.2.24a). Using the three standard assumptions o f additive. zero mean
errors a nd nonstochastic Xi it c an be shown that b, a nd b2 a re unbiased
estimates of /3, a nd P2'
T he variance of b, c an be readily found by writing b, as
r 1/2
(5.2.25)
= 0 .2695 m /sec 2
which c an be compared with the estimate of 9.7458 m /sec
(5.2.23c)
(5.2.24b)
a nd then from (5 .2.12) the estimated standard error of g is
 1/2
4
~ 1;4]
, I
YjXlk
i I
(5.2.24a)
1
est. s.e.( g) = s  4
II
d.= ~
2
Then using the standard statistical assumptions 111111 a nd (5.2.2) the
variance o f b, is
.
V (b.)=ICt; gia 2 =I(J/2'/;gj+ g1)a 2
5.2.2 TwoParameter Models
.~ .
,.,
5.2.2.1 Model 5, 11,' = /3,X/I + /32 X /2
In order to simplify the presentation of the twoparameter cases, the
general twoparameter case, Model 5, is considered first. Using the sum of
,
= [C~2CII 2C22C~2 + C:2c22 ]a 2/ a2
of
o r simplifying gives
(5.2.26a)
..
~
5.2 ORDINARY LEAST SQUARES ESI1MATORS (OLS)
CHAPTER 5 INTRODUcnON TO LINEAR E SnMAnON
142
$.11.1 MotIelJ,
I n a similar manner it can be shown that V(b 2) a nd cov(b l .b 2) are given by
0 :' . ..
o
T he predicted value of Y; is
"'1 flo + fl IX,
Model 3 results can be found from those o f Model S by replacing in Model
S PI by Po. bl b y bOo fl2 by PI' b2 by bl' X'I by I, a nd X,2 by X" This gives
(S.2.26b.c)
:'
(5.2.33)
YI ,
(5.2.34)
(S.2.27)
T he variance of
O ne must be careful where the squares are placed in 11',~ XZ
note that ~ I
2
means the sum of Xi whereas ( IXi means the square o f the sum of the X
values. I t also can be shown that 11 is also equal to
I
Y is then
I
2
/I .
l 1n I ( X,X).
(S.2.28)
From the above relations b l' the estimator o f f ll in Model 3, which is
"'i  PO+fJIXi• can be found from b2 in (5.2.24a) to be
n (I Y iXi)(I Y ,)(IXI)
b l ..:.......:...:.......:..........:...:....:........::..:..
11
(5.2.29)
From (5.2.14), (5.2.28). and (5.2.29) the variance of the residual e; ( = Y ;YI ) is equal to
(5.2.30)
Then using the result that E(Smin) is equal to I Vee;) given by (5.2.13b).
we find that
2
(5.2.31 )
since 11 == C II C 22  d2' Consequently. for the twoparameter case with Model
2
S and assumptions 1111011. a n unbiased estimator for 0 is
( n>2)
(5.2.32)
which differs from (5.2.21) in that there is a factor of n  2 rather than
n  I. Observe that (5.2.32) is properly meaningless for n = 2. For two
parameters and two observations the two residuals must be zero also giving
S min O. Consequently. for two parameters, 0 2 can be estimated only if
n >2.
(5.2.3Sa,b)
I I
where (S.2.26) is used. I t can also be shown that cov( Y;. Y;) is equal to the
same value or
= (n  2)0
143
J
(S.2.36)
Using (5.2.35a) this expression can also be written (Model 3)
b
. I (X,X)Yi
l

I (XIX)(YI Y)
:::z 
I (XiX)2
_ _ _ _ _......:...
(5.2.37)
I (X,X)2
where Y I YJ n a nd the range o f each summation is f rom; = I to n. The
estimator for bo c an also be found from (5.2.24a) b y using the expression
for b l' Instead we shall use (5.2.23a) divided by n ( and b l+bo a nd b2+b l )
to get
(5.2.38)
Hence if X IX,/ n is equal to zero, bo is simply Y. F or this reason a nd
the resulting simplifications in (5.2.37), a transformation sometimes used in
hand calculations redefines XI so that X O.
As mentioned several times above, no statistical assumptions are used to
obtain the estimators for bo a nd b l given respectively by (5.2.38) a nd
.(5.2.37). Suppose now that the standard assumptions are valid. A number
·
CHAPTER 5 INTRODUCTION T O LINEAR f STIMATION
....
Unlike the variances of bo a nd b l' the variances of Y:. a nd e· are functions
of i. Note that V (il ) has a minimum a t X;=X a nd ~aximu:n value a t the
smallest or largest value of X;. . The variance of the residual e· is different
I
in that it has a maximum a t XI == X.
The estimated standard errors of bo and b l are found from (S.2.39a.b) to
be for assumptions 1111011.
y
i]
I x.2
est. s.e.( bo) = S [
F lpre 5.1
Linear model with Y being a random variable with constant
1
0
and normal
e st.s.e.(bl)=s [ n /a ]
probabilily distribution.
of these are illustrated by Fig. 5.1 for Model 3, 11; = fJo+ fJ 1X ;. The normal
probability density is superimpo~~d upon the curve for sev~ral
~alu~s.
The first two assumptions of addItIve, zero means errors are Imp~l~d I D FI~.
5.1. T he third assumption of constant variance is depicted exphcltly, a~ IS
the normality assumption (number 5). The nonst~~hastic ~; a~sumplton
(number 1) is implied by the lack of a probablhty densIty I D the X;
!;
direction.
Mean and Variances for Model 3
The OLS estimates of fJ o and fJ l are unbiased for additive, zero mean
errors as was demonstrated for the more general case, Model 5:
From (5.2.26), (5.2.33), and (5.2.34) the variances and covanance of bo
a nd b l are
(5.2.39a.b)
where a is given by (5.2.35a). Assumptions 1111111 ar~ used.
From (5 .2.28) and (5.2.30) the variances of the predIcted value
the residual e; can be written
i;
a nd
(5.2.4Ia)
V (e;)=
[
I
 2 I] 0 2
I n n(X;X) a
1/2
(S.2.42a)
1/2
•
(S.2.42b)
where from (5.2.32), s=(Smin/(n2)]1/2
F or Model 3 the sum of the residuals is equal to zero or
(5.2.43)
This interesting result can be used to check the accuracy o f calculations for
the parameters. This result is true for any linear o r nonlinear model
provided there is a fJoterm in the model, that is, a parameter not multiplied
by a function of an independent variable, and provided OLS is used.
Example 5.1.3
Experiments have been performed for the heat transfer to air flowing in a pipe. A
dimensionless group related to the heat flow rat~ is the Nusselt number, designated
N u . This is a function of the Reynolds number, denoted Re. which is proportional
to the average velocity in the tube. Below are some values for the turbulent fluid
flow range.
(5.2.40)
....
...
.45
5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS)
Re
Nu
1()4
32
2 x 1()4
60
4 X 10"
90
119
T he suggested model is N u'" aoRea• where the parameters are ao a nd a l' Reduce to
a linear form and estimate " 0 a nd a l using ordinary least squares with log N u being
the dependent variable.
Solution
Take the logarithm to the base 10 to get
(5.2.4lb)
sX 1()4
log N u  Iogao+ ollog Re
.'
' .'
CHAPTER 5 INTRODUCTION TO LINEAR £STIMATION
146
F or convenience write the model in the Model 3 rorm. 1Ji 
Po + PI X,.
with
log Re . .. X,
10gNu ..... 1J,.
T he tabulated values or N u are used to obtain log N u which is now Y, as given
below
4.6990
4.6021
4.3010
4.0
X, ( logRe)
2.0755
1.9542
1.7782
1.5051
Y,
T he estimates or bo a nd b l are found using (5.2.37) and (5.2.38). In these equations
the following are needed.
_ IX,
X I t 
14.0+4.301 +4.6021 +4.699)
4
I ( X , X) 1
I ( X,  X) Y, 
1.5051 + I ~782 + . .. _ 1.82825
(4  4.40(525)1 + . .. + (4.699  4.4(0525)1 0.30004 53
Normal random error terms (2) with a mean o f zero and unit variance have been
a dded to the model 1 J, Po+ PIX, with Po set equal to I a nd PI set equal to 0.1. T he
" data" are tabulated in Table 5.1.
( a) Estimate the parameters Po a nd PI using ordinary least squares.
( b) F ind the estimated standard errors for bOo b l' a nd Y, using the standard
assumptions except that the errors need not be normal and that CJl is unknown
(1111·011).
T lble S.1
O ltl for EXlmple 5.1.4
Observation
I
2
",
X,
0
10
20
30
4
5
6
7
8
9
40
50
60
70
80
360  IX,
(4  4.400525)( 1.505 I ) + ( 4.301 4.40(525)( 1.7782)
+ . .. 0.2335972
14'7
Exlmple 5.).4
3
4.400525
f _ I nY, _
5.2 ORDlNA~Y LEAST SQUARES ESTIMATORS (OLS)
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
Y,
E,
 0.742
 0.034
1.453
0.963
0.040
0.418
1.792
 0.374
 0.222
3.294  I!t
0.258
1.966
4.453
4.963
5.040
6.418
8.792
7.626
8.778
48.294  IY,
Solution
( a) T he O!::S esti!!!ators for bo a nd b l are given by (5.2.37) a nd (5.2.38). I n these
T hen (5.2.37) gives
equations X a nd Y are needed.
b
l
I( X , X) Y,
_ .2335972  0.7785397
I (X,_X)l
.3000453
a nd from (5.2.38) bo is
T he estimate or 0 0 is
Thus the prediction equation for Nu is
I~
I
360
X   '" X , 9(0+ 1 0+20+ . .. + 80) 40
"'_I

I~
Y 
~
"II
I
4 8294
Y '9(.258+1.966+ . .. + 8.778)'5.366
9
Additional required calculations a re given in the second, third. and fourth columns
o f Table 5.2. T hen the estimates o f PI a nd Po a re
bl 
bowhere some of the decimal places have been dropped.
9
I (X,X)Y,
2
I (X,X)
f
611.93 0.10198833
6000
b I X5.366(0.10198833)(4O)1.2864667
which happen to be about 2% a nd 29% larger than the true values.
CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION
I.
T able 5.1
Calculations f or E xample 5.2.4
X ;X
o
10
20
30
40
50
60
70
80
360 =
( X;_X)2
 40
 30
 20
 10
0
10
20
30
40
1600
900
400
100
0
100
 10.32
 58 .98
 89 .06
 49 .63
0
64 . 18
175 .84
228.78
351.\2
611.93 =
400
900
1600
6000 =
~(X;  X)2
~X;
Y;
Y ;(X;X)
Y ,
1.02847
0.34035
1.12677
0.61688
0.32600
0.03212
1.38623
0.79965
 0.66753
0.00000 . .
1.28647
2.30635
3.32623
4.34612
5.36600
6.38588
7.40577
8.42565
9.44553
~(X,X)Y;
Y,=e;
~(Y; 
5.1 ORDINARY LEAST SQUARES ESTIMATORS (OLS)
Statistical statements regarding the accuracy of the estimates are discussed in
Chapter 6 in connection with the confidence region.
The estimated standard error of the predicted (or smoothed) value of Y using
j
(5.2.4Ia) is
•
est. s.e.(Yj )
I
..

[n
+
5.2.2.1
S
s= ~
[ n 2
]1/2 = [5.937718] '/2 =0.921002
_
~(Xt _ X)
2
.I =
I
[
9
+
( Xj _4O)2
1/2
] (0.921002)
6000
EstillUlton l or M odel 4, TI;" P~ + PI(X; 
X)
M odel 4 is interesting because a n umber o f the results h ave s imple forms.
W ithout a ny s tatistical assumptions t he O LS e stimator for
is
Po
 l: y.
o'" y ='
n
h
Y;= f + b l (X;  X) = 5.366+0.10198833(X,  40)
and is also given in Table 5.2. The residuals e; are also given. Note that the sum is
zero.
( b) In order to find Ihe estimated standard errors it is necessary to evalute .1 2
which in tum needs S min = ~el which is 5.937718. Then from (5.2 .32)
 2]1/2
(X ;X)
which varies from a minimum at X j 4O of 0.307 to maximums of 0.566 a t X,
.O
.
and 80. This latter v~lue is the same as for est. s.e.(bo) because bo in this case is also
the X; . . 0 value of Yi •
Y,)
All eight significant figures given in these estimates are not needed, but it is
usually wise to carry a couple of extra significant digits in the calculations because
there can be small differences of large numbers.
The predicted value of the dependent variable. Y • can be found from
I
149
(S.2.44)
a nd t he O LS e stimator f or PI is the s ame a s t hat given for M odel 3.
U sing t he assumptions 1111011 t he variance o f ho is
2
V (ho) !!..n
(S .2.4S)
a nd t hat o f hi is given b y (S.2.39b). T he c ovariance o f ho a nd hi is simply
9 2
(S .2.46)
which is a n estimate of the standard deviation. Compared with the true value of
unity this is only about 8% too low .
From (5.2 .42a) the standard error of bo is
est. s.e.(bo)=
[
~X,2 _ ]'/2s= [20400 ]1/2(0.921002)=0.56608
 ()
n~(XkX)
2
9 6000
and the standard error of b l is obtained from (5 .2.42b)

est. s.e .(b,)=s [ ~(X; X)
2]'/2
=
0.921002
(6000)
1/2
=0.011890
Notice that bo±esl. s.e.(bo) is 1.286±0.566 which includes the true value of fJo= I .
b, ± est. s.e.( bl ) is 0.10 199 ± 0.0119 which also includes the true value of 0.1.
T he v ariance o f
( S.2.4la,b).
5.2.2.4
Y;
a nd e; a re e qual t o those given for M odel 3,
OptinuJI Experiments l or Models 1 t uUl4
I f o ne h as t he freedom o f t aking the observations a t a ny XI v alues f or
estimating p arameters i n M odels 3 a nd 4, then o ne s hould select the XI
values s o t hat t he m ost a ccurate e stimates o f p arameter v alues a re p roduced. Such designs o f e xperiments a re t ermed optimal a nd y ield o ptimal
p arameter e stimates. O ur c riterion o f o ptimality i n this section is t hat o f
m inimum v ariance o f hi. A m ore general criterion a nd a nalysis is given in
C hapter 8.
.
Models 3 a nd 4 p rovide exactly the s ame O LS Y; values. F or t hat r eason
we consider t he v ariances for M odel 4 f or assumptions 1111011. T he
CHAPTER 5 INTRODUcnON TO LINEAR ESTIMATION
ISO
variance of ,,~ is independent of Xi a nd the covariance of b~ a nd b l is zero.
Hence only the variance of b l which is given by (5.2.39b) ~ed b e
considered. Note that V (b l ) is minimized by maximizing ~(Xi X)2. Let
the maximum permissible range o f Xi be between X m in a nd X m Ol' T hen it
can be rigorously shown that V (b l ) is minimized if one half the measurement are made at X m in a nd the other h alf a t X m u' N o i ntermediate
measurements are taken. The optimal case is illustrated by Fig. 5.2.
T he variances of b l with uniform spacing of the Xi values given by
X i=(iI + c)8.
(5 .2.47)
i = 1.2 • .. . • n
for various models a~e given in the fifth column of Table 5.3 which is a
s ummary of the results of this section. The spacing between the Xi values is
8 a nd the first X, value is X 1 = c8 where c is a factor locating X I ' T he
largest Xi value is X" = (n  1+ c)8. F or this uniform spacing the variance
of b l is
V (b ) _
"
I
120
2
(5.2.48)
n(n21)6 2
I f o ne half the observations were located a t X m in == c8 a nd the other half a t
X m ax"" (n  I + c)6. the variance of b l is (for this nonuniform spacing)
V ( b )=
"
l
40
2
(5.2.49)
n (_1)2 6 2
151
5.J ORDINARY LEAST SQUARES ES11MATORS (OLS)
T he r atio o f V,,(b,)1 V,,(b l ) is
3( n  I)
V" ( b, )
V,,(b,) =
(5.2.50)
n+ I
which is equal to I for n == 2 a nd monotonically increases to 3 a s n+oo.
H ence for large n. there is a factor 3 in the ratio o f variances o f b for the
uniform spaced case a nd t he case o f p lacement of the observatio~s a t the
extremes.
In using the next to last column o f T able 5.3 o ne s hould note t hat
X min=
(5.2.51)
c8.
a nd thus
X mill
X m . .  Xmin
6
n I '
C
6'
X min(nI)
== Xm. .  X
min
=
(5.2.52)
In this discussion o f o ptimal design o f experiments it is i mportant to
note that the s tandard a ssumptions o f 1111011 a re assumed. Also there
should be no u ncertainity regarding the validity o f the model. I f the model
is in question then o ne would be better advised to choose equal spacing o f
the Xi values o r e qual spacing in " time" if X. is a function o f time such as
t 2 /2 .
'
5.2.3
Comments Regarding Definitions
I n this section a n umber o f definitions are given. Some o f these c an b e
confusing. T here are. for example. several expressions related t o y .. We
have
'
•
Yi  l); + E;. m easured value o f Y,
Y.1
a nd
E ( Y/ )  l);. expected value o f Y; o r model o r d ependent variable
1\
Y1
.
Y
,
I
bo+ b,X,. p redicted value of Y; for Model 3
 ~y.
Y '"  ' • average value o f Y/ for i I to i . . n
n
xm ln
.
X
m ax
Xi
f1Ie 5.1 Recommended location of measurements when model is know" to be • straight
line in X.
Also used is the symbolE; f or measurement e rror o r noise. This should n ot
be ~onfused with the residual t!; which is Y/  f;. T he i ndependent variable
X, IS a ssumed t o b e errorless a nd h as a n average value given by X...
~X;/ n._AII these terms a re i llustrated in Fig. 5.3. Modified definitions for
X a nd Y m ay b e used in subsequent sections when 01 is n ot a c onstant.
cJt'l
''"''I
:
. . ,+
T able 5.3
M odel
N o.
~
S ummary o r Estimators, Variances, and CovarillDCeS ror Five Simple
Linear Models. Standard Assumptions o r 1111111 Apply.
Model
Estimators
Variances a nd C ovariances
for Uniformly Increasing X,;
X ,(i  I + c)8; i I,2, . ..
V ariances
a nd
Covariance
Variances a nd C ovariances
f or 1 /2 M easurements a t
X c8 a nd R est a t X (II1 + c)6
112,3, . ..
'11, Po
bo Y
2
' II;PIX;
bl   
I Y,X,
large II
a2
a2
II
II
II
3a 2
11 38 2
"
II[ cl +(111 + c)2]8 2
2a 2
2a "
11 38 2
6 al
a2
V(b l )    2
I X}
I X/
112,4,6, . ..
a2
II
a2
V (bO>,.
1
large II
a2
11(11+ 1)(211+ 1)8 1
b
f or c I
3
'11, Po+ PIX,
a lI x l
V(bO>
bo YbIX
I (X,X)Y,
al
V(b l ) 
bl 
I (X,X)2
2
I (.\jX)
cov(bob l)
11(111)
f or c I
1 2al
11(112 _1)11 1
2
bo Y
 6(111(11I2+ 2c)a
1
1)8
l
 6al
1118
f or
I(X/_X)2
a2
I (.\jxf
c ov(b;"bl)O
S
'1/, PIX/I
bl 
dl c u dl Cl2
A
11(11_1)2
4a 2
11 38 2
4a 2
11(11_1)211 1
 2(111 +2c)a
11(111)2 8
2
 2a 2
~
f or
II>C
II>C
a2
II
al
II
al
II
1 2a2
11(1111)8 1
12a 2
11 38 2
4 al
11(11_1)28 2
4 al
11 311 2
0
0
0
0
+ PI(X,i)
V(b l ) 
l
II
II
I (X/X)Y/
2
IaI
a2
V(bO) !L.
bl 
2 [cl +(III+d]a l
12al
11 311 1
I (X/X)
o
II>C
f or c I
_2
' I/,P
l
4
 Ia
I
 Xa l
4
f or
f or
c I
2(211+ l )a l
III(XJ Xf
2
C ur
V(bl)~
+~XI2
b2 
d zClI d lc l2
A
V (bz)
c al
_1_1_
A
c lla2
c ov(bl.bz)  ~
" tI,XI/rXI/
d "I,Y,X,,,
ACIICUC~l
" If t he m easurements a re o nly a t X (II1 + c)8 a nd II>C. the v ariances a re o ne half o f t hose i ndicated.
bFor uniform spacing, t hat is, X ,(il + c)8. i I.2 • ...• II. we h ave
1
1
1
X  ii IX,y(III+2c)8. I ( _ ) 2  rr"(121)82
X ,X
2
I X? 8 { (II+C)(II+CI)(211+2c1)c(c1)(2c1)}
6
6
1st
CHAPTER 5 IIIfI'RODUcnON TO LINEAR E SnMAnON
155
5.3 MAXIMUM LIkELDlOOD ( ML) f S11MAnON
5.3.1 OnePanmefer Cases
"
f
Consider the linear model of ,,;  PIXI (Model 2) a nd i ntroduce this " 1
expression in (5.3.2). T he function I n/( yIn . ... Y.I P I) is maximized with
.
respect to PI by minimizing S Ml since PI appears only in S Ml' Differentiating with respect to P I' replacing PI by its estimator b l' a nd setting the
derivative equal to zero yields the normal equation
R esidual. Y iYI = e i
.,/
.~ .
~ True r egreulon
(5.3.3)
l ine, "
"
which can be solved for b l to obtain (for Model 2)
P redicted l ine. Y
(5.3.4)
Figure 5.3 Figure showing some term. used in Section S.2.
N ote that this expression reduces to exactly the same one as given by
(S.2.S) for OLS if o~  0 2, a c onstant. Also note that by defining
Sol MAXIMUM LIKELIHOOD (ML) ESTIMATION
Maximum likelihood estimates make use of whatever information we have
a bout the distribution of the observations. We illustrate M L estimation for
the case of additive errors, YI = T I(XI.!l)+ f /' a nd when the erro~s f l h av:
zero mean, are independent, are normal, and have known vanances 0 ;.
T he X 's are errorless and the parameters are nonrandom. These assump. tions ~re designated 1111111. This information can be used to obtain
estimates of parameter variances.
..
.
.
T he natural logarithm of the normal probabIlity denSIty for mdependent
measurements is given by
YI
1 ;=,
01
XI
Z I=
(5.3.4) can be written as
(5.3.6)
which is also similar to the O LS expression, (5.2.5); here I ; is analogous to
YI a nd ZI t o XI' In terms o f 1; a nd Z I' S Ml is a sum of squares of terms
which have constant variance a nd has the same form as for OLS. Finally
note that the variance o f FI is unity.
From the analogies given above between YI a nd 1;, XI a nd Z I' a nd 0 2
a nd unity, the variance o f b l c an b e f ound from (5.2.10) to be
.
where the "physical" parameters are only contained in
S Ml=
..  L [YI 0Tl;]2
1
V (b,)(l:Z,'f'
S Ml'
(5.3.2)
;_1
The one and twoparameter cases are considered briefly in this s~ctio~.
It is pointed out that the M L estimators for Models 2 and 5 can be gIven I n
a similar form to those given by OLS.
(5.3.5)
01
n
I
[:1:(:
(5.3.7)
F or Model I. T lI POt the estimator bo a nd the variance o f bo a re found
by letting X;  I in the above two equations,
bo• Y;
Y= (l: Y;012)(l:0/2r I
V (bo) [l:0;2r
t
(5.3.8a.b)
(5.3.8e)
CHAPTER 5 I NTRODUcrlON TO LINEAR ESTIMATION
156
5.3.2 T woParameter Cases
F or t he general model, M odel 5, given b y 1/; = f ilX iI + {32 X ,2' t he e stimators
f or {31 a nd {32 a nd t heir v ariances c an b e o btained b y l etting
(5.3.9)
5.3 MAXIMUM LIKELIHOOD (ML) ESTIMATION
( a) Estimate the parameters using ML. Let the standard assumptions apply
except that we do not assume that ol equals a constant, 0 2•
( b) Find the standard errors for bo a nd bl.
S olution
( a) For this example, the model is Model 3 and the estimators are given by (5.3.10)
and (5.3.11). Note that X;=sin/;. Some of the required detailed calculations are
given below.
X;
a nd t hus (5.2.24) a nd (5.2.26) c ould b e u sed for t he e stimators b l a nd b 2,
t heir variances, a nd c ovariance.
F or M odel 3, 1/ j = {3o+ {3I X " w ith a ssumptions 1111111 (5.3.9) c an b e
u sed t o f ind
(5.3.lOa,b)
( 5.3.lla,b)
15'7
I
2
3
4
5
0
0.5
I
0.5
0
OJ
 2'
X jX
(X;  X)20;2
 0.0239
0.4761
0.9761
0.4761
 0.0239
5.723
90.660
95.273
90.660
5.723
288.039
X;0,2
10,000
400
100
400
10,000
20,900
0
200
100
200
0
500
Y jo;2
Y ;0;2(X;X)
4926.0
399.4
135.47
380.76
4996.0
10837.63
 117.847
190.145
132.229
181.271
 119.522
266.276
In addition to the sums indicated in the above table, X a nd' Y are found from
(5.3.11) to be
500
Y= 1~~3 =0.518547
X  20900 = 0.0239234,
Then from (5.3.10)
266.276
b l  288.039 . . 0.924449
boos
r.
N ote t he new definition of X given b y ( 5.3.lla). T he s ame d efinition o f
2
is given in (5.3.8b) a nd (5.3.11b). F or c onstant 0 , t hese definitions for X
a nd Y r educe t o t hose given in S ection 5.2.
Example 5.3.1
( b) The standard errors are found from the square roots of (5.3.12a,b)
s.e.(bo)
IX/Oj2/IOj_2jl/2 [ 200/20900 ]1/2
2
=
=0.0057639
[ I ( X  X) O k2
288.039
k
_
Simple harmonic motion can be described by 11; = 130 + {J I sin I; where {Jo is a shift of
the axis and (JI is the amplitude of the motion. Measurements and their standard
deviations vary as indicated in the following table.
."
Y b I X=0.5185470.924449 (0.0239234)=0.496431
" (0)
I
2
3
4
5
0
30
90
150
180
OJ
Y
j
0.01
0.05
0.1
0.05
0.01
0.4926
0.9985
1.3547
0.9519
0.4996
s.e.(b l )= [ I(XlX)
2
(Jl2] 1/2 =(288.039)1/20.05892
Least squares estimates of the parameters for this example are bo= 0.5 10329 and
b l  0.872829. The bo value is outside the bo± s.e.(bo) interval found using maximum likelihood.
5.3.3 Estimating
CJ1
Using Maximum Likelihood
W hen t he e rror v ariance is a c onstant, t hat is, a} = a 2 , a n e stimator f or a 2
c an b e o btained b y d ifferentiating (5.3.1) with respect t o a 2 a nd s etting the
158
CHAPTER 5 INTRODUCTION TO LINEAR E SnMAnON
result equal to zero. T he result is
5." MAXIMUM A P OsttRlORl ( MAP) ES11MAnON
where Vb,2 ( the variance o f b~ is given by
(5.3.13)
or
(5.3.14)
T his is unfortunately a biased estimator for 0 2• F or o ne parameter, the
denominator should be n  I to provide a n unbiased estimator. F or t hat
a nd o ther reasons use (5.2.21) to estimate 0 2 for one parameter a nd use
(5.2.32) for two parameters when the assumptions 1111011 a re valid.
5.3.4
M aximum Ukellhood Estimation Using Information from PrIor
Experiments
A fter one set of data has been used to estimate the parameters, a second
set o f d ata may become available. I f the second set o f observations is
i ndependent of the first a nd p arameter estimates based on all the d ata a re
needed, then the first set of d ata c an provide prior information for analysis
o f the second set. A method is given below whereby the number o f
calculations in simultaneously analyzing all the d ata c an be reduced by
taking advantage of the results o f the analysis o f the first set of data.
F or simplicity let us derive the method for one parameter. The ML
estimator for one set of data when the standard assumptions 1111111 are
valid is given by (5.3.6); assume that there are n l observations and write
(5.3.6) as
(5.3.15)
where Vb" is the variance o f b. 1
Vb
,= V(b.I)=(,~ Z/)
,1
159
I
(5.3.16)
Consider now a combined analysis of n = n l + n 2 observations. Then (5.3.6)
becomes
(5.3.17a)
(5.3.17b)
We point o ut t hat (5.3.17) uses only the previously calculated b. 1 a nd Vb .•
values; n o o ther information regarding the first n l o bservations is needed
to calculate improved values o f b and V. T he same procedure c an b e used
for more than one parameter.
5.4 MAXIMUM A POSTERIORI (MAP) ESTIMATION
Therf'! a re several ways to introduce prior information. O ne o f these is
given in Section 5.3.4 a bove for M L estimation. In this method, information from previous tests is included in such a way that exactly the same
estimates a re o btained as if all the d ata were analyzed together. This M L
m ethod also assumed that the parameters were nonrandom.
Another way to include prior information utilizes the maximum a
posteriori ( MAP) method. T he M AP estimators a re based o n Bayes's
theorem a nd a re therefore called bayesian estimators. I n t he M AP method
the parameters either a re r andom o r a re conceived as being random.
Hence there a re two situations when M AP estimators might be used: ( I)
when the parameters a re r andom a nd (2) when there is subjective information. W hat is meant by random parameters is discussed further below.
In this section the s tandard a ssumptions o f additive, zero mean, uncorrelated, normal errors as well as known statistical parameters a nd nonstochastic independent variables a re considered to be .valid. Also, there is
i nformation a bout a p rior distribution o f values o f t he parameters ({J). We
assume this prior distribution to b e n ormal with known mean a nd variance. We assume throughout o ur experiment t hat the {J's a re c onstant, that
is, nonrandom. These assumptions are designated 11011110. ( In Chapter 6
where a more detailed set o f s tandard assumptions are given, two particular sets o f M AP assumptions considered are designated 111112 a nd
111113.)
5.4.1
Random P anmeter Case
In the random parameter case the parameter for a particular e xperiment o r
set o f experiments is considered to be constant ( or nonrandom). This may
be clarified by a n example. A particular steel is occasionally produced by a
160
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
plant. The thermal conductivity is known to vary from batch to batch. The
longrun roomtemperature average thermal conductivity (the parameter.
P. o f interest) is 20 W / moC with the standard deviation among batch
averages being 0.1 W / m 0c. T he distribution is normal. Then this information regarding the random nature of P from batch to batch is described
by the probability density of
f (P)=[(2'IT)1/2(O.I)r 1exp [
k( Po~,2°f]
(5.4. 1)
5." MAXIMUM A POSTERIORI (MAP) ESTIMATION
. :et us now ~evelop a n e stimator for the parameter PI in Model 2,
TJ;  PIX;.' PI belD~ chosen a t r andom from a given popUlation. With the
a ssumptions mentIOned above a nd t hat PI is i ndependent o f
[
2"(0:4) 2]
I
Y fJ
feY)
j
•
we have
P IN (Jtp. Vp )
(5.4.4a)
t ;N( O .al).
E (t;P I ) =0
(5.4.4b)
a nd t hus the (prior) probability density o f the random parameter PI is
f ( P .) = (2 'IT Vp ) 1/2 exp
f _! (PI  Jtp)2]
t 2 Vp
(5.4.5)
a nd that o f Y I . .... Yn given PI is
(5.4.2)
f (YI . ....
YnIPI)={n(2?Tan1/2}exp[t .~ (YjPIX;)2(Jj2]
, I
Let us use Bayes's theorem in the form
f (PIY)= f (YIP)f(P)
t
Y; = PIX; + t it
T he s tandard deviation of measurements Y, for a given batch is k nown to
be 0 .4. F or a single normal measurement the probability o f this measurement given the true conductivity fJ of the batch is
12
I
f (YlfJ)=[(2'IT) 1 (0.4)] exp
161
(5.4.3)
where (P I Y) is the posterior distribution o f P given Y. I t includes information both from a large number of batches. f ( P ), a nd from a given batch.
f ( YI P). I f additional measurements Y; are made, they are also considered to
be from this given batch.
Since the parameter fJ appears only in the numerator of (5.4.2) and since
it is convenient to take the logarithm of (5.4.2), we find that f ( PI Y ) is
maximized by minimizing
with respect to fJ.
N otice in this example that the conductivity of a batch chosen a t
r andom is a r andom parameter. Once the batch is chosen, however. all our
specimens are from this batch and thus the expected value of each is the
same.
I f we examine the conductivity as a function of temperature, instead of
having a single parameter corresponding to room temperature conductivity
we have a regression function containing a number of parameters. These
parameters vary from batch to batch but our estimates are estimates of the
specific values of this particular batch.
(5.4.6)
I ntroducing (5.4.5) a nd (5.4.6) into (5.4.3) a nd then taking the logarithm of
P.I Y1•• . .. Y,,) gives
f(
In [I( /1,/ Y, •...• Y ,) J
+
(P I _p.p )2
Vp
 t [( n + I) ln2n+ In " , +:!: Ina!
22] In!(YI,· ... Y )
+ l":(YjPIXj)aj
n
(5.4.7)
Note that f ( YI. .. ·• Y,,) is ~ot a function o f the parameter PI'
In (5.4.7) we a re effectively considering the j oint p robability o f each
random choice o f ( both) PI a nd the subsequent collection o f observations.
We con~entrate o ur a ttention o n those possible choices which include the
ob~ervatlons we a~~ual~y o btained a nd h unt a mong them for that PI for
whl~h the probablhty IS greatest. This PI we use as a n estimate o f the
p articular ~alue for the batc.h chosen. Note that we are dealing with a
random vanable. P I' a collection o f possible values. a nd a constant P I' the
value actually chosen, that is, the parameter for the particular batch used
in the experiment.
Ta~ing the derivative o f (5.4.7) with respect to PI yields the normal
equation,
(5.4.8)
162
CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION
which. after the addition and subtraction of p./IX/ within the summation,
can be written as
(5.4.9)
where
Y;
F =.
I
(5.4.lOa.b)
0;
Solving (5.4.9) for b; then yields
I F;Z; + p.p Vp I
I ( F;  p.pZ; ) Z/
b l=p.p+
I Z/+VpI
....
I Z/+VpI
(5.4.lla.b)
The expected value of b l given by (5.4.11) is p.p. Hence the ~AP
estimator for b l is biased since it is not P I' the value for the particular
batch.
The variance of b l is affected not only by the errors in the measurements, Y . but by the variability of PI from batch to batch. F or measurements in~olving a particular batch we are interested in the variability of b l
compared to the value of the batch ( PI)' Hence we are interested in the
variance of the difference. b l  PI' Using (5.4.11 b) we can show that
Then the variance of the difference, b l  PI' is given by
(5.4.13)
where V ( P I)" Vp is used. Notice that as more observations are taken. the
relative effect of the prior information regarding the random parameter
diminishes. As the number of measurements becomes arbitrarily large,
IZ2+oo a nd thus V (b l  PI)+O. This means that the variability of
estfmators obtained using (5.4.11) approaches zero for a particular batch if
a very large number of measurements are taken for this batch.
.
.
Equations for the twoparameter cases involving Model 5 are gIven t n
Problem 5.21.
5." MAXIMUM A POSTERIORI (MAP) ESTIMATION
163
to a certain proposition. In this context the concept o f developing probabilities utilizing repeated observations is regarded merely as a means of
calibrating a subjective attitude. In this view to say that one thinks the
probability is one half that candidate A will be elected president means
that we have the same belief in the proposition " candidate A will be elected
president" as we would in the proposition " a toss of fair coin will produce
a head." We need not imagine an infinite series of elections in half of
which A is elected. and in half o f which he is defeated.
This view can also be applied to the estimation o f a physical property.
The following is an example given in reference 3. Two physicists. A a nd B.
are concerned with obtaining more accurate estimates of some physical
constant P. known only approximately. Imagine.physicist i f is very familiar
with previous measurements o f P a nd thus can' make a moderately good
guess of the true P value; let his prior opinion about P be approximately
represented as a normal density centered at 900 and having a standard
devia tion of 20.
(5.4.14a)
This implies that A believes that the chance of P being outside the interval
of 860 to 940 is only about one in 20. By contrast. suppose that physicist B
has little experience regarding values of P a nd that his rather vague prior
beliefs can be represented by .a normal density with mean of 800 and
standard deviation of 200.
f jJ(P)·[(27T)1/2 2OO ]  I exp [
I
8(0)2j
 2 ( P200
(5.4.14b)
We can see that B is much less certain of the true P value because any
value between 400 and 1200 is considered plausible.
Suppose that one of the physicists performs an experiment a nd an
observation of P is made. Further assume that this measurement contains
a n additive. zero mean, normal error with a standard deviation of 40. The
probability density o f Y is the same as given by (5.4.2) with the 0.4
replaced by 40.
T o make the results more general let us use the notation f ( PII p.) for the
prior subjective information for PI; for a normal distribution we have
5.4.1 Subjectlye PrIor Information
Some authors such as Box a nd Tiao (3] regard the prior probability
distribution as a mathematical expression of degree of belief with respect
(5.4.15)
CHAPTER S INTRODUCTION T O LINEAR ESTIMATION
164
T he conditional probability density of ! ( Y I ••••• Ynl {31) is given by (5.4.6).
F or this case the use of Bayes's theorem leads to maximizing the natural
logarithm of the p roduct!({3I/1)!(Y I •··· • Yn l{3). o r
~ 1 [(n + 1)ln20+ In.;+ ~In.,'
I n[f( P,I.)f( Y, ....• Y.I P,) J
+
({31  /1)2
2
0~
~( Y;  {3IX;)2]
+2
0,
(5.4.16)
1615
5." MAXIMUM A P OSTERIORI (MAP) ESTIMATION
W e see that after the single observation the ideas o f A a nd B a bout {3
( represented by the posterior distributions) a re m uch closer t han b efore
using the observation. N ote t hat A d id not learn a s m uch from the
experiment as did B. T he r eason is that for A t he uncertainity in the
measurement indicated by 0 =40 was larger than that indicated by the
prior s tandard d eviation. 0 " = 20. I n c ontrast, for B t he uncertainty in the
measurement was considerably smaller t han t hat o f B 's p rior (o~ = 2(0).
F or A t he greater influence o n the posterior distribution is the prior
whereas for B t he measurement h as g reater effect. As, however, more a nd
m ore Y j m easurements a re u sed for estimating {3, (5.4.17) a nd (5.4.1S)
i ndicate that the prior information has less a nd less effect u pon t he
estimate a nd its s tandard d eviation.
which is q uite similar to (5.4.7). T he e stimate for {31 is
~(F,  /1Z; ) Z;
b =/1+
I
~Z ,2+02
J"
~ F,Z, + /10,, 2
=~Z2+02
5.4.3
(5.4. l7a,b)
J"
0; .
which is identical to ( 5.4.lla.b). with Il being p.p a nd Vp b eing
I t is also
very similar to (5.3. l7a.b) which give M L e stimations for a combined
analysis o f two sets of observations.
As for the random parameter case the expected value of b l a nd the
variance b l  {31 are
(5.4.18a.b)
Note that though the estimators given by (5.4.11) a nd (5.4. 17) a re identical
in form. the meanings attached to the quantities /111' VII' a nd
are
different.
Let us r eturn to the example o f the two physicists. F or o ne measurement
Y ... 850 the estimator b a nd its variance for physicist A a re (since X ; = I for
11 = {3)
0;
Comparison o f Viewpoints
T hree d ifferent types o f p rior information have been discussed. First, in
Section 5.3.4 prior information from actual experiments is combined with
t hat from a new set of experiments. O nly m aximum likelihood need b e
used a nd t he ideas a re relatively straightforward. I n t he M AP cases, which
use Bayes's theorem, the ideas a re less clear a nd h ave been the subject o f
c ontroversy. In the first case, the parameters a re r andom, as in the case of
the thermal conductivities o f d ifferent batches o f steel in the example
above. I n t he second M AP c ase the parameters a re n ot r andom b ut o ur
p rior b elie! c an b e i ncorporated i nto a subjective prior.
F or e ach viewpoint the form o f t he p arameter e stimators a re identical.
T he o nly differences a re in symbols a nd m eanings o f the terms for the
prior mean a nd v ariance. I n e ach case, the variance o f b . {3. gives the
same mathematical expression.
Problem 5.21 gives the estimators for the twoparameter model (Model
5).
E xample 5.4.1
A scientist has measured a certain physical phenomenon and obtained the data
given below. From knowledge of his measuring device, the variances of the
measurements are also given. From his previous experience he feels that he can
give a prior nonnal distribution with a mean of 1.01 and a variance of 0.001 for the
parameter.
R epeating the same calculation for physicist B gives b B =S48 a nd V (b B )=
153S . N ote t hat though the observation was the same for both physicists,
the different normal prior distributions resulted in physicist A having the
posterior distribution of n (890, 17.92 ) a nd physicist B h aving n(84S, 39.22).
H ence physicists A a nd B have different estimates a nd d ifferent standard
deviations of 17.9 a nd 39.2. respectively.
XI
1
2
3
4
0 .01
0 .1
1
10
YI
0;
0.02
0.12
0.8
13
0,01
0 .05
0.1
2
CHAPTER 5 IN11tODUctJON t o LINEAR E SnMAnON
166
The regression function is
7/,. fJlX, and the assumptions regarding the data are
ror;~j
V (X,)o . (J,2 values are known. Estimate fJl using ( a) OLS, (b) ML. and ( c) MAP
estimators. Also find the variance o r the estimate in each case.
S olution
The assumptions given above can be designated 1101 I I 10. Various sets of assumptions are used in the dirrerent estimator methods.
( a) The OLS estimator does not use any statistical assumption. Using (S .2.S) the
estimate is
bl.oLSII Y j J [IX/rljX
0.01(0.02)+0 .1(0.12)+ 1(0.8)+ 10(13)
0.0001 +0.01 + 1 +100
. . 1.29S0
The calculation or the variance or hl .ot.s does require some assumptions: we use
those designated 110111. With the nonconstant a 1• (S.2 . 1O) is not valid for finding
the variance. Instead the reader should derive
V(bI.O LS )  [IX/a,1] [ IX,1r
2
0.0001(0.01)2 + 0.01(0.OS)2 + 1(0.1)2 + (100)(4)
~~ . . 0.0392
(101.0101)1
( b) F or ML estimation the assumptions needed are those given above. Prior
inrormation is not used. From (S .3.4) and (S .3.7) we find
I Y,X,a 1
Ix1  1 j
b l .MI.
J (JJ
0 .02(0.01)(0.01)2 + . .. + 13(10)(2)1
1
2
1
1
1
(0.01) (0.01) +(0.1) (O.OS) + 11(0.1) + 101 (2)1
 IIIj 3 " '0.91769
0
V(hI. MI)  [IX/a}  1
r
1  (130)  10.00769
( c) For MAP estimation the subjective prior information is included. Using the
assumptions given above permits the use of (S.4.17b) and (S.4 .18b) to get
I
119.3+ 1.01( .OOW
 0.99938
130+ (.001)  1
V(bI. MAP )  (IZ/+ a,,1]  1_(1 130)  10.00088S
For the OLS estimation no statistical assumptions are used : this implies that no
inrormation is used regarding the errors. Maximum likelihood estimation uses
5.5 MULTIPLE D AtA POINtS
161
information regarding the measurement errors. MAP estimation uses the prior
information regarding the parameter in addition to the information used i n M L
estimation. This suggests that the parameter variance for ML would be less than
that o f OLS a nd that o f MAP would be the smallest. This is indeed what occurs in
this example. However. if many additional measurements are given. the effect of
the prior information is to reduce the disparity in values given by ML and MAP. I f
the errors do not have constant variance, the OLS values could be different from
those given by ML and MAP even for a large number of observations.
5.5 M ULTIPLE DATA P OINI'S
One way to gain insight into the assumption of the constant error variance
(that is. alo:::: ( 2) is to use repeated measurements. For Models 2, 3, and' 4,
this means to have more than one measurement of Y a t each XI' For
Model S repeated measurements occur for more than one Y/ value a t each
combination of Xit' X/ 2• Repeated measurements are not always possible to
obtain, but whenever possible they should be obtained for each new
problem until the nature of the dependence of 01 on ; is understood.
Furthermore, multiple data points could be useful in investigating the
validity of other assumptions such as those of zero mean, uncorrelated,
and normal errors.
In some cases repeated measurements can be simply obtained by investigating another specimen a t the "same" conditions. In other cases,
repeated measurements can be obtained by using several sensors attached
to the same specimen. An example of the latter is for temperature mea'surements in solids and fluids; the thermocouples (if they are used) might be
all placed to measure the same temperature. The same could be true for
other sensors as well.
I t is important to distinquish between repeated measurements and
taking repeated readings of the same measurement: A failure to do so may
lead to inefficient design of experiments and to erroneous statements
regarding accuracy of the parameters. The difference between repeated
measurements and those that are essentially repeated readings can be
illustrated by an example involving the temperature history of a solid
copper block that is initially hot and then allowed to cool in open air.
Several thermocouples are attached to it. Because of the high thermal
conductivity of the copper the temperature of the block is quite uniform
throughout it a t any given time. The temperature of the block gradually
decreases with time, however.
Consider first a given thermocouple. At any time the thermocouple
would yield a temperature measurement which is in error owing to a
number of different factors. Perhaps the largest factor is that due to
161
CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION
calibration errors. Over the whole calibration temperature range the
average error is nearly zero b ut a t most temperatures the calibration error
is n ot zero. Hence if several temperature measurements are made with only
a short time interval between them. the " same" calibration errors would be
in each measurement. Very nearly the same measurements would be
obtained so that these could be considered repeated readings of the same
measurement. These repeated readings may contain random components
b ut the variance would be small compared to the calibration error.
A repeated measurement of the temperature a t a specified time is more
appropiately given by another thermocouple embedded in the specimen. I t
too would have a calibration error but the error would be independent of
that of the first one (provided the calibrations are independently made for
each sensor).
I f a measurement is taken a t some later time when the temperature has
dropped considerably. the calibration error in the temperature measurement will be nearly independent of the early measurements for the same
sensor.
I t is also possible to obtain repeated measurements involving thermocouples (or other sensors) using the same sensor. This would occur in the
above example if the calibration were very good a nd the associated
variance were small compared to fluctuations in the readings due to
electronic noise. For example. it might be that unbiased measurements of
the temperature of a stirred waterice mixture would produce values of
0.11.  0.06.  0.01. 0.03 •...• 0.05°C when the correct value is O°c. T he
same type of random measurements might be produced for small o r large
time spacing between the measurements. In this case the errors are random
with zero mean. These measurements can be considered repeated values
even if the " same" specimen and sensor are used. The above examples
illustrate that it is necessary to be careful to distinguish between repeated
measurements and repeated readings.
5.5.1
Sum o f Squares
T he case of ordinary least squares is first considered. O ne can always
number the observations so that we can write
n
s=
.~
L ( Y,1Ji
, I
(5.5.1)
if there are any repeated values. the estimators given in Section 5.2 still
. apply. Some saving in effort. however. can be sometimes achieved by
denoting the observations YI.J. a nd the regression function .... .. There might
'1)0
Ie
5.5 M ULTIPLE DATA P OINI'S
b e m l measurements of Y a t X I' m2 measurements a t X 2, . .. , and m, a t X,.
Typically the Y values will be designated Ylr for location X; with j = I,
2, . .. ,m;. T hen (5.5.1) c an be written
,
S=
,
WIj
LL
( Yij1Ji
L
where
m j=n
(5.5.2)
jI
iI jI
Let us now derive another expression for .S t hat is frequently easier to
use than (5.5.2). I t applies equally well for both linear a nd n onlinear cases
a nd shows that minimizing S n eed involve only means of the Yy's for each
i. C onsider first the identity,
(5.S.3)
where Y/ a nd a nother mean (to be used later) are
(S.S.4)
Squaring a nd summing (S.5.3) over i a ndj gives
+ 2 L ( Yij
Y;)( Y;1J;)
(S.5.5a)
I ,j
(S.5.Sb)
T he crossproduct sum in (S.S.Sa) is zero because the summation on j is
e qual to zero. Note that the first summation in (S.S.5b) is n ot a function of
the parameters. Hence for linear a nd nonlinear p arameter estimation problems with repeated measurements the same parameters will be found if we
start with the function
(5.5.6)
rather than (5.5.2). Note that (5.S.6) requires less computation, however. I f
the measurement errors are independent, b ut h ave variances dependent
only o n i , maximum likelihood estimation (with the assumptions 1111111)
171
CHAPTER 5 INTRODVcnON TO LINEAR E SnMAnON
5.5 MULTIPLE DATA POINTS
c an be performed by minimizing
y
,
S I = ~ (F,  H;)l
.0:
.•
 II
V iVI
(5.5.7)
T~~~~~II
i I
I V;. XI"
__
1"1)1
where
Ib~. X,
I.L.....r
(5.5.8)

__
I I ). • XI"
I
r elreuion v alue
a t XI
e xpected v alue
a t XI
I"~. i",
W hen estimating parameters using repeated measurements, it is necessary
t hat, >p where p is the number of parameters. In Model 3, for example,
estimates of Po and PI would require measurements at no less than two
different XI values regardless how large ,. is.
5.5.2 Parameter Estimates
X
Parameters can be estimated by minimizing (5.5.7) for various models
given in this chapter. Economy in obtaining estimators can be obtained by
utilizing previous results. Consider first Model 2 (T/; = PIX;) a nd ML
estimation with the assumptions 1111111. T hen b l is given by (5.3.6) with
F; as defined by (5.5.8) a nd Z I by
(5.5.9)
T he variance of b l is given by (5.3.7) with Z I defined by (5.5.9).
F or M odelS given by T/;  PI XII + fJ2XI2' the estimators b l a nd bl , their
variances, and covariance can b e obtained from (5.2.24) a nd (5.2.26) by
letting
FIIIft 5.4
Relationships a mon, observations, etc. for repeated measurements.
Example 5.5. t
F our measurements are made for both XI  0 a nd X lSO with the same errors e, as
in Example 5.2.4 except the fifth error is not used. Then the Y I) measurements at
XI a re 0.25B, 0.966, 2.453, a nd 1.963, whereas a t X lBO, Y~ is 9.4IB. 10.792, B.626,
a nd B.778. The assumptions of additive, zero mean, constant variance, uncorrelated, normal errors, and errorless X, are valid. There is n o prior information and
a 2 is unknown.
( a) Using expressions developed in this section, estimate the parameters flo and
fll in Model 3.
(b) Find the estimated standard errors of bo a nd b l •
Solution
(5.5. IOe,d)
F or Model 3, TJ/fJO+PIXI, the M L results can be obtained from the
above proc~dure more simply from ( 5.3.1012) by replacing al by al/m/
a nd Y(by Y/.
T he number of terms related to Y a nd T/ has increased in this section. In
addition to the observed value Y!I' th~re is the value ~, which is the
average of the Yu values at a given XI' YI is the predicted regression value
a t XI; 1J1 is the actual regression "yalue a t Xi' that is, by ~efinition E ( Y/i)
a nd thus the expected value of Y/ a nd of Y/ also; and Y is the weighted
average of the Yu values over all the XI values. These symbols are
illustrated by Fig. 5.4.
(a)With the assumptions given, 11111011, the estimates can be obtained using OLS
o r ML. T he simplest expressions to use are those given by (5.3.10,12) by replacing
al by a2/ m, a nd YI by f,. Since al is a constant (5.3.10) a nd (5.3.11) can be written
I~f,m/(XIX)
bl :...,..Ijm)(.\j  X
t
( a)
( b)
(e)
171
CHAPTER 5 I NTRODUcnON TO LINEAR ESTIMATION
In the above equations r =2, m l=m2=4, X I=O, and
X2=80;
173
a nd the estimated standard error is s 0.9867.Then using (d) and (e)
also
2
6400(4)/8 ] 1/2
est. s.e.(bo) =0.9867 [ 2(1600)(4)
 0.4933
~ 4 X.
X  •  I8
5.6 C OEFnClENT O F MULTIPLE DETERMINATION ( R2)
= 8I" [ 0 + 4( ) ] = 40
80
est. s.e.(b.)0.9867[2(l6oo)(4)]
 1/2
 0.00872
Though the value of bo is less accurate than that given in Example 5.2.4, the
variances are smaller in this example than in Example 5.2.4. These estimated
variances corroborate the theoretical result that smaller estimated variances are
generally obtained for Models 3 a nd 4 by concentrating the measurements a t the
minimum and maximum XI values.
2
~ ~2
y= k~
=
5.6 C OEFFICIENT O F M ULTIPLE DETERMINATION ( Rl)
! (I.41O+9.4035)5.40675
Then using the expression (a) for b l, we obtain
= [ 1.41(4)(  40) + 9.4035(4)(40)11[4(1600) + 4(1600)]
0.09991875
( b) The expressions for the estimated standard errors can be obtained from
(5.3.12) by replacing u ;2 by m ;s2 to get
In this section the sum of squares are compared for two different models
applied to the same data. Ordinary least squares is used as the estimation
procedure. The analysis will start in sufficient generality to permit the
models to be linear o r nonlinear in the parameters. Later the results are
specialized to Models I a nd 4. In the following discussion we consider two
models, designated A a nd B. Frequently Model B has the same functional
form a nd parameters as Model A except there is a n additional parameter
in Model B. M any authors restrict the meaning o f R 2 to the case where
Model A is Mode:l I.
/
Let A Y b e the predicted value of Y/ for Model A a nd BY; for Model B.
We start with the identity
(5.6.1 )
(d)
,
which can be also written as
_
e st.s.e.(bl)=s [ ~(XkX)
2
m.]
 1/2
Since ~ere a r! two X; values and two parameters, the predicted line passes
through YI and Y2 • Then the minimum sum o f squares resulting from (5 .5.2) is the
first term on the right side of (5.5 .5),
.'It
Smin= ~
_
(
2
Y ij Y;) = 2.9179+2.9239=5 .8418
(5.6.2)
( e)
for which the residuals for Models A and B are defined by
(5.6.3)
a nd
Let us square a nd sum (5.6.2) o ver; to get
'J
(5 .6.4a)
so that the estimated variance of the errors is
S2= S min/(n  2) = 5 .8418/6=0.9736
SST
= SSE +
S SR
+
2SC
(5.6.4b)
·7"
5.1 ANALYSIS o r YAIlIANCE A BOtrr 1 1IE SAMPLE MEAN
CHAPTER 5 INTRODtJcnON TO LlNtAR f SMMAnON
Each term in (5.6.4b) corresponds to the term in (5.6.4a) directly above.
N ote t hat SST is t he minimum sum of squares for Model A a nd SSE is
the minimum sum of squares for Model B. Let us specify Models A a nd B
so that
R 2 c an be calculated from (5.6.7) which becomes
SSR
R 2=1 IBel = 1 SSE
I,.e/
SST
(5.6.6)
l:(Y, i)2 b:I(X,X)2 bll:(X,X)Y,
SST
I(~
R 2__
(5.6.5)
which would be always true if Model A could be obtained from Model B
by making a certain parameter in Model B equal to zero.
Divide (5.6.4) by the left side a nd rearrange to the form
175
.•
,
it
_
l :(Y,
it
_
I(Y, i)2
(5.6.11)
.
where i is associate~ with Model A ( Modell i n this case) a nd Y, with
Model B (i.e., 4). I f Y, . . Y" t hat is, the prediction is perfect, then R 2 = I. I f
i, t hat is, bl  0 o r the model Y =Po + e alone fits the dat,!.. R 1 ... O.
T hus R 2 is a 'measure o f t he usefulness o f t he term P I(X, X) in the
model, it being n ot n eeded for R 2~O a nd n eeded for R 2~ I . R 2 a s
given b y (5.6.11) is the correlation coefficient o f (2.6.17).
Y,'"
Example 5.6. t
where R 2 is ca11ed the coefficient of multiple determination a nd is defined
by
Investigate the goodness o f fit as indicated by R 2 for Example 5.2.4.
Solution
(5.6.7)
Because of condition (5.6.5), an examination o f (5.6.6) reveals that 0 <: R 2
<: I where R 2~0 corresponds to both models being nearly as effective a nd
R 2~ I corresponds to Model B being much better than Model A. Then R 2
c an b e used to say something about the improvement in the "goodness o f
fit," R 2_0 being the poorest a nd R2"" I being the best improvement in
using Model B r ather than Model A.
F or nonlinear problems, the parameter estimates a nd sum of squares can
be found separately for Models A a nd B a nd then R 2 would be evaluated
using (5.6.6). F or the simple linear models given next a simplified form o f
(5.6.7) is frequently used.
A classical case considered in connection with R 2 is for Models I a nd 4
b eing A a nd B, respectively,
(5.6.8)
The term SC in (5.6.4b) a nd (5.6.7) is then
(5.6.10)
where the normal equation for Model 4 a nd p arameter p. was used. Hence
Using (5.6.11) a nd values given in Example 5.2.4 gives
2
R 2_ (0.101988) (~)  0.9131
1:( Y , 5.366)
which is nearly unity, indicating that the f Jl(X,X) term may be needed in the
model.
5.7 ANALYSIS O F VARIANCE A BOur m E SAMPLE MEAN
T he s ubject o f analysis o f v ariance is a b road o ne a nd c ontains many
different facets. I n this section only certain aspects o f the analysis o f
v ariance ( ANOVA) a re considered.
The preceding section e mployed n o statistical i nformation a nd t hus n o
probabilistic statements c ould b e made. This section uses many o f the
s tandard assumptions. Assume t hat t he errors a re additive, uncorrelated,
a nd n ormal a nd have zero m ean a nd c onstant variance. T he 0 2 value is
unknown a nd there is n o p rior information regarding the constant parameters. T he X, values are nonstochastic (i.e., errorless). These assumptions a re
designated 11111011.
F or models I a nd 4 given b y (5.6.8) a nd (5.6.9), equation (5.6.4a) c an
be written
2
I (Y, Y)
SST
.2
_2
 I(Y, Y,) + l:(Y,

SSE
+
Y)
SSR
( 5.7.la)
(5.7.lb)
CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION
176
Y is for Model A (or I) and Y is for Model B (or 4). The sum of squares
j
o n t he left side of (S.7.1a) is sometimes called the total sum of squares a nd
designated SST. The first term on the right of (S.7.la) is called the error
s um of squares, SSE. T he remaining term is (S.7.la), called the regression
s um of squares, SSR. I t can be proved that SSE a nd SSR are independent.
Any sum of squares has associated with it a number called its degrees o f
freedom. Let the sum of squares be written as a sum of the squares of
independent linear forms. (A linear form, for example, is ~ OJ Y where the
j
a;'s a re constants and the Y;'s are variables.) Then the number of independent linear forms is the number of degrees of freedom. T he sum of squares
o f y.I  Y.I for the assumptions 11111011 is n  p for n being the number of
observations and p the number of independent parameters. Hence SST h as
n  I degrees of freedom and SSE has n  2. Since SSE a nd SSR a re
independent, we know from Cochran's theorem (4] t hat the sum of the
degrees of freedom of SSE a nd of SSR is equal to the degrees of freedom
of SST. This information can be used to obtain that which is displayed in
Table S.4.
Table 5.4 ANOVA Table for Partition of Variance About Y, (5.7.1)
Source of
Variation
Sum of
Squares
Mean
Square
n 2
I . Deviation about
regression line
(residuals)
Degrees of
Freedom
T his statistic can provide a measure of how much the additional paramete.
13. (i.e., using the model Y ; 130+ 13.(X1  X)+ £1 r ather than Y ; 130+ £;) i~
needed. I f F is n ear unity (corresponding to R2:::::::0 i n (5.6.11)], then the
twoparameter model (Model 4 ) does not significantly improve the fit
c ompared to the one parameter model (Model I). T he o ther extreme is
large F (which corresponds to R2:::::::1 i n (S.6.1I)]; in this case we c an be
confident that the 13. p arameter is needed.
A probability statement c an b e made utilizi~g the F statistic a nd a table
of its distribution which could b e used to obtain the value o f F ._ a (l,n2).
See Section 2.8.10. T he p robability of F being less than F ._ a (l,n2) is
I a o r
(5.1.3)
p [ F < F ._a(l,np)] = Ia
Alternatively we c an write
(5.1.4)
In words, if the null hypothesis H o :13.=O is true, the probability that the
calculated value F exceeds the tabulated value is a . I f F is greater than
F ._a(l,n  p), we reject the null hypothesis a t the given significance level
a . I f the calculated F value is less than F ._a(l,np), we say that we
cannot reject the null hypothesis  that is, it may be that 13. = o.
s 2=SSE/(n2)
Example 5.7.1
2. Deviation bet
SSR
ween the regression line and
mean
3. Total deviation
between data
and mean
IT1
5.7 ANALYSIS OF VARIANCE A Bour m E SAMPLE MEAN
Using the data of Example 5.2.4 (ievelop a n analysis of variance table and
determine if the P. parameter is needed. Make the probability 1% o f falseh
deciding that PI is needed.
Solution
SST = I ( Yj  y)2
n l
Using the data from Example 5.2.4 the following ANOVA table is constructed.
Sum o f
Squares
Degrees o f
Freedom
Mean
Square
I . Residual
5.9377
7
We n2..w wish to employ an F test to o biain an indication if the
13.(X;  X) term in Model 4 is needed. F or the assumptions indicated by
11111011, an F statistic can be given. Recall that an F statistic is the ratio
o f two independent random variables, each having a x2 distribution a nd
e ach divided by its respective degrees of freedom. O ne x2 statistic can be
formed by SSE divided by 0 2 a nd another independent x2 statistic is
S SR/ 0 2• T hen an F statistic is
( SSR/ 0 2 )/ I
F=
(5.7.2)
( SSE/02)/(n2)
2. Deviation
between
line and
mean
62.4097
3. Total
68.3474
Calculated
F
0.92100
Source
62.4097
62.4097
0.92100 67.763
8
From a table of the F distribution, we find
CHAPTER 5 I NTRODUcnON T O LINEAR f S11MAnON
171
co
00
Since F >Fo.,,(I.7). we reject the null hypothesis that 11,0. I f fl, is not needed.
our method has only a Ill. chance of causing us to use the model '11.110+ 11,(X,X) rather than ' I," Po" flo ·
The use of the F test for model building is considered further in Chapter
5.1 ANALYSIS o r vAllIANCE A BOtrr n u: REGRESSION LINE
1'79
T he number o f degrees of freedom o f the last term are given by subtraction. The various terms are labeled SS" SS.., a nd SS,; note that the terml
a re not completely analogous to those in (5.7 . 1), b ut a re similarly labeled.
In fact, (5 .8.1) can be used in (5.7.1) to get
6.
S STSSE+SSR
(5.8.3)
 [SS.. + SS, ] + SSR
S.I ANALYSIS O F VARIANCE ABOUT m E REGRESSION LINE
FOR MULTIPLE MEASUREMENTS AT EACH X,
where an additional summation is used in (5.7. 1), a nd then
(5.8.4)
Consider the case of partitioning the variation about the predicted regression line for multiple measurements at each X" F rom (5 .5.7). which applies
for linear a nd nonlinear parameter estimation, we have
(5.8.5)
where i is defined by (5.5.4).
Table 5.5 shows the analysis o f variance table for (5.8.1) in lines 2 a nd
3; the table as a whole illustrates (5.8.3).
The mean square
which is defined by
or
SS,
SS..
T otal s um o f
s quares·
b etween data and
.
'
regressIOn I Ine; "residuals"
(d.f. == n _ p )
+
$:,
SS,
Sum of squares of
of squares
local mean about
within data sets;
"
••
regressIOn h ne;
pure error sum of + " I ack 0 f f'It sum
"
squares
of squares"
(d.f.... n  r)
( d.f.=rp)
where d.f. stands for degrees o f freedom. The number o f degrees of
freedom on the left has been discussed previously; it is the total number of
points minus the number of parameters. The first term on the right has the
contribution f rom; = I of
m, 
which has
I degrees of freedom; the second contribution ( i  2) woul.d
have mz I degrees o f freedom. Hence for the first term on the right hand
side of (5.8.1). the number of degrees of freedom is
,
d .f.= ~ ( m,I)n  r
I I
· SS, i . our former SSE.
SS..
n r
1
Sum
(5.8.2)
..
(5 .8.6)
$ 
Table 5.5 ANOVA Table for Partition of Variance Aboot
an~ Aboot Y, (5.8.3)
fl ,
(5.8.1),
Sum of
Squares
Degrees of
Freedom
SS..  1:1:( Y ,, Y,)z
I tr
s~SS.. / (It r)
2. Lack of fit sum
of squares
SS,  1:m,( Y,  r,)z
r p
s !SS,/(rp)
3. Residual sum of
squares
S S,1:1:(Y,, r,)z
I tp
s 2SS,II(Itp)
r,  Y)z
p I
SSR
Source of
Variation
I . Pure error sum
Mean
Square
of squares
4. Sum of squares
between line and
mean
SSR  1:m,(
S. Sum of squares
S ST1:1:(Y,, f~
between data and
mean
I tI
CHAPTER S INTRODUCTION T O LINEAR ESTIMATION
180
is a n unbiased estimate of 0 2 even if the true model is not used o r if the
model is nonlinear. Hence this estimate of 0 2 is said to arise from " pure
error." On the other hand , 5 2 ,
2
SSt
5 =
5.8.1
Expected Values of
52
0
2
W hen the model is incorrect, the residuals contain both r andom (qij) a nd
s ystematic o r biased components (B;) which are respectively called variance a nd bias e rror components o f the residuals. A n i ncorrect model
results in a n inflated residual mean square.
5.8.2
if the model is incorrect.
F T est with Repeated Data
F or this case o f r epeated observations, a n F statistic is ( forp=2)
J
for Incorrect Model
l et us investigate the effect upon 5 2 of an incorrect mathematical model.
We recall that eij is the residual for t hejth measurement a t X, ; it " contains
all available information on the ways in which the fitted model fails to
properly explain the observed variation in the dependent variable Y " [ I, p.
26] . Recalling lJ, = E ( Yij ) a nd writing
eij = Y!I Y'=(Y!I  Y, )E(Yij  y,)+E(Yij Y,)
= {(
Yij  Y, )  [ lJ,  E ( Y, ) ] } + [ lJ,  E ( Y
(5.8.8)
j) ]
(5.8.9)
where
qij = {( Yij  Y,)  [ lJ,  E ( Y; ) ] },
.81
(5.8.7)
n2
is not a n unbiased estimate of
5.8 ANALYSIS O F VARIANCE ABOUT 1 HE REGRESSION U NE
B ,=lJ;E(Y,)
(5 .8.10)
A
B; is calIed the bias error a t X j ; it is zero if the model is correct ( E [ Y;]= lJJ.
T he random variable qij has a zero mean whether the model i5 correct or not
since E (Yij) = lJ; is true in a ny case. These statements regarding B; a nd qij
are true for nonlinear as well as linear models.
F or M odelS with the assumptions denoted 111111 (except that E ( Y;)
= 1/;  Bi ) it can be shown for O lS a nd M L estimation that
A
(5 .8.11 )
which reduces to
(5.8.12)
where (5.2.3\) is used . I f the model is correct, the last term in (5.8. 12)
disappears.
SS..
I
SS,
[ ~~~
F ,=
I
(5.8.13)
SS
..
where numerator a nd d enominator contain X2 d istributions if the model is
c orrect; is called the mean square due to lack o f fit. T his F, value should
be compared with F ,_,,(r2,nr). I f F ,>F,_,,(r2,nr), we say that
F, is significant a nd we m ean that the model is i nadequate. An estimate of
2
0 using
would be unbiased, b ut using S 2 o r would be biased a nd tend
to yield too large a n estimate. I f, o n the other hand, F, < F '_a(r  2, n  r),
F, is said to be not significant; there is no reason to d oubt the adequacy of
the model a nd b oth 'the p ure error a nd lack of fit mean squares (s; a nd s;)
c an b e used as estimates o f 0 1• Moreover, S 1 is a pooled estimate o f S 2. See
Fig. 5.5 for a schematic diagram summarizing the steps for checking for
lack of fit with repeated observations.
T he use o f t he F, statistic as given by (5.8.13) does not preclude the use
o f the F statistic given by (5.7.2). T hey give different information. F,
(5.7.2), c an b e used whether there are repeated measurements o r not; it
tells whether p, is needed a nd c an b e generalized to investigate the validity
o f a dding another o r several parameters to the model. F or cases where
there are repeated measurements, the F, test can indicate if the model is
s atisfactory (with no reference to adding a nother p arameter) a nd c an tell if
1
0 c an be estimated from S 2. F or r epeated measurements b oth tests should
be used.
With the two F tests we can have four combinations associated with ( a)
significant (or not significant) lack o f fit a nd ( b) significant (or not
significant) linear regression. These combinations are illustrated in Fig. 5.6
a nd the results are summarized in Table 5.6. In each case the model
s;
s;
s;
Y =Po+P.X+t:=P(,+P. ( XX)+t:
is used.
... ,.
•r
',
';
I
a
DOT
1.
P rovides e stimate c f G'
i f m odel c or r ecto
IT 1. + b ias t e r m i f m odel
i nadequate .
I
m ean
s quare
1. S Sr
s r   r1.
Lack o f f1 t
sum o f s quares
SSr
T est nc s i gni [ icant .
No r ea, o n t o
q uestio n m odel.
,......
:
d .f . = r D
R esidual 5 5.
S St
d . f. = n p
II
s
F r=
~
o~'.1 F, .•" . z..
1.
r
r
s
e
\
P ure e rror 5 5
{ rom r epeated
m easurements
S Se' d £ = nr
I 
m ean
s qutre
s
e
E q.(5.8.6 )
r nl
T est s ig n ificant.
L...
M odel i r a dequate .
P rovides e stimate
o f IT 1. e ven 'i £
m odel i nadequate.
F Ipre 5.5. Schematic diagram for c heckingiaclt o f fit w ith repeated oblcrvatioDi. (Adapted from Applied R egreuion ADalysia
by Norman R. D raper a nd H arry Smith, John Wiley & SoDS.)
y
••
•
C ASE Z
y
N o l ack o f f it, S ign. l inear r eg.
( Model a dequate. 13 :j: 0 )
1
•
•
• ••
•
•
•
y
•
•
N o l ack o f f it. l inear r eg. n ot s ign.
( Model a dequate, ~l m ay b e z ero)
X
y
X
y
..
•
i.
•
•
•
C ASE 4
S ign. l ack o f f it ( Model i nadequate)
S ign. l inear r eg. (13 :j: 0)
1
X
i
...
•
'
..
•
y
•
•
•
S ign. l ack o f f it
( Model i nadequate)
l inear r eg. n ot s ign. (~l m ay b e z ero)
X
F Ipra 5.6. Typical s tnipt line situatioDi. (Adapted from Applied R qraaion ADalysia by N orman R. D raper a nd H arry Smith,
J "... ., w a .." ....
c::',...~ ,
CHAPTER 5 I NTRODUCTION T O LINEAR E STIMATION
184
115
5.10 T HE STANDARD A SSlIMP110N O F ZERO M EAN ERRORS
Table 5.6 Summary of Observations from Figure 5.6
O bservation
Significant lack
of fit
F ,>F I _ n (r2,nr)
S ignificant linear
regression
F > F I _ a (l.n2)
Case 3
Case 4
X
Case 2
Case I
X
y
X
X
F or c ase I the linear model is a dequate since there is no lack of fit a nd
t here is significant linear regression. F or case 2 the linear regression is n ot
s ignificant; hence the model Y= Y would be recommended. F or c ase 3
there is lack of fit, but the linear regression is significant; thus one might
try f = 130 + 131 X + 1311 X 2 + (. In c ase 4 there is a significant lack of fit a nd
n ot significant linear regression. A model such as Y = Po + 13 1X + 13 11 X 2 + (
w ould be recommended even though there is n ot significant linear regression. (Why?)
Both tests need not be limited to testing the adequacy of the simple
linear model f, = f3 0 + 13 1: ( + ( i' b ut can be applied to linear estimation
with more parameters a nd even to nonlinear parameter estimation; this
c an be done if there are repeated observations for the s tandard c onditions
of zero mean, independent. constant variance. a nd n ormal errors.
After saying the above. it should be emphasized that considerable
insight can sometimes be gained in unfamiliar cases if the residuals a re
p lotted a nd inspected visually.
5.9
C ONFIDENCE I NTERVAL A BOlIT T HE P OINTS O N T HE REGRESSION L INE
Let us c onsider a confidence interval a bout a ny point o n the regression
line
(5.9.1)
.
.~
R egression
l ine
T his requires the variance of f k' which is given by (5.2.4Ia). Using this
expression with (J r eplaced by s the estimated s tandard e rror is
".
.
est. s.e.( fir ) =
I
S

f
n
+
 2]1/2
(X/r  X)
_2
~(Xi X)
950/0
c onfidence l imits
o n Yk f or e ach X
k
X
k
X
X
F lpre 5.7 Confidence intervals about points on the regression line.
which is c learly a m inimum a t X" = X a nd b ecomes larger toward the
extremities; (5.9.2) implies that we d o n ot k now a. T he c onfidence limits
for 'flc a re
(5.9.3)
for n o bservations o f Yi , p p arameters, a nd 1 00(1 a) c onfidence. Figure
5.7 shows the 95%, say, confidence limits for the model (5.9.1); the curved.
hyperbolic lines a bout t he straight regression line give the confidence
limits.
These limits can be interpreted as follows. S uppose t hat repeated sets o f
m easurements of f a re taken a t the same X values as were used to find the
confidence limits given in Fig. 5.7. Then. of all the 95% confidence
intervals constructed for 'l}1c = E ( flc) a t X". 95% of these intervals will
contain E ( flc ).
C onfidence intervals a nd regions for parameters a re discussed in
C hapter 6.
5.10
V IOLATION O F T HE S TANDARD A SSUMPTION O F Z ERO M EAN
E RRORS
(5.9.2)
I n the next few sections violations o f the basic assumptions a re c onsidered.
O ne o f the easiest t o t reat is t he case of additive errors t hat d o n ot h ave a
1116
CHAPTER 5 I NTRODUcnON TO LINEAR E SnMAnON
z ero mean . T he a ssumptions then are 10111111.
W e a re concerned here with n onzero m ean errors that remain after a ny
a ppropriate c orrections have b een m ade. Suppose, however, after all
known corrections h ave b een made, the errors still d o n ot have a zero
m ean so t hat
(5.10.1 )
w here . /;'#0. L et f, b e written as two terms o ne o f which h as a z ero m ean,
f ih+Vi'
(5.10.2)
C onsider several functions o f h in connection with Model 2. 7Ji  P.X
"
w ith Xi n ot b eing the same for all i. T he first function t hat we consider is
h  c, c onstant. Then Y, for Model 2 c an b e written
(5.10.3)
w here now the bias c is a p arameter t o b e e stimated in addition to P•. In
this case a oneparameter Model 2 p roblem becomes a twoparameter
Model 3 problem.
I f h h appens to be proportional to Xi o r 1 ," e Xi t hen instead o f (5.10.3)
we write
(5.10.4)
a nd t hus it is possible to estimate only the sum p. + c.
A nother case is when h " cZ/ is s ome known function which is n ot
p roportional to X" This reduces to a M odelS e stimation problem which
involves two parameters.
.
~.11
V IOLATION OF m E S TANDARD A SSUMP110N O F N ORMALITY
5.11 VIOLATION OF 1 1ft STANDARD ASSUMP110N o r NORMALITY
117
W e n ote t hat t he previously used estimators o f t he v ariances o f the
p arameters a re unchanged. C onfidence i ntervals a nd tests for significance
given in this c hapter a re b ased o n t he a ssumption o f n ormal errors,
however; for small n umbers o f o bservations the intervals a nd tests could
b e s ubstantially in error. Fortunately, for larger s ample sizes a nd p rovided
the distribution is n ot r adically n on n ormal, the confidence limits a nd tests
o f s ignificance c an b e u sed as r easonable a pproximations.
I f t he form o f t he underlying probability density o f t he e rrors is known,
then the maximum likelihood a nd m aximum a p osteriori methods c an b e
used. F or e xample assume t hat a ll t he s tandard a ssumptions a pply e xcept
t hat t he p robability o f t , is given b y
(5.11.1)
T hen the M L f unction to minimize is
S ML 
"
~ I Y, 7J;/
(5.11.2)
I I
U nfortunately, minimizing
m easurement e rrors.
S ML
is n ot a s simple a s i t would b e f or normal
E xample S .ll.I
'1,
For Model I,
flo, estimate fJo for the data as given below. Assume that the
assumptions 11110111 are valid and that f(~) is given by (5.11.1).
( a) Y .O, Y 21.
(b) Y .O, Y2O.5, Y ,I.
( c) Y .O, Y20.25, Y,0.5, Y4 1.
( d) Generalize the results.
Solution
( a) For the observations Y .O and Y2 I,
I f t he s tandard a ssumptions excluding t hat o f n ormality are valid
(11110 I II), o rdinary least squares estimation c an still be used. T he resulting least squares estimators are unbiased a nd h ave minimum variance
a mong all linear unbiased estimators, b ut they are n ot efficient. A c onsequence o f the central limit t heorem is t hat the least squares estimators
are consistent a nd a symptotically efficient almost regardless o f the d istribution of the errors, however. Hence when the normality assumption is n ot
justified. least squares estimators still retain most o f t hdr d esirable p roputies.
A plot of S ML versus fJo shows that S ML has a minimum between 0 and I . In that
range S ML is equal to I. Thus there is neither unique minimum nor parameter
estimate.
(6) For the three observations of 0, 0.5, and I, a plot of S ML versus Po gives a
minimum value of S ML also equal to I a t bo0.5.
( c) For this case a plot of S ML versus fJo shows that a minimum occurs between
bo·0.25 and 0.5.
CHAPTER 5 I NTRODUCTION T O L INEAR E STIMATION
188
( d) F rom the pallern of the answers obtained. it a ppears t hat there are two
possibilities; one is (or a n even n umber o ( o bservations n a nd t he o ther is ( or a n
o dd n umber. Let Ihe r, values b e o rdered so that the s mallest )', value is r l • t he
next larger value is Y 2' etc . T hen for n even. the b o v alue is l ocated b etween Y n / 2
a nd } 'n/2+ I ' F or n o dd. bo is e qual to } '(n + 1 )/2"
A nother example with o ther t han the normal distribution is given in
Section 4 .9 in c onnection with M onte C arlo m ethods.
5.12
V IOUTION O F T ilE S TANDARD A SSUMPTION O F C ONSTANT
VARIANCE
W hen V (f,)=o,2 varies with i. o rdinary l east s quares e stimation does n ot
yield minimum variance estimators. M inimum v ariance estimators c an b e
obtained. however. using maximum likelihood. These estimators for o neand t woparameter cases a re given in S ections 5.2 a nd 5.3.
T he effect upon the estimator(s) c an b e investigated for m any a,2
f unctions . Assume t hat the s tandard a ssumptions (11011111) apply in this
section where two possible functions a re c onsidered. F or i llustrative p urposes, the o neparameter case. Model 2. which is 1), = ~IX" is used. T he
O LS a nd ML estimators a nd v ariances are
_
2
V(bl.M">(~X,(J,
)
(5.12.2)
In the case of the M L e stimator a nd v ariance the q uantity Z, = X;/ a, c an
be considered as a modified sensitivity coefficient; Z; plays the s ame r ole
as X; when OLS is used with all t he s tandard a ssumptions being valid.
s ome s ituations a re
Before investigating some cases of n onuniform
suggested where nonuniform
m ight arise. E rror v ariances tend t o
increase with the amplitude o f signal ( or o bservation). When the response
o f Y; varies over several o rders o f m agnitudesay, from 0.001 to lOOthe
a ccuracy of the measuring device(s) is r arely c onstant. F or small signals
the errors usually are even smaller; for the large signals the s tandard
d eviation of the errors may be the same small fraction of the signal. b ut the
actual error may be many times the value o f t he smallest signal. F or
0;
. j!
2 1
" The eslimalor bo conrorms
10
a;,
Ihe definilion or Ihe median given in Section 3. \. \.
5.11 T HE S TANDARD ASSUMPTION O F C ONSTANT VARIANCE
189
e xample, s uppose t he voltage o f s ome device, s uch a s h eat flow meter,
varies from 0.00001 t o 0.1 V in a series o f o bservations. ( Another device
h aving l arge variations in o utput is a thermistor, for which the electric
resistance varies greatly with temperature.) I n o rder t o m easure s uch a
range, a digital voltmeter with several full scale settings c ould b e used. O ne
r ange might go u p t o 0.001 V, a nother r ange might be used f or 0.001 to
0.01 V, a nd s o o n. T hen f or readings n ear 0.001 a nd 0.01 V t he percent
a ccuracy m ight b e t he s ame; n ote t hat this infers a v arying a;2 t hat is
a pproximately p roportional t o T/}.
5.12.1
Variance of E; Given by a/=(X;/~)2a2
O ne p ossible variation o f a } is a /=(XJ ~ia2 w here ~ is s ome q uantity with
the same units as X;. T he O LS e stimator is unaffected, b ut t he variance of
b •. O LS b ecomes
(5.12.3)
T he b •. M l e stimator a nd v ariance b ecomes
(5.12.4)
N ote t hat the variance o f b l • Ml is a simple expression, b ut t hat for O LS is
not. I n o rder t o m ake a c omparison l et X ;= ;~. O ne c an d erive the
following s ummation e xpressions
±
±
; 2= n (n+ 1 )(2n+ I)
i _I
6
; 4=
i _I
n (n+ 1 )(2n+ 1)(3n 2 +3nl)
30
w hich yield for the stipulated
a}
t he expression for V (bl,OlS) o f
6 (3n 2 + 3 n  I ) a 2
l OlS
V (b .
)= 5 n(n+ 1)(2n+ 1)152
F or l arge values o f n this expression reduces to 9 0 2 /5n8 2• H ence f or large
values o f n , t he O LS e stimate f or this M odel 2 c ase with a;2=(XJ~)2a2 h as
a v ariance o f b l w hich is 80% l arger t han t hat o f b l given b y M L. T his
· 190
CHAPTER 5 I NTRODUcnON T O LINEAR E SnMAnON
means that ML estimation is substantially superior in this case to OLS
estimation.
O ne further benefit of the maximum likelihood (ML) method of estimation is that it can be used to provide an estimate of a 2• This can be
accomplished by replacing a 2 in (5.3.1) and (5.3.2) by ( X;! /i)2a 2, dirferenti,2
.
2
2
dbating (5.3.1) with respect to a , and then replacmg a by a an TI/ y Y; to
get
(5.12.5)
which is a consistent, asymptotically efficient, and biased estimate.
A commonly occurring case is ror the standard deviation of the error to be
proportional to the dependent variable T lj' In terms of the variance or l j'
this can be expressed by
(5.12.6)
The OLS estimator is the same as usual, but the variance can only be
approximated. For our purposes it is permissible to replace E ( Y;) = Tlj by
Yj , the regression value for OLS; then let
(5.12.7)
In ML estimation the a; = a"r,} relation makes the problem nonlinear
2
because the parameters appear in both the denominator and numerator of
SML given by (5.3.2) and also in the In a} term contained in (5.3.1). A
suggested procedure to get approximate ML values is to first solve for the
parameter(s) using OLS and so obtain ~pproximate values of YOLS ' These
j•
are then used to approximate 0,2 as a 2 yj • OLS in the ML estimators such as
(5.12.2a).
V IOLATION O F S TANDARD A SSUMPTION O F U NCORRELATED
ERRORS
In the past decade there has been widespread use of automatic digital data
acquisition equipment in connection with dynamic experiments. Transient
temperatures have been measured, for example, by using such equipment
.'1
to digitize the response of thermocouples. However, measurement error!
tend to become correlated as the high sampling rate capability is used. In
such cases the standard assumption of independent observation errors b
n ot valid.
One might also obtain correlated measurements by testing the sam(
specimen using the same sensors for different ranges o f the independent
variable XI' Examples are measurements for a particular steel specimen at
different temperatures for a property such as thermal conductivity, electric
resistance, o r hardness.
The standard assumptions of zero mean and uncorrelated measurement
errors given by (5.1.7) a nd (5.1.9) result in
for
5.12.1 Variance of l ; Equal to a"r,l
5.13
5 .U STANDARD ASSUMP110N OF UNCORRELATED ERRORS
(5.13.1)
i+1c
When this equation is n ot true many descriptive terms have ~en used;
these terms include colored, correlated, not independent, a nd dependent
errors. Some specific types of correlated errors are called autoregressive
(AR), moving average (MA), a nd autoregressivemoving average (ARMA).
Only AR errors are considered in this section. For further discussion see
C hapter 6.
Let us consider a case with additive, zero mean, autoregressive errors in
Y/, There are no errors in the X/so We can then write
(5.13.2)
E (YAIJ)=Tlj
T he measurements errors are described by the model
=al
for i =j
(5.13.3)
0 for i +j
which is called firstorder autoregressive since the error l / depends on the
error l ;_1 which is for the preceding time. (Secondorder errors would
depend o n two preceding times, etc.) In the following analysis the p; a nd a}
values are assumed to be known. There is n o prior information. The
associated assumptions are designated 1102111.
Rather than using the direct matrix maximum likelihood approach of
Chapter 6, we shall attempt to construct some sums of squares of terms
that are uncorrelated and have constant variance. In other words a
transformation is to be used to obtain modified measurements for which
the assumptions 1111111 a re valid. Then write (5.13.3) a t time i a nd i I
as
(5.13.4a)
Y; . . TI; + p ;l; _ 1 + u;
=
Y ;_I = TI;I + l;_1
(5.13.4b)
J91
CHAYTER 5 I NTRODUcnON T O LINEAR FSTIMATION
5.14 E RRORS IN INDEPENDENT AND DEPENDENT VARlABLfS
193
t hat is subject to n equality constraints,
Multiply (5.13.4b) by Pi a nd subtract from (5.13.4a) to get
j
Yi  Pj Y  , = (TJi  PITJj  .) + uj
(5.13.5)
Define the transformed observation F; a nd model H; as
(5.13.(ia, b)
Then analogous to (5.13.2) a transformed model is
(5.13.1)
where the model value F; is now independent from other fj (j~ i ) values.
Notice that the term u; divided by (1; has a variance of unity for all i's. This
suggests that a sum of squares of independent, constant variance terms can
be constructed from the uJ (1; values, o r
; = 1,2, . .. , n
where m > n a nd the 1ft; are differentiable functions. Since the m variables
a"a2, . .. ,a", must satisfy n constraints, there are in effect only m n
independent variables. A stationary value of J(a" . .. •a",) requires that
aJ
aJ
d J=da + . .. + da = 0
a a,'
aa", '"
but the differentials da; are not independent.
the n differential relations
Th~
(5.14.3)
alftll
(5.13.9)
I t is important to note that (5.13.8) has been derived without restricting the
problem to cases for which TJ is linear in the parameters; hence i t can be
used for linear and nonlinear cases.
In Chapter 6 it is shown that the function given by (5.13.8) must be
minimized for ML estimation if, in addition to the assumptions given
above, the errors u; are normal.
(5.14.2)
constraints (S.l4.1) imply
(5.13.8)
where F; a nd H; are given by (5.13.6) provided
(S.14.1)
alftll
alftll
 da + da + . .. + da = 0
aa, ' aa2 2
aa", '"
A direct method of solution can be illustrated by a simple case. Suppose
m  3 so that (S.l4.2) becomes
aJ
aJ
aJ
d J=a d a'+a da 2 +  d alO
aa3
a,
a2
(5.14.4a)
Let there be only one constraint so that n = I a nd then (5.14.3) gives
5. •"
E RRORS IN INDEPENDENT AND DEPENDENT VARIABLES
A nother violation of the standard assumptions is that of the independent
variables. designated Xij in this chapter, being stochastic as well as Y;. In
order to present a method of solution that can be generalized to complex
situations the method of Lagrange multipliers is introduced in this section.
F or the simple example to be given it is not required, but this method of
solution is illustrated. Before giving the example, the method of Lagrange
multipliers is presented.
5.14.1
Method of Lagrange Multipliers
We consider the problem of finding a stationary (a relative maximum or
minimum) value of the continuously differentiable functionJ(a •• a2, . .. ,a",)
..
........
~
a1ft ,
a1ft ,
alft.
 a da .+  a da 2+  adal=O
a,
a2
al
(5.14.4b)
which could be solved for dal' say. This expression substituted for dal in
(5.14.4a) then would give
dJ=( . .. )da, + ( ... )da2
(S.l4.4c)
where the two different expressions in the parentheses are set equal to zero
because the da, a nd da2 terms can now be arbitrarily assigned. These two
equations coming from the parentheses in (5.14.4c) plus 1ft,'""0 would
provide three equations for the three unknowns, a" a 2, a nd a l'
An alternative procedure is called the Lagrange multiplier method. This
method is introduced using the same example of m = 3 a nd one constraint.
CHAP1l:R 5 INTRODUcnON TO LINEAR t snMAnON
. 194
Multiply (5.14.4b) b y". a nd add the results t o (5.14.4a). Since the
righthand members are zeros, there follows
for a n arbitrary value o j " •. N ow let " . b e d etermined so that one of the
parentheses in (5.14.5) vanishes. Then the two differentials multiplying the
remaining parentheses can b e a rbitrarily assigned a nd h ence these two
parentheses must also vanish. Consequently we must have
(5.14.6a)
5••4.2 Problem o f £ noon In the IDdependent and Dependent Variables
A p roblem which is nonlinear even though the model is linear in the
parameters is t he estimation o f the .parameters in the presence o f e rrors in
the independent variables a s well as the dependent variables. T he p roblem
is formulated in this subsection a nd t he solution o f a simple case is
considered in the n elt.
C onsider first the dependent variable Y/ which is related to the model by
(5.14.8)
Y /,,;+ly,
a nd t hus the error lr: is additive. Also let ly, h ave a zero mean. b e
i ndependent from l y 'for i "pj, h ave a normal probability density with
•
J
k nown v anance terms, o r
for
E (ly,)O,
(5.14.6b)
(5.14.6c)
T hen these three equations, (5.14.6a,b.c) plus the constraint 4 1.=0 comprise four equations for solving for the four unknowns a ., a 2• a), a nd " •.
T he q uantity". is k nown as a Lagrange multiplier. T he i ntroduction o f
these multipliers frequently simplifies a nd organizes the relevant algebra
in minimization problems with equality constraints. I t is i mportant to
n ote t hat the conditions given by (5.14.6) a re equivalent to requiring that
J+>'.cfI. be stationary without a ny further constraints being imposed.
Applying this observation to the more general problem given above
suggests that
be extremized with respect to a •• a2, . .. ,a",. H ence the following m e quations m ust b e satisfied.
ilJ
 ; +
o a;
/I
ilcflj
".J; 0,
jI
o a;
~
i I,2• ... , m
E ( l~,) . . o~,
a nd
i "pj,
is n ormal
ly,
(5.14.9)
These assumptions a re d esignated 110111. W ith this information the
probability density o f ly"lyJ, . ... ly. is
J(ly" . .. , ly)11/2 I
e l P[ .
(2'11') O y" • Or•
I·
t f l~0~2]
(5.14.10)
jI
T here are also errors in the independent variables XI} which are described
by
E (lX ) =0.
•
E(lX.lX,,)  0 e lcept w hen i  k
(5.14.11)
a nd j I.
0:
a nd EX h as a normal density. T he
values a re a ssumed to be known.
T he va1ue XIJ is measured a nd
is the· t rue value of XI}' T he e rrors ly, a nd
lX a re considered to b e i ndependent for all values o f I. j . a nd k.
A~alogous t o (5.14.10) we c an write
4
(5.14.7)
along with the n c onstraints given by (5.14.1). T hus (5.14.1) a nd (5.14.7)
constitute a set o f m + n e quations for the m + n u nknowns
195
5•• 4 E UORS I N INDEPENDENT AND DEPENDENT VA RIAau:s
J (lx , . ..
"
,lX• ) (2'11')"P/
1
e l P [  2
2I
.
O x . .. O x
"
f f l:.Oi,l]
(5.14.12)
j I i I
O wing to the independence o f t he lr: a nd lX errors. the m aximum
likelihood method o f e stimating the p a;ameters ~equires t hat the product
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
196
of (5.14.10) and (5.14.12) be maximized with respect to the parameters
1/1' . ... 1/n'~II . .... ~1\P values. This is equivalent to minimizing
197
5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES
(5.14.16) for any model, linear o r nonlinear, can be written as
fJ l'fJ 2' . ... fJp and the
P
n
k =I, . .. , p
(5.14.18a)
; = I , . .. , n
(5.14.18b)
n
S(1/.~)= ~ (Y;1/iO~2+ ~ ~ (Xy'_~ij)2ox/
(5.14.13)
j I i  I
iI
with respect to fJl'''''~1\P or a tot~1 of (~+f+np) parameters. This will
produce the estimates b l ,b2 ... , bp , Y I, .. ·, Y n,XII,,, , ,XI\P ' The 1/i' fJle' a nd ~ij
•
values are not independent. however. and must be related through the
model for 1/; which can be written as the equality constraint
(5.14.14)
g
a.
 (;Ie  X;Ie) ax 2  A~;1e'/·,.b, . .... X.. . = 0
X
,.
;a Y
A
where (5.14.18c) applies for ; = I, . .. , n; k = I , . .. , p. In addition to the
equations given by (5.14.18) there are the constraints g ;=O which, for the
linear model considered in this section, are equivalerat to
which applies for i = 1,2 . ... , n.
The method of Lagrange multipliers involves minimizing the function
(5.14.15)
with respect to parameters f3 1, fJ 2, . .. , fJp. Necessary conditions for a minimum are
aL I as
=+
af3"
2
afJ"
ClL = .!. as +
a1/;
2 Cl1/,
L AClgj
=O,
j _ 1 } ClfJk
k =I,2, . ... p
(5 .14.16a)
± ag
; = 1.2 . .. . ,n
(5.14.18c)
; = 1,2, . .. , n
(5.14.19)
Then (5.14.18) and (5.14.19) provide p +2n+np equations for the same
number o f unknowns which are b l, . .. , b" YI, . .. , Y", AI,· . . ,A", X II,,,,,XI\P'
Consider first (5.14.18) without introducing the assumption of a model
linear in the parameters such as (5.14.19). Then in general (5.14.18b) yields
A. =( Y, .,
Y.)a y 2
I
(5.14.20)
I
(5.14.16b)
n
A)
j_
I
j
= 0,
Thus the Lagrange multipliers are weighted residuals. Introducing (5.14.20)
into (5.14.18a,c) eliminates A; a nd gives
Cl1/;
k = I , . .. , p
i = I, . ... n ;k= I. .... p
(5.14.2Ia)
(5 .14.16c)
; = I, . .. , n;k= I, . ... p
The expressions in (5 . 14.16) are all evaluated at f31 = b l • · . . • f3p = bp • 1/1 =
•
1/
t
. • t.
Y I '··"" = Y' ~ ll=X II ...~"" =X"P . I t is important to note that
f t'
s = S (1/1 . .. · .1/n'~II'·" .~np)
".,
..
(5.14.17a)
gj=g)(~.f3I.f32. ···. f3p'~I'~2'···'~jp )
(5.14.17b)
(5.14.2Ib)
Thus S is not an explicit function of the parameters f3 1. .. · ,fJ,· Then
Hence, for the general nonlinear case. (5.14.21) a nd a set of constraint
equations, g; = O. can be solved for the p + n + np unknowns of
b l . ... , y l . .. ·.XII' ...
Let now the linear model a nd its constraint, (5.14.19). b e used. Then
CHAPTER 5 INTRODUcnON TO LINEAR E SnMAnON
·91
.o...
(S.l4.2S) into (5.14.22a) gives the nonlinear equation.
(S.14.21) can be given as
f (lJ f blXj; )0~2Xjlt ... 0,
' 00
) 1
t!
.99
5.'4 ERRORS IN .NDEPENDENT AND DEPENDENT VARIABLES
k = 1,2, . .. , p
I
(S.14.22a)
i I
~+
lJba J0  2 XJ + lJba  0
[ YJ  b 1+ b 2a
y
(S.14.26)
I + b1a
F or convenience let
i"'" 1, . .. , n;k= 1, . .. , p
S yyIY/.
S xxIxl
(S.14.22b)
which comprise p +np equations for the unknowns b" . .. ,b" XII'''''X""
N otice that, even though the model is linear in the parameters PI,· .. ,p"
the solution of (S.l4.22) is nonlinear and thus is not straightforward. O ne
way to start is to note that (S.14.22b) for fixed i provides a set of linear
equations for X ;I''':'X;, whiCh can be solved in terms of the bl, . . · ,b,
values. When the X j; values are substituted into (S.14.22a), a set of p
nonlinear equations results for the unknowns h I"'" h,. T he simplest case is
for p " I, which is considered next.
5.14.3
As a n example of the above procedure consider a case involving model 2,
TJi  P(;, where there are errors in both the dependent variable TI, a nd the
independent variable (,
Y;=TI;+ t y,
(S.14.23a)
X j =t+tx,
(S.14.23b)
0;
Let the assumptions given above for t y a nd t x apply except let o~ a nd
_be the constants o~ a nd ai, r espectively.'
.,
We can obtain the solution for h, an estimate of P, through the use of
(S.14.22a, b). Using first (5.14.22b) gives
which can be solved for
X,
• [X + Y;ha]
, [ I + h 2a]
where a ==
0; / 0:.
(S.14.24)
to obtain
X=
N ote that
j
X;
,
i = I , . .. , n
(S.14.27c)
a nd then (S.14.26) c an be e xpanded to
SXY + h aSyy + b1aSXy + bla1Syy  bSxx + 2b2aSXY
+b a Syy
(S.l4.25)
is a nonlinear function of h. Introducing
l
2
which can be simplified to
(S.14.28)
which in turn can be solved for b,
a S yy  Sxx± [ (aS yy  Sxx )2 + 4 aS;y]'/2
b =  =:_____~
2aSXY
M odell ( TI;Pt) Example with E nors In both TI, and (;
(Xi  X; ) 0;2+( Y;  hX; )hOy2=0
(S.14.27a,b)
(S.14.29)
T he positive sign is chosen in the ± sign in (S.14.29) because then the
estimate will cotlVerge to the correct value of S Xy/Sxx when a=o:/o~+
O. I f a +oo, b approaches S yy/SXY' Equation S.14.29 also gives b =
S u/ S xx for all values of a if it happens that S u/ S xx is equal to
S yy/ SXy o r in other terms, S xxSyy  S;y = 0. I n ordinary least squares
estimation involving Model 2, we d o n ot permit S xx to be equal to zero. I f
SXy is equal to zero, (S.l4.28) gives h =O.
A fter h is calculated using (S.14.29), the estimated values Xi c an be
obtained from (S.14.2S). Observe that a different XI is calculated from
(S.14.2S) for each i value if the Yi values are different even if the XI values
are actually the same. Physically a given Xi value may not be known
precisely, b ut it may be known that it is c onstant for several measurements.
However, if this is the case the assumption of independent errors in each Xi
is violated. Hence another analysis is required for this special case of
repeated YI values a t precisely the same XI value.
Example 5.14.1
Consider a case involving Model 2, with errors in either Y, o r X, o r both. that
satisfies the assumptions given above in Sections 5.14.2 a nd 5.14.3. T he data are
CHAPTER 5 I NTRODUcnON T O LINEAR ESTIMATION
200
S .'4 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES
given below.
x,
L east S quares
o nX(a co )
Y,
+ ( XI' Y I )
1 6
I
I
I
2
101
Y
Let 6 be a positive value. Also investigate the case for 6+0.
( a) F ind b . X,. a nd Y, for Ox = Oy .
( b) F ind b and Y, for Ox = 0 .
( e) F ind b and : (; for Oy = O.
0
L east S quares
o n Y (6  0)
Solution
T o find the b values (5 .14.29) can be used. Hence find S xx, S yy. a nd SXY from
(5.14.27) to be S xx,..2, S yy=226+6 2, a nd S Xy=6.
( a) In this case cr = I a nd (5.14.29) gives
I
I
X
0
Flame S.8 Predicted lines for errors in dependent and independent variables for data points
( I, I) and (I, I) for Example 5.14.1.
[  2+6+(846+6 2 )1/2]
b=
=;;:2,...~
2 1/ 1 . .. 0 .4142
I f 6+0, . bo  I +
\ 36.
T he X; values are found from (5.14.25),
• Xj + Yjb
X=,
1 + bl
E xample 5.14.2
is equal to b:(,. F or 6 .0 we o btain : (1=0.5. YI =  0 .2071067 a nd
Xl"" 1.2071067, Y2= 0 .5. F or O x'" Oy = I the sum S given by (5.14. 13) is precisely 2.
( b) This is the usual least squares case a nd b = S xr/ S xx is e qual to 6 /2 . T he Y,
values are YI '"  6 / 2 a nd Y1 = 6 /2 . F or 6 = 0, the values are zero; hence the
predicted line is Y, = O. Again the minimum S for 6 = 0 is 2.
( e) F or this case b =Syr/Sxr=(226+61)/6. T he X, values are found from
a nd
Y,
•
X
..
"
A case for which the predicted lines are much closer together is w hen 6 . . I
c ausing YI  O with X I =  I a s shown in Fig. 5.9. I f 6 were equal to 2 so that
YI , ..  I, t hen for a ny a s uch t hat 0 <: a <: I t he predicted lines a re all the same.
Y,
YI 6
b
N ear a wall over which a turbulent fluid is flowing, the velocity is a l inear function
o f position. Let the velocity (in c m/sec) b e designated u a nd the distance from the
wall (in cm) be designated x . T he below d ata were taken from Fig. 6.20 o f Kreith
2 26+62
=  =  ....:
:
F or 6 .0, b oco a nd XI = : (2=0. Unlike part (b) the predicted line is now the
vertical axis for : (; = 0 for all i . T he m inimum S is again ~ .
I t is instructive to examine the predicted lines for each o f the cases above. See
F ig. 5.8 for 6 00+. N otice that the usual least squares case ( a = 0) has the predicted
line o f Y'"' 0 ; the YI = I , X 1 =  I obse~vation is replaced with Y = 0 a nd X 1 =  I,
I
a nd Yl = I, X 2 = I is replaced with Yl=O. X l = I. T he case for a oco h as the
vertical predicted line o f X = 0; the two observations are replaced by the single
point Y1 = Yl = I with X = O. F or the cr = I case, the predicted line is inclined as
shown. I t is thus clear that the three cr values can yield quite different predicted
values. In other words it c an make a large difference in the predicted line whether
the errors are in Y o r X o r b oth . This case shown in Fig. 5.8 is a n extreme one,
however. because many times the predicted lines are quite close.
y
L east S quares
o n X (acu)
o
_ IL__
~
I
_ _____ L_ _____
o
~L
____
~
x
Flame S.9 Predicted lines for errors in dependent and independent variables for data points
( 1,0) and 1.1) for Example 5.14.1.
CHAPTER 5 I NTRODVcnON T O UNl:AR E S11MAnON
l
2
3
4
(cm)
v (cm/sec)
0.0112
0.0162
0.0215
0.0310
I
f iX
0.0003
0.0003
0.0003
0.0003
80
125
165
235
The models for the true velocity
II
and the true distance
f lU
( em/sec)
•
x, 
3
3
3
3
oX
are
1 1 fJx
X,+ V,M
1 + b 2"
After values are calculated for
,I
I
Solution
In each of the cases I I and V are analogous to " and Y. and JC and X to Eand X in
the notation given in this section. With this in mind let us then evaluate S yy. Sxx'
S;ry . a nd a in (5.14.29).
4
S yy ~ V l_(80)2+(I25)2+(I65)2+(235)2_ 104475
X,.
li, 

X, + 1.5947438 x I O'V,
1.5768013
the values for
0, are found
using
bX,7594. 7438 X,
The resulting values are given below.
x,
X x+rx
where V and X are measured values and fJ is a parameter which is proportional te
the shear stress at the wall. Estimates for fJ are to be obtained using
( a) the above information;
( b) f lxO and 0u is unknown; and
••
(e) Ox is unknown and 0 uO. Also calculate the V and X values for each case
The assumptions indicated in Section 5. 14.2 are valid.
l ilt
( a) F or the above values the parameter fJ is estimated using (5.14.29) t o b
b7594.7438/sec. The values of X, are obtained from (5.14.25) to be
(5). Estimated values of (Ix and (lu are also given.
X (em)
5.14 ERRORS IN INDEPENDENT AND DEPENDENT VARIABLES
I
0.01096
2
0.01629
0.02158
0.03098
3
4
~
83.21
123.75
163.91
235.28
(b) This is the usual least squares analysis for which b  SXy/ S xx7593.8778
The predicted or regression line is now 0 , bX,. The values for X, and 0, are
tabulated next.
X,
I
2
3
4
0,
0.0112
0.0162
0.0215
0.0310
85.05
123.02
163.27
235 .41
i I
4
S xx ~ X l(0.01l2)2 + (0.0162)2+ (0.0215)2+ (0.031)2
i I
0.00181113
4
S Xy
~
X ,V,0.0I12(80)+0.0162(125)+0.0215(165)+0.031(235)
i I
13.7535
a _ o} _ (0.0003 )2  10  1
o~
3
(e) In this case the role of X a nd Y are interchanged in the least square~
analysis. Here b  Syy/ S;ry7596.2482; X, is obtained from X, V,fb; and V, is
the measured value. The results of the calculations are as follows:
X,
I
2
3
4
V,
0.01053
0 .01646
0.02172
0.03094
125
165
235
80
A comparison of the b values in this example reveals t hat there are somt
differences b ut they are very smallthe largest difference in the b values is 0.03%
This case is more common than that shown in Figs. 5.8 and 5.9 where the predicted
lines are quite different. Because the curves are so similar in this example. only thr
lower two points are shown in Fig. S.IO. There are negligible differences in t ht
curves that can be drawn between the three sets o f predicted points.
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
PROBLEMS
PROBLEMS
+0
t:.
S .l
Prove using the s tandard a ssumptions that
S.2
a nd indicate which assumptions are used.
Show that
1 20
1 10
U.
1
a nd
+ D ata
1 00
1\
S.3
• L. S. on Ui
o L . S. on Xi
Ui
( ern/sec)
a nd use to show that (5.2.37) follows from (5.2.36).
W hat is the expected value o f e/ for O LS e stimation when the followin!
a ssumptions apply?
S.6
6
90
t:.
80
Y;"" E (YMn+E;"!J;+E;PX/+Ej
E «(;)o*O
Y(X;)""O
v (p)=0
Prove (5.2.43) for YiJ" Po + PtX; + PIX? + P lX/ + (; when using OLS.
F or Model 5 what weighting functions for maximum likelihood estimation
would cause the sum o f the residuals to be equal to zero? Assume that the
assumptions designated 11011111 a pply.
Show that the minimum value of S for y; . . Po+ PtX;+ ( j is
S.7
T he following d ata a re given
( 0)
( b)
( c)
( d)
E rrors i n b oth
U i a nd X i
S.4
S.5
I
I~
75L~~~~
. 01
. 0lZ
. 014
. 016
. 018
1\
X i a nd X i ( ern)
FllUre 5.10
Predicted line for errors in d ependent a nd i ndependent variables for Example
S.14.2.
REFERENCES
I.
2.
3.

:::...
4.
S.
Draper. N. R. a nd Smith H .. Applied Regre55ion Ana/Y5i.f. J ohn Wiley a nd Sons. Inc .. New
York. 1966.
Burington. R. S . a nd May. D. C . Handbook o j Probabili/y a nd S/a/istic5 wi/h Table5. 2nd
ed .. McGrawHili Book Company. New York. 1970.
Box. G. E. P. a nd Tiao. G. C . B ayuian InJerence in S/a/is/ieal Ana/y5i5. AddisonWesley
Publishing Co .• Reading. Mass .. 1973 .
Brownlee. K. A .. S/a/i5/ica/ Theory a nd Me/hodology in Science a nd E"ginteri"g. 2nd ed .•
John Wiley and Sons. Inc .. New York. 1965.
Kreith. F .. Principle.f o j Hea/ TramJer. 3rd ed .• Intext E ducational Publishers. New York.
1973.
2
2
2
I
4
4
3
3
3
3
5
5
6
Assume that the s tandard a ssumptions apply.
( 0) F ind estimates o f the parameters in y j
Answer.
,..
Po + PtX/ + ( j.
0, I .
(b) F ind estimates o f the parameters in Y ;=po+Pt(X;X)+E;.
Answer.
3, I .
( c) G ive the residuals ej • ( Do they a dd u p to zero?)
( d) Estimate the variance o f f ;.
PROILDts
cHAPTER 5 INTRODUcnON TO LlNtAR £S11MAnON
Answer.
a ture a nd f t o the yield:
1.333.
( t) Give the estimated standard error of boo
'1
:
ABIWft.
X,
f,
1.211.
f
( f) Give the estimated standard error of be,.
o
4
3
2
4
3
6
I,,
( b)
Answer. 0 .365.
(e)
( h) Give the estimated covariance of bo a nd b,.
( d)
( t)
 0.4.
S.l1
( i) Give the estimated covariance of bi, and b, .
I
2
o
o
9S
6
o
o
10
100
o
110
5
.oS
40
4
3
f,
90
7
10
SO
8
9
10
5S
10
4S
10
10
60
Consider the model
I
40
0.32S
2
50
0.332
3
60
0.340
4
70
0.347
5
80
0.353
6
90
0.359
7
100
0.364
Assume that the standard assumptions apply.
( 0) Estimate
Po and p, for
the model
( 0) Derive an unbiased, minimum variance estimator for p;
( b) Give an unbiased estimate o f the variance o r (, ( a 2 is unknown).
S .1l
Repeat Problem 5.11 for the model
f ,pxl+(,
Assume that the standard assumptions apply. Answer the same questions as
in Problem S.7.
The following values have been reported for a certain set of experiments.
X,
f,
5
11
where the standard assumptions apply for " .
The following data are given
i
4
12
Y ,p+"
i
X,
S.9
3
13
Y,Po+P,X,+(, estimate Po a nd PI ' What is the
prediction equation?
Construct an analysis o f variance table. Let the null hypothesis be t hat
p,O with a risk 0 .05.
What are the 95% confidence limits for PI?
W hat are the confidence limits a bout'l, a t X 3?
Are there any indications t hat another model should be tried?
Aaswer. O
.
S.8
2
12
0
( 0) F or the model
V
( g) Give the estimated standard error of b,.
1
918
I
Assume that the standard assumptions apply.
,I,
Answer. O 16.
.S
Answer.
5
f ,Ilo+ PIX, + ',.
Answer. 0.2997,0.000657.
(b) Estimate variances for bo a nd b, .
Answer. 2.38 x 1 0 6• 4.49 x 1 0 '0.
( c) Calculate t , and plot.
( d) Are the residuals correlated?
( t) Based on the conclusions of ( d). are the estimates given in ( b) valid?
( f) How could the model be improved?
S.10 A study was made on the effect of temperature on the yield of a chemical
process. The following d ata were collected with X linearly related to temper·
5.1l
Repeat Problem 5. I I for the model
Y,  psinX, + (,
S.14 Use the ft column o f T able 5.1 as d ata ( that is. f , 0.742. f 2  0.034.
etc.) a nd use the model or Problem 5.11.
( 0) Estimate p.
( b) Estimate a.
5.15 Repeat Example 5.2.4 with t he" values replaced with nine consecutive
values o f a column o f Table XXIII o f reference 2. The column is to b e the '
one corresponding to your birth date a nd the first value used in the column
is to correspond to the birthday month. For example, if your birthday is
March 14. then pick the fourteenth column a nd s tart with the third entry
since March is the third month.
S.16 T he temperature 01 a nuid nowing over a plate is ncarly linear ncar the
plate. Let f be proportional to the temperature and X be the distance from
the wall. The following results are obtained:
_
XO.OS,
I (X, X)(f, f)BO,
I (X,X)
2
 0.016.
I (Y, f)2  8320.
f 300,
CHAPTER
~
I NTRODUcnON T O LINEAR ESTIMATION
Assume that the model Y,=/JO+/JIX, +(; and the standard assumptions
apply.
PROBLEMS
a nd
( 0) Estimate /Jo a nd /JI.
( b) Prepare the analysis of variance table.
5.17 Show that ~7_I(X;  ii is maximized for n even with  R < X; < R by
choosing onehalf of the X; to be  R and the other half to be at R .
5.18 Derive an expression for cov( Yi • Y,). thus proving (5 .2.29).
5.19 Derive the expression for V (bl/JI) given in (5.4.12).
5.20 Modify the analysis given in Section 5.5 to obtain estimates for /JI a nd P2 in
Model 5 when maximum likelihood estimation is used for
for
Show that b l  PI can be put in the form
b lPI
PIA 1 + P2 A 2+ Ie,B,
lJ.
+C
where
B, . . ( Z'I [ 221 Z,2 (121)a,I. e is n ot a random variable
; +k o rj+/
a nd whatever other standard assumptions are needed.
5.21 Consider MAP estimation for a random parameter for Model 5. Let the
standard assumptions implied by 11011110 be valid. All the measurements
are taken from the same batch. The random parameters PI a nd /J2 have the
joint density
Derive the given expression for V (b l  PI).
(e) T he expressions given in ( a) a nd ( b) can also be applied for the case of
subjective prior information. Reinterpret the meaning o f P I' P l. VI' Vl •
Vu. b l' b2• V (b l  PI). V (b l  Pz}. a nd cov(b l  PI.bl  PZ} for this case.
5.22 Before measuring the thermal conductivity of a particular steel alloy. a
research engineer has developed from experience knowledge relative to ·
values for steel alloys in general. The thermal conductivity over a limited
range of temperature can be described by the regression m odel",, PI + P1XI
where XI is temperature in dc. This prior information regarding PI a nd P l
can be described by ! (PI.Pl) given by that in Problem 5.21 with 1 '138.
1 '2 0.01. VI 2.
a nd VI, 0.001. Assume that the standard
assumptions designated 11011113 apply. Using the results of Problem 5.21
find bl' b ,. V (b l  fll)' V (b, P2)' a nd cov(b l  P I .b2 PZ} for the following
data:
V,IO'.
The quantities VI' V2• and lJ./l must be greater than zero.
( 0) Derive f or;= I a ndj=2 or i =2 a ndj= I.
X; ( 0C)
lJ.:=[II)[22)[12)2
c" == ~ Z "Z",
X"
Z ,,:=a,
D,:= ~ (F,  1'1 Z,I  1'2Z/2 )Z,,.
( b) I t can be shown that
I
2
3
4
5
Y; (W 1 m DC)
al
100
200
300
400
36.3
36.3
34.6
32.9
31 .2
0.2
0.3
0.5
0.7
1.0
600
5.23 Utilizing (5.13.8) derive for Model 2 (.,,;  PIX;) the following estimator for
Y,
F ,==
firstorder autoregressive errors
a,
.....
w
i I.2•.. .• n
F ; (Y;p; YI _ 1 ) a/ I •
.....
Z /(XIP;X;_I ) a/I.
; = 1.2• ...• n
a nd where Xo:=O a nd Yo==O .
CHAPTER 5 I NTIlODUcnON T O LINEAR I StlMAnON
21'
....
~
PROBLEMS
, ,5.14 Simplify the results of Problem 5.23 for the case of Model I. Show that
~
211
Test
I
II
0 lJy. +
l : ( 1 PI)( YI 
2
Pi Y _ I )Oi 2
I
l
4
5
6
7
8
9
10
2
ba •
A
V (ba)A I•
_
2"
A =ol
+
l : ( Ip,)20,  2
2
5.lS F or the case of firstorder autoregressive errors show that the variance of
I,
given by (5 . 13 .3) is the c orulanl value
for
I~ I
when
V, (mV)'
lei ( W/mK)
Test
V,
lei
8.81
8.35
7.97
7.66
7 .l8
7.10
6.86
6.64
6.44
6.27
1.178
1.133
1.148
1.159
1.148
1.136
1.144
I .Il6
1.133
I .Il6
II
6.11
5.96
5.75
5.51
5 .l4
S.IO
4.77
4.S2
4.19
l.87
1.129
1.133
1.101
1.101
1.091
1.087
1.084
1.087
1.080
I.OS8
12
13
14
IS
16
17
18
19
20
F or the model 1c, Po+PIT, find the estimates bo a nd b l' Also find est.
I.e.(bo). est. s.e.(b l). J . a nd e" L ellhestandard assumptions be valid.
5.21 Modify the program for Problem 5.27 for variable til. Let til be a table of
input values. In particular. let tll.ol V, a nd obtain new bo and b l values for
the data of Problem 5.27.
P ,P,
for 1 2. .... n
5.16 ( a) Using the results of Problem S.2S show that A I of Problem 5.24 can
be written as
5.19 The Moody chart provides the followins data for the friction factor IDw as a
function of the Reynolds number Re for a roughness ratio I I DO.OOOI. Fit
these d atato an equation of the f ormJDw·c+d(Re)'" ulinS the linear OLS
method with c set equal to 0.0118. (Notice that IDw is approachins a
constant for larse values of Re.)
Re
( b) Suppose that as n becomes larger the measurements become more
correlated as indicated by the expression p exp(  al n) where a is some
positive constant characteristic of the errors, Show that
IDw
Re
low
S x 101
0.0370
O.OlIO
0.0214
0.0180
0.014S
O.OllS
S x 1 0'
I x 107
5 x 107
I x loa
0.0123
0.0121
0.0120
0.0120
Ixl~
Sxl~
I x 10'
S x 10'
I x 1 0'
for fixed a? What is the physical significance of this result?
( c) Modify the result of part ( a) for fixed a? a nd p a nd large n . What is the
physical significance of this result?
5.17 The following are actual data obtained for the thermal conductivity Ie of
Pyrex. The temperature T (in K) is related to the voltage (in mVs) by
T  301.6+ 18.24 V.
Let l og(/Dwc) be the dependent variable and logRe be the independent
variable. Calculate also the residuals in terms of IDw  lDw and the relative
residuals. ( /DW lDw)/iDw,
5.30 The United States d raft lottery issued in March 1975 gave the call order for
the standby draft for men born in 1956. Results for birthday months o f April
and September are given below.
CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
112
C HAPTER
6_________________ _
April
1170
2228
3008
4340
5005
6092
7303
8180
9025
10147
11031
12133
13205
14047
15093
16131
17264
18134
19036
20359
21183
22101
23280
24080
25110
26053
27277
28050
29105
30343
M ATRIX ANALYSIS
F OR LINEAR
PARAMETER ESTIMATION
S eptember
1175
2263
3087
4199
5236
6221
7322
8341
9349
10347
11  173
12161
13325
14343
15135
16117
17307
18019
19041
20230
21086
22128
23156
24227
25209
26231
27022
28102
29089
30064
Use the model 1'/ = flo ·
( a) W hich s tandard a ssumptions a re valid?
( b) E stimate flo using O LS u sing the April d ata.
( c) E stimate 0 2 u sing the April d ata .
5.J1
5.J2
6.1
INTRODUCTION T O MATRIX NOTATION AND OPERATIONS
R epeat P roblem 5.30b a nd c using the September d ata .
U sing the April d ata in P roblem 5.30. e stimate flo a nd fl. i n the model
1'/, = flo + fl. X, using O LS . Also estimate their s tandard e rrors.
T he extension of parameter estimation to more than two parameters is
effectively accomplished through the use of matrices. The notation becomes more compact, facilitating manipulations, encouraging further insights, a nd permitting greater generality. This chapter develops matrix
methods for linear parameter estimation and Chapter 7 considers the
nonlinear case.
Linear estimation requires that the model be linear in the parameters.
For linear maximum likelihood estimation, it is also necessary that the
independent variables be errorless and that the covariances of the measurement errors be known to within the same multiplicative constant.
Before discussing various estimation procedures, this section presents
various properties of matrices a nd matrix calculus that are used in b oth
linear a nd nonlinear parameter estimation.
6.1.1
Elementary Matrix Operations
A matrix Y consisting of a single column is called a column vector. We use
113