CHAPTER 5 INTRODUCTION T O LINEAR ESTIMATION
112
C HAPTER
6_________________ _
April
1170
2228
3008
4340
5005
6092
7303
8180
9025
10147
11031
12133
13205
14047
15093
16131
17264
18134
19036
20359
21183
22101
23280
24080
25110
26053
27277
28050
29105
30343
M ATRIX ANALYSIS
F OR LINEAR
PARAMETER ESTIMATION
S eptember
1175
2263
3087
4199
5236
6221
7322
8341
9349
10347
11  173
12161
13325
14343
15135
16117
17307
18019
19041
20230
21086
22128
23156
24227
25209
26231
27022
28102
29089
30064
Use the model 1'/ = flo ·
( a) W hich s tandard a ssumptions a re valid?
( b) E stimate flo using O LS u sing the April d ata.
( c) E stimate 0 2 u sing the April d ata .
5.J1
5.J2
6.1
INTRODUCTION T O MATRIX NOTATION AND OPERATIONS
R epeat P roblem 5.30b a nd c using the September d ata .
U sing the April d ata in P roblem 5.30. e stimate flo a nd fl. i n the model
1'/, = flo + fl. X, using O LS . Also estimate their s tandard e rrors.
T he extension of parameter estimation to more than two parameters is
effectively accomplished through the use of matrices. The notation becomes more compact, facilitating manipulations, encouraging further insights, a nd permitting greater generality. This chapter develops matrix
methods for linear parameter estimation and Chapter 7 considers the
nonlinear case.
Linear estimation requires that the model be linear in the parameters.
For linear maximum likelihood estimation, it is also necessary that the
independent variables be errorless and that the covariances of the measurement errors be known to within the same multiplicative constant.
Before discussing various estimation procedures, this section presents
various properties of matrices a nd matrix calculus that are used in b oth
linear a nd nonlinear parameter estimation.
6.1.1
Elementary Matrix Operations
A matrix Y consisting of a single column is called a column vector. We use
113
.....
. .....
214
C HAPTER' MATRIX ANALYSIS f OR U NUR PAItAMEl'ER t snMAnON
1.1 I NnODUcnON TO MATRIX NOTADON AND O PDAnONS
215
0)
boldface letters to designate matrices. In display form V is written as
;
a nd C is .r x r, is f ound using (6.1.3a) to be
YI
:"
Y,
V
(6.1.1 )
Y"
where 0 is a n m X r matrix.
Example 6.1.1
For the matrix A being 3 x 2 and B being 2 x 2, find the product AB.
S quare brackets are used to enclose the clements of a matrix. This matrix
has n elements a nd is sometimes called a n n X I matrix, that is, n rows a nd
o ne column.
An m X n rectangular matrix A is given by
[."
0 '1
A  [all ]
0 12
a22
:
a", I
a",2
.'"j
S olution
Using (6.l.3a) we get
a,,,
(6.1.2)
6.1.1.1
a",,,
TNIUpO. o f M .trbt
T he transpose o f a n m X n matrix A is given by
I n the notation for an clement, a ll' the first subscript refers to the row a nd
t he second to the column. You may may find it helpful in memorizing this
o rder by mentally visualizing ! _ with subscripts i j.
I f m  n in (6.1.2), the matrix A is said to be square. I f the matrix is
s quare a nd also all  ajl for all i a nd j , the matrix is termed symmetric.
6.1.1.1
(6.1.4)
If A is a symmetric matrix, A T_A. T he transpose o f the product AB is
e qual to
Prodllct 0 / Matrice.r
T he product o f A times B, where A is a n m X n matrix a nd B is n x.s, is a n
m X s matrix C,
(6.I.Sa)
which c an b e extended t o the product o f a ny finite number o f matrices,
(6.l.3a)
(6.I.Sb)
Note that it is meaningful to form the product A times B only when the
number o f columns in A is equal 10 the number o f rows in B. A square
matrix A is said 10 b e idempotent if AA  A.
The triple matrix product o f ABC, where A is a n m X n matrix, B is n x s,
6.1.1.1
IIIW. ... 1 htI"";".",, I11III E;~s
F or a s quare matrix, A, the notation A  I indicates the inverse o f A. A
nonsingu!ar matrix is a square matrix whose determinant is not zero.
116
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
(Examples of determinants are given below.) I t can be shown that if and
only if the matrix A is nonsingular, it possesses an inverse A  I, that is
6.1
INTRODUCTION T O MATRIX NOTATION AND OPERATIONS
117
T he inverse of the s ymmetric 3 X 3 matrix A is
(6 . 1.6)
where 1 is a diagonal matrix and is called the unit o r identity matrix. 1 is
given by
I
o
1= 0
= 111 [ a"a"  al,
oo
I
o
o
I
= diag[ I I . ..
I]
(6.1.1)
0 UOl)  01l0))
° llOl3  °UOll
° 11 °)3  0:3
° IlOIlOIIOn
symmetnc
1 (6. l.Ila)
° 11°22 0 :2
( 6.l.Ilb)
Notice the notation "diag" in (6.1.1), which indicates the elements of a
diagonal matrix.
Let f/I be a diagonal matrix,
(6.1.8)
Clearly nondiagonal matrices are much more difficult to invert than
diagonal matrices. For the inverses to exist the determinants given by
(6.1.9b), (6.1.I0b), a nd ( 6.l.Ilb) c annot be equal to zero. For a method for
evaluating higherorder determinants, see Hildebrand [IJ. See Problem 6.4
for a method for finding inverses of larger matrices.
The determinant of the product of two square matrices is
IABI = IAIIBI
(6.1.12a)
I f (6.1.12a) is applied to (6.1.6), the reciprocal of IA II is found to be equal
to IAI,
Its inverse and determinant are given by
(6.1.9a)
(6.1.12b)
(6.1.9b)
Evidently diagonal matrices have some very convenient mathematical
properties.
Another convenient relation involving products of two square n X n
matrices A and B is
(6.1.13)
A 2 X 2 matrix A and its inverse are
(6.1.I0a)
which is a relation similar to (6.1.5) for transposes.
F or any square matrix A it is also true that
(6.1.14)
.~
 where IAI is the determinant of A,
(6.1.I0b)
or in words, the transpose of a n inverse is equal to the inverse of the
transpose.
An eigenvalue (also called characteristic value o r latent root) Ai of a
2 1'
CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER f SllMAnON
U
be pO$iti~ definite a re t hat the inequalities,
s quare matrix A is o btained by solving the equation
a..
a ll
a l2 a u
a u a2)
(6.\.15)
There are n eigenvalues if A is n x n (counting repealed values).
1S.1.1.4
2 1t
I NltODVcnON TO M ADIX NOTAnON AND O PDAnONS
a ll
a2J >0,
etc.
(6.1.18)
an
be satisfied. Also A is positive definite (semidefinite) i f a nd only i f all o f its
e igenvalues a re p ositive (nonnegative. i.e.• AI~O with a t l east o ne e qual t o
z ero). I f  A is positive definite (semidefinite), A is negative definite
(semidefinite).
PartitioMd Matrix
Let B b e a n n x n matrix that is partitioned as follows
(6.1.16a)
6.1.1.6
where By has size n jXnl'iJ=I,2, a nd where n l +n 2 =n. Suppose that
IBI~O, IBIII~O, a nd IB221+0. Set A =B I a nd p artition A as
T ra«
T he Irace o f a s quare matrix A is the sum o f the d iagonal elements a nd is
also e qual t o the sum o f the eigenvalues A"
( 6.1.I6b)
t r(A).
I"
all 
I I
where AIj h as size n, X nJ for i ,j = 1,2. T he c omponents AI} a re
A ll" [BII  BIlB221B21 r ' "" B ii' + Bii 'B Il A22 B2,Bii I (6. 1.1 1 a)
All= BiiIBu[BnB2IBiiIBurl BiiIBIlAu
A ll'" [ B 22  B21BIIIBur I = B n l + B n lB 21AIIBI2 Bn
A21 ==  Bn IB21 [BII  BIlBn IB21 r I ...  Bn IB 2I A"
(6.1.11c)
(6.1.11d)
T he d eterminant of B is
'
B 1 ... d lag [ BIB']
II 12
6.1.1.5
(6.1.I1f)
Positive Definite M atri«s
Necessary and sufficient conditions for the square symmetric matrix A t o
(6.1.19)
I t c an also be shown t hat the determinant o f A is equal t o the product o f
the eigenvalues,
(6.1.20)
6.1.1 Matrix Cakulus
I n p arameter estimation the
a p v ector o f p arameters,
operator VfJ by
f~rst
P.
derivative with respect to the elements of
is frequently needed. Let us define the
a
aP I
(6. 1.1 1e)
I f B u a nd B21 a re null matrices (only zero components). then the inverse o f
B is m ore simply given by
A,
II
(6.1.11b)
l
I"
(6.1.21 )
V fJE
a
ap,
Because VII is p x I i t must be applied t o the transpose o f a c olumn vector
o r t o a row vector. Let C be a n n X I c olumn vector which is a function o f p.
I
no
CHAPTER 6
MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
6 ..
I NTRODUCTION T O MATRIX NOTATION AND O PERATIONS
6.1.3
I II
Quadratic Form
T hen we o btain
aC I
aC2
afJl
a/31
(6. \.22)
[cl" ' cn ]=
v cT =
(I
I n estimation sum o l s quares lunctions a re l requently encountered. These
functions c an b e c onsidered to b e quadratic forms Q,
aCn
afJl
a
afJl
a
aC I
aC 2
aCn
afJp
a/3p
a/3p
a/3p
which is a p X n matrix a nd whose determina.n t is called the Jacob;~n.
M any times the matrix derivative is applied. to a scalar. ~onsld~r Vp
o perating upon the matrix p roduct AO where A IS I X n and 0 IS n X I , then
where III is a symmetric positive semidefinite matrix. Q is t ermed positive
definite if the conditions given in (6.1.18) a re s atislied b y III [ I, pp. 4852);
for these conditions Q is always nonnegative for all real values o f the
variable a j • I n some analytical considerations the equivalent expression for
Q.
Q =tr(IIIAAT)
= (V~>< IIAII >< nl)oln >< II + (V~>< IIOTII >< nl)ATln>< II
(6.1.23)
T he sizes of the matrices are indicated by the superscripts in brackets. O ne
a pplication of (6.1.23) is for A = CT , 0 =~. an~ C ap X I c~lumn v ector
whose elements are not functions of the /3 s. U smg (6.1.23) gives
(6.1.24)
is m ore convenient than (6.1.27).
Applying the o perator V p t o Q d efined b y (6.1.27). assuming t hat A is a
f unction o f P a nd III is not, gives
V p QV p (ATIIIA)=(V pAT)IIIA+ [ V p(ATIII)]A
I
Vp Q=2(V pAT)IIIA
since
(6.1.25)
A nother i mportant application of the matrix d~rivative i~ V{ I o perating
o n the product AO where A is a I X n m atrix t hat IS a f unction of p a nd 0
is a n n X m matrix which is n ot a function o f p. In the same m anner t hat
(6.1.23) was derived we o btain
(6.\.26a)
I
(6.1.30)
Aln>< 11= xln><pl plp>< II
w here X is i ndependent o f
p.
Vp Q=2(V pPTXT)IIIXP
w here (6.1.26b) is used.
(6.1.31 )
T hen s ubstituting (6.1.31) into (6.1.30) gives
T
Vp Q=2X IIIXP
(6.1.26b)
(6.1.29)
w here (6. \.23) a nd (6.1.26a) are used.
In linear estimation, A in the q uadratic form is given b y
I
I f A"" PT a nd 0 has the size p X m, (6. \.26a) then yields
(6.1.28)
I;
Q = (XP ) TIIIXP
(6.1.32)
(6.1.33)
222
C HAPTER' MATRIX ANALYSIS f OR LINEAR PARAMETER £S11MATION
6.1." Expected Value of Matrix and VarlanceCovarlance Matrix
6.1.1.1
......
6..
I NnODUcnON TO MA11UX NOTADON AND O PUAnONS
m
A n i mportant special case in connection with the covariance matrix
(sometimes this term is used rather t han v ariancecovariance) is for Y
c ontaining additive zero m ean errors e, o r
Expected Value Matrix
T he integral of a matrix is given by
(6.1.31)
1"
o
f al,,(t)dt
Let 1/1 be d efined as the covariance matrix o f Y. F or t he conditions given in
(6.1.31). 1/1 is also equal t o cov(e) o r
(6.1.34)
1/1 == cov(Y)  cov( e)
(6.1.38a)
where 1/1 is in detail
This result c an be used to find the expected value o f a column vector
E(e:>
i
y,,,)(I),
1/1 :
E (Y.)
E (ele,,)
(6.1.35 )
E (Y,,)
~(.'..) ]'
E(r~)
f Y"J( Y" )dY"
(6.1.38b)
w here J( Y,) is the probability density of Y,.
6.1.1.1
VarUurc~CovaritlMe
Matrix
A n i mportant application of the expected value o f a m atri" is the variancecovariance matrix of a random column vector Y which is
c ov(v''' )( I I) = E { [ Y  E (Y) ] [ Y _ E (Y) ] T
E
Il
I
}
1
Y I E( Y .)
:
[ YIE(Y.) . .. Y ,,E(Y,,)]
since 0 ,2_ V (r;) E(rl) . I n m any e stimation methods, 1/1 given by (6.1.38b)
plays a n i mportant role.
6.1.4.1
Coval'ialw
0 / L iMtu ComlHlllltiort 0/ Vector Random
Let zI",)(I) b e l inear in the r andom v ector e,,,)(I) o r
Y ,, E( Y,,)
(6.1.39)
)
cov( YI • Y2 )
cov( YI' Y 2 )
cov( Y 2• Y 2 )
T he m atrix G is not a r andom matrix. I t c an be proved that the covariance
matrix of z is given by
cov( Y.. Y,,)
cov( Y2' Y,,)
cov( YI' YI
(6.1.40)
(6.1.36)
cov( YI• Y,,)
cov( Y 2• Y,,)
N otice that this is a symmetric
VaMbies
n )(
n matrix.
cov( Y". Y,,)
where 1/Icov(e).
A d erivation of (6.1.40) is given next. Let
(6.1.41)
m
CHAPTER 6 MATRIX ANALVSIS FOR LINEAR PARAMETER ESTIMATION
6 .l
I NTRODUcnON T O MATRIX NOTATION AND OPERATIONS
6.1.S Model In Matrix Terms
a nd then E (ZZT) c an be written as
(6.1.42)
C ertain conditions regarding the model a nd assumptions c an b e expressed
more generally using matrix rather than algebraic notation. Let the dependent variable 1) b e given by the linearinparameters model
where ;.1 = 1.2•. ..• n a nd where we used
(6.1.51 )
(6.1.43)
a nd (6. 1.3b). Using the linearity property of the expected value operator.
E (afl + bf2) = aE ( f l )
+ bE ( E 2 )
(6.1.44)
where for the same dependent variable measured n times, the dimensions
o f 1) are [n X I]. a nd those o f X a re [n x p] . W e term X the sensitivity
matrix; it is a matrix o f derivatives with respect to the parameters. The
parameter vector fJ is [ f x I]. Thus we have
permits (6.1 .42) to be written as
E(ZZT) = [
X II
L L gij E(h
Jk )
glk] = GE(H)G
1)11
X l,
X II'
fJ.
fJ2
p=
Xy
X II1
(6. 1.52 )
fJp
(6.1.46)
E
need not be zero (2).
E quation 6 . 1.51 includes all the models given in C hapter 5. F or example.
for Model 5 (TJ;= fJ.XjJ + fJ2X;2) the X matrix is
[~II ~121
Expected Value o f a Quadratic Form
T he q uadratic form given by (6 . 1.27) a nd (6.1.28) is
X.
Q =AT cJ)A = tr(cJ)AA T)
(6.1.47)
Let A be an I n x I J random vector but cJ) be not a random matrix. Using
the expanded form of a matrix p roduct given by (6.1.3b) and the linearity
property of the expected value operator indicated by (6 .1.44), it can be
shown that
E ( Q ) = tr (cJ) E (AAT) }
_
XIII
.
(6.1.53)
X II2
I t is p~ssible t hat the terms Xjj could represent different functions o f the
same variable such as t ;. Some examples a re
X ;I= I .
(6.1.48)
In general the covariance matrix of A is equal to
X ;)=sin6t;
In other cases Xjj might be composed o f different coordinates such as time
(6.1.49)
... .
.
X, p
X Z2
X=
1J=
cov(z) = G cov( E)G T
where the expected values of the components of
1)2
(6.1.45)
X. 2
X 2•
1)1
T
S ince'" = E (H). we have proved (6.1.40).
A more g.eneral form of (6.1.40) which can also be proved is
6.1.4.4
115
I; a nd position Zi'
Using this equation in (6.1.48) a nd also using (6.1.47) yields
E ( Q ) = E (A T)cJ) E (A) + tr ( cJ) cov( A) }
This expression can be used to obtain an estimator for
0
2
(6.1.50)
for certain cases.
Similar to the last case 1J might represent the output o f a chemical process
which might be a linear function of air flow rate qj. cooling water inlet
temperature 1'; . a nd acid concentration C; ; here if there is a constant term
.0
••
226
CHAPTER 6 M ATRIX ANALYSIS FOR LINEAR PARAMETER E S11MAll0N
6.1 I NTRODUcnON T O M ATRIX N OTATION A ND OPERATORS
227
we have
would set
This same example involving these independent variables is given in
Section 13.12 o f Brownlee (3). C hapter 6 of Draper a nd Smith (4). a nd
C hapter 5 o f Daniel and Wood (5).
A nother situation that can be represented by ' I in (6.1.51) is called the
multiresponse case in the statistical literature. In this case ' I a nd X m ay be
given by
Unfortunately in most such cases the model is n onlinear in terms of the
parameters which are usually rate constants in the reaction problem just
mentioned. F or didactic purposes. however. let CA(/;) a nd CB(t;) b e given
by
CA(I,) 
P. I. (I,) + P212 (I;)
CII (I;)  P.Jl (I,) + I ll!" (I;)
' Ie I)
11 I ( i)
'1(2)
112 ( i)
where the ./.;(/;) a re known functions. Then the sensitivity coefficients are
where ' I ( i) =
' 1=
(6.1.54)
11", ( i)
' I(n)
X (I)
X I. ( i)
X(2)
X I, ( i)
( i)
X 2p ( i)
X li
where X (i)=
X=
X (n)
(6.1.55)
X",I ( i)
An example involving the partial differential equation describing heat
conduction is h eat transfer in a plate o f thickness L with temperatures
measured a t several locations as a function o f time. Suppose t hat the
observations a t m positions X j have been made a t times 11./2• •••• 1". T he
temperatures. which would be the dependent variables. are designated
T (x"I;) a nd then
X"" ( i)
for i I • ...• n ; j I • .. .• m
N ote that the variable XjA ( i). which is called a sensitivity coefficient. is for
the j th d ependent variable in 11. for the k th parameter. a nd a t the ith time.
Also observe that
i = I . .. .. n ; j = I • ...• m
F or the boundary conditions o f a known constant heat flux o f qo a t x =O
a nd no heat now a t x = L . the temperature distribution for large times is
given by
(6.1.56)
where
In the above definitions of ' I a nd X in (6.1.54) a nd (6.1.55) there is a total
o f n " times" a nd m d ependent variables resulting in ' I being a (mn X I)
vector a nd X having dimensions o f (mn X p). I f m = I. then the j subscripts
in (6.1.54)(6.1.56) a re dropped a nd replaced by i. o r X IA(i)=XiA a nd
')_ qo/,
JI('
L'
" I(i) ... ,,;.
Multiresponse cases commonly arise in parameter estimation problems
involving ordinary a nd partial differential equations. An example involving
ordinary differential equations occurs in chemical engineering when the
concentrations of several components present in a chemical reaction are
measured as a function of time. T o be more precise. let the concentrations
o f c omponents A a nd B b e designated CAU;) a nd CIIU/) a nd then we
T he quantity p is density. c is specific heat. a nd k is thermal conductivity.
The initial value (i.e .• a t 1  0) o f T is TC)t which is known. I f there are
measurements a t two locations (such as a t x =O a nd X " L ). m =2 a nd the
"ii) values are
j I.2
UI
C HAPTER 6
MATRIX ANALVSIS F OR LINEAR PARAMETER ESTIMATION
6.1
a nd the sensitivity coefficients for i = I • ... •n are
X I I ( i) = f l ( i),
X 21 ( i)=fl ( i),
5.
X 12 ( i) = f2 ( I )
X 22(i)=f2(2)
6.
Actually this case could also be considered a single response but multiple
sensors are required for m > I . Many other examples can be generated.
6.1.5.1
Identifiability Condition
In order to uniquely estimate a ll p parameters in the fJ vector it is necessary
that
7.
(6.1.57)
8.
for estimation problems involving least squares or maximum likelihood.
This criterion is analogous to those given by (5.1.2) for some simple
models. When maximum a posteriori estimation is used, it may not be
necessary that (6.1.57) be true. See Appendix A.
6.1.5.1 Assumptions
T he s tandard assumptions given at the end of Section 5.1 are repeated and
amplified below. In making this list all cases considered in this text are
included but doubtless many other possibilities exist that are not explicitly
covered below.
I.
2.
~
.
.....
r~
c..~
Y = E (YI fJ) + e = 'Jf(X.fJ) + e (additive errors in measurements) (6.1.58)
No. measurement errors are not additive.
I Yes, measurement errors are additive. Also the regression function
'Jf is correct and does not involve any random variables.
o
E (e) = 0 (zero mean measurement errors)
(6.1.59)
No. the errors in Y do not have zero means. In other words, the
errors are biased.
Yes, the errors in Y have zero mean.
3. V ( Yil f J)= 0 2 (constant variance errors)
(6.1.60)
o No. V ( Y,I fJ)~constant; that is. V ( Y,I f J)=
1 Yes. the errors Ei , i = I. .. . , n have a common variance 0 2•
4. E { [ f j  E ( fj)] [ f j  E ( s) J} = 0 for i~ j (uncorrelated errors) (6.1.61)
o No. the errors Ej are correlated.
1 Yes. the errors f j are uncorrelated .
o
or
I NTRODUCTION T O M ATRIX N OTATION A ND OPERATIONS
2 No, the errors are described by an autoregressive process. See
Section 6.9
e has a normal distribution (normality)
(6.1.62)
o No, the distribution is not normal.
I Yes, the distribution is normal.
Known statistical parameters describing e
(6.1.63)
o No, y,=cov(e) is known only to within an arbitrary multiplicative
constant, or y,= 0 20 where 0 2 is unknown and 0 is known.
I Yes. y ,=cov(e) is completely known.
2 No, but £ is known to be described by a n autoregressive process
although the associated parameters are unknown. See Section 6.9.
3 No, y ,=cov(e) is completely unknown.
Errorless independent variables [ V ( Xij) = 0 f odinear models] (6.1.64)
o No, there are errors in the independent variables.
I Yes, there are no errors in the independent variables.
fJ is a constant parameter vector and there is no prior information
(nature of parameters)
(6.1.65)
o No, fJ is a random parameter vector and the statistics o f fJ are
unknown.
Yes, fJ is a constant parameter vector and there is no prior
information.
2 fJ is a r andom parameter vector with a known mean o f I' a nd
known covariance matrix Vp ' Also fJ is normal. The random
vectors fJ a nd Y are uncorrelated. All the measurements are considered to be from the same " batch."
3 fJ is a constant but unknown vector but there is subjective prior
information. Since fJ is unknown, the prior information regarding fJ
is summarized by fJ b~ing N ( I'p, Vp) a nd by cov( Pi' lj) = O.
As indicated in Chapter 5 the eight standard assumptions are denoted
11111111.
The estimation problems are generally less difficult (if the " best" estimates and their variances are of interest) when the standard assumptions
are valid. I f, for example, the covariance matrix y, o f the errors is completely unknown and there are no repeated observations, the complete y,
matrix cannot be estimated along w ith' the parameters. From repeated
observations, investigations o f the measuring devices. and familiarity with
estimation in such cases, the y, matrix gradually becomes better known,
however. Then estimators other than OLS can be used although a nondiagonal y, does introduce complications; see Sections 5.13 and 6.9. I f Y, is
completely unknown in a multiresponse case, Box a nd D raper [6] recommend the use of a different method than OLS; see (6.1.73a).
.,
231
C HAPlER' MATRIX ANALYSIS f OR UNEAR PARAMETER f S11MAnON
'.1.6 Maximum Ukellhood Sum or Squares FundioDS
I N'nODUcnON 1 '0 MATIIX NOTAnON AND O PlllADONS
1.1.1.1 & otrallhpadertl JllltWIt. (MIIltiIYIpOIUe)
Suppose that the measurement errors are additive. and have zero mean.
normal distribution. known statistical parameters. and errorless independent variables. Also there is no prior information. This case is designated 111111.
The probability density function ror the above assumptions is
f (VIII) =(21t)  N'21~1'/2exp[  (V 1I{~1 (Y  tl)/2]
(6.1.66)
where ~  I is the inverse of ~ given by (6.1.38) a nd N is the total number of
observations. Before the data are available 1(YIII). the probability density
of Y given fJ. associates a probability density with each different outcome
Y of the experiment for a fixed parameter vector II. After the data are
available. we wish to find the various values in II which might have led to
the set of measurement Y actually obtained. In the maximum likelihood
method the likelihood junction L ( lilY) is maximized; L ( lilY) has the same
form as 1(YIII) but now II is considered variable and Y is fixed .
The procedure for forming estimators from L ( II I Y ) usually first involves taking the natural logarithm of (6.1.66) to get
1(IIIY)lnL(IIIY)~ 
U
i[Nln(21t)+lnl"'I+SMLJ
231
Q r.w
Suppose t hat m different dependent variables are measured a t n different
discrete instants. F or example. temperature and concentration might be
recorded with time. The formulation below is also appropriate if the same
physical quantity. such as temperature. is dynamically measured utilizing
several thermocouples each positioned a t a different location.
In multiresponse cases, the notation given by (6.1.54) is u sed for 11 a nd a
similar expression is used for the nan observation column vector, Y. Let the
inverse of ' " be W which is partitioned into a symmetric n x n matrix with
components being m x m matrices.
~(I' I )
w .
W (I,2)
W (I,n)
W (2,n)
[
T(I.O)
l'
(6.1.69)
W (n,n)
Then the maximum likelihood sum o f squares function is
(6. 1.70)
(6.1.67a)
where
(6.1.67b)
When wit is known as assumed above. maximizing L (IIIY) is equivalent to
minimizing SML' Several possible algebraic forms of SML are given next.
which can be simplified further for independent errors. F or example. let
the errors be zero mean and independent in "time," t hat is, E[I!(i)I!(j)T]_
o for i '*j. Then (6.1.70) reduces to
SML 
6.1.6.1
Smgle I Hpnukttt JltI,u,ble (SiIIgle
R~$pOIIM)
C IIR
Let ' I represent a single dependent variable that is measured at N  n
different times o r conditions. For correlated errors, wit  I is not diagonal.
Let the inverse of wit be given by the symmetric matrix W with components
WI/, F or this case SML is given in algebraic form by
"
(6.1.71)
I f. in addition, the f)(i) values are independent for a given i, that is,
EI~(i)!k(i)JO for j '* k, SML further simplifies to
I II
"
SML  ~ ~ [ lj(i)1Jj(i)Ya;2(i)
( 6.1.72)
) 1 i  I
"
S ML ~ ~ ( Y;1JI)(Yj 1J)Wij
(6.1.68a)
I I j  I
I f ~ is diagonal with components
f [Y(i)1I(i)(W(i,i)[Y(i)1I(i)]
I I
a1, then
"
SML ~ ~ ( Y,
1Jio, 2
(6.1.68b)
I I
F or further discussion of SML for single response cases. see Section 6.5.
since W.u(i)o.u~i).
In this section, SML is given with wit a nd its inverse assumed known.
Exactly the same estimates for the physical parameters (fJ) are obtained if
~ is known only to within a multiplicative constant, that is. ~ 0 20 where
1
0 is unknown b ut 0 is known. This case is designated 111011. Should
the ~ matrix contain more unknown parameters than 0 1, the ML procedure becomes more complicated. See Section 6.9.5.
231
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
When the y, matrix is completely unknown, Box a nd D raper (6) recommend for the multiresponse case that the following determinant be minimized,
6.1
I NTRODUcnON T O MATRIX NOTATION AND OPERATIONS
rTfJ, say, which can be expressed as a linear combination o f the observations, the one with minimum variance is fTbMV '
T o s tart the proof, we note that b MV is simply expressible in terms o f fJ
a nd I:
. (6.1.76)
(6.1.73a)
where
(6.1.17)
where SIJ is given by
so that
(6.1.73b)
E ( b MV )'" fJ + A E (I:) = fJ
)
F or m = I , this reduces to ordinary least squares. F or m ;;' 2 the minimization of (6.1.73a) must be accomplished by the solution of a set o f nonlinear
equations even when the model is linear in the parameters. For other
discussion of multiresponse cases. see Sections 6.7.5 a nd 7.8 a nd the
Hunter paper (1).
6.1.7
(6.1. 78)
o r in words, bMV is a n unbiased estimator of fJ. T he covariance matrix of
bMV is
cov(b MV )= E[ (Ae)(Ae)T] =
E[ Aee T AT] = AOA T 2 =(X TO 1X)  la 2
a
(6.1.79)
GaussMarkov 1 beorem
An important result in estimation is the GaussMarkov theorem: if 11 = XfJ
is a linear regression function. if errors are additive and have zero mean. if
the covariance matrix of E, y,. is known to within a multiplicative constant
('" = Oa 2). a nd is positive definite, if there is no error in the independent
variable. and if the estimation procedure does not use prior information
(assumptions 11011), then of all unbiased estimators of any component
Pi of fJ which are unbiased a nd which are linear combinations of the
observations, the component biMV of b MV has minimum variance where
Consider now the scalar eTb MV ' It follows from (6.1.78) a nd (6.1.79) that
(6.1.80)
a nd
(6.1.81 )
Next we consider any unbiased linear estimator of ,TfJ. It c an be written
as
(6.1.74)
(6.1.82)
The covariance matrix of b MV is
(6.1.75)
Since we are usually more interested in the use o f o ur estimates of fJ in
estimating expected values of Y o r o ther linear combinations of the
components of fJ, we shall prove a stronger theorem which includes the
one just stated. namely: under the conditions stated above. of all unbiased
estimators of any particular linear combination of the components of fJ,
where C is a vector of constants.
Since rTb MV + C Ty is unbiased a nd since
(6.1.83)
we see that CTXfJ is s imply zero. Since, by definition o f unbiased
estimator, this condition cannot depend o n the value o f fJ, CTX must be a
. ...
2J4
CHAPT£R 6 M ATRIX ANALYSIS FOR LINEAR PARAMETER £ SJ1MAnON
6.2 LEAsT SQUARES F SnMAnON
r .J
~. null vector. The variance of fTbMV+CTy is
V(fTbMV+CTy) . . E (fT(XTO'X)  IXTOle+CTe)
= E (fT(XTOIX)  IXTOIU
e qual t o zero. Using (6.1.30) a nd (6.1.26b) we g et
1
V~SLS2[V ~(y _ XI)T][y  XI)]
V~(Y  XI) ) T 
T OIX(XTOIX)  I,)
( 6.2.2)
(6.2.3)

V/JIJ T XT _

XT
a nd thus (6.2.2) equated to z ero a t I)  bLS p roduces
+ E (CTu TC) + 2E(C T T OIX(XTOIX)  If)
ee
=
fT(XTOIX)  1'a 2 +
C TSlCa 2 + 2C TX(X TOIX) 1 101
;
estimator,
6 .U
I
bu 
(XTX)  I XTY
I
(6.2.5)
This estimator requires for unique estimation o f all the p p arameters t hat
the p X P matrix XTX b e n onsingular o r IXTXI + 0. T his m eans t hat a nyone
c olumn in X cannot b e p roportional t o a ny other column o r a ny linear
combination o f o ther columns because i f s uch a proportionality (i.e., linear
dependence) exists, IXTXI  O. (See Appendix A a t the e nd o f t he text.) T he
c ondition IXTXI+O also requires t hat n, the n umber o f m easurements o f
~, b e c:.qual t o o r g reater than t he n umber o f p arameters p . I f t he predicted
curve ~ is not to pass through each observation i t is further necessary t hat
L EAST S QUARES E STIMATION
I n o rder to obtain parameter estimates using ordinary least squares (OLS).
n one of the standard assumptions need be valid. Instead a prescribed sum
o f squares is always minimized. When nothing is known regarding the
measurement errors. OLS is recommended. T he r eader should be made
aware. however. that he may know more a bout the measurement errors
than h e first thinks. After analyzin., similar sets of data. for example, he
c an learn much regarding the errors.
When information regarding the measurement errors is p resent a nd a ;l is
far from constant, some estimation procedure other than OLS is recommended.
(6.2.4)
Pr~multiplYing b y t he inverse ( XTX)I results in the ordinary least squares
(6.1.84)
.
Since CTX =O. the third term is o T he first term is unaffected by a choice
o f C. Since Sl is positive definite cTac is positive unless C is the null
vector. T hus no unbiased linear estimator of ,TIJ has a variance less than
the variance of ,Tb MV . Hence ,Tb MV is a minimum variance unbiased
linear estimator of ,TIJ a nd in p articular biMV is a minimum variance
unbiased linear estimator of /JiMV .
U
 XTy + XTXbLS  0
n >p+1.
Estimators obtained from (6.2.5) for Model 2,
i
"1 /lXI'
a nd M odel S,
' 11 /lIXII + /l2 X I1' a re those obtained in C hapter 5; i n particular see Table
5.3. F or t he single response case a nd using X given by (6.1.52), XTX a nd
X Ty a re given by
(6.2.6)
OrdInary Least Squares EstImator (OLS)
T he sum o f squares function used for ordinary least squares with the linear
model '1  XIJ is
symmetric
~. XI'Yll
(6.2. 1)
X Ty_
which is a q uadratic form . T he principle of least squares asserts that a set
of estimates of parameters c an b e obtained by minimizing SLS. T his is
accomplished by setting the matrix derivatives o f SLS with respect to I)
[
l :X"YI
where the summations are from I t o n.
(6.2.7)
136
CHAPTER 6 MATRIX ANALYSIS FOR UNEAR PARAMETER ESTIMATION
.
.'
T here a re m any p ossible a pplications 0 f O LS . O ne is t o find a c urve
.
A
h . t o find p arameters
w hich is " best" fit t o s ome a ctual d ata. n ot e r IS
.
h aving p hysical significance. O ne c an a lso use O LS t o pas~ a n a pproximate
c urve t hough s ome p oints t hat h ave b een a nalytically derived.
6.1 LEAST S QUARES ESTIMATION
S olution
The model can be written in the more familiar form
where Y = log N u, 11 = E ( Y), a nd I  log Re. In terms o f Y a nd
E xample 6.2.1
Give the components of the XTX and XTy matrices for the model
'I, = PI
'i
Y;
+ P2 1, + Pll/
I
 0.347
2
0
 0.076
3
I
0.263
4
2
0.708
5
3
1.20
I,
the data are
6
4
1.75
7
5
2.39
which is equally spaced in I . The X T matrix is given by
S olution
T
In this example X is
X
II
'2
I~
I.
X=
II
•
I II
symmet;ic
It'l
I I:l
,
X Ty=
Itj~j
[IV 1
E xample 6.2.2
2
Also give N u as a function o f Re a nd compare with the below given values.
fv
Nu
I
3
9
I
4
16
~l
25
56
224 ] ,
980
X Ty_
5.885 ]
24 .57
[ 101.3
The inverse of the symmetric XTX matrix is the symmetric matrix (see (6.1.11»),
 1176
3724
 784
7~1
196
Then from b u(XTX)IXTy we find the estimated parameters to be
1 0gNu= PI + P210gRe+ III (JogRe) + f
Re
14
56
224
I llY j
I t:
Many experimentally based heat transfer correlations are given by the Nusselt
number (Nu) as a function o f the Reyr.olds (Re) and Prandtl (Pr) numbers. One
such correlation is for flow normal to heated cylinders and wires. Even though Nu
may vary several orders of magnitude. typically the variation in the measurements
for Nu is ± 30% irrespective of the magnitude of Nu. On a log log plot of Nu
versus Re for the geometry mentioned. the data appear to be described by a second
d ree curve and the variance of logNu is nearly constant. For the below values
g :!n by Welty (8. p. 268). use OLS to find the parameters in the Iinearintheparameters model
.  .J
[I~
56
[ n It,
I
2
4
11
X TX_
x Tx=
I
0
0
Note that the rows in X T (or columns of X ) are not proportional o r even nearly so.
This indicates that little difficulty will be encountered due to XTX being nearly
singular. Using the results o f Example 6.2. 1, the XTX a nd X Ty matrices are found
to be
and then
,...
c[ l
0.1
0.45
\0
0.84
1.83
102
loJ
lot
105
5.1
15 .7
56 .5
245
bl"s  [  0.0734
0.314 0.0358]
In o rder to express the model in terms o f N u, let I llz:logB which produces
A... 0.844. With B in the log Nu expression we can write
N u , . BReb2 + b , I oaR. ... 0.844 ReO.ll .. + 0.03511", R .
Values obtained using this expression are given in the I hird column o f Table 6.1. A
comparison of the recommended a nd N u values shows an agreement within ± 3%.
Note that the largest residual ( Nu Nu) occurs at R elQ5 a nd is 0.9. which is
much larger than those for small Re. I f a n OLS analysis on N u ( and not 10gNu)
had been used. the magnitude of the residuals would have been more uniform over
the Nu range, causing the relative differences in columns two a nd three of Table
6.1 to be much larger for the small Nu values than the large values. Hence in this
case in which the relative differences in the given a nd predicted values o f N u were
to be nearly constant, OLS estimation o n log Nu produced the desired result
2JI
CHAPTER' MATRIX ANALYSIS FOR LlN£AR PARAMETER £S11MAnON
U
whereas O lS on Nu would nol. Furthermore the transformation from Nu and Re
to 10gNu and log Re. respectively. results in a linear estimation problem while the
problem in terms of Nu and Re is nonlinear. These data are considered further in
Example 6.3.1.
T able 6.1
UAS1' SQUAUS I'S11MADON
s ince AX  I. T hen c ov(bu) i l
c ov(b u ) 
E[ ( bu  fJ ) (bu

fJ )T]
 E[ (IJ+Ae fJ)(IJ+Ar fJ )T] AItA T
Given a nd O LS Values l or
Nu; l or Example 6.1.1
since
It E(u T).
(6.2.10)
Utilizing t he definition o f A t hen gives
Nu,
Re
0.1
I
10
100
lol
I~
lOS
6.1.1
Given
Values
0.45
0.84
1.83
5 .1
15.7
56.5
245
(6.2.11 )
Values
0.4452
0.8445
1.889
4.983
15.50
56.85
245.9
OLS
in which we also introduce the symbol P u '
W ithout the a dditional s tandard a ssumptions o f u ncorrelated a nd c onstant v ariance measurement e rron (111111), t he OLS e stimator does not
provide the minimum variance estimator. F or this reason. bLS is said to be
n ot efficient. S uppose t hat the s tandard a ssumptions o f 111111 a re valid,
then ." is given b y
a nd we have
Mean of the 0 1.8 E stimator
In order to obtain the p arameter v ector b lS it is n ot necessary to invoke
any of the standard assumptions. However, if some statistical statements
are to be made regarding b LS • we m ust know several facts regarding the
errors. Let there be additive. zero mean e rrors in Y a nd let X a nd P b e
nonstochastic (1111). T hen the expected value o f bLS is
which is the m inimum c ovariance matrix o f bLS a nd t hus for these assumptions, which include additive, zero mean, un c orrelated, a nd c onstant variance e rrors, bLS d oes provide a n e fficient estimator. (See Section 6.1.1 o n
t he G aussMarkov t heorem.)
I n a ddition t o n eeding the covariance o f t he parameters, o ne m ay desire
the covariance matrix o f a c ollection o f p redicted points o n the regression
line o r s urface. U sing (6.1.46) with t he a ssumptions o f a dditive, zero mean
e rrors a nd n onstochastic X a nd fJ (1111) we find for the set o f points
represented by Y X,b LS,
£ (bLS) = E [ (XTX)  'XTy] = (XTX)  'XTE(Y)
= (XTX)  'XTXP_ fJ
( 6 .2.8)
With the four assumptions given above. the least squares estimator is thus
shown to be unbiased.
6.1.3
I
VarianceCovarlance M atrix o f bLS
(6.2.12a)
I
See Fig. S.1. F or i ndependent, c onstant v ariance e rrors this expression
reduces to
I
,
(6.2.12b)
1
W ith the use of the four assumptions denoted (1111) a nd utilizing
(6.2.8) we c an find the covariance of b LS. Let
bLS = A(XP+ t ) = fJ + A r
cov(Y)  cov(X,'b u )  X, c ov(bLS)xf
J
(6.2.9)
I
I
i
T he d iagonal elements o f this square matrix give the variances o f 'f"~ F or a
140
CHAPTER 6
MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
typical element o f (6. 2. 12b) see (5.2.41 a), which is f or the simple model
TJ = f30 + f3 I X .
6.2.4
141
6.1 LEAST S QUARES ESTIMATION
a q uadratic form in terms o f f . T hen using (6.1.50) with £ (£)0 a nd
cov( £) ... 0 21, the expected value o f RLS is
E (RLS ) = t r[ 1  X(XTX) IXT]02
Relations Involving the Sum of Squares of Residuals
F or o rdinary least squares with the linear model1J = XIJ, the minimum sum
of squares is
(6.2.13)
which is the ~um of squares of residuals. ( It is also called error sum of
squares and designated SSE.) By expanding R LS a nd employing the bLS
expression given by (6.2.5), we can also write
(6.2.14a,b)
= t r(I )0 2
tr[ (XTX) I(XTX) ]0 2=(11 p)02
(6.2.18)
I t follows that a n unbiased estimate o f 0 2 (provided the assumptions
111111 are valid) is
R_
s 2=a 2 _ LS
n p
(6.2.19)
This is a generalization o f the results in Chapter 5 where p = I a nd 2.
A s ummary o f the basic equations for OLS estimation is given in
Appendix B.
since (6.2.4) and the scalar nature of R LS can be used to obtain
yTXb LS = b[SXTy = b[sXTXb LS
(6.2.15)
The expression given by (6 .2.14a) can be convenient to use for numerically
evaluating RLS because the individual residuals need not be calculated a nd
also because XTy is known from the b LS calculation as indicated by (6.2.5)
a nd (6.2.7).
F or the standard assumptions of additive, zero mean, constant variance,
uncorrelated errors; errorless X matrix; a nd n onrandom parameters
(111111), the expected value of R LS given by (6.2. 13) is o btained next.
Substituting (6.2.5) into (6 .2.13) gives
R LS = ( y  X(XTX)  IXTY ) T( y _ X(XTX)  I XTY )
y T(I_ X(XTX)  IXT)y
I f in addition to the assumptions used above, the errors f l , ; = I, . .. , n have
a normal probability density, several theorems can be s tated regarding RLS
a nd b LS ' T hey will be presented without proof.
Theorem 6.2.1
The quantity RLS/ a 2 has the
~emm
x2(n 
p)
distribution.
6.2.2
The vector bu IJ is distributed as N (0, a 2(XTX)  I).
1beoremm 6.2.3
Let Obi be the ith diagonal element of (XTX)IS1. Then the quantity (bt • LS  Ili)/ Obi
has the t (n  p) distribution.
=y T(I_ X(XTX)  IXT ) T ( 1_ X(XTX)  IXT)y
=
6.2.5 DlstributloD of R LS and bLS
(6.2. 16)
since 1  X(XTX)IX T is symmetric a nd idempotent. Let us substitute
XIJ + f for Y in (6 .2. 16) to find
(6.2. 17)
Another theorem is c oncerned with testing whether certain subsets o f the
parameters are needed in the model. Let the parameter vector fJ be
p artitioned so that IJ T=[fJr fJJl a nd let X b e p artitioned conformably so
that X =[X 1 X2 ]. T hen Y c an be written as
(6.2.20)
where IJI c ontains p  q elements, say, a nd fJl contains q elements. Assume
that the hypothesis to be tested, simultaneously specifies all the compo
U2
CHAPTER 6 MATRIX ANALYSIS FOR U NEAl P ARAMEttR f STIMAnON
:. nents of 112' O ne possible hypothesis is t hat fJ2 = 111 where fJf c ould be a
z ero vector. T he test is b ased upon the relative reduction in the residual
sum of squares when all p p arameters are estimated as compared with the
...residual sum o f s quares when 11. is estimated with 112 =11! If the reduction
.. js large. the hypothesis that 11 2 '" 111 is untenable.
.. T he least squares estimate for all the parameters is the usual expression
(6.2.21)
which has the associated residual sum of squares corresponding to (6.2. 11)
of
U
L UST SQUARES I'.S11MAnON
used t o investigate whether the model should include the X2fJ1 t erms; in
this case if the ratio in Theorem 6.2.4 is greater t han F ._.(q,np), t hen
we h ave a n i ndication that the X2112 t erms may be n eeded .
Example 6.1.3
A solid copper billel 1.82 in. (0.0462 m) long a nd I in. (0.0254 m) in diameter was
heated in a furnace a nd then removed. Two thermocouples were attached to the
billet. The temperatures. f " given by one thermocouple are given in Table 6.2 as a
function o f time. See also the plot o f f , versus time shown in Fig. 6.1. T he other
thermocouple gave values about 0.2 P (0.11 K ) larger for smaller times and
decreased gradually to about 0.14 P (0.08 K ) larger a t 1536 seconds. F or a
c onstant temperature environment the thermocouple temperature readings have a n
estimated standard deviation of 0.03 FO (0.017 K ).
(6.2.22)
Now suppose that 112 were set equal to
b •. LS( 111> a re obtained.
111 a nd estimates of 11 •• d enoted by
Theorem 6.2.4
T he statistic
R (bl.l.5'~ ' LS)} / q
R ( b l. LS.b2.1s)/(n  p)
o
96
192
288
384
480
576
672
768
864
5
6.
7
112  111. Now we give the theorem.
).lIrJ 
( sec)
I
3
4
(6.2.23b)
{ R [bl.Ls( l it
T ime
2
which produces the R expression.
F
f"
Observation
No.
(6.2.23a)
when
Table 6.1 D ata'or Example 6.1.3
l lR / q

R (bl.LS.bz.LS)/(n  p)
(6.2.24)
has the F 1_.(q.n  p) distribution. See Section 2.8.10 for an F table. Also see
Section 5.7 for other discussion.
F or a proof. see Goldfeld a nd Q uandt (9. p. 46).
In using Theorem 6.2.4. the hypothesis that 112" 111 is tested a t the a
level o f significance by comparing F o f T heorem 6 .2.4 with the critical
value F ._ .. ( q.np). I f this critical value is exceeded. the hypothesis that
111 111 is rejected. N ote that the particular null vector of 111'" 0 c an be
8
9
10
II
12
13
14
IS
16
17
9 60
1056
1152
1248
1344
1440
1536
T emperature
o f Billet (OF)
279.59
264.87
251.53
239.30
228.18
217.24
207.86
199.36
191.65
184.44
177.64
171.41
165.04
159.89
155.19
150.78
146.68
Find a satisfactory power series approximation to the temperature history o f the billet
given in Table 6.2. Assume that all the standard assumptions except known a l are valid.
Use the F test at the S% level o f significance.
z.u
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
"
280
410
0
0
0
0 .2
0
0
0
"
10
400
\.
260
390
\
..".
L
0.
 0.1
 0.2
' ;;
"
'0
:;;
8.
\
Table 6.3
\~ ,~,,,"""
Degrees of
Freedom (d.f.)
R(PLS)'
Residual
Sum of
Squares
Mean
Square. S2
(S2= R ld.f.)
I lR
I lRIs2
16
15
14
13
12
II
27624.667
894.491
16.1613
1.0959
0.7186
0.5781
59.633
1.1544
0.0843
0.0599
0.0526
26730.176
878.33
15.0654
0.3773
0.1405
448.3
760.9
178:8
6.30
2.67
200
360
180
350
160
340
F=
6 t • 96 seconds
140
Figure 6.1
Sum of Squares and F ratio for Example 6.2.3
I
2
3
4
5
6
::0
.i
...
OJ
E
><
to calculate the parameters and residual sum of squares. R . for p a nd p  I
parameters for p "" 2. 3. 4. .... Then an F test based o n Theorem 6.2.4 is used. In
each case we have q = I a nd the hypothesis is / 3;=0. T he sum of squares. R LS• for
p I to 6 are listed in Table 6.3. T he mean square (RLS divided by the number of
degrees of freedom) is also given. The F ratio is the I lR value in the table.
R (bl . .... b p)R(bl . .... b pI). divided by the mean square. S2. This ratio is compared with F. 9S (1. 1 7p). which is 4.75 a nd 4.84 for p 5 a nd 6. respectively. Then
a t the 5% level of significance. the p "" 5 model is selected. (This decision is based
on the standard assumptions. 11111011. being valid.) The parameters correspond·
ing to equation a in English units are 279.660.  15.451.0.71350.  0.023024. a nd
0.00039479 for b l . ... • b,. respectively.
0.
L
O
....J
0
0
145
L EAST S QUARES E STIMATION
p
L
L
0
.
"0
 0.1 ' ;;
::0
0
 0.2 ':;;
 0.3 OJ
'"
 0.4
\
\.
.;
.....
~Res1dU'1
~
0
'\
"
0
370
0
0
\
220
.;
,
\
"\
'
240
380
><
0
0 .1
0.1
0
U
0
6
8
10
12
No. o f t1me s teps. 1
14
The residuals for the p = 5 model are shown in upper portion of Fig. 6.1. They
seem to be somewhat correlated and are even more so for smaller time steps. (See
Fig. 6.6.) T he assumption of uncorrelated measurements may not be valid. Hence
the selection of a fiveparameter model based on the uncorrelated errors assumption may not be correct. The p = 4 model (cubic in I) might be actually more
appropriate.
The inverse of the XTX matrix is
.16
Temperatures and residuals for cooling billet example (Example 6.2.3)
Solution
The suggested model for the temperature in this case has the for m
0.7853
Y= /31 + /32/+ /33 /2 + . .. + /3plpl+ e
XTX
I f the actual values of I as given in Table 6.2 are used. the components of the
matrix would have disparate values. For example. the first diagonal term is 17 and
the fifth is 2.07 x lOll. For this reason. the model is written as
I
I
Y=/3I+/32 (  II ) + . .. +/1P ( II
I
I
), 1
+l
( a)
where I II is 96 sec.
There are many ways of determining the "best" model. One way for this model is
(XTX)  I =
symmetric
 0.5581
0.6697
0.1174
 0.1709
0.04739
 0.00945
0.01525
 0.00044
4.305 x 1 0 4
2 .58x 1 0 4
 4.43 x 1 0 4
1.34 x 1 0 4
 1.32x 1 0'
4 .13x 1 0 1
I f the assumptions 11111011 are valid. the estimated covariance matrix of bLS is
( XTX)t. given above. multiplied by S2. which is 0.0599 (the mean square value in
Table 6.3 for p = 5). From these values we can find. for example. that the estimated
standard error of hi is (0.7853 (0.0599»)1/2=0.217 a nd that of b , is 0.000157. A
246
".:
comparison of these values with b l a nd bs shows that the relative uncertainty in b ,
is much greater than for b l'
T he value of .,  (0.0599)1/2  0.245 FO is worthy of comment. First, for p  6, "
a nd 8, its value would not decrease greatly. Next, this value of 0.245 is considerably
larger than 0.03 found for a constant, low temperature environment. I t is, however,
d ose to the difference in temperature between the two thermocouples. At the
higher temperatures the specimen's temperature may be more sensitive to air
currents, etc. Also the calibration error over the whole temperature range is greater
than 0.03 P . F or these reasons additive errors with as large a standard error as
0.24 are quite possible.
J41
6.1 LEAST SQUARES £ STIMAnON
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER t snMAnON
2.4
•
•
2.3
•
....
,
+'
....
,
2.2
 13
2.1
lS!
.
....
•
•
••
~
•
+'
Example 6.2.4
I II
Using the model for temperature found in the preceding example. estimate h in the
differential equation.
dT
pc VhA(T ..,  T)
dt
This equation describes the temperature T of the billet. assuming that there is
negligible internal resistance to heat now. The density. p. of copper is 555 Ib m/H3
(8890 kg/m3); c is the specific heat and has the value of 0.092 B tu/lbm·oF (0.385
k J/kg.K); V is volume; A is heated surface area; h is the heat transfer coefficient;
and Too is the air temperature.
S olution
Solving the above equation for h gives
•
h
pcV
dY
A(Too Y) dt
where the true temperature T has been replaced by the estimated value Y. T he
ratio V / A is found to be 0.01634 H (0.00498 m). The temperature Y a nd the
derivative is obtained from the power series model with p  5. The derivatives are
calculated using
c S!
 14
•
•
......
::J
N
.
·
2.
••
2.0
....
.
c S!
•
 11
•
0
F lpre 6.1
2
•
4
.
6
,
,
.
8
10
12
Time step fndex, f
•
,
14
16
Heat transfer coefficient u a function o f time for Example 6.2.4. ( itd4t).
One advantage of this method o f analysis for this problem is that it is not
necessary to specify a functional form for hi versus time. Another is t hat the
estimation for hi is a linear problem. Some disadvantages are that (1) a functional
form for T is needed; (2) sometimes this method yields h values that are more
variable than are expected; a nd (3) the method is not "parsimoniOUs" in its use of
parameters. The method that is usually recommended for this problem involves the
solution of the differential equation given above. This,would give T as a nonlinear
function of time and h a nd thus nonlinear estimation would be needed. See Section
7.5.2.
6.2.6 Weighted l ast Squares ( WLS)
("'>
where the parameters are given in the preceding example a nd tot  96 sec or 0.02667
hr. The temperature Too is the room temperature which is nearly constant at 81.5
OF (300.6 K).
The hi estimates are depicted in Fig. 6.2. From knowledge of the heat transfer
process we know that the magnitude of the hi values are reasonable and that they
should drop with the temperature difference, T  Too·
~
•
1 .9
1 .8
...E
I
 12
There are cases when the covariance matrix of the errors
is not known
but yet the experimenter wishes to include some general information
regarding the errors. He might know, for example, that as Y increases in
magnitude the variance of ~ also increases. This was the c~se discussed
with the heat transfer example in Example 6.2.2 in which o J Y/ was about
constant. I t might be appropriate to assume that some symmetric weight~ng matrix IrJ is given. This matrix m ayor may not be equal to " ,I, the
Inverse o f the error covariance matrix.
2a
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER F SllMAll0N
In this estimation problem, the function to minimize for the linear model
o f" = XP is
S WlS
= (Y 
Xp ) T" ,(Y 
Xp )
6.3 ORTIIOCONAL POLYNOMIAlS IN OLS E S11MAll0N
F or these orthogonal polynomials, OLS yields
(6.2.25)
which yields the estimator
(6.3.4)
T he first few orthogonal polynomials satisfying the above conditions are
bWLS = (XT",X)  I XT" ,y
(6.2.26)
Using the standard assumptions of additive, zero mean errors a nd errorless
X a nd p (1111), the covariance matrix of b WlS can be shown using
(6.2. 10) to be
(6.3.5)
(6.2.27)
where t =Itdn. Notice that the p itl) values are dimensionless a nd a re
functions only of n (for given j a nd ; values). A recursive scheme for
obtaining Pj+ 1(1;) in terms of Pj(t/) is
6.3 ORTIIOGONAL POLYNOMIALS IN O LS ESTIMATION
The polynomial regression model with evenly spaced (in t) observations
can be analyzed with certain computational simplifications through the use
of orthogonal polynomials. There are many different sets of orthogonal
polynomials that could be employed. Some are appropriate for use with
uniform spacing in the variable t. Others have been developed for other
particular spacings a nd for weighted least squares, Himmelblau [10, p.
152}. A set of orthogonal polynomials is given below for OLS a nd uniform
spacing.
Suppose that the model is given by the polynomial which is a function of
a single independent variable,
) 1,2, . ..• ; 1; ; = 1.2•...• n
( 6.3.6)
en
The starting conditions o nj a re pJ,.tl ) = 1 a nd P I(/;) =;
+ 1 )/2.
I f desired, after the parameters ao,al, . .. , a, have been estimated, one c an
find the Po, ... ,p, from the cij values. F or example, for r O, I, a nd 2 we
have [since (6.3.1) 'a nd (6.3.2) a re identical polynomials in I)
r=O:
Po= ciao
(6.3.1 )
a nd that Y; has been measured at n values of
t; + I = t ; + I lt. One can rewrite this model as
t;
which are evenly spaced or
r =l:
r =2:
(6.3.7)
(6.3.2)
!
'
where p i/) is a polynomial in I of order j . The PJ(/) functions are chosen so
that they are orthogonal, that is, f or) a nd k = 0, I, . ..• r,
'
"
~ PJ(/;)Pk(t,)=O
E xample 6.3.1
f or) =1= k
; 1
=1=0
f or)= k
(6.3.3)
Using the evenly spaced in log Re data of Example 6.2.2. estimate af» . .. , a, and
compare the results with those in Example 6.2.2. Let r ~ 3 which causes 11 to be a
cubic in I .
~
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER E STIMAnON
U
O RmOGONAL POLYNOMIALS IN OLS E STIMAnON
l SI
The models for , 0, I, a nd 2 in terms of PI(I/) are
Solullon
Using the orthogonal polynomials given by (6 .3.5), we can find for the given data
. . (the i, ' I' Y; table of Example 6.2 .2) that n ..,7 , l :!,I, a nd X is given by
x
5
0
3
6
6
0
6
6
6
4
3
0
5
, 0:
which has the simple inverse
These are the X, XTX, and ( XTX)' matrices for r 3 and for any set of seven
equally spaced values of I . The XTy vector becomes
~ 0.07340+0.3138611+0.0357911
where the parameter estimates change in most cases as , is increased.
In the above example the X matrix for seven equally spaced observations contained only whole numbers. I f a nother column were added,
fractions would enter. T o avoid fractions. tables have been prepared
( II , 12) which contain factors ~ a nd elements +1/' F or the n '" 7 case above.
a table frequently has the information as given in Table 6.4. The component XI) o f X is related to +iJ by
5.8846]
[
P j(I/)XI/=
3.0066
 0 . 1516
Then from either ci=(XTX)  'XTy or from (6.3 .4) we find
0.84066]
0.45703
[ 0.03579
0.00070
Since the polynomials p ,(/) are orthogonal, the a /s are independent. That is,
regardless of the value of r in (6.3.2) we find ao  0.84.
The residual sum of squares as r increases can be easily calculated from the
above information. The total sum of squares, y Ty, is given by
YTy _ I
YI 0.07340+0.4570311
, 2:
12 .7968
X Ty_
c i
YI O.84066
, 1:
0]
0
0
216
YIO.84066+ 0.45703p, (II) +0.03579p2(11)
where we note that each parameter iii is unchanged as , is increased. In the
standard form given by (6.3.1) this is not true; using (6.3.7) yields
Note that X is composed of orthogonal vectors which results in XTX reducing to
the diagonal matrix,
o
84
o
YI0.84066+0.45703p,(II)
, 2:
6
f / 0.84066
, 1:
3
2
I
0
I
2
3
,  0:
Y / 10.90349564
The residual sum of squares can be found using RLS  yTy  cr TXTy .
+iJ
A;
j =O.I • ... , ,;
(6.3 .8)
i = 1.2. ... , n
'j
Table 6.4 Orthogonal Polynomials for n  7
+;0
I
2
3
4
5
6
7
DI
~
+/1
1
1
1
1
1
1
1
7
3
2
I
0
1
2
3
28
4»/2
5
0
3
4
3
0
5
84
I
4»;)
I
4»;4
3
1
1
0
7
I
6
I
I
I
6
,
i
1
7
3
._  
154
7
il
151
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
a nd Dj is related to the ith d iagonal term of XTX b y
n
D.
'Ii:'
2
)
£.J X,,= >,.2
; 1
(6.3.9)
j
Using these relations in (6.3.4) gives
m=O, I, . .. ,
>"0= I
(6.3.10)
Some o f the advantages o f the use of orthogonal polynomials a re the
following. First, n o difficulty c an o ccur in finding the inverse o f the XTX
m atrix because it is a d iagonal matrix with nonzero diagonal components.
I f r is i ncreased by one. (XTX)I is c hanged merely by the addition o f a
d iagonal element a nd the sum o f s quares RLS is r educed by
( L Y;Xir )
2
LXj~
Second, for a n umber o f cases, the terms o f XTX a re tabulated, thereby
reducing the n umber o f calculations. Third, the p arameters ao,a l , . . · a re
unchanged as additional values are estimated. T hus o ne c an easily a dd
a dditional parameters by increasing the degree o f the orthogonal polynomials. Fourth, when orthogonal polynomials are used, there is usually
less accumulation of rounding errors in the calculations o f the estimates o f
t he parameters. A d isadvantage o f o rthogonal polynomials is t hat t hey are
n ot c onvenient for sequential estimation; one c annot easily a dd a nother
o bservation. See Section 6.7. A f urther disadvantage is t hat polynomials,
orthogonal o r not, are limited in their ability to fit functions. There is a
t endency to think that orthogonal polynomials c an b e used to fit a ny single
independent variable data, provided the degree is h igh enough. T he m odel
might b e i mpractical, however, if the degree is h igher t han t he fourth o r
fifth, say. The intervals between the d ata p oints could have predicted Y
values that are quite oscillatory. F or curves t hat seem to require even
higher degrees, splines a re r ecommended (13).
6.4 FACTORIAL EXPERIMENTS
6.4.1
Introduction
In Chapter 5 we found that the best design for estimating f3 1 o f M odel 3 is
o ne in which one half o f the observations are taken a t the smallest
6."
FACTORIAL EXPERIMENTS
p ermissible value o f Xi a nd t he o ther h alf a t t he largest. T his is a
r ecommended p rocedure when the model is k nown t o b e l inear in X a nd
w hen the s tandard a ssumptions indicated b y 111111 a re valid. I n ~ther
cases there may b e several i ndependent v ariables X li' X 2i ,oo.,X" w hich are
commonly called factors. T he s election o f p rescribed values o r levels o f the
factors constitutes a design. T here a re m any possible designs discussed in
the st~tistic~1 lite~ature. T he o ne t o b e d iscussed below is the complete
factonal deSign With t wo levels for e ach f actor. F or o ne f actor a s in Model
3, the two levels are the lowest a nd h ighest permissible values o f X ..
A f actor is termed quantitative w hen its possible levels c an be ra~ked in
m agnitude o r qualitative w hen they c annot. E xamples o f q uantitative
factors a re t emperature, p H, R eynolds number, a nd time. Examples o f
q ualitative factors a re t he type o f c atalyst used a nd t he presence o r
a bsence o f a p articular additive.
6.4.2
TwoLevel Factorial Design
Consider the model
(6.4.1 )
where each x ij c an a ssume o nly t wo values, termed " high" a nd " low."
T here a re t hen 2 ' possible c ombinations o f f actor levels which constitutes a
complete 2 ' factorial design.
E xample 6.4.1
In a certain chemical process the effects of three operating variables on the overall
process yield are to be studied. These are temperature (Xii)' type of catalyst ( xi 2),
a nd pressure ( xi 3)' Over the ranges of temperature and pressure studied it is known
that the yield is linear in both temperature and pressure and thus two levels of each
can be chosen. Also two different catalysts are chosen. The following levels are
selected:
Low Level
Temperature
Type of catalyst
Pressure
Obse~~
High Level
4200K
B
2 .2x I WN/m2
that two factors, temperature and pressure, are quantitative and the
remalDlD~ fa.ctor,. type of catalyst, is qualitative. A 23 experimental design for this
example IS given ID Table 6.5. When each factor is set at a certain level a nd a test
is performe~. the result is t~rmed an experimental run. T he runs are not'necessarilY
performed ID the order given; rather, the order of the actual tests should be
random.
"
~.
.W
...
c:1) ••
254
C HAPTER' MATRIX ANALVSIS FOR LINEAR PARAMtTER E STIMAnON
Table 6.~
Run
Number
A I I Experimental Design l or Example 6.4.1
Catalyst
Type
( K)
I
2
3
4
s
400
420
7
8
2.0x 10'
2.0x 10'
2.0x 10'
2.0x 10'
2.2x 10'
2 .2x 10'
2.2x 10'
2.2x 10'
420
6
Note that X given by (6.4.2) contains orthogonal columns. This matrix
can b e conceived of as representing a particular design for a twolevel
complete factorial design for three factors. F or o ne factor the upper left
2 X 2 matrix in X is the X matrix for a complete factorial design. F or a
twofactor, twolevel complete factorial design the design matrix, if interactions are assumed zero, is the upper left 4 X 3 matrix. F or more than three
factors, the pattern of the elements of X in (6.4.2) can b e repeated. F or
example, if there are four factors a fifth column would b e a dded which
would b e a vector o f eight  I's followed by eight + I 's; t he second
column would continue the  I, I,  I, I pattern for a total o f ~  16
terms, etc.
Pressure
( N/m 2
A
A
B
B
A
A
B
B
Temperature
420
400
420
400
400
6.4.4 Inclusion 01 I ntendloa Terms In tile Model
6.4.3 Coding the Factors
I f X is composed of column vectors that are orthogonal, XTX is a diagonal
matrix. Hence the analysis can proceed in a similar manner as for orthogonal polynomials. A set of orthogonal column vectors can be o btained by
coding. For quantitative factors the coding is typically given by
actual temp  0.5 (high temp + low temp)
h t·gh temp  Iow temp
Coded t emp2
which produces either + I or  I. The qualitative factors are also assigned
values of + I a nd  I; for example, catalyst A could be signified by  I
a nd catalyst B by + I . For the case of three factors such as in Example
6.4.1 the X matrix for the 2) factorial design can be given as
XIO
XII
x I2
X I)
I I I I
X
I
I
I
I
I
I
I
I I I
I I I
I I I
I I I
I I I
I I I
III
(6.4.2)
In terms of (6.4.1) a nd Example 6.4.1, the first column is a set of
coefficients of flo, the second for coded temperature, the third for the
coded catalyst type, and the last for coded pressure.
T he complete 2 ' factorial design can b e used in cases for which interaction
o r cross product terms are included in the regression model. An example is
(6.4.3)
where fllxllx;2 is a n interaction term. The X matrix remains orthogonal.
F or the model given by (6.4.3), the design matrix X can b e
x[ i
I
I
I
1
I
I
1
1
: 1
I
I
(6.4.4)
where the fourth column contains the products x/ lx/ 2; X /I is given in the
second column and x12 in the third.
In such a design as indicated by (6.4.4) there are four observations and
four parameters and thus the predicted values Y/ exactly equal Yj" Hence
no estimate of 0 2 c an b e obtained. I f each run is replicated the same
number of times, then 0 2 c an b e estimated. One should be careful to
perform the experiments in a randomized order.
6.4.5 Estimation
We have designed the orthogonal nature o f X for use with ordinary least
squares. The OLS estimator is given by
(6.4.5)
Since X is orthogonal and contains only
+ I o r  I terms, the inverse of
256
C HAPTER' MATRIX ANALYSIS F OR LINEAR PARAMETER ESTIMATION
XTX is easily found. S uppose t hat t here a re f our p arameters ( three f actors)
so t hat X is given by (6.4.2) a nd t hen
6.4 F AcrORIAL EXPERIMENTS
plus or minus one as indicated in XT. T he resulting predicted equation is .
Y 54+ 11.5zll2.5za+0.75zI3+0.5zIIZ,'2+4.75z/IZ;3
I
XTX = d iag [ 8 8 8 8 ] = 81
(XTX)  I =
 0.25zaZ13 + 0.25z;IZ2Z;3
,'
.! I
8
Example 6.4.2
Estimate the parameters using OLS for the experimental design given in Table 6.5
and the Y; values given by
Since there are eight observations and eight parameters, the calculated Y values
I
and the measured values Y; are equal; hence the sum of residuals is zero.
( b) T he effect o f each term in the model can be examined independently
because the design is orthogonal. This is an important benefit of such designs.
From (6.2.14a,b) the reduction in the sum of squares is given by
7
bTXTY=bTXTXb=8bTIb=8 ~
y T =(49 62 44 58 42 73 35 691
where Y; is in percent.
(0) Estimate the parameters in a linear model permitting interaction using OLS.
( b) Suppose that the standard assumptions designated 11111011 apply. (These
include normal. independent, constant variance errors.) Also assume that an estimate of the variance of Y; is .r 2 = 2.5 which we shall consider to have come from a
separate experiment and to have 20 degrees of freedom. Using the F statistic find a
"parsimonious" (in terms of the parameters) model.
Solution
( 0) The complete model including all the interaction terms can be written in terms
of the coded factors z ij as
11;= f3o+ f3l z jJ + f12 Zi2+ f13 z;3+ f34Z; IZ;2 + PSZ/IZ,l
where Z;I' z,2. and Z;l correspond respectively to coded temperature. catalyst. and
pressure. The X matrix then becomes
1n teraction
terms
Main effects
I
I
I
X=
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
bI
k II
I
I
I
I
I
I
I
I
~
where XTX81 is used. The sum y Ty is 24624. The residual sum of squares ( R),
reduction in R, a nd the F statistic are given in Table 6.6. The F statistic is (~R/I)
divided by .r 2( = 2.5). At the 5% level of significance, F.9S(l,20)  4.35. (.r 2 was stated
to have 20 degrees of freedom.) We have a n indication that only the first, second,
third, and sixth parameters are needed since only for these is F > 4.35. The first
parameter is Po. the second is for coded temperature, the third is for the catalyst,
T able 6.6 S um o f S quares and F ratio for Example 6.4.1
R, Residual
No. of
I
2
3
4
5
6
7
8
Sum of Squares
~R
F
1296
238
188
183.5
181.5
I
0.5
0
Parameters
23328
1058
50
4.5
2
180.5
0.5
0.5
9331.2
423.2
20
1.8
0.8
72.2
0.2
0.2
a nd the sixth is for the interaction between temperature and pressure. Since the
pressure enters through the interaction term, we also include the linear pressure
term (fourth parameter). We assume that these provide a parsimonious set of
parameters. The model then is
YI 
54+ 11.5zll  2.5z;2 +0.75z;3 +4.75zII Z
13
or, in terms of the original factors,
which contains eight orthogonal columns. The inverse of XTX is (1/8)1 and the
XTy terms are obtained simply by taking sums of the Y; terms each multiplied by a
251
CHAPTER 6 M AnlX ANALYSIS FOR LINEAR PARAMETER f S11MAnON
... here T is temperature in K. C  I for catalyst A. C I for catalyst B. a nd P is
pressure in N/m2. This expression can be simplified to
Y 3656 .5  8.825T 2.5C0.OOI94P+4.75 x I O'TP
The predicted values of
T able 6.7
Run No.
1
2
3
4
5
6
7
8
Y are given in Table 6 .7 along with the residuals.
I
P redicted V alues a nd R esiduals f or E xample 6.4.1
Observed
Y; (%)
Predicted
YI (%)
Residual
Y ; Y, (%)
49
62
44
58
42
73
35
69
49
62.5
44
57.5
41
73.5
36
68.5
0
 0.5
0
0 .5
1.0
 0.5
I
0.5
U
M AXIMUM U KELIHOOD F SnMATOR
6.5 MAXIMUM LIKELIHOOD ESTIMATOR
The treatment of the maximum likelihood (ML) method by some authors
leaves the impression that the M L method applies only to cases of
noncorrelated errors. In the ML method discussed below, however, correlated errors may be present. Also the errors may have nonconstant ,variance. With only these two exceptions, the standard assumptions are used;
these are denoted 111111. Note that the errors are assumed to have a
normal probability density. I t is noteworthy that ML estimation assumes a
great deal of information regarding ~. This is in contrast to OLS estimation.
65.1
M L E stimation
F or the above assumptions the ML parameter estimator is derived by
minimizing the ML loss function given by (6.1.67b). F or the linear model
11  XIJ we then seek to minimize
(6.5.1)
6.4.6 Importance of R eplicates
In the above example the significance of each term in the fitted model was
tested using an "external" estimate of the pure error variance. A superior
estimate of the pure variance may be obtained by replicating some or all of
the runs. In order to preserve the orthogonal character of a 2 ' design it is
necessary to repeat all the 2' runs the same number of times. It is
important. however. that the replicated run is a genuine repeal. Repeatedly
measuring the same response does not constitute a genuine replication. I t
is better to interpose at least one change in a factor level before a run is
repeated to obtain an independent response. A random choice of the order
in which the runs are performed is important in this connection.
6....7 O ther E xperfment Designs
Many other experimental designs are possible in addition to a complete
twolevel factorial. For example, if there are two factors, and one is desired
at two levels and the other a t three, then we would use a 2 X 3 factorial
.
design. The XTX matrix can also be made diagonal for this case.
Other possible designs are made up of treatments which are selected
from among the treatments of a complete factorial design, the selection
being made in such a way that the possibilities of eliminating certain
parameters are tested in a systematic manner which preserves the possibility of making the remaining tests (14).
with respect to the parameter vector IJ. Using (6.1.26b) and (6.1.30) as in
Section 6.2.1 results in
I
bML c (XT",IXfIXT",ly
By using V 
I
(6.5.2)
Xp + ~ in (6.5.2) and taking the expected value of bML we get
(6.5.3)
o r bMl is an unbiased estimator of p. The covariance matrix of bMl can be
found as in Section 6.2.3. Let
bMlA(XP+~)
(6.5.4)
Then from (6.2.10) the covariance of b Ml is
(6.5.5)
Observe that the maximum likelihood estimator is exactly the same as
given by the GaussMarkov theorem (Section 6.1.7). Thus the maximum
160
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
likelihood method produces a minimum variance unbiased linear estimator
for the linear model a nd the s tandard a ssumptions given. T he o nly additional assumption required by M L e stimation a nd n ot b y the G aussMarkov theorem is the knowledge t hat the errors have a normal probability
density.
The covariance of a collection of predicted points o n the regression line
o r s urface represented by Y= X1b ML is o btained in the same manner. as is
(6.2.l2a).
161
6.5 MAXIMUM LIKELIHOOD E SllMAT OR
Then the ratio is
which for X i s o I reduces to
(6.5.6)
E xample 6.5.2
This expression simplifies to t hat given by OLS. (6.2.12b). if 1#t=021.
T he d ifference between LS a nd M L e stimators given by (6.2.5) a nd
(6.5.2) is clearly the presence o f t he I#t  I m atrix in (6.5.2) b ut n ot in (6.2.5).
(Recall t hat I#t  1 is the inverse o f the covariance matrix of the measurements errors.) I f I#t = 0 21. the estimators are exactly the same. When I#t
deviates considerably from the 0 21 c ondition. the LS a nd M L e stimates c an
b e quite different. T he LS e stimators c an have variances that a re a rbitrarily larger than those given by m aximum likelihood. O ne case in which
this can o ccur is when
(6.5 .7)
a'nd the o~ terms are quite disparate in m agnitudes such as the set of values
o f 0.1. I. 10. 100. 1000. a nd so on.
E xample 6.5.1
For . f I given by (6.5.7) derive an expression for the ratio of the variance of hLS to
the variance of h ML for the model TI, = {JX; and simplify the result for X; = I.
Solution
From (6.2. 11) the variance of b LS is found to be
0.
Investigate the solution in Example 6.5.1 for TI = {J and
1000. etc.
01
= I.
02
O J""'
100.
S olution
For n =2. the ratio is ( 1/2)2 (101) ( 1.01)=25.5. For n =3. it is ( 1/3)2 (10101)
(1.0101)= 1133.7. In general the approximate result is 1Q21"I)/(.99n)2 which
increases rapidly with n . Hence in such cases the M L estimate is far superior to the
O LS estimate.
E xample 6.5.3
Investigate the solution of Example 6.5.1 for X i= I. 2. 3. and 4 and for
100. and 1000 respectively.
0 ;=
I. 10.
S olution
The ratio for n = J, 2. 3. and 4 are respectively I. 16.68.480.1. and 18610. The ratio
does not increase as rapidly for the model TI = {J but still indicates the superiority of
maximum likelihood estimation compared with ordinary least squares for unequal
error variances.
I f the m easurement e rrors a re c orrelated. the I#t m atrix is n ot diagonal.
S ome c ases t hat p roduce n ondiagonal I#t m atrices a re those involving
autoregressive a nd m oving average errors. See S ections 5.13 a nd 6.9.
Typical terms in XTI#t1X a nd XTI#t1y for the single response case a re
; ,j= 1.2 . ... , p
'i.
= 10.
(6.5.8a)
while (6.5.5) yields
V (b ML )

[
f
k I
I
Xl
2
0 t ]
; = I . ....p
( 6.5.8b)
· 262
CHAPTER' MAllUX ANALYSIS .OR LINEAR PARAMETER ES11MAll0N
where ~ is the k l c omponent of ~  I. I f ' "  I is given by (6.5.7) the double
summati~~s above can be replaced by single summations as in
I .}
t.2 .....p
(6.5.9a)
.~
1= 1.2... .• p
6.5.1 Estimation of
0
(6.5.9b)
6.5 MAXIMUM LIKELIHOOD ESnMATOR
N ote t hat the physical o r s tructural parameters. 11. c an be e stimated
directly from (6.5.13) without knowledge o f 0 2, Using these estimated
values permits the estimation o f 0 2 from (6.S.12) as
d~L 
0'
*
R adbMd
, f2.
(6.5.10)
(6.S.IS)
A n advantage
the m aximum likelihood method is t hat it c an p rovide a
direct method for estimating 0 2• T he e stimator a~L is u nfortunately biased.
however. An unbiased estimator for 0 2 is
1
One advantage of the M L formulation is t hat i t c an provide a direct
method for evaluating certain " statistical" p arameters such as the variance
of the errors. F or example. assume that the covariance matrix is k nown
except for the multiplicative constant 0 2 ( assumptions 111011).
*[yTO'Yb~LXTOly]
 'Radb . .L)
n p
on
(6.5.16)
See Section 6.8.3 for a derivation. Since (6.5.16) is unbiased. it is recom2
0 0. A s ummary o f t he basic M L
mended for estimating 0 2 when
e stimation equations is given in Appendix B.
.
T heorem 6.2.4 regarding a n F d istribution can also be s tated for the M L
a ssumptions 111011 for a linear model. T he statistic
+=
R~dbl.Mt< (11).11:] 
where 0 is completely known b ut (72 is u nknown. We start the analysis
with the logarithm of the likelihood function as given by (6.1.61a). with
R~dbl.ML.b2 . Md
(6.5.11)
n N)
(6.5. lIa)
where
(6.5. l ib)
T ake the derivative with respect to o l a nd with respect to the parameters
Pl . ... ,P, for t he linear model ,,= XII· Set II = bMl a nd (72 = a~l; t hen
setting the derivatives equal to zero gives
(6.5.12)
; = 1.2, . ...p
(6.5.13)
where
(6.5.14)
Example 6.5.4
T he t hermal conductivity, Ie. o f A rmco iron has been measured using a new
method discussed in Beck a nd AIAraji (15J. T emperatures between 100 a nd 3 62°F
(311 a nd 456 K) were covered along with power i nputs between 272 a nd 602 W as
given in Table 6.8. T he t emperature a nd p ower c an be c onsidered to be measured
so much more accurately than is the measured Ie t hat the error will be a ssumed to
b e in the Ie m easurement only. Also it is r easonable to assume additive errors.
Comparison o f values o f m easurements from different laboratories frequently
indicate nonzero mean for t he errors. Nevertheless a zero mean value for f is
a ssumed.
I
E ach Y, value given in the last column o f T able 6 .8 is the average o f four values
o f t he conductivity. T he values a re given to varying significant figures because the
original values were given to four figures a nd s ometimes the average yielded four
digits a nd o ther times more. Notice that there are approximately two p ower levels.
a bout 275 a nd 550. T he e stimated s tandard d eviations found using the f our
observations for e ach P a nd T for the smaller powers is a bout 0 .278 whereas for the
higher power the estimated s tandard d eviation is 0.16. smaller b y a f actor o f a bout
2. H ence the variance for the evennumbered runs of Table 6.8 is a bout f our times
those for the oddnumbered ones which a re for the larger powers.
264
CHAPTER 6 MATRIX ANALYSIS F OR LINEAR P ARAMETER ESTIMATION
Table 6.8
6.5 MAXIMUM L IKEUHOOD ES11MATOR
Data for Thermal Conductivity Measurements of
Armco Iron for Example 6.5.4
K
44
Run
No.
Temp.
(OF)
Power
(W)
Measured Thermal
Conductivity ( Btu/hrftOF)
,>
I
100
2
3
4
5
6
7
8
9
10
90
545
276
602
275
538
274
550
274
522
272
41.60
42 .345
37 .7875
39.5375
36.4975
37.3525
35 .785
36.36
34 .53
33.915
>
161
149
227
206
270
247
362
352
.c.,
...
;,
..

'" = (72 diag[ I
4 I 4 I 4 I 4 I 4)
where (72 is unknown. In terms of the parameter estimates the statistical parameter
need not be known. An estimate of (72 is given by using the standard deviation
values quoted in the example.
A plot of the data is shown by Fig. 6.3 . I t a ppears that at least a second degree
curve in T may be required and that P may not be needed. From physical
considerations it is also expected that k is not a function of P . T here are many
possible models and a number of ways of proceeding. Some possible models are as
follows :
( 72
:~
t ~
t 
1.
2.
3.
4.
5.
6.
k =/1I '
k=/1I+/12 T.
k = /1 1 + P P.
2
k=/1I+/12T+/13P.
k =/1I+fl2T+/13 T2 .
k=/11 + /1 2T+ /13T2+ /1.T 3.
One way to build the model is to start with the simplest a nd add o ne term a t a
time. As one progresses, the need of adding terms can be assessed by examining the
X About
•
550w
~
~
About 275w
70
40
tJ8
65
::::I~
. ."
C . ..
c~
u
~
~
~
>
L 36
..
:E
U
::::I
.c.,
c
u
~.e:
.EO::
.. '
.....
60
X
•
~ e 34
l
The objective is to find a parsimonious model of thermal conductivity versus
temperature T a nd power P using maximum likelihood. The errors f , are assumed
to be independent and to have a normal probability density.
The assumptions mentioned above of additive. zero mean. independent. a nd
normal errors in Y, but errorless 7; and P, c an be designated 11011011. We c an
write the covariance m atrix", as
•
42
32
Solution
75
0
200
100
300
T emperature,
l
GI
.e:
~
400
OF
F 1pre63 T hermal conductivities measured for Armco iron a nd used in Example 6.5.4.
reduction in the R~l value and by utilizing the F test based on Theorem 6.2.4.
With tit as given above the parameters were estimated using (6.5.2) with the
components given by (6.5.9). The mean square. $ 2, is found from (6.5.16). The
results o f the calculations including the F statistic ( ... ~R / $ 2) are tabulated in
Table 6.9. These F values can be compared with those given by F.9s(I,n  p) which
are 5.99, 5.59, 5.32, and 5.12 for n p=6, 7, 8, a nd 9, respectively. From a
comparison of these F values with those in Table 6.9, it appears that the power
factor, PI' need not be used. Also Theorem 6.2.4 leads us to select Model 5 because
Model 6 has an unnecessary additional term since F =5.729<5.99=F.9s(I,6).
However, since these values o f 5.729 a nd 5.99 are so close, it might be that another
Table 6.9 Results of Calculations o f R and F for Example 6 5.4
Model
No.
I
2
3
4
5
6
No. of
Degrees of
Parameters Freedom
I
2
2
3
3
4
9
8
8
7
7
6
R~l
40.0078
4.47506
39.91282
4.18133
1.13092
0 .57852
Mean
Square, $ 2
4.445
0.5594
4.9891
0.5973
0.1616
0.09642
~R"
F
1973
8769.36
35.5327 63 .52
0.09495 0.01903
0.29373 0.4918
3.34413 20.70
0.55241
5.729
"~RI  yTy  Rlk I. ~R2  RllL.l RllL.2' ~R3  Rrlk J  RaL.3' ~R4  RaL.2RaL.4' etc.
266
.. ' <
~
~
6.5 MAXIMUM L IXEUHOOD ESTIMATOR
CHAPTER 6 MATRIX ANALYSIS r OR LINEAR PARAMETER ESTIMATION
.•.
Y 47.51910 .07084T+9.669x
.5
z:
1 O'T 2
.

e st. s .e.(Y f )
•X•
•
) •25
co
~
a nd the estimated covariance matrix of the estimated parameter vector is
450
400
350
300
Ito.
~
N
T e.perlwre. K
....•
experiment or a larger sample size might show significance. that is. indicate M<><,fel
6 in preference to Model 5.
F or Model 5 the regression equation for Ie is (in English units)
0
1.0
X
0.5
W
0
....
.
.>
 6.J33x 1 0 2
6 .080 x 1 0 4
6.741
est. cov(b Ml )  ,,2
[
.
symmetric
.
!
1.242 x 1 0 ]
 1.282 x 1 0 6
2.796 x 1 0'
.75
"
M
II
where , ,20. 1616. T he variances of b ,.Ml' b 2. Ml • a nd b3• Ml a re the diagonal terms.
The estimated standard errors (std . dev.) for them are
est. s .e.(b,.Ml) [ 0.1616(6.740 )
est. s.e.(b 2. Ml ) 
'/1
6.5.3
0.00991
100
200
Tetnperlture. OF
400
300
Expected Values o f SML and RML
E (SML)T he sizes of the estimated standard errors of the bj.Ml·S d o not reveal the relative
importance of the terms in describing Ie. T he estimated standard error of b ,.Ml is
1.04. whereas one standard error in b,. Ml gives a value of 2.6 for the T2 term of Y
for T 3S00F. Hence the effects on Ie due to uncertainty in bl,Ml a nd b ,. Ml a re
about the same.
E[ (Y _ XII)T.,,I(y 
 tr[
XII)] 
E[ eT.,,le ]
.,,I cove ] tr[ .,,1.,,]  n
(6.5.18)
where n is the number o f observations.
In finding E (RMJ. let
Example 6.5.5
RML  (YXbML{.,,I(y  XbMd
Y/
:I
:...
!
II
T he expected value of SML (parameters not estimated) given by (6.5.1)
using (6.1.50) c an be written w ith." being known as
est. s .e.(b,. Md2.13 x 1 0'
Find the estimated standard error in
•
X
0
1.0
F lpre 6.4 Residuals o f thermal conductivity values for Example 6.S •••
 1.04
1
[0.1616(6.080 X 10 4 ) ] '/ _
0.5
• .25
> .5
4
II!
~
(6.5.19)
for the data of Example 6.5.4.
and use A defined in (6.5.4). Then (6.5.19) c an be written as
Solution
T he estimated s tandard error of Y/ is found by taking the square root of the
diagonal terms o f (6.5.6) with P Ml replaced by est. f ov(h,..d which is displayed just
above:,. F or a rbitrary T, a n expression for est. JI (Y/). the ith diagonal term o f est.
cov( Y/). is
est. JI ( Yj )  [ PT. + 2 T,Pf2 + T / (P12 + 2 Pf,) + 2 T/P1J +
since
r,4pfJ ],,2
Introducing Y  XII + e in (6.5.20a) yields
where PJ  P"Ja 2. F rom above. note that a typical P: is P f.6.741. Also , ,2_
0 . 1616. The est. s.e.( YI)' which is the square root of est. JI( Y,), is displayed as the
curve in Fig. 6.4.
RML:a e T.,,I(I XA)e
I
(6.5.21)
261
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
6.6 LINEAR MAXIMUM A P OSTERIORI ESTIMATOR (MAP)
where (6 .5.20b) and the following relation are used
Solution
Since there are 10 o bservations ( n"" 10) a nd t hree parameters, n p7 degrees o f
freedom for which a table o f t he X 2 d istribution gives
(6.5.22)
From (6.5 .21),
R ML
R ML =
can be given by
P (X2 < 14.07) = .95,
£ TI/t 1/2[ 11/t  1/2X(X T  'xf IXTI/t  1/2]1/t 1/2£ (6.5.23)
I/t
This means that only in 5% o f similar cases would R ML  X2 exceed 14.07. Also
only in 5% o f the cases would R ML b e less t han 2.107. Since 2 .167< 13 .52< 14.07,
we h ave a n i ndication a t the 10% level o f significance t hat the model' a nd y, a re
satisfactory.
Then taking the expected value of (6.5.23) and utilizing (6 .1.50) we find
E (RMd = tr{ [11/t1/2X(X T
I/tIX)  I XTI/t  I/2] cov(I/tI/2£)}
= tr(l)tr[ (XT",  IXfl(XTI/t  IX)]
(6.5.24)
The above expression for E (R ML ) can be utilized to check the validity of
the model and the I/t matrix. I t is important to have a check because in
many cases there is some uncertainty in ' I o r in I/t . F or the assumptions
used above (111111), R ML has a X2 distribution with n  p degrees of
freedom; this then can provide a statistical test. Table 2. 14 gives the X~
value for which p(X2<X~) is a specified value. For example, there is a
column of values of X2 for which the probability of X2 being less than X~ is
equal to .95, or
p (x 2 <xn=·95
(6 .5.25)
Example 6.5.6
F or a second set of data similar to those given in Example 6.5.4, ML estimation
has been used. There are 10 s eparate measurements of the thermal conductivity as
a function o f t emperature a nd power input. T he model used is the one found from
the preceding analysis,
a nd t he errors in k a re assumed to be independent a nd the y, matrix based on the
results o f Example 6.5.4 is
y ,=0.16diagI14 I 4 I 4 I 4 1 41
F or the analysis based o n these data, this model a nd y, yielded R ML = 13 .52.
Assume that the standard assumptions indicated by 11011111 a re valid a nd
investigate the "goodness o f fit" using the X2 distribution.
1
r
I
An important source of error which could cause R ML to be either too
small o r too large (compared to an interval based in the Xl distribution) is
an incorrect choice of I/t. F or example, I/t might be assumed to be diagonal
while the measurement errors are correlated causing I/t to be nondiagonal.
For this reason it is always advisable to investigate if the residuals suggest
correlation among the errors. Such correlation" can be inherent in the
measurements themselves or a result of selection of an incorrect model.
Often the model is dictated by physical considerations. In other cases
there may be several models that fit equally well.
6.6
LINEAR M AXIMUM A P OSTERIORI E STIMATOR ( MAP)
I
;.
6.6.1
Introduction
Maximum a posteriori estimation utilizes prior information regarding the
parameters in addition to information regarding the measurement errors.
Inclusion of prior parameter information can have the beneficial effect of
reduction of variances of parameter estimators.
In Section 5.4 two MAP cases are considered. The first is for random
parameters. One convenient way to visualize this case is to think of some
product that is produced in "batches," each of which is different. The prior
information is relative to the mean a nd variance of these batches. The
measurements Y, however, deal with a sp<lcific (new) batch.
In the second MAP case, the information is visualized as coming from
subjective informationbelief of an investigator regarding the parameters.
This could include mean and variances. In this case the parameters are not
random, for example, a parameter o f interest might be a fundamental
constant such as the speed of light. The knowledge about the parameters is
probabilistic. This probabilistic information leads us to view the parameters as having probability distributions, similar to the random parameter
case.
; . ' 1.
<•
•
..
;
m
CHAPTER 6 MATRIX ANALYSIS FOR UNtAR PARAMETtR ES11MAnON
In both cases or views the derived parameter estimators are formally the
same but the meaning of the various terms may be different. In order to be
succinct the case of random parameters is treated first and the results for
subjective information are given without derivations.
U
UNtAR MAXIMUM A POSTERIORI ES11MATOR (MAP)
211
When the only parameters of interest occur in IJ, not in 4t. for example,
maximizing J( IJ IV) can b e accomplished by minimizing SMAP'
Following the usual procedure of taking the matrix derivatives with
respect to IJ. etc., one finds using (6.6.4b). (6.1.33), and (6.1.26b),
6.6.2 Assumptions
Let us consider the case of MAP estimation involving a random parameter
vector. We shall investigate estimation for the s tandard assumptions denoted 111112. Each assumption except for the last is the same as used for
M L estimation in Section 6.5. Mathematically the assumptions can be
given as
(6.6.la. b)
Y E(YIIJ)+t.
cov( IJ, t )  0
V (Xy ) O.IJN ( "/J' V/J)'
(6.6.1c.d.e)
(6.6.5)
Sblving for the estimator b MAP yields
~APPMAP[XT"''Y+Vi'I'/J]'
p~ip=xT"''X+Vi' (6.6.6a,b)
Notice the definition of P MAP given by (6.6.6b). By a dding a nd subtracting
2X T",IXI'/J from (6.6.5), the additional expression o f
where "', I'/J' a nd V/J are known. Note that the distributions fo'r both IJ, the
random parameter column vector, and t are normal.
6.6.3 fAtlmatlon (nyolylng Random Parameters
In MAP estimation the estimated parameter vector is the one that maximizes the probability density f<YIIJ) . This density is related to f ( IJIV) a nd
that for the random parameter, f ( IJ), by
f (IJIY)
f(YIIJ)f(lJ)
(6.6.6c)
c an be given for bMAP. In this form the second term on the right side can
be considered as a correction to the known mean value of the random
parameter vector I 'll; it is a result o f the new information, Y, for a given
" batch." F or the case of Model 2. 11, fJlX" (6.6.6 a, b,c) reduces t o
(5 .4. I I a, b). Since Y XIJ+r, E (IJ)I'/I' a nd E (,)O. use o f (6.6.6c)
shows that the expected value of bMAP is
(6.6.2)
(6.6.7)
which is a form of Bayes's theorem. Recall that f<YllJ) is the same density
used in maximum likelihood estimation; for the given assumptions it is
given by (6.1.66). F or the conditions indicated by (6 .6.1 d), f ( IJ) is
and thus bMAP is a biased estimator. Even so it is recommended whenever
appropriate owing to the reduced error covariance matrix as shown below.
An example o f M AP estimation is given in Section 6.7.4 in connection with
the sequential M AP method.
The covariance matrix of interest is that of bMAP  IJ as explained in
connection with (5.4.12) and 5.4.13); bMAP  IJ is the difference between
the estimator and the parameter vec'.or for the particular batch. Utilizing
(6.6.6a) we c an write
f ( IJ)  (2,.,.)  ,/lIV / JII/l exp[ 
fey)
!
(IJ  I'/J) TV i I( IJ  I'/J)
1 (6.6.3)
where p is the number of parameters. The probability density f(Y) need
n ot be explicitly given since it is not a function of IJ·
T he maximum of J( IJ IV) given by (6.6.2) occurs at the same parameter
values as does the maximum of its natural logarithm,
I n[I( IJIY)]  
!
[ (n + p)ln2,.,. + Inl"'l + In IV/JI + SMAP]  lnf(Y) (6.6.4a)
SMAP=(Y 1I)T",1 (Y 11) + ( IJ  I'/J{ V i I( IJ  I'/J) (6.6.4b)
bMAP  IJ  P MA,xT",  I(XIJ + e )  IJ + P MAPV i II'/J
 (PMA,xT",IX  () I J+ P MA,xT",I,+ PMAPVi Ip./J
(6.6.8a)
(6.6.8b)
Now taking the covariance o f b MAP  IJ, the error covariance matrix o f
172
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
b MAP' yields
(6.6.9)
where (6.1 .40) is used . Expanding the right side o f (6.6.9) produces
COV(b MAP 
",
P) = PMAPX T  IXV /lXT",IXP MAP 
PMAPXT",IXV /l
 VpXT",IXP MAP + VP + PMAPXT",IXPMAP
=
(6.6. lOa)
V/iXT",  IXPMAP+V/l
= _V/lXT",IXPMAP+V/lPM~PPMAP
(6.6. lOb)
6.6 LINEAR MAXIMUM A P OSTERIORI ESTIMATOR (MAP)
273
u sed regarding Y a bove, the assumptions are denoted 111113. T he
e stimator a nd c ovariance matrix a re t hen the same as derived in Section
6.6.3. Now, however, IIp a nd Vp have different meanings.
Let us briefly consider certain implications o f t he M AP e stimators for
subjective p rior i nformation. Suppose first t hat the prior information is
very poor. This implies t hat t he Vp m atrix h as large diagonal components.
I f Vp h as large diagonal components, b MAP given b y (6.6.6a) approaches
the b ML expression given b y (6.5.2) a nd COV(b MAP  IJ) a pproaches
cov(b ML ) given by (6.5.5). Hence a c omputer p rogram developed for M AP
e stimation could be also used for M L e stimation (for the assumptions
denoted 111111). T he s ame program c ould be also used for O LS e stimation by letting Vp have large diagonal c omponents a nd b y r eplacing'" b y
2
2
0 1. I f ' " w ere equal to 0 1 a nd all the s tandard a ssumptions were valid, we
would have
b MAP = b ML = bLS
= Vf/[  XT",IX+XT",IX+Vil]PMAP
a nd thus the covariance of b MAP 
(6.6. lOc)
P is
I f the s tandard a ssumptions a re n ot valid, the same estimator for O LS
given by (6.2.5) is o btained from (6.6.6) b y simply replacing " ,I b y I a nd
b y s etting IIp = 0 a nd V i I = 0; in this case, COV(b MAP  P) will n ot yield the
correct relation for cov(bLS)' however.
In (6.6.6) all components o f b MAP c an be uniquely found if
(6.6. 11)
This is valid for the stated assumptions which are denoted 111112. N ote
t hat the effect of the random parameter behavior disappears as XT",  IX is
m ade large. which usually results from a large n umber o f m easurements.
A summary of the M AP e stimation equations is given in A ppendix B.
6.6.4
I
M IIXT.'·IX+VII~O
pr
Ip AP .  . , .
(6.6.12)
Hence for M AP e stimations it is n either necessary that IXTXI~O n or t hat
In both M L a nd LS e stimation,lJ c annot b e estimated if IXTXI = 0
which is the case if n < p o r if there is linear dependence a mong the
sensitivity coefficients. T he f act t hat b MAP e stimates c an b e o btained for n
as small as I is used in the sequential method discussed in the next section.
n ;> p .
Estimation with SUbjective Information
C onsider now the case o f c onstant parameters with subjective information
a bout t hem . This information c an be expressed in probabilistic terms if we
imagine that the parameters have probability distributions. that is, are
random. A close analogue can be drawn with the random p arameter case.
Now IIp becomes the prior p arameter v ector based o n belief a nd V/l is the
covariance matrix for a normal distribution . T hat is , o ur knowledge o r
belief regarding f1 is expressed by the probability density. f ( (1) =
N (Il/l' VII)' We can conceive that this knowledge has been developed from
investigation of many tests on similar " batches ," from literature values o r
from other sources. This information is i ndependent o f t hat obtained from
the new information contained in Y. W ith these assumptions a nd those
6.6.5
Uncertainty In '"
O ne m ajor difficulty in using M L a nd M AP e stimation is t hat the s tandard
a ssumptions may n ot b e valid. I n p articular, the '" m atrix may n ot b e
known. T here a re certain checks a nd c orrections that c an b e used when
there is some uncertainty in ",.
O ne check involves the expected value o f SMAP which is c alled the p rior
value because the new d ata a re n ot y et used to o btain p arameter e stimates.
Using (6.1.50), E (SMAP) b ecomes
(6.6.13)
Now we also know for M L t hat E (SML)=n a nd E (RML)=np . H ence it
27~
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR P ARAMttER f S11MAnON
. .. is reasonable to assume that the expected value of RMAP' the minimum
value of SMAP' is about equal to n. Again, if n "»p, the difference between
n + p a nd n is relatively small. Hence for many cases RMAP c an be
considered to have a X2 distribution with n degrees of freedom. (This is
true provided the assumptions designated 111112 or 111113 are valid.)
 A related example is given by Example 6.5.6.
2
Suppose next that there is uncertainty in ~; let ~ be equal to 17 0, where
1
17 2 is unknown a nd 0 is known. Both 17 a nd p c an be estimated by
maximizing f ( PlY) with respect to 17 1 to find its MAP estimator and with
respect to P to find the mode of the distribution of the p. After taking the
1
derivative of I nf( PIV} given by (6.6.4) with respect to P a nd 17 a nd setting
2
the resulting equations equal to zero where P= b MAP and 17 = O~AP' we
o btain the set of p + I equations given by (6.6.14a, b),
OM~P[  XTOIy + x TnIXbMAP]  Vi lI'p + V i IbMAP=O
(6.6.14a)
n(o~AP(I_(O~AP(lQ~AP=O (6.6.14b)
Q~AP= (V  XbMAP)TOI(y  Xb MAP )
(6.6.14c)
Unfortunately these equations are no longer linear in O~AP a nd b MAP '
Nevertheless we c an solve them to obtain
(6.6.15b)
The nonlinearity has not heen removed, hut two different approaches are
apparent from these equations. First. if it happens that a~APv til is known.
the nonlinearity disappears and a direct solution is given for b MAP a nd
O~AP' This case might occur when two (or more) sets of data are analyzed
separately but there is a common 17 2 for all the data. Second. an iterative
procedure is suggested by (6.6.15). I f an initial guess t or a~AP is available.
it is used in (6.6.15a), and then an improved value of a~AP is found using
(6.6.15b). whereupon this value is used in (6.6.15a). etc .• until the changes
in b MAP a nd a~AP are negligible. I f the initial value o f a~AP were zero, the
2
first estimator for P would be b ML. At the other extreme of a +00.
b MAP = I'p. In this iterative procedure. note that some matrix products.
namely. those in braces in (6.6.15). need be evaluated only once.
2
For another case with uncertainty in ~. see Problem 6.29. where ~ = 17 0
a nd V II " " (f2V with (J2 unknown and n a nd V known.
6.1 SEQUEN11AL E SllMAnON
275
6.1 S EQUENTIAL ESTIMATION
6.7.1 Introduction
The sequential estimation procedures developed in this section refer to
continually updating parameter estimates as new observations are added.
O ne o f the most important advantages o f this method is t hat matrix
inverses may not be needed. Another is that the computer memory storage
can be greatly reduced. Moreover, the method can be utilized to produce
an " online" method o f p arameter estimation for dynamic processes. These
a nd o ther advantages are discussed further a t the end o f this section.
T he mathematical form derived for M AP estimation. (6.6.6). includes
those derived for ML a nd OLS estimation. F or M L estimation with the
standard assumptions o f 111111, (6.6.6) mathematically reduces to (6.5.2)
if V i 1 +0. F or the subjective prior information case, this corresponds to no
prior information. In this case the value o f I'p is unimportant (provided
V i lI'p =O). F or (6.6.6) to reduce to the estimator given for OLS estimation,
(6.2.5), we may set ~,. 17 11 a nd V i 10:: 0 in (6.6.6). Whether o r n ot these
assumptions are valid. the OLS e stimator is obtained. If the assumptions
denoted 1111111 (17 2 need not be known) are valid, then the estimates
obtained using b MAP will equal those given by b ML a nd bOLS ' In the
sequential p rocedure we use the fact the M L a nd O LS estimates c an bev~ry closely ~pproximated as indicated above if the VfJ matrix is diagonal
WIth large dIagonal components. The sequential procedure also includes
ML a nd OLS estimation when a set o f d ata has been analyzed to estimate
the parameters a nd then later this information is c ombined with more
data; the information for the first set o f d ata summarized by b ML a nd
(XT~IX)I ( or b OLS a nd ( XTX)I) c an be mathematically treated in the
same manner as I'p a nd Vp in M AP estimation. See Section 5.3.4.
Two different sequential procedures a re given. The first is the direct
method; it involves matrix inverses o f dimensions p x p . In the alternate.
a nd recommended. formulation the inverses have dimensions m X m where
m is the number o f responses a t each " time." In the case o f a single
response, m is equal to one a nd thus results in only scalar inverses being
required.
6.1.2 Direct Method
Since the MAP estimator can mathematically include ML a nd O LS
estimators a nd since estimates can be obtained for n as small as one, the
sequential estimator given by (6.6.6) is used as a building block for t~e
sequential method.
276
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
O ne important assumption for sequential M AP a nd M l e stimation is
t hat the measurements are i ndependent in " time." T hat is . for the multiresponse case. '" c an be partitioned into the diagonal matrix
(6.7.1)
where «1», is m X m a nd m is the n umber o f observations taken at each time.
T he m easurements a t each time may be correlated since «1», need not be
diagonal. I f o rdinary least squares estimation is used. the measurement
errors in Y may be correlated in time since in O lS e stimation the matrix '"
in (6.6.6) is replaced by I; in this case P MAP would not yield the covariance
o f bols '
Sequential M AP a nd M l e stimation can be used w hen", is not given by
(6.7. 1) b ut a transformation o f the measurements is necessary to produce
pseudoobservations that are uncorrelated in time. See Section 6.9.
A s equential estimator can be derived by letting
V p +P,.
(6.7.2)
a nd i ntroducing into (6.6 .6) to find
(6.7.3)
_
. 1
P; +1 [XT+1 .....'+1 X; +1+ p ; I]  I
; ...
(6.7.4)
T he i s ubscript refers to " time" ( or whatever the independent variable in
terms of which measurements a re being added). Thus b,+ I is a n e stimator
for all p p arameters based on the d ata Y I' Y 2. . .. . Y,+ I as well as on the
prior information. if any . In the sequential procedure (6.7:3.4) a re used for
i = 1.2. ... . n. In the above equation X,+ I is a n m X p matrix a nd Y,+ I is a n
m X I vector. In order to use the above formulation . it is necessary to invert
the p X P matrix P, + 1 a nd the m x m m atrix «1»; + I a t e ach time.
6.7.3
Sequential Method Using Matrix Inversion Lemma
6.7 SEQUENTIAL ESTIMATION
277
See A ppendix 6B for a derivation o f these equations; (6.7.5a) is k nown as
the m atrix inversion lemma. N ote t hat even though P;+ I is a p x p m atrix,
the matrix that must be inverted o n the right sides o f (6.7.5a, b) is m x m.
By i ntroducing (6.7.5) into (6.7.3,4) we o btain
(6.7.6a)
(6.7.6b)
(6.7.6c)
(6.7.6d)
(6.7.6e)
(6.7.6f)
where K ;+ I is sometimes called the gain matrix. T his gives a general
sequential procedure that can be used for O lS, W lS. G aussMarkov, M l,
a nd M AP e stimation. T he s ame c omputer p rogram c an b e used for each.
Parenthetically we note t hat the same c omputer p rogram c an a lso
provide a Jilter . T hat is, the_estimator b;+1 c an b e u sed to find the best
estimate o f Y;+ I ' design,!lted Y;+ I ' b ased o n all the d ata until a nd i ncluding
time i + I . N otice t hat Y i+I=X;+lbi+1 is n ot the same vector as would b e
o btained from using a ll the d ata ( i=).2, . .. , n) t o evaluate b; when we use
all the d ata , as we usually do, the Y; values are termed smoothed values
rather than filtered values.
In starting the sequential procedure given by (6.7.6) the bo a nd Po
m atrices are required. F or M AP e stimation bo is I'p a nd P o=Vp . F or M L
a nd O LS e stimation bo m ay be set equal t o a z ero c olumn v ector a nd Po is
m ade to b e a d iagonal matrix; t hejth d iagonal term o f Po s hould be large
c ompared with bj .". T he Po m atrix for the M L a nd O LS cases is discussed
further below. W LS a nd G aussMarkov e stimates are o btained in a similar
way.
A nother expression for P HI given b y M endel [16, p. 128] is
T he l abor in finding the inverses o f the matrices in (6.7 .4) can be reduced
if m < p by using the matrix identities
(6.7.7)
P
_ [XT .....  1 X
p  I]  I
, +1' +1 ....'+1 , +1+ ,
_
T
(
T
 P; P;X; +I X '+IP,X'+I+«I»,+I)
.
.....
~
I
X ,+IP,
(6.7.5a)
(6.7.5b)
This expression can b e s hown t o b e equal to that given b y (6.7.6f) b y
i ntroducing the definitions o f K ;+ I a nd .::1;+ I ' I t is t rue t hat (6.7.7) is a more
timeconsuming expression to evaluate t han (6.7.6f). b ut M endel shows
that it is less sensitive to propagation o f e rrors in K t han is (6.7.6f).
....
~
co '
m
C HAPTER' MATRIX ANALYSIS FOR UNEAR PARAMETER ES11MAnON
6.7.1.1
EstimotiOll wit" O nly O M Oburvtltiolt t it Eae" f ilM ( In  I)
A n i mportant simplification occurs in the sequential form given (6.1.6)
w hen t here is a single observation a t e ach time. This is because 4 . is a
scalar a nd t hus its inverse is a scalar. Also n ote t hat
,+,
V i+I'
Y
H
I'
.,+ 1. 0 1+ I '
XH
,
{XH , .,' . . XH
6.,
SEQUEN11AL E S11MAnoN
X I,' a nd X u. T he memory registers caD b e assigned as follows:
o
2
bl
b2 P II
P
XHI .• A •. 1
+,
H'
S TOS
,
(6.1.8b)
P 12 
S T02
Ui+1
...
I
e
K
(6.1.8d)
X ,+uble •1

A ,A 2
r +P'2
S T03
A~
P22 T+ P22
(6.1.8c)
" .I+'A
,
9
A2
S T09
A2
I e'
eHI  YH
7
A 2 X,P 12 + X 2P22
P II   T+PII
I
B
X 2 AI
S T08
(6.1.8a)
X H, .• P"•. I
...
6
XI
~A2X2+A,X,+G2
~i+,ol+l+
A
2
A ,X I P II +X2P'2
I eI
'K
S
G
I ,... ]
...
I
4
P22
Later in the calculations. register S c an be used for ~ a nd then e/~.
A set o f equations a nd storage locations are as follows:
w here 0,2+, is the variance o f Y H , for M L a nd M AP e stimation. b ut is
replaced by unity for O LS e stimation.
T he s equential p rocedure for m = I i mplied b y (6.1.6) is
A... i +,'"'
3
P 12
S T04
Y X,b,X 2b2
STOS
~
I eI
b ,A'l + b,
STOO
e
b2A 2K+ b2
STO I
(6.1.8e)
. ,1.2 . ....,.
(6.1.8f)
where " 1.2. ....p . I t is i mportant t o o bserve t hat t here a re n o s imultaneous e quations t o solve o r n onscalar matrices t o i nvert with this method.
T his is a somewhat surprising result a nd i t is true for a ny v alue o f p ~ l .
T his p rocedure d oes require starting values for b and p . however.
E xample 6.7.1
Give a set of equations based on (6.7.8) for two parameters that is appropriate for a
small programmable calculator. Also indicate the memory locations.
T he i subscript has been d ropped b ut i t is implied; for example. in the ' . .
e quation.
o n the right is a t time i whereas
o n the left is a t time i + \. T he
above set o f equations are used for each value of i. A special storage location for Y
is n ot necessary because i t is read in a nd used as needed.
'11
'11
E xample 6.7.1
Using sequential ordinary least squares. estimate the two parameters for observations Y a nd sensitivity matrix X given by
Y[~] .X[ ~ ~]
10
Solution
b,. ' ... 'n. G'.
Before the calculations. values can be stored for b l'
'I"~
X,. a nd X l'
T he first five are for " time" index zero whereas X I a nd X , are for index I . that is.
I
Let 110 0 a nd P oIo't. 10'ot. a nd 101l1. (These large Po values simulate no prior
information.)
.,
CHAPTER 6 MATRIX ANALYSIS FOR U NEAR PARAMETER ESTIMATION
S olution
Since OLS is to be used,
Example 6.7.1.
ol =
I f or; = 1,2,3. The equations to be used are given in
11 1 ", 3 x 105(3) + 4 x IIP(4) + 1  2,500,001
The rest of the calculations for the first time are given in the third column of Table
6.10 along with results for times 2 and 3. The calculations were performed using a
Texas Instruments SR56 programmable calculator which has a 12 digit accuracy.
For this problem the calculator accuracy can be important since subtractions of
nearly identical large values occur while calculating the P....,1 values. The parameters are only slightly affected for a large range of Po matrices such as from l()l to
10 10 for this example. I f the Po matrix is KI where K > lOl" however, the PI
matrices are 0 f or; ;> 2 and the parameters do not change after ~.
T able 6_10
Quantity
PII,I
PIl,I
Pll,l
bl. 1
bl,l
P lI,l
Pll,l
P ll,l
bl,l
b1,l
P lI,l
PI1,l
Pll,3
bl,3
bl,l
. ;.
.s:;:..
CD
Table 6.11 is given to illustrate the relative errors in the parameters a t the third
data point for different values of Po. The large values of P o·I()l1 to 10'1 lead to
accurate estimates. Small a nd large values of K in Po= KI can lead, however, to
relatively inaccurate parameter values. Small values imply prior parameter estimates are accurately known, which is not compatible with OLS estimation. Small
o r large values of K should be compared with the values of the square of the
parameters. In the present case the parameters are about unity so that K <; I is
termed "small" a nd K ;> l()l may be termed large. Another indication that K is
chosen sufficiently large is that K is large compared with the largest diagonal term
of PI f or; ;> 2 (for two parameters).
T able 6.11
R elative E rrors In hi,) and h2,) f or E xample 6.7.2
Relative Errors in
K in Po ... K I
 S.l4x 1 0 3
 S.24x 1 0 6
 S.13x 1 0 1
4.5SX 1 0 7
1.35 x 1 0'
 2.21 X 1 0 4
 6.02 x 1 0 3
1.94 x I O l
 0.1792
R esults for E xample 6.7.2 U sing 1 1 S R56
P rogrammable C alculator
Exact Values
.52
 .56
.68
2
I
0.0131826742
 0,022598870 I
0.1101694915
2.436911488
0.5367231638
P oIIPI
36000.0256
 47999.9808
64000.0144
1.759999296
1.319999472
0.51999427
.55999339
0.67999228
1.9999952
1.0000044
0.0131826666
 0.0225988336
0.1101692781
2.43691129
0.5367231198
P oIOlot
9
3 .6x 10
 4.8x 109
6 .4x 109
1.76
1.32
0.542
 0.572
0.679
2.0
1.0
0.01311537163
 0,02206035
0.107167128
2.436373456
0.5462544
Po10 1l1
3.6 x lOll
 4.8X lOll
6 .4x lOll
1.76
1.32
0
0
0
2.0
1.0
0
0
0
2.0
1.0
Physically P O implies that the variance of the parameters is zero and thus
nothing more can be learned from additional data if P O; hence the parameters
do not change with time for PoIO"I after the second data point. However, P is
effectively zero in this example only because of our method and the limited
accuracy of the calculator. Hence though Po can be selected from a large range of
values to simulate no prior information, it can be made too large.
Z It
6.7 SEQUENTIAL ESTIMATION
 7.56XIO 3
 7.56x 1 0 6
 S.20x 1 0 1
 2.16x 1 0'
5.40 x 1 0 4
0.0178
0.348
 0.148
O,S63
I t can be shown that K in p o· K I is too large for the twoparameter case when
a nd nc is greater than the number of significant figures used by the
computer o r calculator. I t is not difficult to show that
oUI1IIO~
of
111 
of
K[Xll+Xll]+of
(6.7.9)
Let oU111 be equal to or greater than IO~, nc being the number of significant
calculational digits. Also let K IO' where K is large. Then for K not too large, we
should have
n" <; n clog( Xfl+Xfl)
' 0:'
(6.7.10)
Using the values for the above example and nc  12, we find n" <;.1~.6. In ~ther
words, K should be less that 1010.6 in order not to be too large. This IS consistent
with the results of Tables 6.10 a nd 6.11. To be not near the critical number of
significant figures, four less are recommended, that is, Po..,IWI in this case.
212
~
&qll~lItial A uIyJ;J
6.7.1.1
c,n.
o·
CHAPTER 6 M ATlIX ANALYSIS FOR UNEAR PARAMETER E S11MAnON
o f EXllmp/~ 1.1.4
From (6.6.6b) we can write
Computer programs can be readily written based on (6 .7.8). One advantage is that no separate method is needed for the solution of a set of
simultaneous algebraic equations. Moreover. the procedure is readily modified for any number of parameters p. F or two parameters a small programmable calculator can also be used.
A computer program was written to estimate. using sequential OLS. the
parameters in the model TI;  {J, + {J1X/ for the data of Example S.2.4 .
Ordinary least squares analysis implies that the variance a} is constant,
prior parameter values are unknown. and the diagonal components of Po
must be large. Since OLS analysis is unarrected by the choice of 0;1. replace
til by I f or;" 1. .. .. 9. For simplicity. let the initial values of {J, and {Jl be
zero. as no prior information is given.
I f rough estimates were available for the parameters. then the diagonal
components of Po could be chosen about loJ to 10' times larger. I f we d o
not have this information. then for this twoparameter case. (6.7.10) can be
used; since XI . ,I and X l.,O. we find that nA:<:n('. Thus if a computer
with IS significant digit accuracy is available. Po should have diagonal
terms less than 1015. It would be safer to reduce Po by four orders of
magnitude. however. say to 1 0". Shown in Table 6.12 are those obtained
using P oIO'I but the parameters are identical to the seven decimal places
given to those for P oKI.lo'<K<IO". for a IS significant digit computer. Actually the values in Table 6.12 after ;  I are exactly the same as
those given by the usual least squares procedure if the data were first
analyzed for the first two data points. then the first three. etc.
One way to check if Po is made large enough is to repeat the calculation
with Po made larger. This is not efficient. however. Another way is to
compare the diagonal components o f Po and PII • the matri" for all the data.
T ab1e6.U Sequential Analysis 01 Example 5.2.4
b,
0
I
2
3
4
S
6
7
8
9
1.1 SEQUENT1AL I S11MAnON
b1
PII
P12
PZl
0
0.2580000
0.2580000
0 .1281667
0 .4197000
0 .8238000
0.9545238
0 .7957500
1.1195833
1.2864667
0
0.0
0. 17080000
0.2097500
0.1660200
0.1256100
0.1158057
0.1253321
0.1091405
0. 1019883
1 0'
1.0000000
1.0000000
0.8333333
0.7000000
0.6000000
0.5238095
0.4642857
0.4166667
0.3777178
0
0.0
 0. 1000000
 0.0500000
 0.0300000
 0.0200000
 0.0142857
 0.0107143
 0.0083333
 0.0066667
1 0'
10"
0.0200000
0 .0050000
0.0020000
0 .0010000
0 .0005714
0.0003571
0.0002381
0.0001667
P ,, [XT.,,IX+p;'r
l
(6.7.11)
which for the OLS analysis above becomes
(6.7.12)
Now as K.... ~. p " ....(XTX)I. T hen as this condition is approached, the
diagonal components of XTX . a nd hence those of P I I' must be small
compared to K . Consequently we can check if Po';' K I is large enough by
comparing the diagonal components of PIt with K. N ote that in Table 6.12
the P II a nd Pzz values for ;">2 a re much less than K Io'.
Some further advantages are given below of the sequential method
compared to the usual OLS analysis illustrated by Example S.2.4. Each
advantage relates to the ability o f the sequential method to provide more
information than is apparent from the usual OLS analysis. First, the effect
of adding a single observation is apparent. For example, the effect o f the
fourth observation is to make b l much larger than if only the first three
observations are used. Second, decreasing changes in the parameters w ith;
show that each new observation tends to contribute less new information
than the previous one. S ee b z versus i in Table 6. 12.
Third, time variations of the parameters can yield insight into the
accuracy of the measurements a nd/or adequacy of the regression function.
F or example, b l seems to be increasing with t he; index whereas bl is more
constant. This increase in b l could be due to inaccurate data o r to actual
time dependence 01 the parameter. (In this example we know that the
former is the case because the regression function used is the correct one.)
Owing to the larger variation o f b l we suspect that the relative errors in the
estimate b l are greater than in the estimate bl • T he possible time dependence of b l could be further investigated by adding measurements o r by
repeating the analysis with a new set of data. I f the increase in b l persists,
then a change in the regression model would be indicated.
Fourth, some conclusions can be drawn from the time variation o f the
parameters without any prior statistical knowledge of the measurement
errors. I f, however, there is statistical knowledge more can be learned.
The sequential method also yields time variation of components of P.
Note that P II is decreasing much more slowly than P22• I f the measurement errors are independent a nd have constant variance (or more precisely
111101), P II is proportional to V (b,) a nd Pzz to V (b l ) . Hence the
lI4
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
m easurements for ; > 2 in this example are m ore effective in r educing
e rrors in b l t han in b •.
T he d ecrease in p •• a nd P 22 w ith; s hown in T able 6.12 is n ecessary as
indicated b y t he equations in E xample 6.7.1 ( because A:/~>O a nd AiI~
> 0). Physically this is r easonable because a dded m easurements i ncrease
the available information. which results in the diagonal c omponents o f P
d ecreasing o r a t least n ot i ncreasing.
6.7.4 Sequential MAP Estimation
6.7 SEQUENTIAL ESTIMATION
~.
Notice that the measurements tend to be concentrated at the extreme temperatures
of 20 and 300°C. I f there were no uncertainty in the adequacy of the linear in T
model and if there were no prior information, the optimum design would consist of
onehalf of the measurements being at each extreme T. The experimenter compromised by putting most of the measurements at the extremes but some intermediate values were included.
Estimate sequentially the parameters in the model k  P. + P2T with and without
the prior information.
Solution
k
\
A nother a dvantage o f the sequential procedure is t hat the s ame p rocedure
( and t hus c omputer p rogram) c an b e u sed for M AP e stimation as well as
for WLS. G aussMarkov. M L. a nd O LS e stimation provided c ertain
s tandard a ssumptions a re valid. Sets of assumptions permitting sequential
analysis are 1111112 a nd 1111113. T he c ondition that the m easurement
e rrors b e uncorrelated in time is p articularly important.
Example 6.7.3
An engineer has been given the task of measuring the thermal conductivity k of .1
new electrical resistance heating wire. A linear curve of k versus temperture T is
needed. Based on his experience with similar alloys he feels that the model
."  P. + P1T is reasonable with prior estimates of P. and Pl of 1'. = 12 WI moC
and ILl  0.01 W I moC l where T is in 0c. He estimates that the covariance matri1:
for these values is
v _[
fJ 
2
 0.002 ]
0.002
IO  s
and the prior distribution is normal.
For the new alloy he obtained the following measurements. The error in the
temperature level can be neglected but the standard deviation in each measurement
of k is about 0.2. Also the errors are independent and normal.
T (°C)
....
'CJl
.
I
2
3
4
5
6
7
8
9
10
Measured
Value of k (W I m°C)
20
20
21
100
150
200
250
297
300
302
11.47
10.94
11.15
11.85
12.55
13.18
13.48
13.90
14.54
14.36
;:
This problem can be viewed as being one involving subjective prior information.
The prior means of h. and h2 are 12 and 0.01, respectively. The Po elements are
P II 2, p . 2  0.002, and P n  10 5• The 01 values are 0.04. The algorithm given
in Example 6.7.1 can be used to estimate the parameten with X II a nd X 2 being
the T values. The results are given in Table 6.13. Notice that the first two
observations (both of which are at T 20°C) yield estimates of h I and h2 which are
near the final values. The variance of h I which is given by P II reduces considerably as a result of the f int two observations. This is not true, however, for h2 since
P22 decreases only slightly. This result is reasonable because h l represents the slope
of k which. in the absence of prior information, requires measurements at two or
more different 7j values. With all the observations used, both P II and P22 have
decreased considerably, indicating that the new measurements substantially reduced the experimenter's uncertainty.
Table 6.14 gives typical results for sequential estimation with no prior information. Using (6.7.10) it is found that P oIQ41 is large but not too large for 12
significant figure accuracy. In contrast with the prior information case, the f int two
observations do not yield estimates that are reasonable. The reduction in the P
matrix is negligible from the f int to second observation. This is because both are at
Table 6.13 Sequential Estimates Using PrIor Infonnstlon for
Example 6.7.3
b.
0
I
2
3
4
5
6
7
8
9
10
b2 x 102
P II x 102
P I2 X lOS
P nX 107
12
11.271
10.997
10.971
10.973
10.961
10.940
10.960
10.979
10.933
10.922
I
1.0669
1.0921
1.0934
.9591
1.0228
1.0803
1.0409
1.0127
1.0813
1.0977
200
4.399
2.386
1.719
1.718
1.634
1.513
1.393
1.290
1.246
1.222
 200
 20.37
 18.52
 18.18
 17.56
 13.33
 9.916
 7.557
 5.977
 5.318
 4.953
100
83.50
83.33
83.32
42.58
21.20
11.58
6.930
4.512
3.521
2.982
216
C HAPTER' MATRIX ANALYSIS FOR UNEAR PARAMETER f SIlMAnON
1.1 SEQUIN'IlAL r snMAnON
f ,..·• .
....
Table 6.14 Sequential Estimates for No PrIor Information a nd
PoI~I for Example 6.7.3; o lO.04.bo·O
b lx 102
bl
CJl
N
0
I
2
3
4
5
6
7
8
9
10
0
.0286
.0279
12 .274
11 .018
10.967
10.935
10.958
10.977
10.928
10.916
I 'll x 102
I 'll X 105
0
57.207
55 .885
 5 .349
.8318
1.0054
1.0824
1.0399
1.0113
1.0833
1.1001
1 0'
9 .98 x 105
9.98x 105
2475.9
2.361
1.875
1.627
1.455
1.328
1.276
1.247
0
 4.99x 1 0'
 4 .99x 1 0'
 1.22x 105
 33 .82
 17.28
 11.27
 8.127
 6.258
 5.510
 5.104
6.7.6 Ridge Regressloa EstImation
S tarting a bout 1960 A. E. H oerl a nd R. W. K ennard [1721] developed a
p rocedure called ridge analysis, which is a graphical m ethod for depicting
the characteristics o f s econdorder regression functions having many independent variables. T o a r elated p rocedure Hoerl gave the name "ridge
regression." which he a nd K ennard h ave pointed o ut c an h ave a general
Bayesian interpretation. Hence the Bayesian estimation p rocedure which
we call maximum a posteriori is related t o ridge regression.
F or t he s tandard a ssumptions implied b y 111111, OLS estimation
provides the estimator given b y (6.2.S). T his estimator among all linear
unbiased estimators provides the minimum variance (for these assumptions). T he c ovariance matrix o f bu is given by (6.2.11). F or convenience.
Hoerl a nd K ennard scale the independent variables s o t hat XTX has
diagonal elements all e qual to one. I f the eigenvalues o f XTX a re d enoted
" .iJI.2 .....p . then a seriously "illconditioned" (relatively small IXTXI)
p roblem is characterized b y t he smallest eigenvalue Amia b eing very much
smaller than unity. Hoerl a nd K ennard h ave noted t hat OLS estimation
provides inadequate estimators for a n illconditioned problem since
02
/".18 is a lower b ound for the average squared distance between bLS a nd
IJ. T hus for such cases bu is expected t o b e far from the true vector IJ with
the absolute values o f t he elements o f bLS being too large.
T he ridge regression e stimator is given b y
I 'll X 1 0'
1 0"
loa
loa
2 .49x
2 .49 x
5 .99x 105
8 4.02
27.78
13 .24
7.475
4 .732
3.652
3 .075
T  20 oe. Reasonable values o f b l a nd bl a ppear o nly a t ; :> 4. I t is only a t ;  4 t hat
T changes to l oooe arter being n ear T  20 e for the first t hree o bservations. 8 0th
b l a nd bl c an be estimated for the linear model k  PI + P2 T o nly if measurements
a re m ade a t 2 o r more T values. Notice t hat ~" c omponents in T able 6.14 decrease
in magnitude more rapidly than for T able 6.13. H ence a s t he n umber o f o bservations increases. the importance o f t he prior information diminishes. P rior information always reduces parameter uncertainty. however.
0
6.7.5 Multlresponse Sequential P arameter Estimation
(6.7.13)
W hen several ( m> I ) dependent variables are measured a t the same time,
it is sometimes possible to renumber them so t hat in effect m  I. T his c an
b e done if OLS estimation is being used o r if M L a nd M AP estimation is
used a nd the measurements are independent at each time as well as with
time. As a n example consider the temperature d ata given in Table 1.14
where there are eight measurements made a t e ach time. Assume t hat a
sequential OLS analysis is to be performed. T he t emperature measurements o f this table can be described either by
'1 (i);
j ... I, 2, .. . , 8.
for K :> O. W ith XTX scaled t o h ave unity d iagonal terms, values o f K in
the range o f 1 0 4 to. 1 a re typical. T here is a n " optimum" value o f K for
a ny p roblem; Hoerl a nd K ennard discuss methods for selecting K. T he
M AP e stimator given b y (6.6.6c) yields t he s ame estimates a s (6.7.13) if 1'/1
is r eplaced b y O.  I by I. a nd V i 1 b y KI. T his has the effect o f
i ntroducing the subjective p rior i nformation t hat the mean parameter
values are zero; then the estimates given b y (6.7.13) h ave smaller absolute
values a s K becomes larger. H ence a s i ndicated b y T heorem 2 o f
M arquardt'S p aper (21). (6.7.13) has the p otential o f reducing the innated
O LS p arameter estimates found in iIIc:onditioned cases. T hough the estimator given b y (6.7.13) provides biased estimates, Hoerl a nd K ennard (11)
h ave demonstrated t hat t here exists a K > 0 s uch t hat E [(b· _1J)T(b. IJ»)
< E (buIJ)T(buIJ)] provided IJTIJ is bounded. Evidently the M AP
f ormulation has m any d ifferent interpretations a nd uses.
+
;  I, 2, . .. , n
or
Y,,;
k I.2. .... 8.9.10. . ... 8 n2,8nI.8n
By using the laller numbering, the problem is changed from o ne with m  8
t o m  I. See Problem 6.26.
l
188
6.7.7
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER E SIlMATION
6. • MATRIX FORMULATION FOR CONFIDENCE INTERVALS
Comments and Conclusions on the Sequential Estimation Method
Y. = X b .. These a re based o n
I.
2.
3.
4.
S.
,...
c.n
w
6.
T he method is general as it includes MAP, ML, Gauss Markov, WLS,
a nd OLS estimators. When M AP a nd M L estimators are used it is
necessary that the measurement errors t be additive, normal, a nd
i ndependent in " time" designated by i . I f the observations are not
independent in time, sometimes transformations can be made to obtain new dependent variables that are independent. In Appendix 6A it
is shown how certain autoregressive (AR) models for the observation
errors can be treated by constructing independent combinations of the
observations. I t is assumed that the statistical parameters of (12 a nd p
a re known. See also Section 6.9.2.2.
I f ( I) (12 is unknown for I/; = (12r! where r! is known a nd (2) there is
no prior information, the sequential procedure with (12 replaced by I
c an be used to estimate the parameters. After bn is found, (12 c an be
estimated using S 2; the estimated covariance matrix would then be P n
times s2 .
I f the problem can be formulated so that there is only one independent
observation at each i, only a scalar needs to be inverted regardless o f
how many parameters are present. There are no simultaneous e quations to solve.
The method readily extends to more than o ne unknown parameter.
T he summations in (6.7.8) can be easily programmed for a n a rbitrary
value of p, the number of parameters in the b vector.
An examination of the parameters as a function of the index i c an
yield information that is not readily available otherwise. First, the
models are usually chosen to contain parameters that are constant with
time. I f inspection of the parameters indicates that there is a time
dependence (as in Table 6.12), then the adequacy o f the model is
questioned. Second, one can obtain a n immediate " feel" o f the effect
o f a n additional observation which does not depend upon any statistical knowledge of the probability densities. The change in the p arameters becomes less as more observations are used.
G ood practice usually entails an inspection of the residuals. F or the
linear parameter case the true residuals are not obtained directly in a
sequential manner. The residuals e based on the final parameter values
(b,,) are e = Y  Y= Y  Xb n • These values are not the same as those
calculated based on bi '
_
T he sequential method provides a t e ach i a filtered estimate of Yi as
the d ata until t ime; a nd c an b e used in
a~ on~li~e analysis. T he sequential method given in this section c an b e
In this subsection some observations regarding the sequential method are
given.
7.
related to the discrete K alman filter (16, p. 159].
The sequential M AP e stimator c an also be interpreted as providing a
ridge regression estimator which c an b e helpful when the d ata a re
illconditioned, t hat is, IXTXI is nearly zero.
6.8 MATRIX FORMULATION F OR CONFIDENCE INTERVALS
AND REGIONS
M uch more information c an b e conveyed regarding parameters b y specifying confidence intervals o r regions in addition t o p arameter estimates.
Whenever possible a nd a ppropriate it is r ecommended t hat confidence
regions be presented in addition to estimates.
In order to present meaningful confidence regions it is necessary that the
underlying assumptions be valid. Two assumptions frequently violated in
scientific work are t hat the errors have zero mean a nd t hat t he errors are
uncorrelated. Erroneously taking these assumptions to be true has led
many to present overly small confidence intervals. Physical parameters
have been presented b y d ifferent experimenters with each successive estimate being outside the preceding confidence interval. This has happened
so orten t hat o ne s hould be very careful to check his underlying assumptions before presenting his results. F urther discussion o f assumptions is
given in Sections 5.105.14 a nd Section 6.9. Presentation o f confidence
regions is r ecommended b ut m ust be carefully a nd honestly given.
F or additive, zero mean, normal measurement errors the j oint probability density for the parameter vector b can be written as
! (b)=(2'IT)P/2IVblI/2exp[  !(bII)TV;'(b II )]
(6.8.1 )
where Vb is the covariance matrix o f b. I t is given by (6.2.11) for ordinary
least squares, by (6.5.5) for maximum likelihood, a nd by (6.6.11) for M AP
estimation. (The assumptions are different for each case.) F or convenience
in representing cov(b) [or c ov(b II) for M AP cases], let us use P for each
case,
(6.8.2)
T o o btain confidence regions, the covariance matrix of t , t hat is, 1/;,
s hould be known a t least within a multiplicative constant. Section 6.8.1
. ",..
.c:,.T1
.~
2M
C HAPTER' MATRIX ANALYSIS FOR LINEAR P ARAMEttR f S11MAnON
gives confidence intervals. Section 6.8.2 p rovides a derivation of a confidence region p rovided'" is completely known. F or the more general case
o f ", ... 0 10 where 0 1 is u nknown a nd 0 known, a confidence region
analysis is given in Section 6.8.3.
6.8.1
Confidence Intervals
A confidence interval can be found for each parameter P. through the use
o f the k th diagonal term in P a nd the I d istribution (if it applies). See
T heorem 6.2.3. Suppose that the measurement errors a re additive, zero
mean, a nd normal. Also let there be no errors in the independent variables
a nd n o p rior information regarding the parameters. Also let ", ... 0 10 where
2
0 is u nknown and 0 known. These assumptions are designated 111011.
S uppose t hat OLS o r M L h as been used a nd 0 2 was replaced by a ny
c onstant c l a nd the matrilt P was calculated; for e umple, for M L it is

p ... C2 (X T OIX)
I
U
291
MATRIX FORMULAnON FOR C ONnDtNCI: INTERVALS
L et us think o f t he hyperellipsoid as being centered a t t he origin with
coordinates being b l  PI'" .• b,  /1,. Let { Z b e s ome ~pecific value. F or
, 2 <; 12, (6.8.5) r epresents the interior o f a hypereJlipsOid. At
I, (6.8.5)
p roduces hypersurfaces o f c onstant p robability density. A met~od for
determining values for I is derived below. Although (6:8.5) WIth. , .1
describes the ellipsoid. for m any p urposes a m ore convenient descnptlon
c an b e given in terms o f t he directions a nd lengths ~f t he axes of ~he
ellipsoid. T o t ransform (6.8.5) in such a way as to provIde such a deSCription. we first find the eigenvalues o f P o r P  I since the eigenvalues of P
a re simply the reciprocals o f those of p I.
.
F or c onvenience in the following derivation, let p  I be desIgnated C
whose eigenvalues a re f ound by solving the d eterniinantal e quation
,=
, c~O
T he tilde C) is used because 0 1 is u nknown. With the above assumptions,
for OLS o r ML we can give the estimated s tandard e rror of hie as

e st.s.e.(hk)=(Pu )
where S 2 is the estimated value of
interval is given by
2
0•
1/2 S

c
(6.8.3)
T he eigenvalues of C a re d esignated AI,A 2 ,.· • • A,. . F or convenience in
numbering the A;'s, let AI <; Aj for i < j.
L et C =pI a nd let C be given by
T hen t he 1 00(1 a)% c onfidence
hie  est. s.e.( hk )lltJ/2( n  p) < Pie < hie + est. s.e.( hle )lltJ/2( n  p)
(6.8.4)
(6.8.7)
where
(6.8.8)
where I I_a/in  p) is the I statistic for n  p degrees o f freedom. F or 95%
c onfidence a nd n  p .,.. 10, we find in T able 2.15, t.",< 10) ... 2.23. F or a
n onlinear example, see E umple 7.7.2.
It should be noted that there is considerable danger in constructing
confidence ,egions from c onfidence intervals found as above, since such a
procedure can yield highly inaccurate confidence regions.
6.8.2
Confidence Regions for Known '"
a nd where e is a p x p matrix,
e ll
e ll
(6.8.5)
el"
e 2p
[ee···e]
I2
,
ee,,1
I n this s ection", a nd t hus P a re assumed to be known. Consider the matrix
product in the exponent o f (6.8.1) a nd set it equal to , 2 s ince it is a
nonnegative scalar,
e ll
en
(6.8.9)
e".
e,,2
T he v ector components of e a re o rthogonal a nd o f u nit length (i.e.,
orthonormal).
or
(6.8. lOa. b )
292
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
6.8 MATRIX FORMULATION FOR CONFIDENCE INTERVALS
which implies t hat
T he p robability o f a p oint z ( or b IJ) lying inside the h ypersphere
(6.8. IDe)
12;> r2, w here I is s ome fixed value, is f ound b y using the a bove t ransfor
(6.8. 11)
22
P (r <.1 )= f f··· f(27T)p'2 exp [  ; ]dZ 1 dZ 2 · ··dZp (6.8.19)
P ostmultiplying (6.8.7) by e gives
I ntroducing the vector c omponents of e i nto (6.8. 11) y ields
ee, =A,e,
mation in (6.8. 1) t o o btain
T he i ntegration is p erformed o ver t he i nterior o f t he hypersphere described
b y (6.8.18). N ote t hat
( 6.8. 12)
which can be considered to c omprise a set o f p l inear homogeneous
e quations with the u nknowns e l ,. e 2, • . • .• ep ;' F or e xample. for p = 3 we have
( 6.8.l3a)
(6.8.20)
is used in deriving (6.8. 19). A v olume e lement i nside t he hypereJlipse c an
b e d escribed b y
( '12 e l,
+ « ('22  A, )e2i + ( 2 )c), = 0
( 6.8.l3b)
ClJc"
+ C2 )e2, + ( C))  \)e), =
(6.8.13c)
(6.8.21 )
which constitutes three equations with three unknowns, b ut s ince these
e quations a re homogeneous. there are a t m ost two i ndependent e quations .
A t hird equation is f ound from (6.8. lOb).
w here f ( . ) is t he g amma f unction. T he u sing (6.8.21) in (6.8.19) results in
0
e~, + ei, + ej, = I
(6.8.14)
T hen e l i ' e 2i • a nd e ), w ould usually be found from a solution of (6.8 . 13a,b)
a nd (6.8.14).
A new coordinate vector can be defined by
or
(6.8.15)
(6.8.22)
( If t he t ransformation r 2 = x is used, (6.8.22) is t ransformed t o a form
which is t he integral o f t he c hisquared p robability density function with p
d egrees o f f reedom.) F or p = I , (6.8.22) gives
T hen i ntroducing (6.8.7) a nd (6.8 . 15) in (6.8.5) produces
(6.8.23a)
,=1
(6.8.16)
a nd f or p =2
U sing t he further transformation
22
P (r <.1 ) =
10'exp (  ~ ) rJr= I exp (  ~)
( 6.8.23b)
(6.8. 17)
. ...
we c an write
C )1
CJl
(6.8.18)
T hese p robabilities for 1= I , 2, a nd 3 a re given in T able 6.15 for several
values o f t he n umber o f p arameters p . T hese t hree I v alues a re s ometimes
called the one, two, o r t hreesigma probabilities. Also given in T able 6.15
a re t he I v alues associated with t he 9 0 a nd 95% c onfidence regions. O ther
.c...
.n
294
CHAPTER 6 MATRIX ANAlVSIS FOR LINEAR PARAMETER E SnMAnON
1.1 MATRIX FORMULAnON FOR CONnOENCE I NttRV.US
C;;
Table 6.15
Values o f Confidence Region Probabilities
for Various Numbers of Parameters
p
la I
1=2
1 3
I
2
3
4
6
8
10
.683
.3935
.1987
.0902
.0144
.00175
.00017
.955
.8647
.739
.594
.323
.143
.0527
.997
.9889
.971
.939
.8264
.658
.468
F or 95%
Confidence
1.960
2.441
2.795
1 080
3.548
3 .938
4.279
For 90%
Confidence
1.645
2.146
2.500
2.789
3.263
3.655
3.998
parameters.
v alues o f
'I .
(6.8.27)
1 Value
Probability for
No. of
the matrix form
U . . ± 11 __ ( p)diag[",l/l "1 1/2 • •. ,.,1 / 2] =eT8
fJ for the
where 8 _ ( 8 8 . .. 8 ] a nd 8 , contains the coordinates o f b t2
,
O c)'
ith u is. Solving for 8 using (6.8.1
gives
8 ±
'1_ (p)ediag[",I/l •.. ~'12]
(6.8.28)
Example 6.8.1
Consider the case o f two parameters. Give a n algebraic fonnulation of the
confidence region for fJ using C  p , .
Solution
( P) c an b e o btained u sing the F or X2 t ables since
' I_ .. ( p)= [ p FI _.. ( p,oo) ]
1/2
[2
= X ( p)]
1/ 2
Since C is given by
( 6 .8.24)
I n s ummary Ihe c onfidence region for k nown P ( and t hus ~) is t he
i nterior o f t he hyperellipsoid,
(6.8.25)
we c an e xpand (6.8.25) to
( b  PI )IC U + 2 (b l  PI ) (b1  P1 ) C Il + ( b1 l
pdc21 ll
0
( 2)
However, we need new coordinates hi a nd h l s o that ( a) c an be written
I l_oAlhf+AlhJ
w here ' la(P) is t he I v alue associaled with the I OO(Ia) % c onfidence
r egion for P p arameters in the model. T his r egion is m ore c onveniently
d escribed b y locating the principal axes o f t he h yperellipsoid. T he e xtremes
o f t he a xes in terms o f t he new c oordinales hl . .... h, a re g iven by (6.8.15).
T he m aximum c oordinate v alues a long t he new axes a re given by
h i = ± 1 1_,,(p)A Ih i = O.
h 2 =O,
h)~O. ... , h,=O
h2 = ± 11 _ 0 (P)>'2 1!2.
( b)
T he eigenvalues AI a nd A1 a re found using (6.8.6).
( C II A)(C21 A) cllAIA(C II + C ll ) +(CIICn  C~I ) 0
(d
which can b e solved for AI (being the smaller) a nd A2•
I
I ]I/I} / 2
( d)
I
I ]l/l} /2
(~)
AI  (CII+Cn[(CII+Cn )  4C II Cn +4C Il
h) = 0, . .. , h,"; 0
I2
/,
( a)
( major a xis)
A I{CII+Cll+[(CII+Cll)  4C II Cn +4C II
T he coordinates hi a nd h I a re given by (6.8.15).
h
( 6.8.26)
w here (6.8.16) is used with r 2= 112_o(P). T he h, v alues d epend o n e,. which
is f ound a s suggested b y ( 6.8.13a, b ) a nd (6.8.14). W e wish t o r elate these
values to p oints in the b  fJ c oordinates . E quation 6 .8.26 c an b e w ritten in
T he
I
_~ II (blPI)+~21(blPI)'
.
hl~II(blPI)+en(blP2)
e components are found using (6.8.10) and (6.8.12) o r for ~II a nd ~Il'
u
( C II  AI)ell + CIl~IIO
~f,+ellI
( J)
.
296
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER fS11MATION
U
~TRIX
FORMULATION FOR CONFIDENCE I Nn:RVALS
m
which yields
From ( d) a nd (e) of Example 6.8.1 we find AI =0.990001 a nd A I= 101.009999. The
components of e are found from ( g), (II), ( i), a nd ( j) to be
( g)
( II)
_ [ 0.9949382
e  +0.1004886
+ 0.1004886]
+0.9949382
F rom ( k) the end points o f the ellipse are given by
( b l  PI )m~'" ± (2.447)(  0.9949382)(0.990001) liZ  =+ ~.447
Similarly for e ll a nd e ll, we have
( b 2  /32 )~ ±2.447(0.1004886)(0.990001 ) I/Z ... ±0.2471
C llell + ( C22  A2 )e22 = 0
( b l  PI )min  ± 2.447(0.1004886)(101.009999) liZ,.. ± 0.02447
which has a solution of
( b 2  P2 )miD'" ±2.447(0.9949382)(l01.009999) liZ  ±0.2422
These are the end points o f the ellipse
( iJ)
If  a (2) ... 2.4472 ""AIIIl +A211~
c :0.99000llIi+ 1OI.00000011l
For twoparameter cases it can be shown that e ll = e21 a nd that ell = Symmetry can not generally be arranged for p > 2.
The end points of the axes are given by (6 .8.28),
e2 '
2
which is shown as the larger ellipse in Fig. 6.5.
F or i =9, the C matrix is
9
c = [ 360 23 60]
0400
(k )
for which AI .,2.646235 a nd A2. . 20406.35377. The e matrix is
 0.99984429
e = [ 0.0176466
since e llen  elle21 = ± I . This equation means, for example,
0 .0176466]
0.99984429
a nd the end points of the confidence region are
( b l  /31 )maJ = ± 11  (2)eIlA I 1/ 2.
0
( b l  /31
) m in
= ± 11 _ . . (2)eIlA 2
1/2.
( b 2/32)maj= ± /1_ .. (2)e21'\11/2
( b 2  /32 )min = ±
'1 .
( 2)enA21/2
( I)
( m)
E xample 6.8.2
Find the 95% confidence region for i = 2 and i = 9 of Example 5.2 .4.
S olution
Consider the ; =2 case first . The C =pI matrix for OLS estimation with the
standard assumptions is a1XTX. Since 17 2 = I . we have
II
c=
[
l :X;
10]
100
This curve is also shown in Fig. 6.5; it is very narrow indicating less uncertainty in
b 2 than for b l'
...
.
The estimated values o f b l a nd bl usmg the Rlne observations m Table 6.12 are
1.286 a nd 0.10199. respectively, a nd thus b l  /310.286 a nd bl  /32=0.0199. (The
true values of /31 a nd P2 are I a nd 0.1.) This value is outside the 95% confidence
region for i = 9 because the , 2 value given by (6.8.5).
( b II)Tpl(b II)=(b l  PI )IC U + 2(b l  PI ) (b2 P2 ) C I2 +(b l  P2 ) ICU
is equal to 12.9, which is greater than Q_,,(2)ca5.99. On~ c~n obse~e directl~ from
the plot in Fig. 6.5 that the estimates bl.2 a nd bl •1 are mSlde the , 2 confidence
region. (See the point X a t b l  /11"  0.75 in the figure.)
199
6.1 MATRIX f ORMUUnON f OR CONflDENCE INTERVALS
6.8.3 Confidence Regions for '" = CJ2Sl with
N
c
~
Known and
CJ2
Unknown
In the previous subsection we considered the case of known ",. In this
section", is equal to CJ2 times 0 where CJ2 is unknown and S l is known. This
2
case is analyzed by developing an F statistic. which is the ratio of two X
statistics each divided by their respective degrees of freedom. The first X2
statistic is ( b fJ)TpI(b fJ), which has p degrees of freedom. The
assumptions are designated 11101 I which include zero mean, normal
measurement errors. In this section we assume that there is no prior
information regarding the parameters. Maximum likelihood estimation is
L.
of
Sl
...
r
..
.. ....
u
0:::
"0
~
~
0::: •
o
assumed.
The second statistic is RML,
u~
I
N
..;
.Q
.!!
!
.
..,
which we wish to show is a X2 statistic with "  p degrees of freedom. As
usual, ", ... cov(~). Notice that ordinary least squares analysis is not permitted unless 0 1 for which case R LS  RML' I n general, 0 is not diagonal.
I t could result from autoregressive observation errors, for example.
We shalt transform R ML to a sum of squares using the identity
. D .2
N
0
a
.8
(6.8.30)
0
"i
L.
..
0
a
~
c
.....
.
0
u
c
a
(6.8.31 )
(3
t
,
..
;:
c
"0
u
~
a
,
~
IX
where 0 a nd. a re" x " matrices a nd. is diagonal. Note t hat·
~
j
I
I
,
'";
and thus
lao
R ML
can be written
R ML 
• )T I( • )  2
(YY ML 0 Y Y ML
(J
(6.8.32)
where
(6.8.33)
I
N
I
Notice that (6.8.32) is a sum of squares. For the linearintheparameters
 For Ibis case the square root of Ibe diagonal m atrix. is a diagonal matrix having elements
that are the positive square roots of corresponding elements in • .
I
300
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER FSTIMATION
6 .t MATRIX ANALYSIS w rm CORRELATED OBSERVATION ERRORS
301
known. W hen", = a ll, (6.8.39) reduces to the more common expression,
model 'IJ = XfJ. (6.8.32) can be written
(6.8.40)
=
T · r ) _2
F
(F T b~ll . Z F a.
(6.8.34)
which has a x 2 distribution with n  p degrees of freedom. See Theorem
6.2. 1. Observe thai Ihe covariance malrix of F is
c ov(F)=
E[ <I>I/2D  u
1
T (D 
F or convenience. define R · and ( p.)I
1
{
10
<1>1/2) = a 21
since then bMl = bLS •
W hen dynamic experiments a re p erformed a nd automatic digital data
acquisition equipment is used, n is usually quite l argepossibly several
hundred o r even thousands. In such cases n  p is large a nd
p FI _ a ( p,n p):::::/~a ( p)
a nd then
(6.8.41)
be
(6.8.35a)
(6.8.35b)
Since F in (6.8.35a) is a random vector satisfying the first five s tandard
assumptions.
is known. and there is no prior information. (6.8.35a) is
analogous to (6.2. 19) which can be used to get
i
F or example, for a  0.05, c orresponding to the 95% confidence region, and
for two parameters ( p=2), the values o f 2 F._ a(p,np) a re 6.18, 6.08,
6.04, a nd 5.99 for n = 1 00,200,400, a nd 0 0, respectively.
6.9 MATRIX ANALYSIS WITII CORRELATED OBSERVATION ERRORS
6.9.1
R·
(6.8.36)
S2=_
np
In order to be consistent, let ( p.)  I be found for M L estimation o r
(6.8.37)
Now the F statistic is the ratio of Iwo independent random variables,
each with a X2 dividedbydegreesoffreedom distribution (see Section
2.8.10). Hence a joint confidence region for the parameter estimates can be
found from
(6.8.38)
or
2
In deriving (6.8.39) we have assumed that '" is equal to a D where D is
Introduction
When automatic digital d ata acquisition equipment is used for dynamic
experiments, very large numbers o f measurements can be obtained. These
observations may not be independent, however. Correlated measurements
a re frequently obtained when the s ame specimen is tested over some range.
An example is the measurement o f the electrical resistance o f a piece of
wire as a function o f t emperature such as a t 20, 21, a nd 2 rc. The
measurements a t 20 a nd 21°C m ay be correlated for a given specimen, but
a 2 0°C value for one specimen probably would not be with a 21°C value
for another specimen.
Another example o f c orrelated errors is provided by the case of a
cooling billet given by Example 6.2.3. Some o f the temperatures recorded
are depicted in Fig. 6.1 a long with the associated residuals. These data are
shown for 96sec time steps, but observations were actually made a t the
smaller steps of 12 sec. I f all the d ata between 0 a nd 1536 sec are used, the
regression curve for the 129 observations is very close to that obtained
using 17 observations. F or the fourthdegree model given in Example 6.2.3,
residuals in P are plotted in Fig. 6.6 for the first 25 d ata points for 1l1= 12
sec. O n the upper side of the horizontal axis is a scale that corresponds to
Fig. 6.1. T he solid circles show the residuals. There are I I consecutive
negative residuals, followed by 14 positive residuals.
U
.....
L ~ a
~"
1 0"
.
 la = lav
sa:>uaJaJJ~a
.....
0
.....
0
0
I
U 'I
r '),
"
...
)(
N
o
N
Q/
r
u
"'n
e
~
....
en
"0
C
0
u
CII
en
CII
u
"n'
e
~,.
...
01
o r
L
CII
en
::::I
...
I
)(
.c
'ui
Ci.
u
~
.
4
:E
. c .E
~
) (/
",
.c
.N ~
....
II
/
r
I
<;
..." . !I
~
~
.~
)(
... ..
o
c
0
i
... a:
.~
>
"' :2
en j
L
CII
l'\c, __ __
~
0

N
o
...
....
o
0

=~)(
0
.
z:
1&0
M Ana ANALYSIS WI11I CORRELATED OBSERVAnON ERRORS
J I3
Residuals are not precisely independent even if the observation errors
are, but for a large number of independent observations, the residuals are
nearly independent. This is far from true for the residuals shown in Fig.
6.6; rather, the residuals are highly correlated. A least squares analysis of
the measurements containing highly correlated errors may produce satisfactory parameter estimates. There are, however, a t least two different
dangers in the OLS analysis of highly correlated errors (i.e., ." having
relatively large offdiagonal terms). One o f these is t hat o ne might present
too small confidence regions based on the erroneous assumption of independent errors. Another danger relates to this point and also experimental
design when observations are independent and unbiased; doubling the
number of them significantly improves the accuracy of the parameter
estimates. This may not be true if the additional measurements result in
higher correlation between the measurements as in the billet example
illustrated by Fig. 6.1 when the time step is halved and the number of
measurements is doubled. In such cases one might erroneously design the
experiment to have too small a time step.
A simple check to see if the residuals are approximately independent is
based on the number o f runs. (The number of runs is the number of
changes in the signs of the residuals plus one. F or example, for
+  + + +  , there are four runs.) F or n independent, zero mean
random variables the expected number of runs is (n + 1 )/2. For significance tests based on runs, see references 4 and 22. F or cumulative errors
(see below) the expected n umber of runs is about n l / 2 • T he residuals of
Fig. 6.1 exhibit six runs compared with ( n+ 1 )/2=9. There seems to be no
reason to question the independence of residuals. I f there are still about six
runs when the number of observations is doubled. we should question the
independence. Certainly the residuals shown by the solid circles in Fig. 6.6
c annot be considered independent with only two .runs while ( n + 1 )/2 = 13.
Also shown in Fig. 6.6 as crosses are first differences of the residuals .
There are IS runs o ut o f 24 points, indicating that the first differences are
much closer to being independent than the residuals themselves.
6.9.1 Autoregressive Errors (AR)
o
o
o
I
In this section a firstorder, single response, autoregressive model of errors
is considered. (A more general analysis is given in Appendix 6A.) Let t l ,
the ith measurement error, be described by
.
(6.9.1 a )
£0=0,
E (ul)O
for i=l=j,
(6.9. lb)
(6.9.1 c)
,
304
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
for j = 1,2, ... ,n. In words, (6.9.1a) states that the present error is a fraction
p of the previous error plus a zero mean component u; that is independently distributed. This is called a first order autoregressive process.
It is convenient to relate the n x I vectors e and u by
(6.9.2)
e =Du
6.9 MATRIX ANALYSIS WITH CORRELATED OBSERVATION ERRORS
Also, several special autoregressive (AR) cases have error variances as
follows
for i =2,3, . .. , n
for which t he'" matrix, designated
p
p2
D=
0
P
p n2
p nt
cp
I +cp2
c p,,t
(6.9.3)
p(1 +Cp2)
Cp3
p 2(I+cp2)
p ,,2(1 + Cp2)
( I + p2+cp4)
c
0
0
0
0
"'a' becomes
"'a = D+DT
where D can be found from (6.9.1 a) to be the lower triangular matrix,
0
I
(6.9.S)
p(1 + p2+cp4)
p ,,3(1 + p2+ cp4)
Cp2
symmetric
p nl
(6.9.9)
Also from (6.9.la) we find [see (6A.5a») the inverse of D to be
0
D t =
p
0
0
I
p
0
0
0
0
0
0
0
0
0
0
See (6A.IS) of Appendix 6A for a derivation. Three special cases
associated with (6.9.9) are as follows:
0
0
0
(6.9.4)
which contains a main diagonal of ones and a diagonal just below of  p.
The covariance matrix of the errors, "', is given by
1 "'= cov(e) = E (EE T) = E(DuuTDT) = Dq>D
T
"'at
I
2
2
a"
a2 = .E ( e .)=·
•
(6.9.5b)
Several classes of f matrices can be generated. For p=O and a;2= a; .. a
constant, the " standard" covariance matrix, fST' is found,
(6.9.6)
fST= a;1
Next, if p=O, we have fw which is
a,n
"'a2 has c = I and p = I; (6A.22b).
"'al has c = I; (6A.22c).
,
l p2 '
(6.9.5a)
where (6.1.40) is used and where q> is defined to be the diagonal matrix
"'w=q>=diag[ af o~ . ..
"'at has c =(I_p2)t; (6A.22a).
The
matrix is the most common of the three special cases. I t might
be called a "steadystate" case because the diagonal terms are all equal,
p
0
I.
2.
3.
(6.9.7)
Notice that as p..... l , the a,2 values become much larger than a;. Physically,
time
before the taking of measurements. The other extreme physical situation is
for measurements starting when the process starts; this is better described
by case 3, "'a3. In case 3, the variance of El is a minimum at i = I (provided
p > 0) a nd gradually increases to the steadystate value of
j ust given
above.
Case 2 has a simple", matrix as given by (6A.22b). Notice, however, that
the i th diagonal term is equal to ia;. Sometimes this case is considered to
be unstable because the variance continually increases. We can use it
however, with p = I (although case I can not have p = I). F or case 2 E; is
"'at is appropriate for some process which has been going on a "long"
a;
f
 306
CHAPTER 41 MATRIX ANALYSIS FOR U NtAR PARAMETER ES11MATION
U
MATRIX ANALYSIS WI11f CORRELATED OBSERVAnON EIlIlORS
.,
~. .
' " . f ound rrom (6.9. 1) to be the sum
N otice t hat t he iy values a re f ound starting a t t he " bottom" o f e ach
column. Using this n otation t he XTl/IX p roduct h as typical terms as
indicated b y t he Ik t erm below,
'
F or this reason case 2 is called the cumulative error case. T he d ifference o f
the successive values E ;_I a nd E, is E;  Ei _ 1 = u; which is i ndependent from
uj • j "i' i. Figure 6.6 shows residuals a nd differences o f residuals for the
billet example. Since the residuals a re a pparently correlated a nd t he
dirrerences are not. the cumulative error model for E; is b eller t han that o f
i ndependent E;. ( Estimation o f p is discussed in Section 6.9.5.)
6.9.2.1
(6.9.12)
U nfortunately simple algebraic expressions for the covariance o f bLS d o
n ot r esult from (6.2.4) for A R e rrors; the c omputer e valuation o f t he terms
is straightforward, however.
Example 6.9.1
O LS EJtimatiorr Wit" A R Error!
O rdinary least squares can always be used irrespective o f a ny o f t he
s tandard assumptions. In specifying the covariance matrix o f the p arameters o r the confidence regions. some conditions must be known ( or
a ssumed). S uppose t hat the conditions o f a dditive, zero mean, firstorder
autoregressive errors a re valid. Also assume t hat l/I is k nown within a
multiplicative constant. These conditions a re d esignated 1120/ I .
T he covariance matrix o f bLS is given b y (6.2. I I )
Derive an expression for V(bLS) for the simple model
S olution
F or this case,
X
T
 (II· . . I) a nd
iT_[ f
p iI
I I
(6.2. 11)
p ,,J p ,,2 p ,,I
Z =OTX=
0
0
0
0
0
0
X,,_l.1
X,,_2.,
P
X,,_I.I
X"I
= [.iy ]
X,,_I.,.
I
I
0
i
is given by
" i:lp/I . . . l +p+p2
I +p
I]
I p,,I . .. I p' l p2 l p]_IIp
( Ip")
V". ( b LS ) I
[ II
XI,.
p2
p
0
0
XII
( Ip)
matrix
2  2
I
I p
,I
+ ,~ ( I_pi) 21
i _I
The result for V".(b LS ), the variance of bLS for 1/1 1/1,,1' can be written in the form
C onsider now the OTX matrix product, designated Z.
P
( 1.2 [
2
(6.9.10)
X TX_II. T he
I _I
_ (I_p"
T he t erms in XTX a re given in a detailed form by (6.2.6). T he XTl/IX
p ortion o f (6.2.11) is a lillie m ore difficult to evaluate. Using (6.9.5a) we
c an write
XTI/tXcXTDcI»0TX =(OTX) T
cflOTX
fJP a nd for 1/1_1/1,,1'
X",
2p
+    ( 1 1  _ _ _._
 P")] (12
1 I(1p)
1I(1p)
l p1
For any fixed value of p between 0 and I a nd increasing values or
always decreases.
(6.9.13)
II, V".(b
LS )
E xample 6.9.2
Suppose p a nd the number of observations
II
are related by
(6.9.1 l a)
where the c omponents
iy c an be found
for
(6.9.14)
; ... n l.n2 . . .. , I; j =I,2, . .. , p
which gives greater correlation between observations for a fixed experimental range
as the observations become more "dense." In (6.9.14), Q is some constant characteristic o f the data. Using the result of Example 6 .9.1, investigate V .,(b LS ) for n+oo.
Assume that , ,;  ,,~( I  p2)  , is held constant.
using the expressions
(6.9. l ib)
J08
CHAPTER 6 MATRIX ANALYSIS F OR LINEAR PARAMETER E STIMATION
U
MATRIX ANALYSIS
w rm CORRELATED OBSERVATION E RRORS
where
Solution
a;
In (6.9 . 13). a J/(Ip2) is replaced by
a nd p b yea/ft . The result is a n
indeterminant form. After using I'Hospital's rule we obtain
(6.9.16c)
Typical terms of X a nd F are given by
(6.9. 15)
See Fig. 6.7 for a plot of (6 .9.15). I f the measurements become more "correlated" as
n becomes larger in the manner described by (6.9. 14), Va,(b LS ) approaches a
constant value for large n rather than going to zero as one obtains from (6.9. 13) for
p =constant and n~oo .
V (b
)
a1 LS
V (b ML J
a1
1 .l,
Z tj=X tj ,
Z ij=XjJpXj_'J
F;= Y jpYj _,
for i =2.3 . .... n
(6.9.17a)
i =2.3 •.. .•i I
(6.9.17b)
for
where 0 ' displayed by (6.9.4) is used. Note that by replacing y. by F . the
modified observations F; a re uncorrelated with i since fro~ (6.9.1 a),
F; = T/j  PT/j _ , + uj a nd the uj's are uncorrelated.
Another way to define the modified sensitivities a nd observations is by
using
(6.9.18)
J. O
O. 8
il, r 1  iI
which permits us to write
( Ie  a ) I . E quat i on (6 . 9 . 15)
(6.9.19a.b)
O. 6
Z
a t Z . S ee P roblem 5 .Z6 b
where bML has a similar form as given for OLS.
O. 4
p
O.
Results for ModelT/ = p
= e  a/n
l,
F or A R errors for the simple model T/ = P. the components of Z are given
by
0
0
15
10
ZO
l ,5
30
Z ,=I.
a
F lpre 6.7 Variance or b ror 1 1+00 round ror rirst order autoregressive errors using least
squares and maximum likelihood.
6.9.2.2
In maximum likelihood or GaussMarkov estimation, the estimator for the
linear model is given by (6 .5.2),
b ML = (XTI}IXrIXTI}  ly
(6.5.2)
For ML, this equation follows from the assumptions 111111.
I t simplifies calculations in (6 .5.2) if the relation ~ = ~DT is used,
bML = (XT(OI{4>  'O'Xf I XT(OI{4>  IO  Iy
.
(6.9.16a)
(6.9.16b)
.
F or case I A R errors, the Zj components for this same model are
,
Z ,o, •
M L Estimation With A R E "on
.
(6.9.20a)
•••
Z Z=Z3 . .... · = Z,,=[(Ip)/(I+p)]
' /Z
0,'
(6.9.20b)
which are shown in Fig. 6.8 for p =O. 0.5. 0.9, a nd 1.0. The net effect of p '
being between zero and one is to reduce the value of the modified
sensitivity compared to Xj' This results in the variance of b being greater
than for p = O.
F or the simple model the F; values are
F ,= Y,.
Fz = Y zpY,.
F3= Y3pYZ" "
(6.9.2Ia)
a nd the components of 4>a' are
4>.. , =o~diag[ ( 1 pZf' I I . ..
1)
(6.9.21b)
3 1.
CHAPTER 6 MATRIX ANALVSIS FOR LINEAR PARAMETER £ S11MAnON
U
,
1 .0
p
p·O
0 .8
(.
•
1 .0
fo
~
0 .6
311
MATRIX ANALYSIS' wrm C ORRElATI:D O BSERVAnON E RRORS
= I. 0
0.8
'0.6
0 .5
° tZt
0 . 4 fo
o
F lpre 6.11
0 .9
l
0. 2
,
o
W
P
1
2
4
1.~
•
I
6
t
Modiried sensitivity coerricients for
8
II I
10
I
12
o
14
errors a nd the m odel" 
fJ .
"
( l p 2)Yt+ ~(Y;pYId(lp)
2
=l_ p 2+(n_I)(I_p)2
50
40
30
n
Using (6.9.16b) with (6 .9.20a) and (6.9.21) gives for the simple m odel"  {J
and a l errors,
bML
zo
10
(6.9.22)
which has the variance
F Ipre U
Variance o f bML for correlated errors, (6.9.23).
in a dynamic experiment. F or example, if in Example 6.2.3 I II were
decreased from 96 sec to 48, then 24, and finally 12, while keeping the
same total time of 1536 sec, the observation errors would become more
and more correlated. This is demonstrated by the residuals shown in Fig.
6.6.
The asymptotic values of Fig. 6.10 for n+oo are given by 2 a:/(a+2).
(Sec Problem 5.26b.) Figure 6.7 depicts this relation as a function of Q as
well as the ratio of the variances for LS a nd ML estimators. The maximum
, ,2
V"t ( bMd=
2
u
I p + (nI)(Ip)
2
I + (nI)(Ip)/(I + p)
(6.9.23)
1 .0
which is depicted in Fig. 6.9 versus n for various p values. For any fixed
value of p between zero and unity, V ".(bMJ always decreases with increasing n values. Physically this represents the case of constant spacing of
measurements but an increasing number of them. See Fig. 6.1, for example, which is for the billet problem with time steps of 96 sec. I f more
measurements are added with the same spacing of 96 sec, n would be
increasing while p would be fixed, as in Fig. 6.9.
Suppose that the observations become correlated as the time step is
reduced and that p is related to n, the maximum number of observations,
by p exp(a/n). Then for variable n a nd fixed a, we can obtain Fig.
6.10. Here as n becomes large (about 20 for the case of a '" I), V"t(bMJ
approaches a constant value. In this case increasing an already "large"
value of n will not significantly improve the accuracy. This case of
correlated errors can occur as the time steps are made smaller and smaller
a = O. 1
0 .8
..:I
~
t"b'"
Jl
0 .6
z
>'"
0 .4
10
o. z
OIl
10
zo
40
30
so
n
t"Ipre U ' Variance of ~L for correlated errors for p_~.I. a nd " II" filled, (6.9.23).
J Il
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
ratio is \.139. occurring at a =2 .7. For this firstorder autoregressive error
model and simple physical model example. negligible improvement in
accuracy occurs using maximum likelihood (or GaussMarkov estimation)
rather than ordinary least squares. Other cases can be exhibited. however.
for which ML estimators are far superior to OLS estimators.
U
M AnlX ANALYSIS WI11I COItRElATED OBSERVAnON ERRORS
XTt/lX. By using (6.9.10) we see that the modified sensitivity given by
•_
T
Z ... = O",X
•
Z/J, .... X/JOX1
(6.9.24)
E(u;)=O
Uo=O.
I n matrix form
f!
(6.9.26)
(6.9.27)
where
I
0
I
 fJ
0
o=
0
0
0
0
0
I
0
0
0
0
0
0
0
 fJ
'"
(6.9.28)
I
0m 1 =

0
I
oj
0
0
(J
I
02
(J
0
0
0
(6.9.29)
Analogous to (6 .9.5a) t/I is given by
t/I= cov( f !) = O",~o~.
(6.9.30)
In determining the covariance of bLS as given by (6.2.11). we have the term
(6.9.32)
(6.9.33)
which has components
1J, ...
f ori=2, . .. ,11
(6.9.34)
•
Notice that the AR Z for LS analysis is similar to the Z for M L moving
average errors, whereas the A R Z for M L analysis is similar to the LS MA
analysis.
6.9.4 Summary o f FirstOrder Correlated Cases for the Model
whose inverse is
0
02
f ori2,3, . .. ,1I
Z ",O,;;IX
Zij, ... = Xij+fJZ1_
can be written as
1J
and for j = 1,2. ... ,p. When using ML estimation, we usc the modified
sensitivity matrix,
(6.9.25)
for ;=j:.).
E (uju)=O
(6.9.31)
can be c onveniint to usc. The m subscript denotes moving average.
Components of Z", are
6.9.3 Moving Average Errors (MA)
A brief treatment is given below for firstorder moving average errors. A
model for firstorder moving average errors is
313
,,II
I n Table 6.16 the variances of b are given for six different correlated
errors; three o f these are for autoregressive errors and the others are for
moving average errors. The variances are given for both LS a nd ML
estimators of p. Also the ratio o f the variances for LS a nd M L estimation
is given for large values of I I, the number of observations.
The following are some conclusions relative to correlated errors.
1. I f the measurement errors are actually correlated b ut are erroneously
assumed to be independent, two deleterious effects can result. First,
the experimenter Dught r eport much more accurate results than he .
should. Next, the experimental strategy might be based on a n incorrect
premise. That is, the accuracy o f a n estimate as measured b y the
variance always decreases as more independent measurements are
included, but this is not necessarily true for correlated errors. Thus the
experimenter might take 1000 observations expecting to achieve much
more accuracy than if 100 were taken; it might be t hat the extra 900
measurements are o f dubious value for this purpose.
1. F or autoregressive errors with O<pC; I, the variances o f estimates can
be much less than for independent errors ( p=0).
6.9 MA11l1X ANALYSIS w rm C ORRELAttD OBSERVATION ERRORS
315
I
m
. en
J.
' :l
1 r:.
' ' u
,.
:::.. f!l
........ 01
'3=
~.E
...V.
.Q
..
...V.
.Q
..
r:.1 .....
.E
.q ,
..V
...
.E
r:.1 .....
M
.E
::::
' :l
::t
~
~
~
11
::::
u
u
...Q.
01
I
r::
·c
>
c
"8
:S
'e
!!
'
u
a.
a.
~
..J
U
...+
...+
01
1
~ ,q,
q ,...
1:,»
" '»
~
I

~
.~
~
I
s:
i
~
~
l
B
oQ
c
'"3
~
~
+
r:.
  Nit
:.
r
M
I
r:.
'"
I
1:,a
~
:;:U
u
r::
01
'5
>
...
U
~
0
f Il
~
~
Ia.
Q
e .I
1r:.
I
 
~
M .lf
Q_
''
" 'a
~
+
r:.
M
+
r:.
,.....
:a
u
\0
Q
~. I'"IQ.
I
13
...+
.. , ..
.....
jI
l
Q.
e
Q.
q,
I
r:.
tr
71"
..,.
r:.
r:.
'''triNe:
Q.
I
.•
..
''
L .....I
+... by

N.•
•
..
Q.

17
I"'... IIDI 1+... 1oTI == 1+... 1 ~
l
= p
,.....
.....
13
•
...
v
q,
~~
I
''
211
.
MI
~
e
~.:.
.....
,.....
''
(6.9.37)
Introducing (6.9.37) into (6.1.67a) gives
I nL 
i ["ln2W+ln( 1"!.' )h.'S, 1
II
s ._(Ip2)( Y .1J.)2+ ~ [ ( Y ,pY,d(1J/P'rI.)]
i 2
314
(6.9.36)
I]
Then the determinant o f " '... is
I
r:.
13
where 0 is given by (6.9.3) a nd
+...  (J~diag[ ( l p2f'
Q.
..
When high speed digital d ata acquisition equipment is used, the measurements o n o ne specimen are frequently correlated. Neither the best model
t o describe the correlations nor the statistical parameters such as p, 9, a nd
(J~ may be known. In this section the error model is assumed to be the
firstorder autoregressive designated a I. T he parameters p a nd (J~ are
unknown, however. Maximum likelihood estimation is used because, unlike least squares, it can directly provide estimates o f p a nd (J~.
The analysis starts with the logarithm o f the likelihood function as given
by (6.1.67). F or a single response case with the errors as described by
(6.9.1), t he'" matrix is
T
(6.9.35)
"'... ""D+... O
I
r:.
M
..
6.9.5 Simultaneous Fstlmatlon o f P. a!. and Physkal Parameters for the a l
Cases
''
 10::

tr
r:.
(
0::
j'
u
r:.
+
>
.0
.!!
\0
~.
a.
' "~al"'Q.
I
i
10

...
+
r:.
" '» ,~
~
.....
...
~
oS
~Iq,
II
g
r:.
.0..
~
~
.
+
."
q,


E!
I
I
q,
q,
r:.
I
r:.
::I
..........
,.....
,.....
"il
I
 "'»
 I
I
1:,»
... 
F or some correlated error cases, the M L estimators can be greatly
s uperior to those obtained using least squares. See the last column o f
T able 6.16.
4. When the statistical parameters p a nd 9 are known, the estimation
p rocedure for maximum likelihood o r generalized least squares is not
much more complicated for A R a nd MA errors than for independent
errors.
5. Since the covariance matrix o f b for both ML a nd LS estimation can
b e readily given for A R a nd M A errors, the confidence intervals a nd
regions for known p a nd 9 c an b e developed in exactly the same
m anner as in Section 6.8.
(6.9.38a)
2
(6.9.38b)
316
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
I n t he M L m ethod we seek to minimize In L s imultaneously w ith respect
to the p arameters p, o~, a nd p. N ecessary c onditions f or t he m inimum a re
t hat t he first derivatives o f In L w ith respect to these p arameters b e e qual
t o zero,
Vp Sd,,b.,.l'P'; = 0
(6.9.39a)
n
S , ( bML,P)
0u
4
0u
 =2 +
=0
as,/
p
22+I2 ap
1 _p
°u
=0
(6.9.39b)
( 6 .9.39c)
b.,.l'';
S olving (6.9.39b) for a~ yields
(6.9.40a)
which c ould be used directly to o btain a ( biased) e stimate o f o~ if p w ere
k nown. I f p is u nknown, (6.9.39c) is e mployed t o find
' p=
,.,
~e,e; +,
["'][ a~(1p2)'+~e] ]',
6 J MATRIX ANALYSIS W I11I C ORRELAnD OBSERVATION ERRORS
The estimate for p is given by (6.9.4Ob) with Y, replaced by bML• These equations
must be solved iteratively even for this simple problem.
Example 6.9.4
For the a3 autoregressive case investigate using the Monte Carlo method the
estimates of bl. b2, a nd
for the model1JIIOO+O.IX, where X ,.O. 10,20, . ..
a nd 11; are chosen from random normal numbers of unit variance and zero mean.
For p . . l . let n ... lO. 30,40, SO, a nd 120. F or n 60, let p '"  I,  0.5,0,0.5, a nd J.
i>.
a;
S olution
For this a3 case a solution for a given set of measurements is found in a similar
manner as for the a I case discussed above. The main difference is tbat the
determinant of ~..3 is 0;" rather than the expression given by (6.9.37). The result is
the set of equations
bML _ (ZTZ)IZTF,
,.
_
~ ( e/pe;_I)
( 6.9.40b)
Z :=DIX,
F :=DIy
(6.9.41)
I
a ;n
The diagonal matrix
( 0)
2
(b)
i>("f e;e, + I )( ~I e1)'
w here we have used (6.9.38b) to find
317
( c)
+.'3 1 is not needed in (0) because it is equal to 0 .. 21 and the
.
ou 2 term cancels. The Z matrix is equal to
T he s olution for bML • P. a nd a~ is n onlinear e ven t hough t he m odel is linear
in t he p arameters o ther t han p a nd 0 2 .
V arious i terative procedures c an b e suggested to find P. a~. a nd b ML . O ne
is to first guess a reasonable value o f p ( such as p = 0) a nd t hen c alculate
b ML u sing (6.9.39a) o r e quivalently (6.5 .2). Next. (6.9.40a) is used t o g et a
value for a~ w hich in t urn is used in (6.9.40b) t o o btain a n i mproved v alue
o f p. T he p rocedure is r epeated until p is essential1y u nchanged. T he
c onverged values o f P. a~, a nd b ML a re t he d esired values.
p
( nJ)IO
p
I
o
Give the set of iterative equations for the model ' I = /3 and a I autoregressive errors .
Solution
The estimator for bMl' is given b y (6.9.22) with p replaced b y p. The estimated
variance of U; is found using (6.9.4Oa).
a;= { (/p2)(y,b d+ .f
M
,2
[y;  Py, _,bMl(tp>t}/n
.

'Ip
p
10
2 0IOp
I p
Example 6.9.3
.~
o
10
20
p
1 0[(n J)(n2)p]
One array of measurements is generated by using Y,1J1 + '1 where ' 1 Pf'I + u,
a nd the U; are found from a table of normal random numbers with unit variance
and zero mean. The Monte Carlo analysis is obtained by generating a large number
3 ..
C HAPTER' MATRIX ANALYSIS FOR LINEAR P ARAMttER £ S11MAnON
o f sets of b l , b z, p, a nd a~ where each set corresponds to an array o f measuremenls.
Table 6.17 is a summary of results of the Monte Carlo analysis for p I with n IO
to 120 a nd Table 6 . 18 summarizes results for n 60 a nd p  I 10 I . F or , ,10 10
60 there were 34 sets of random data for each n ; for n 120, 17 sets were used. (In
general many more than even 34 sets of dala should be used.) The terms with bars
over them are average values found from Ihe Monte Carlo simulation.
T able 6.17
Summary of Monte Carlo Investigation for
p = 1 for Example 6.9.4
n
R £FUENCIS
3.'
Let us examine T able 6.17. T he relative errors i n;, ;'!, ",. a nd"2 decrease a s" is
increased. T he first two estimates a ppear to be biased. I t is n ot c lear whether b,
a nd "2 a re; however. except for h2 with p I a nd , ,10 t he biases indicated by
(hl  1l1)/ Il, a nd (h2  Ili)/ 112 are very small. T he average estimated standard error
o f b l divided by the true value is biased b ut n ot nearly as m uch as t he same ratio
for
which is about 0.36. This means that confidence regions based o n these
estimated standard errors will b e t oo small. o r i n other words. the parameter
estimates would b e presented as b einl m ore accurate than they really are. F or p I
in this example, the true value o f the covariance o f b l a nd b2 is zero while Table
6 . 17 shows small n flllive values. T he average number of runs for the modified
residuals, O '(Y  V). is nearly n/2. which is close t o t he(n+ 1 )/2 value expected
for independent observations.
From Table 6.18 most o f the same conclusions as drawn from Table 6.17 a re
valid. An additional one is that only the cumulative error case ( PI) for b2 has a
much smaller estimated standard e rror r atio than for the other p values listed. F or
further discussion, see reference 23.
"2
JO
30
40
SO
120
0 .362
0 .714
0.783
0.855
0.932
OM
0.803
0.888
0.886
0.928
0.953
(bl  fJM f Jl
 0.00142
 0.00047
0.00001
0 .00039
0.00058
eSl.s.e. (b l ) / s .e.(b l )
0 .837
0 .831
0.837
0.901
0 .932
0.0663
 0.00186
0.0013
0.0045
0.00003
e sl.s.e.(b2 ) I s.e.(b1 )
0.426
0.327
0.316
0.363
0.360
H ildebrand, F . B., M~t1totb o j A ppliN M at"tic$, 2 nd e d ., PrenticeHall, Inc., E nalewood O irr. . N . J., I96S.
2. Deutsch, R., & timatio" TMory, PrenticeHall, Inc., Englewood Cliff•• N . J., 1965.
eSl.cov(b l ,b 2 )
 0.0091
 0.0030
 0.0023
 0.0020
 0.0009
l . Brownlee. K. A., StatuticaJ TMory a NI M~t1tDdol0fY I " S cimce aNI E ",i/tftrl"" 2 nd ed.,
runs III for OIe
0 .512
0 .472
0.486
0.481
0 .491
... Draper, N . R . a nd Smith. H., A ppliN R egnulo" A M/ylu, J ohn W iley.t Sons, Inc., N ew
P
;;
(h2 
fJz)1 f Jl
Table 6.18 Summary o f Monte Carlo Investigation for
11=60 for Example 6.9.4
p
I
P
 0.5
0
0.5
 0.485
 0.024
0.430
0.881
0.
0.958
0.966
0.966
0.963
0.933
fJM I II
0 .00013
0.00016
0.00020
0.00014
0.00012
1.024
1.029
1.005
0.938
0.912
 0.00048
 0.00060
 0 .00080
0.00093
0.00048
e st.s.e.(b2 ) / s .e.(b1)
0.997
1.004
0.981
0.917
0.353
e st.cov(b l .b2)
 0.00004
 0 .00008
 0.00016
 0.00049
0.00175
rUriS I n for O I e
0.523
0.516
0.500
0.493
0.490
( hi 
e sl.s.e.(b l ) Is.e.(b l )
(h2 
fJz)1 fJz
I.
J ohn W iley.t Sons, Inc., N ew York, I96S.
York, 1966.
5. Daniel. C . a nd Wood, F. S., Filti,., EquatiOfl,f to Data, WileyInterscience, New York,
1971.
6. Boll, G . E. P. a nd D raper, N. R., " The Bayesian Estimation o f C ommon P aramelen
from Several Responses," BiOlfWtrlluJ 52 (I96S). lSSl6S.
7. H unter, W. G ., " Eslimation o f U nknown C onstants from Multiresponse D ata." INI.
Cllr",. , (1967), 461.
8. Welly, J. R., E ",in«rlttg Heal TralUJ~', J ohn W iley.t Sons, Inc., N ew York, 197...
9. Goldfeld. S. M. a nd Q uandt, R. E., NOfIliMtl' M~tllodr i,. EcoflOlftrtrlc$, N orthHolland
Publishins Company, Amsterdam. 1972.
10. Himmelblau, D. M., Proceu AM/y$U b y S latutical M~tllodr, J ohn Wiley . t Sons, Inc.,
N ew York. 1970.
I I. B urinston, R. S. a nd May, D. c., HaNIbooIc o j Probability a NI Slatutic$ willi Tab/~$, 2 nd
ed., McGrawHili Book C ompany, N ew York, 1970.
12. Beyer, W. H., HaNIboolc o f T obin JtW Probability aNI S tatink$, T he C hemical Rubber
Co., 2 nd e d .• Cleveland, O hio, 1968.
13. Rke. J . R., 1 M A "'",1(imati_ o j FlUtCliOtU, Vol. I U_, TMory, AddisonWesley
Publishina Co., R adina. M a• ., 1964.
14. Myers, R . H., Ruporu~ SJUjaa M~tllodolorY, Allyn a nd Bacon. I nc., Boslon. 1971.
IS. Beck, J. V. a nd AIAraji, S.• " Investiption o f a N ew Simple T ransient Method o f
E",.
 0.969
;;
REFERENCES
310
16 .
17 .
IS.
19 .
20.
21.
C HAPTER 6
MATRIX ANALYSIS F OR LINEAR P ARAMETER E STIMATION
T hermal Property Measurement," J . Heat TrafUfer, Trans . A SME, Ser. C 96 (1974),
5964.
Mendel. J . M.. D isaete Techniques o f Parameter Estimation The Equation Error Formulation, Marcel Dekker. Inc .• New York. 1973 .
Hoerl. A. E.. "Application of Ridge Analysis to Regression Problems." Chem. Eng.
Progr. SS (1962). 5459.
Hoerl. A. E .• "Ridge Analysis." Chem. Eng. Progr .• Symposium Series Vol. 60 (1964).
6777.
Hoerl. A. E. a nd Kennard . R. W ., " Ridge Regression. Biased Estimation for N onorthosonal Problems." Technometrics. 12 (1970). 55~7 .
Hoerl. A. E. and Kennard. R. W .• "Ridge Regression. Applications to Nononhogonal
Problems,' : Technometrics. 12 (1970). 69  S2 .
M arquardt. D. W .• "Generalized Inverses. Ridge Regression. Biased Linear Estimation.
and Nonlinear Estimation." Technometrics. 12 (1970). 591~12 .
22 . Swed. F. S. and Eisenhan, C . " Tables for Testing Randomness o f G rouping in a
Sequence of Alternatives." Ann. Math. Stat. 14 (1943). 6~S7.
23 . Beck. J. V.. " Parameter Estimation with Cumulative Errors." Technometrics 16 (1974).
8592.
24. U. S. Bureau of the Census. Statistical Abstract o f the United States 1974. 95th Annual
Edition. Washington. D. c.. 1974.
A PPENDIX 6A
A PPENDIX 6A
A UTOREGRESSIVE M EASUREMENT E RRORS
311 •
where
(6AAb)
P21
Do = P)IP21 + P)2
(6A.4c)
P li
The lower triangular matrix Do becomes cumbersome as the dimensions
n X n become large. The inverse of Do can be found relatively easily,
however. Write out (6A.I) for uj as
U I=£I
u2 =  P2 1f',+f'2
u )=  P)2f',P)I£2+f')
A UTOREGRESSIVE M EASUREMENT E RRORS
or
Consider the secondorder model of autoregressive (AR) measurement
errors.
(6A.I)
u =D;le
where
D; 1 is a square matrix with
(6A.5a)
three nonzero diagonals,
for which
"0= f' _ 1 =0.
£ (u,u)=O
for
(6A.2)
i=l=j
(6A.3)
Let us write out the first few terms of (6A.I).
 P2'
 P)2
D0 1 =
0
 P)I
(6A.5b)
 P42  P41
0
0
 P,,2
 P"I
T he covariance matrix of the e rrors'" c an be written as
(6A.6a)
Then in general we can write the
£
vector in terms of the u vector as
(6AAa)
where t/J is the diagonal matrix
t/J = d iag[ a~
a: ... a,n
(6A.6b)
311
F or n onlinear estimation problems the sum of squares function is or
interest. It is given by
T he inverse of '" is
(6A.7)
o
where the inverse of Da has the relatively simple form given by (6A.5b).
Because of the form of Da which is lower triangular with ones along the
diagonal. the determinant of Da is unity. Hence the determinant of '" is
"
n (J~
,
''''I = letI =
where
(6A.14b)
(6A.8)
i I
Again because et is diagonal. we can write the relatively simple summation
For maximum likelihood estimation we know that
S ML
"
= ~ (F; ... 
T
~I
( 6A.l5a)
H;.a)2a;2
i I
(6A .9a)
cov(bMd = (X "'X)
313
APPENDIX 6A A UfORt:GRESSIVE MEASUREMENT ERRORS
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
"
= ~ [( Y; 
(6A.9b)
P;I Y;~ I Pll Y ;l)  (1J;  P;I'IIIP;2'J12)]
1
( J;2
( 6A.l5b)
; 1
for the linear model 1J = X /l By defining
F a=Da ly

T he above analysis is readily modified for firstorder autoregressive
cases by setting P;l equal to zero. Higherorder cases a re also treated
simply by adding terms in Z ij... a nd F;....
(6A.IO)
T he M L relations in (6A.9) c an be written in the weighted least squares
form o f
Covariance Matrices for FirstOrder Autoregressive Errors with Constant p
T
T
b ML =( Z aet ~ I Z" )~ I Zaet F..
T
~I
cov( b Md = (Zaeta Za)
(6A. lla)
~I
Some covariance matrices o f ' " for the firstorder AR case with constant P
a nd
( 6A.llb)
(6A.12a)
F; .a=
Y ;  Pi!
Y;I  P;2 Y; ~2
for
i =2.3 . .... n
(6A.16)
are given below. F or this case 0 .. a nd et are given by
(6A.12b)
Also because et is a diagonal matrix. the matrices in ( 6A.II) c an be written
in summation form as
P
pl
j = 1.2• .. .• p
al=a~
a~=ca~.
Because of the simple form of the inverse of Da. the terms in Za a nd Fa are
a nd
Ie = 1.2 •...• p
..
0=
pl
0
I
0
0
P
p2
0
0
0
0
0
0
0
P
et=diag[ ca~ a~ . . , a~]
( 6A.l3a)
p ,,I
j = 1.2 • ...• p
( 6A.l3b)
P
( 6A.l7)
.
'.'
314
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
325
Case 5.
a nd then !J; can be multiplied out to get
c
APPENDIX 6A AUTOREGRESSIVE MEASUREMENT E RRORS
(6A.2Ie)
cp
I +Cp2
l +p2+cp4
p (l+p2+Cp4)
I + p2+p4+ Cp 6
symmetric
(6A.18)
Consider the terms in the square matrix in (6A.18). Notice that the
diagonal term of the j th row, which we designate Gj' also appears as a
product in t hejth row for the ( j+ I)th and succeeding terms. For p2:p 1 ,0
is given by
l _p2J
G =I+p2+ . .. +p2(j  II+{c_l)p2(j  Il= _ _ + {c_l)p2(j  1l
J
l _p2
(6A.19)
A sj+oo and p 2<1, Gj approaches the value ( l_p2)I. This could be
called the "steadystate" value of the variance of EJ (1. Evaluating the
difference betweeen successive diagonal terms of Gj for any p2 yields
~GJ=GJ+I G =p2 V  Il[Ic(l p2)]
J
I n each of these cases except case 2 a nd if p2 < I, the value of G is the
same, namely, ( I_p2)I. In Case 1 ,0 is this value o f ( l_p2)1 f~r a llj.
In Case 4, 0 increases monotonically to this value whereas in Case 5, G1 is
the largest value and G decreases asymptotically to Goo' Hence it is possible
j
to have constant, increasing, o r decreasing V(~), depending on whether
c =(Ip2)I, c «I p2)I, o r c >(I p2)I, respectively. I f p =1 as in
case 2, Gj =j , which clearly has n o steadystate value; Case 2 is called the
cumulative error case. I f p2 > I, the Gj values grow very rapidly w ithj.
Some matrices for the above cases are
P
(12
I
P
p2
P
(6A.20)
p2
p'
I
p ,,I
!J;al=~
p
p ,,2
p ,,I
p ,,2
p ,,l
p ,,l
(6A.22a)
At least five cases can be identified using (6A.20):
(6A.2Ia)
!J;a2 = (1;
Case 2.
8 0= I ;
c= I,
p = I.
I
2
2
I
2
3
I
2
3
2
Case 1.
3
n
(6A.22b)
(6A .2Ib)
p
(6A .2Ic)
pl
p ,,I
I + pl
Case 3.
p(1 + pl)
p ,,2(1 + p2)
I + p2+p4
p ,,l(1 + p2+ p4)
1/!al = (1;
Case 4.
( 6A.2Id)
symmetric
(6A.22c)
· 316
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
P ROBLEMS
327
Using the substitutions suggested above for A a nd B results in
APPENDIX 6B M ATRIX INVERSION L EMMA
I mportant relations for inverses occurring in p arameter estimation are
derived below. Let A be p x m a nd B be m x p. Let I", be the m x m
identity matrix.
An identity and some rearrangements o f it are as follows:
 A(I",+BA)=  (I,+AB)A
( 68.1)
I
PXT+I=V/lXT(XV/lX T + +fll
PROBLEMS
6.1
Evaluate the following matrix products
(a)
 AD =  (I, + A8)A(I", + DA)  IB
( b)
(68.2)
I , = ( I, + AB)  (I, + AB)A(I", + BA)  IB
( 68 .3)
o
I
(e)
Premultiplying (68.3) by ( I, + AB)  I yields
( I, + AB)I = I ,  A(I", + D A)I B
(68.8)
I] [~
1
( d)
(68.4)
Let P be defined by
(68.5)
( J)
( t')
Using (68.4) for (68.5), with A=V/lXT",1 a nd B =X. gives
P = [ 1  VIIXT",  I (I + XV/lX T  I)  IX
",
[ri
]vII
o ][ a ll
e2
021
a ll ]
0 22
[ a ll
AU
0 12 ][ e l
AU
0
~]
6.1 For X given by
( 68.6)
I
x[i J]
2
3
4
This equation is called the matrix inversion lemma. Equation 6.1.5a is
found from ( 68 .6) by letting
a nd 0  1 with p O.S in (6.9.4), evaluate
XTOIX.
where O OOT
Choose the proper size for 0  1
6.3 Prove for A a nd B being nonsingular square matrices
(68. 1) c an be written as
( I, + AB)  IA=A(I", + B A)I
( AB)IBIA  I
( 68.1)
6 ." A commonly given method for finding the inverse of a square matrix A given
328
CHAPTER 6
MATRIX ANALYSIS FOR LINEAR PARAMETER E STIMATION
P ROBLEMS
319
convection from horizontal cylinders to liquids a nd gases. Also given are the
logarithms to the base 10.
by
ra"
(/21
A=
a ll
(/22
.
0 .1
0 .2
N uo
a,"]
G roPr
10gN~o
log G roPr
1 0 4
1 0 3
1 0 2
1 0'
I
10
J()2
loJ
104
1()5
 0.3098
 0.2596
 0. 1798
 0.0752
0.0334
0.1790
0.3243
0.4997
0.7300
0.9699
1.2095
1.4594
1.7101
1.9699
4
3
2
I
0
I
2
3
4
5
6
7
8
9
( /2.
0.490
0 .550
0.661
0 .841
1.08
1.51
2.11
3.16
5.37
9.33
16.2
28 .8
51.3
93 .3
OM
involves the use of cofactors . A cofactor Aij is (  I );+jD" where D" is called a
minor a nd is obtained by taking the determinant of the submatrix formed by
striking out the row and column corresponding to the element a i} ' T he inverse
of A contains the elements AJI/IA I. Using this expression for the inverse.
verify (6 . 1.10) and (6 . 1.11).
6.S F or Model 5 of Chapter 5. 11, = fJIX;1 + fJ2X;2' show that the matrices (XTX)  '
a nd X Ty used in OLS are
)(1
1 0'
lOS
109
Let Y =logNuo a nd X =logGroPr.
( a) Using the last seven d ata pairs in the above d ata with orthogonal
polynomials, find a o.a ••.. . •a4.
Answer.
1.2212, 0.2450, 2.644 X 1 0 3, 1.6667 X 1 0', 5.4545 X 10  '.
( b) F ind the sum of squares for each of the models that can be obtained from
the results o f ( 0).
6.6 ( a) F or Model 5 of C hapter 5 give the components, XT IX a nd XT Iy, of
¥¥b ML given by (6.5.2). Let ¥  I be given by W where the components are wij
a nd wij~O for i~j.
Answer.
( c) Assume that the assumptions 11111011 a re valid. A t the 5% level of
significance give a recommended model in terms of N uo a nd G roPr. How
does y our model compare with
( b) Using the results of ( a), write out the equation for b l • ML . Check y our
answer by letting ¥ = 0 21 and using the Table 5.3 results for Model 5.
6.7
N uo .53(Gro Pr)"2S
Find (XTX)  I for the cases
(a) Y, = fJl + fJ2X, + f j
( b) Y, =fJIXi+fJ2 X, 2+f;
( c) Y; =fJ,XiI+fJ 2X;2+E ;
X I. = X 4 • = X 'I =  I;
X Il=Xn=Xn=  I;
6.8
1.6814, 5.95 X 1 0 4 ,8.21 x 1 0 6, 8 .15x 1 0 6, 6.80 X 1 0 6 .
with X I =1. X 2=2. X 3 =3, a nd X 4=4 .
with X ,=  2 ,  I. O. I, and 2.
for i = 1.2, . . .• 9.
X 21=X51=XSI=0;
X 42 = X 51 = X 62 = 0 ;
X 31=X61=X91=J.
X 72 = X 82 = X 92 = J.
In Welty [8, p . 247) the following are recommended coordinates for natural
which is o ften used for 104 < G roPr < I o'?
Repeat Problem 6.8 for the f irst seven d ata pairs.
Using all the data o f Problem 6.8, repeat Problem 6.8. Use the orthogonal
tables in Beyer [12, p. 505] o r s ome o ther book.
6.11 Using the last seven d ata pairs of Problem 6.8 with Y Nuo a nd X GroPr,
estimate, using OLS,
6 .9
6.10
(0) fJ. a nd
/12 in Y/  /1.+/12X;+f/.
1 lI
CHAPTER 6 MAllUX ANALYSIS FOR LINEAR PARAMETER ES11MAll0N
(6)
PI. P l' and PJ
in Y ;
p, + P1X + p)x,2 +«.
j
( e) C ompare the residuals found using the resulls of ( b) with those given by
the model
•
N uo0.5638(GroPr)
6.12
PROBLEMS
6.16 t he average rainrall, wind velocity, temperature, etc. a t a ny location over a
number o r years is periodic. Assume t hat the dependent v ariable" is the
function of , given b y
" . II. + III cos 2""
.H50
Using the data of Problem 6.8 a nd starting with the last pair of "observations" for Y logNuo a nd X logGroPr, use OLS sequential estimation for
the model
( a) F rom reference 24, Table N o. 319, the average wind speed (in mph) a t
the airport o f G reat Falls, Montana, is as follows:
J an.
15.7
6.13 Repeat Problem 6.12 for the model
Y  p, + P2 XI + fJJX/ + ( I
I
T he following data are a continuation of those in Table 6.2. Using the Y,
d ata below and orthogonal polynomials. find a satisfactory model utilizing
the F test at the 5% level of significance.
Obs. No.
Y, ( OF)
18
19
20
21
22
23
24
25
26
27
28
29
30
6.15
Time (sec)
1632
1728
1824
1920
2016
2112
2208
2304
2400
2496
2592
2688
2784
142 .93
139.34
136.04
132.94
130.07
127.39
124 .88
122.46
120.18
118.04
115.97
114.13
112.35
Suppose t hat" is given by
Feb.
14.8
Mar.
13.4
Apr.
13.2
May
11.4
J une
11.4
J uly
10.3
Y  p, + 112Xj + (,.
I
6.14
331
Aug.
10.5
Sept.
11.7
Oct.
13.8
Nov.
15.0
Dec.
16.0
Estimate II, a nd III using 0 1.8. C alculate the residuals. (Let the January
value b e a t , ·0.5/12, etc).
A _wer.
13 .1,2.64
(6) Suggest a nother model t hat may b e a ble t o fit the d ata b eller than the
one given in (a).
6.17 T he following d ata a re normal monthly average temperatures (in o f) given
b y reference 24, Table No. 310, for St. Paul, Minnesota, a nd S an Francisco,
California.
St. Paul
S an F ran.
St. Paul
S an F ran.
June
66.2
58.7
Oct.
49.2
61.4
Nov.
31.9
57.4
Dec.
18.2
52 .0
Apr.
44.6
50.9
Mar.
28.0
54.3
July
71.2
58.5
Aug..
69.4
59.4
Sept.
59.1
62.2
( a) Suggest a n a ppropriate function f or" t o describe the St. Paul d ata. The
normal minimum monthly temperature is 8 IO°F less for all months and
the normal maximum t emperature is 8 IO°F g reater for all months.
a nd t hat
m ,(J)
a nd    0
dl
Show that for these conditions the model becomes
( b) Suggest a n a ppropriate f unction" to describe the S an Francisco data.
T he n ormal minimum a nd m aximum monthly temperatures are within
± 7°F for all months.
6.18 T he model
Y, • fJ. + fJl sin 2w"
where fJ is the single parameter.
55.3
May
56.5
56.7
Feb.
16.2
53.4
Jan.
11.8
+ (,
is proposed for the following data. Very little is known regarding (" Estimate
,
I
CHAPTER 6 MATRIX ANALYSIS FOR LINEAR PARAMETER ESTIMATION
P ROBLEMS
P,
l J2
6 .15
a nd
112 using the sequential method.
Y,
o
1 /12
1 /4
5 /12
1 /2
7 /12
3 /4
1 1/12
I
114
152
198
157
96
54
 10
51
96
333
U sing the t emperature m easurements a t t imes 0.3, 0.6, 0.9, a nd 1.2 sec for all
eigh t thermocouples o f T able 7.14. estima te P, i~ t he m odel T.,."  fJ ,.
A ssume
' I}(;) =
fJ, + f P)
f il)=uj(t)
f ii) = f j(i  J) + uj
T
Use orthogonal polynomials for the Example 6.2.3 d ata. T he c omponents o f
X Ty a re 3390.68.  3302.44. 2608.68.  1448.64. 966.00. a nd 2347.2. U se F
test a t 5% level o f s ignificance to find the regression line o f s uitable d egree.
R epeat P roblem 6 . 18 using the s tandard a ssumptions a nd the subjective p rior
. i nformation t hat
= 100
200
V p= [ 100
6.17
D erive (6.7.9) a nd (6.7.10).
6.18
T he following a re t emperature m easurements t aken f rom T able 7.14:
IIp, = 50
100 ]
3000
IIp,
I.
time (sec)
94.56
96.52
15
18
E stimate fJ, a nd fJ 2 using this information a nd find the c ovariance m atrix of
the estimates. T he s equential m ethod n eed n ot b e used.
6.21
p, + fJ, ( 1 3.3)'/'
f rom 1 = 3.3 to 7.5 sec. fJ, is e qual t o 2 q('lTkpc)'/' w ith q b eing h eat flux. k
t hermal c onductivity. p d ensity. a nd c s pecific heal. U sing t emperatures f rom
thermocouples 5 a nd 6 f rom 3.3 to 7.5 sec. e stimate P, a nd fJ2' U se the
sequential estimation m ethod f or m '" I . C alculate a nd e xamine t he residuals.
Discuss y our a ssumptions . H ow c an t he model be improved?
for I, = I. b , = 100.89. b 2 = 103.33. P , I = 4 .464 . P 2 = 0.0000. P n = 13.391
6 .20
i 2.3.4
U se m aximum l ikelihood estimation. Also find the 95% c onfidence i nterval.
T he t emperatures for t hermocouples 5 8 o f T able 7.14 c an b e d escribed b y
Assuming t hat the s tandard a ssumptions hold. estimate the c ovariance
m atrix o f these estimates.
6.19
f or j = 1. ... . 8;
U j(i)N(O.a'). E [ u j(i)ul(I)] = 0 e xcept w hen ; =1 a ndj = k.
6.26
Answer.
(;)
, ( 0) E stimate
S how t hat
P in
93.91
95.70
94.75
96.30
94.17
95.96
t he m odel T = 92.5 + f J(l 12) using O LS e stimation.
( b) F ind t he 95% c onfidence i nterval for the e stimate o f
p. W hat
a ssump
tions a re n eeded?
6.29
6.22
6.23
6.24
N ote t hat if V p is sufficiently large c ompared w ith P MI.' t hen b MAP"",bML for
a ny finite " fl'
W rite a F ORTRAN o r p rogrammable c alculator. program to calculate the
o rthogonal c oefficients using (6.3 .6). G ive c oefficients for r = 4. n = 9. a nd
i = 1.2. .. .. 6.
Using the eight m easurements a t t ime 0.3 sec o f T able 7.14 estimate fJ, in t he
model T = fl" F ind t he 95% c onfidence interval. What assumptions a re
n eeded?
F ind t he 95% c onfidence r egion for the d ata of EJlample 6.7.3 for the two
p arameters in the model k = fJl + fJ 2 T . A ssume the errors are additive. have
zero mean a nd c onstant v ariance. a nd a re uncorrelated a nd n ormal. Also let
T h ave negligible errors. T here is n o prior information regarding the p arameteTS.
Let the a ssumptions o n t he m easurement e rrors be
2 N(0.a 20) w ith
a' unknown and 0 known
X is errorless. T he p rior i nformation r egarding t he r andom p arameter v ector
is ( JN(pp. a 2V) w here V is k nown . Derive the M AP e stimators f or (J a nd
a 2•
bMAP '" P p+PXTO'(Y  Xpp)
62 _
w here
n!
p { (Y  Y) T O I(y  Y)+(b MAP  pp{ V '(b MAP  I'p) }