CHAPTER 3 INTRODUCTION T O S TATISTICS
116
CHAPTER
4___________________
p arameter of the exponential distribution with density function
I
Xe
o
 xiA
X >O
P ARAMETER
otherwise
E STIMATION M ETHODS
wiIl have a rejection region which is the solution o f a n i nequality of the form
(x/~)exp(  x/Ao) <; c onstant.
3.15
I ffxlp(xJp)=px(Ip)'X x =O,I, e (p,p)=(p_p)2,
( 0) find r [p,p,(x») for p ,(x)=(x+ 1 )/3;
( b) f ind r [p,po(x») for Po(x) =
( c) I f g (p)= 2(1  p),O <; p <; I, find E ( r[ g (p),p,(x)
4·
3 .16
n·
I f X h as a binomial distribution with n =4, if p =(x+ 1 )/6, a nd if e (p,p)=
( p  p)2, find r (p,p).
Answer.
3.17
k.
F rom a n ormal distribution with s tandard d eviation 5 a sample o f 16
i ndependent o bservations was obtained. x was calculated a nd f ound to b e
31.5. F ind a 90% confidence interval for " .
Answer.
(29.44, 33.56).
F rom a n ormal distribution a sample o f 16 was obtained. x a nd s were
calculated a nd f ound to be 31.5 a nd 5, respectively. F ind a 90% c onfidence
4.1
interval for " .
3.18
In this chapter we introduce some of the concepts which we shall develop
in the remainder of the book. Some canonical forms for the models of
problems of paraineter estimation are presented, least squares estimators
are described, and modifications suggested by various criteria for good
estimators are mentioned. We close the chapter with a short discussion of
simulation techniques for c.omparing methods of estimation.
Answer.
(29.32, 33.68).
3.19
F rom a n ormal distribution with s tandard d eviation 5 a sample o f 16
i ndependent observations was obtained. x was calculated a nd f ound to b e
31.5. Test a t the 5% level of significance the hypothesis t hat" = 30 a gainst
alternatives that ,,~30.
3.10 I f f (xJ8)= 1 + 8 2(x  0,0 <; x <; 1 ,0 <; 8.;, V I, u sing o ne o bservation, a rejection region x> .9 was decided upon. F ind t he power function o f this test.
Answer.
.1 + .0458 2 •
INTRODUCTION
4.1 RELATIONS B ElWEEN OBSERVED RANDOM VARIABLES A ND'
PARAMETERS
We assume a functional relationship among several measurable variables,
(P I ' P2, . .. , P, ), and particular values of one or more random variables (e,e(l), . .. ,e(m».P(In this book we deal
almost exclusively with cases in which each observation involves only one
random variable; thus we have no need for the index indicated by the
superscript.) The measurement of the measurable variables provides the
observations. The parameters are unknown and we wish to estimate a t
least some of them. The particular values of the random variables e's are
( Y, X I ' . .. , X k )' one o r more parameters
117
I II
CHAPTER 4 PARAMETER E SnMAnON METHODS
unknown. We may estimate the E 'S themselves in order to get a picture o f
how well the estimates fit with o ur preconception o f the distribution of the
£'s.
4.1 OBSERVED RANDOM VAlUABLES AND PARAMETERS
t o introduce in the last relation o ne f urther abbreviation. A nd y et one
more abbreviation is t o use V for ( Y ••..•• Y,,). e for (E ••••• •e,,). a nd X for the
matrix
I l is convenient to pick one of the measurable variables a nd express this
variable in terms of the others. I t is the picked variable with which we
associate the pronoun Y. T hus we write
(4.2.1)
T he variable to be called Y is traditionally chosen because we are interested in how its value is affected by the values assumed by X . .... , X, . I t is
t raditionally called the dependent variable. The variables X . ..... X . a re
called the independent variables. (Be clear in noting that independent in
this context does not mean independence in the statistical sense.) The X 's
m ay be thought of as the causes of the Y. as when Y repres~nts the yield in
a chemical process into which amounts X . ..... X , of matenal from sources
I, . ... k a re combined. The X 's may merely describe the physical environment. as when Y represents temperature a t p oint ( X •• X 2.X 3) in space a t
time X.. .
.
W e are fortunate if the E 'S c an be combined into o ne E a nd espec.ally
fortunate if the errors are additive. t hat is. if we c an write
(4.2.2)
in which the distribution of E does not depend o n the unknown p 's
a lthough it may depend on parameters wh.ich d o .not appear, in t~e o ther
term o f (4.2.2). I t will sometimes be conveme~t to mdex the P s .by Integers
beginning with 0; thus Po.P ...... P _" I t Will also be convement to use
p
vector notation to abbreviate; thus
X =(X . .... , X, )
II = ( P.,· ... P,)
or
( PcrP .... ·.P,_. ) as appropriate
and
Y =lJ(X.II)+E
(4.2.3)
T hus the ith observation will be signified by adding a subscript i to
.
( Y, X I '·· ·' X It. ) . that .·s , YI ' X,1'···' X'L. E, is found by (4 .2.2) o r (4.2.3) to be
,
,,.. ·
such that
Yj = 1 )( Xjl' . ... Xj ' ; P.... .. P,) + E/ = 1) (Xj.1I ) + Ej = 1)j + Ej
(4.2.4)
119
X II
X li
X. 2
Xu
X ••
Xu
(X •• X2• ••• , X .)
(4.2.5)
X".
XII2
X".
N ote that the X;'s a re column vectors.
T he distribution o f t he E'S is generally unknown. I f t he E;'S a re correlated,
estimation o f p arameters may b e m uch more difficult, the estimators less
reliable, a nd the reliability difficult to assess. We therefore deal first with
cases o f i ndependent E;'S, l ater investigating estimation under various
assumptions regarding correlation. F or those methods o f estimation which
require some assumption a bout t he form o f the distribution o f the E;'S o r
when the evaluation o f t he method requires assumptions o f the form, we
shall invariably investigate first u nder the assumption o f normality. Usually we go no further. Fortunately many aspects o f the relationships
between sample moments a nd moments o f the random variable being
investigated d o n ot depend o n details o f the form o f the distribution.
O ne trouble arises from too great a preoccupation with cases in which
the E 'S m ay b e a ssumed to be normally distributed. Different criteria for
evaluating estimators may be expected to lead to different choices of
estimators to be used. W e seem frequently to deal with criteria which in
general d o suggest somewhat different estimators b ut which, when applied
to a case in which normal E'S a re assumed, lead to identical estimators. The
casual student is sometimes misled to assume that if the estimators are the
same the criteria must b e essentially equivalent. Beware. We have used and
shall use assumptions other t han normality not only to illuminate the
difference but also to give insight into the effect o f wrongly assuming
normality.
T o estimate parameters we must first gather d ata a nd then analyze them.
I t is essential that experimental procedures be such that the d ata c an be
analyzed. The form o f the function to be used in (4.2.1) a nd the experimental procedure must be developed together. " Design o f experiments" deals,
for the most part, with choosing values o f the X 's to facilitate analysis a nd
to improve accuracy o f estimation.
CHAPTER ..
IlO
PARAMETER E STIMATION M ETIIODS
4.3 E XPECfED VALUES, VARIANCES, COVARIANCES
With a sequence of random variables. E I" . . , E", we associate expected
values E(E I)• ...• E(E,,). variances V(E I) • ... , Vee,,), and covariances
COV(EI,E2)' COV(EI.E) • ...• COV(E,, _ I.e,,). In Chapter 6 we use vector and
matrix forms to save space and to increase clarity:
E (e) = [ E (E I )' E (E2), . .. , E (E,,) ]
(4.3.1)
V (EI)
COV(EI,E2)
COV(EI,E,,)
COV(EI,E2)
V(E2)
COV(El,E,,)
cov( e) =
(4.3.2)
cov( EI, E,,)
cov( E2,E,,)
V(E,,)
"
4.4 LINEAR PROBLEMS
I f we can write 'IJ(X,IJ) in the form
Y=!3.X.+!32 X 2+· · · +!3p Xp + E
We see that (4.5.2) has the same form as (4.5.1) with " ,(XAJ) o f (4.5. 1)
replaced by ",(X;,IJ)+ po of (4.5.2) a nd E; replaced by a new random
variable E;  po . I f po is known, i t is j ust a number in (4.5.2). I f po is unknown
it plays the role of one of the p's. We shall lose nothing a nd gain simplicity
if we assume E; has expected value O. We deal further with this question in
Section 5.10.
I f 'IJ(X,IJ) is a constant, say, ",(X,IJ)=po for all X, the prohlem of
estimation is one we handled in ~hapter 3. O ur estimate of po is Y a nd if
the £'s a re normally distributed Y is the best ~timator of po from many
points of view. In looking for some property of Y which is a t the same time
simple a nd generaliZable, mathematicians a couple of cent'!.ries ago turned
to the fact that, if we have a set o f numbers YI ' Y 2' • • • , Y", Y is t~e value of
the variable il which minimizes I~_I ( Y/ il)2. As estimator of IJ of (4.2.2)
they chose that Ii that minimizes
S=IIY'lJ(X,P)1I2= .~ [ Y;",(X/,P)]
In cov(e) it is convenient to think of each covariance as occupying two
positions symmetrically situated with respect to the main diagonal.
T/(X.IJ)=!3I X .+!32 X 2+··· +!3p Xp
" .6 G AUSSMARKOVallMAll0N
(4.4.1 )
(4 .4.2)
we find the problem of estimating JJ simpler than otherwise. At the same
time, if 7J is not linear in its parameters, we may find a form such as (4.4. 1)
a useful approximation to 7J(x.IJ) in the neighborhood of some particular
value of X a nd IJ. Chapters 5 and 6 deal with linear estimation.
, I
2
(4.5.3)
Ii.
with respect to changes in
This method o f estimation is known as the
ordinary least squares method. F or (4.4.1) o r (4.4.2) the computation of the
least squares estimates o f IJ can be described in a straight!orward manner
without using successive approximations o r iterations. I f IJ is unique, it is
called the least squares estimator of IJ.
4.6 GAUSSMARKOV ESTIMATION
I f in (4 .2.2) o r (4.4.2) the £'s are independent but do not all have the same
variance and if we know the proportion a:: a~: . . . : a: we would almost
certainly wish to consider in place of I~_I ( Y; 
",i,
S (IJ)=
" ( Y _ )2
; 2"';
~
i I
(4.6.1)
a/
in order that more accurate measurements be counted more heavily. The
minimization o f S with respect to JJ does not depend on the size of any
b ut only o n their proportion. In Chapter 5 we expand o n this idea.
I f the e's are not independent, the weighting that suggests itself involves
the covariances among the e's as well as the variances. We shall deal with
some such cases in later chapters. The sum of squares to be minimized is
a;
4.5 LEAST SQUARES
In forms (4.2.2) or (4 .2.3) or (4.4.2) we shall be interested in E ( YIX;). I t is
(4.5.1)
I f E (e;} = po, we can easily rewrite
"
(4.5.2)
"
S (JJ)= ~ ~ ( Y/"'i )Wij(lj1Jj)
i I j  I
(4.6.2)
I II
CHAPTER .. PARAMETER ES11MAnON ME11IODS
where the W,:;'s are elements o f the inverse of the variancecovariance
matrix o f the f ·S. See Section 6.1.7. G aussMarkov theorem.
123
".7 SOME O nlER E SnMATORS
the mode. that is.
a f(lIlY)I
a p.
,
" .1 S OME O THER ESTIMATORS
.':..
I f we know the form o f the joint distribution of the f /S a nd therefore o f the
Y·s. we c an seek joint maximum likelihood estimators o f the p;'s. We may
sometimes need to estimate parameters of the distribution of the f ;'S in the
process. Maximum likelihood estimators are discussed in Chapters 5 a nd 6
u nder various sets of assumptions about the distribution of the f ·S. We
shall find that. if the f ,'S are normally independently distributed with zero
mean a nd c ommon variance. the maximum likelihood estimators turn o ut
t o b e the ordinary least squares estimators. H the f /S have a multivariate
normal distribution. not necessarily with equal variances o r zero covariances, but possessing known proportions among the variances a nd covariances, the maximum likelihood estimators are generalizations o f least
squares estimates.
I f the conditional distribution o f Y, •. . .• Y" given P, •...• Pp c an be described by a probability density function (or by a discrete probability
function) which has continuous second partial derivatives with respect to
each fl; a nd jointly with respect to each pair. then the first partial
derivatives will be zero a t the maximum likelihood estimate bML of II. that
is.
I
I
a J(YIII)
= 0= a I nJ(YIII)
a p;..
a p;
..
WL
f ori=1.2 . ...• p
_
0
f or;= 1.2• ...• p
The estimator which minimizes the expected value o f the square o f the
deviation o f the estimator from the parameter being estimated is called the
squared error loss Bayes estimator a nd we symbolize it by bSEl ' F or scalar
fl. if the expected value o f the posterior distribution o f p given V exists. it
is b SF.\.: that is.
bSEL =
{>C PJ( PIV)dP
(4.7.3)
 01)
Another possible estimator is the median o f the posterior distribution.
This estimator is associated with minimizing the expected a bsolute deviation o f e stimator from estimated.
H the posterior distribution o f P is symmetric the median a nd mean
coincide. Some symmetric densities for scalar II a re shown in Fig. 4.1. F or
such cases the b SF.l vector defined by (4.7.3) is given by the mean o r
(4.7.1 )
~L
H the form o f the distribution of the random variable I ! is known a nd it
is known that the parameter(s). (a. for example) of the distribution of I ! are
chosen in accordance with a known probability distribution a nd if we
know we c an adequately approximate the prior distribution of the parameters. we can use Bayes's theorem to obtain the posterior distribution. the
M AP estimators. a nd the squared error loss estimators.
The MAP estimates are found by maximizing J( lilY). H there is a
unique II which maximizes J( II IV). that is. if there is a unique mode. the
MAP estimate of II is the mode of the posterior distribution of II. I f the
distribution of II is described by a probability density function which has
continuous second partial derivatives with respect to each P a nd j ointly
;
with respect to each pair o f P;'s, the first partial derivative will be zero a t
(4.7.2)
b wAP
Mean
Medi ....
Median
M ode
M ode
Sy~tk.du.l~
>
Mean
M edian
/I
..
F Ipre • • 1 Some symmetric conditional probability densities.
C HAPTER.. PARAMETER ESTIMATION M EllIODS
124
median value of the conditional distribution of P given Y. J( PlY). I f the
density J( PlY) in addition to being symmetric is also unimodal. the mea~.
median. and mode will all be at the s ame location. Hence when J( fJIY) IS
symmetric about the parameter vector P and is also unimodal. bSEL is
b
. When the distribution is not symmetric or not unimodal b SEL a nd
bM AP are rarely the same. Some nonsymmetric unimodal probability densiMAP
"
ties are depicted in Fig. 4 .2. Note that the modes d0 not comcl d ' h the
e wIt.
means. This causes the parameters bSEL given by (4.7.3) and assocIated
with the mean to be not equivalent to those given by the mode which are
indicated by (4.7.2).
The conditional probability density J UIIY) used in (4.7.2) can be written
in terms of other densities using the form of Bayes's theorem written as

J ( IJ IY ) 
J(YIIJ)J(IJ)
J(Y)
(4.7.4)
The probability density J( IJ) contains the prior information known regarding the parameter vector IJ. Notice that the parameters appear only in the
numerator of the right side of (4.7.4); this numerator can also be written as
(4.7.5)
J (Y.IJ) = J (YIIJ ) J( IJ)
Then the necessary conditions given by (4.7.2) can be written equivalently
as
a ln[J(Y.IJ)]
I  aln[J(YIIJ)] I
ap .
ap' b
MAP
' bMA ,
+ a ln[J(IJ)]
ap,
I
=0
(4.7.6)
lit.,.",
since the maximum of J(Y.IJ) exists at the same location as the maximum
of its natural logarithm.
.."
Mode
·.....l
t o
Med,an
Median
F lpn " .1 Some nonsymmetric conditional probability densities.
115
The estimators bMAP> b SEL are described without reference to the linearity
or nonlinearity of the expected value of Y in the p's nor to the independence of the Y/s. Under some assumptions about the structure of 'Ii a nd
under some assumptions about the prior distribution of the p's, the MAP
a nd SEL procedures are equivalent in arithmetic to certain least squares or
GaussMarkov procedures.
4.8 C OST
Methods of collecting data and analyzing them must be coordinated. I f
observations are expensive, sophisticated methods o f analysis to extract all
pertinent information are justified. Sometimes more expensive methods of
collecting data yield net returns by drastically reducing the cost of analysis. Increased costs due to collecting more data o r using more sophisticated
methods of analysis m ayor may not reduce the cost occasioned by the
degree to which the estimate is incorrect. Some remarks in Chapter 3 were
directed to these matters.
4.9 MONTE C ARW M ETHODS
One method for investigating the effects of nonlinearity o r various other
effects that are difficult to analyze otherwise is called the Monte Carlo
method. Actually, what we describe is sometimes referred to as the " crude"
Monte Carlo method. More sophisticated Monte Carlo methods often
provide the same amount of information as the crude method but at a
lower cost [I}.
The Monte Carlo method can be used to investigate analytically the
properties of a proposed estimation method.' To simulate a series' of
experiments on the computer we proceed as follows:
I.
io" '
MONTE CARLO METHODS
Define the system by prescribing ( 0) the model equation, also called
regression function, (b) the way in which "errors" are incorporated in
the model of the observations, (c) the probability distribution of all the
errors and, where applicable, ( d) a prior distribution. Assign " true"
values to all the parameters ( IJ) in the regression function and to those
in the distribution of error.
1. Select a set of values of the independent variables. Then calculate the
associated set of " true" values of 11 from the regression equations.
3. Use the computer to produce a set of errors ~ drawn from the
prescribed probability distribution. F or most computers programs are
126
CHAPTER .. PARAMETER E SnMAnON ME11fODS
available which can generate a stream of numbers that have all the
important characteristics of successive independent observations on a
population uniform over the interval (0. I). Since they are generated by
a deterministic scheme. they are not actually random. Such numbers
are called pseudorandom numbers. Suitable transformations are used to
obtain samples for any other distribution.
T o obtain a sequence of pseudorandom observations on a normal
population with expected value 0 a nd variance I. we can make use of
the BoxMuller transformation (2]. I f U2/ _ 1 a nd U 2i are independent
(0, I ) random numbers.
X liI
=(  21n U2/ _ 1)1/2 cos(2",u2/ )
( 4.9.la)
and
(4.9.1 b)
are independent random observations on a normal distribution with
expected value 0 a nd variance I. T he normal random numbers are
then adjusted to have the desired variances and covariances.
The simulated measurements are obtained by combining the errors
with the regression values. For additive errors. the ith error is simply
added to the ith 1) value. This then provides simulated measurements.
4. Acting as though the parameters are unknown, we estimate the parameters, denoting the estimates 11.
5. Replicate the series of simulated experiments N times by repeating
steps 3 and 4, each time with a new set of errors.
6. We use appropriate methods to estimate properties of the distribution
of parameter estimates. (We consider the estimates actually obtained
by our pseudorandom number scheme to be a random sample from
the distribution of all possible estimates.) The expected value of our
parameter estimator is estimated by the mean of our parameter estimates,
(4.9.2)
where 1t1 is the j th component of the 11 found on the ith replication.
I f 11 may be a biased estimator. W 11 is a n estimate of the bias. I f it
is not clear whether o r not 11 is biased the size of 11  11 needs to be
compared with an estimate of its variancecovariance matrix.
The variances and covariances of the distribution of 11 may be
. .., M ON'R CARLO M E'IlIODS
121
estimated by
I
est. cov( fl/, f l:)  N _ I
II
_
_
l : (1t1 fl/ )( flt;  fl: )
(4.9.3a)
I I
I f fJ is known to be unbiased, we c an make use of our knowledge o f fJ
a nd use a slightly more efficient estimator
(4.9.3b)
I f fJ is biased, the right side o f (4.9.3b) which are estimates o f mean
square error and corresponding product moments, may be more interesting than variances and covariances. I f we use actual experiments
rather than simulated ones (4.9.3b) will not be available although
(4.9.2) and (4.9.3a) are.
The nexibility o f the above simulation procedure is great. We can
estimate the sample properties for any model, linear o r nonlinear, and for
any parameter values. We can estimate the effect o f different probability
distributions upon ordinary least squares estimation o r o ther estimation
methods. Many other possibilities also exist. An example o f a Monte Carlo
simulation is given below and another one is given in Section 6.9. These
simulations can be accomplished on a modern highspeed computer at a
small fraction of the cost, in time and money, of a comparable set of
physical experiments.
The great power of the Monte Carlo procedure is that we can investigate
the properties of estimators in cases for which the character o f the
estimators cannot be derived. T o demonstrate the validity of a Monte
Carlo procedure a n example is considered which is simple enough to be
analyzed without recourse to simulation. We investigate estimating fl in the
model 1)i .... flX, for the case of additive, zero mean, constant variance,
uncorrelated errors; that is
for ;"I=j
T he distribution o f I , is uniform in the interval (  .5, .5); each E, is found
using a pseudorandom number generator. There are no errors in X, a nd
there is n o prior information.
The X, values are X ,i for i I,2. .... 10 a nd f l1. F or the kth set o f
simulated measurements, f l: is found using the ordinary least squares
u.
CHAPTER .. PARAMETER ESTIMATION ME11IODS
REFERENCES
estimator,
Table 4.2
10
P: = [ .~ Xj Y,.k ] [ 10 X,2 ]  I
.~
, I
1
10
P O=1o ~ p :,est.
1
V(P*)="9
I eI
~

( PtP*)
I
10
(Pt 
1)2
I eI
F or independent sets of errors, estimates were calculated for N = 5, 25, 50,
100, 200, a nd 500. The results are shown in Table 4.1 where the estimated
standard deviation and estimated root mean square error are given rather
than their squares. In Table 4.2 comparable results for a simulation
involving normal errors are given. The variance of f ; in this case was taken
as 1 /12, the same as the variance for the uniform_case.
In both Tables 4.1 a nd 4.2 the sample mean {1* tends to approach the
true value o f I as N becomes large. Hence P* is an unbiased estimator of
p. Also the estimated standard error of po a nd estimated root mean square
error tend to their common exact value
{o2[~X/)
Table 4.1
Sample
Size
~

.
1
W
5
25
50
100
200
500
25
2
_I
1 /2
}
=
{ 1/12
385
0.9969
0.9972
0.9973
0.9995
0.9997
200
500
}112
= 0 .014712
Monte Carlo Simulation for 1J; = PX j , with P= I and Xi= i ,
;= 1,2, . .. , 10. Uniform Distribution of Errors
Est. SId Dev
Est. Root Mean
Square Error
fio
( PO)
( PO)
1.0044
1.0014
0.9992
0.9996
1.0018
0.9987
0.00950
0.0/616
0.01350
0.01425
0.01440
0.01415
0.00958
0.01589
0.01339
0.01418
0.01448
0.0/419
Est. R oot Mean
Square E rror
(PO)
J.()()21
50
IeI
est. mean square error ( {1*) = 10 ~
Est. S td Dev
5
100
10
M onte Carlo Simulation l or lIi = P Xi, with /J= I a nd Xi = i,
i = 1,2, . .. , 10. N ormal Distribution 01 Errors
Sample
Size
.1
T he estimated expected value of Pt, (4 .9.2), the estimated variance of Pt,
(4.9.3a), and the estimated mean square error of Pt, (4.9.3b), are obtained
by using

IZ!I
(PO)
0,01156
0.01608
0.01496
0.01486
0.01410
0.01480
0,01055
0.01606
0.01507
0.01502
0.01407
0.01478
This example shows that the number o f simulations N must be quite large
in order to provide accurate estimates o f the variance o f the parameter
estimate. Such simulations are still inexpensive compared to actual exPeriments ~o determine the variance. Moreover, methods' are available for
making the simulation procedure more efficient [ I J.
REFERENCES
I.
Hammersley, J. M. and Handscomb, D. C., Monte Carlo Methods, Methuen .t: Co. Ltd.,
London, 1964.
2.
B Oll, G. E. P. and Muller, M. E., "A Note on the Generation of Random Normal
Deviates," Ann. Math. Stat., Z!I (19S8), 61~11.