Back to Parameter Estimation in Engineering and Science Home
Table of Contents
CHAPTER 3 INTRODUCTION T O S TATISTICS 116 CHAPTER 4___________________ p arameter of the exponential distribution with density function I Xe o - xiA X >O P ARAMETER otherwise E STIMATION M ETHODS wiIl have a rejection region which is the solution o f a n i nequality of the form (x/~)exp( - x/Ao) <; c onstant. 3.15 I ffxlp(xJp)=px(I-p)'-X x =O,I, e (p,p)=(p_p)2, ( 0) find r [p,p,(x») for p ,(x)=(x+ 1 )/3; ( b) f ind r [p,po(x») for Po(x) = ( c) I f g (p)= 2(1 - p),O <; p <; I, find E ( r[ g (p),p,(x) 4· 3 .16 n· I f X h as a binomial distribution with n =4, if p =(x+ 1 )/6, a nd if e (p,p)= ( p - p)2, find r (p,p). Answer. 3.17 k. F rom a n ormal distribution with s tandard d eviation 5 a sample o f 16 i ndependent o bservations was obtained. x was calculated a nd f ound to b e 31.5. F ind a 90% confidence interval for " . Answer. (29.44, 33.56). F rom a n ormal distribution a sample o f 16 was obtained. x a nd s were calculated a nd f ound to be 31.5 a nd 5, respectively. F ind a 90% c onfidence 4.1 interval for " . 3.18 In this chapter we introduce some of the concepts which we shall develop in the remainder of the book. Some canonical forms for the models of problems of paraineter estimation are presented, least squares estimators are described, and modifications suggested by various criteria for good estimators are mentioned. We close the chapter with a short discussion of simulation techniques for c.omparing methods of estimation. Answer. (29.32, 33.68). 3.19 F rom a n ormal distribution with s tandard d eviation 5 a sample o f 16 i ndependent observations was obtained. x was calculated a nd f ound to b e 31.5. Test a t the 5% level of significance the hypothesis t hat" = 30 a gainst alternatives that ,,~30. 3.10 I f f (xJ8)= 1 + 8 2(x - 0,0 <; x <; 1 ,0 <; 8.;, V I, u sing o ne o bservation, a rejection region x> .9 was decided upon. F ind t he power function o f this test. Answer. .1 + .0458 2 • INTRODUCTION 4.1 RELATIONS B ElWEEN OBSERVED RANDOM VARIABLES A ND' PARAMETERS We assume a functional relationship among several measurable variables, (P I ' P2, . .. , P, ), and particular values of one or more random variables (e,e(l), . .. ,e(m».P(In this book we deal almost exclusively with cases in which each observation involves only one random variable; thus we have no need for the index indicated by the superscript.) The measurement of the measurable variables provides the observations. The parameters are unknown and we wish to estimate a t least some of them. The particular values of the random variables e's are ( Y, X I ' . .. , X k )' one o r more parameters 117 I II CHAPTER 4 PARAMETER E SnMAnON METHODS unknown. We may estimate the E 'S themselves in order to get a picture o f how well the estimates fit with o ur preconception o f the distribution of the £'s. 4.1 OBSERVED RANDOM VAlUABLES AND PARAMETERS t o introduce in the last relation o ne f urther abbreviation. A nd y et one more abbreviation is t o use V for ( Y ••..•• Y,,). e for (E ••••• •e,,). a nd X for the matrix I l is convenient to pick one of the measurable variables a nd express this variable in terms of the others. I t is the picked variable with which we associate the pronoun Y. T hus we write (4.2.1) T he variable to be called Y is traditionally chosen because we are interested in how its value is affected by the values assumed by X . .... , X, . I t is t raditionally called the dependent variable. The variables X . ..... X . a re called the independent variables. (Be clear in noting that independent in this context does not mean independence in the statistical sense.) The X 's m ay be thought of as the causes of the Y. as when Y repres~nts the yield in a chemical process into which amounts X . ..... X , of matenal from sources I, . ... k a re combined. The X 's may merely describe the physical environment. as when Y represents temperature a t p oint ( X •• X 2.X 3) in space a t time X.. . . W e are fortunate if the E 'S c an be combined into o ne E a nd espec.ally fortunate if the errors are additive. t hat is. if we c an write (4.2.2) in which the distribution of E does not depend o n the unknown p 's a lthough it may depend on parameters wh.ich d o .not appear, in t~e o ther term o f (4.2.2). I t will sometimes be conveme~t to mdex the P s .by Integers beginning with 0; thus Po.P ...... P _" I t Will also be convement to use p vector notation to abbreviate; thus X =(X . .... , X, ) II = ( P.,· ... P,) or ( PcrP .... ·.P,_. ) as appropriate and Y =lJ(X.II)+E (4.2.3) T hus the ith observation will be signified by adding a subscript i to . ( Y, X I '·· ·' X It. ) . that .·s , YI ' X,1'···' X'L. E, is found by (4 .2.2) o r (4.2.3) to be , ,,.. · such that Yj = 1 )( Xjl' . ... Xj ' ; P.... .. P,) + E/ = 1) (Xj.1I ) + Ej = 1)j + Ej (4.2.4) 119 X II X li X. 2 Xu X •• Xu (X •• X2• ••• , X .)- (4.2.5) X". XII2 X". N ote that the X;'s a re column vectors. T he distribution o f t he E'S is generally unknown. I f t he E;'S a re correlated, estimation o f p arameters may b e m uch more difficult, the estimators less reliable, a nd the reliability difficult to assess. We therefore deal first with cases o f i ndependent E;'S, l ater investigating estimation under various assumptions regarding correlation. F or those methods o f estimation which require some assumption a bout t he form o f the distribution o f the E;'S o r when the evaluation o f t he method requires assumptions o f the form, we shall invariably investigate first u nder the assumption o f normality. Usually we go no further. Fortunately many aspects o f the relationships between sample moments a nd moments o f the random variable being investigated d o n ot depend o n details o f the form o f the distribution. O ne trouble arises from too great a preoccupation with cases in which the E 'S m ay b e a ssumed to be normally distributed. Different criteria for evaluating estimators may be expected to lead to different choices of estimators to be used. W e seem frequently to deal with criteria which in general d o suggest somewhat different estimators b ut which, when applied to a case in which normal E'S a re assumed, lead to identical estimators. The casual student is sometimes misled to assume that if the estimators are the same the criteria must b e essentially equivalent. Beware. We have used and shall use assumptions other t han normality not only to illuminate the difference but also to give insight into the effect o f wrongly assuming normality. T o estimate parameters we must first gather d ata a nd then analyze them. I t is essential that experimental procedures be such that the d ata c an be analyzed. The form o f the function to be used in (4.2.1) a nd the experimental procedure must be developed together. " Design o f experiments" deals, for the most part, with choosing values o f the X 's to facilitate analysis a nd to improve accuracy o f estimation. CHAPTER .. IlO PARAMETER E STIMATION M ETIIODS 4.3 E XPECfED VALUES, VARIANCES, COVARIANCES With a sequence of random variables. E I" . . , E", we associate expected values E(E I)• ...• E(E,,). variances V(E I) • ... , Vee,,), and covariances COV(EI,E2)' COV(EI.E) • ...• COV(E,, _ I.e,,). In Chapter 6 we use vector and matrix forms to save space and to increase clarity: E (e) = [ E (E I )' E (E2), . .. , E (E,,) ] (4.3.1) V (EI) COV(EI,E2) COV(EI,E,,) COV(EI,E2) V(E2) COV(El,E,,) cov( e) = (4.3.2) cov( EI, E,,) cov( E2,E,,) V(E,,) " 4.4 LINEAR PROBLEMS I f we can write 'IJ(X,IJ) in the form Y=!3.X.+!32 X 2+· · · +!3p Xp + E We see that (4.5.2) has the same form as (4.5.1) with " ,(XAJ) o f (4.5. 1) replaced by ",(X;,IJ)+ po of (4.5.2) a nd E; replaced by a new random variable E; - po . I f po is known, i t is j ust a number in (4.5.2). I f po is unknown it plays the role of one of the p's. We shall lose nothing a nd gain simplicity if we assume E; has expected value O. We deal further with this question in Section 5.10. I f 'IJ(X,IJ) is a constant, say, ",(X,IJ)=po for all X, the prohlem of estimation is one we handled in ~hapter 3. O ur estimate of po is Y a nd if the £'s a re normally distributed Y is the best ~timator of po from many points of view. In looking for some property of Y which is a t the same time simple a nd generaliZable, mathematicians a couple of cent'!.ries ago turned to the fact that, if we have a set o f numbers YI ' Y 2' • • • , Y", Y is t~e value of the variable il which minimizes I~_I ( Y/- il)2. As estimator of IJ of (4.2.2) they chose that Ii that minimizes S=IIY-'lJ(X,P)1I2= .~ [ Y;-",(X/,P)] In cov(e) it is convenient to think of each covariance as occupying two positions symmetrically situated with respect to the main diagonal. T/(X.IJ)=!3I X .+!32 X 2+··· +!3p Xp " .6 G AUSS-MARKOVallMAll0N (4.4.1 ) (4 .4.2) we find the problem of estimating JJ simpler than otherwise. At the same time, if 7J is not linear in its parameters, we may find a form such as (4.4. 1) a useful approximation to 7J(x.IJ) in the neighborhood of some particular value of X a nd IJ. Chapters 5 and 6 deal with linear estimation. , -I 2 (4.5.3) Ii. with respect to changes in This method o f estimation is known as the ordinary least squares method. F or (4.4.1) o r (4.4.2) the computation of the least squares estimates o f IJ can be described in a straight!orward manner without using successive approximations o r iterations. I f IJ is unique, it is called the least squares estimator of IJ. 4.6 GAUSS-MARKOV ESTIMATION I f in (4 .2.2) o r (4.4.2) the £'s are independent but do not all have the same variance and if we know the proportion a:: a~: . . . : a: we would almost certainly wish to consider in place of I~_I ( Y; - ",i, S (IJ)= " ( Y _ )2 ; 2"'; ~ i- I (4.6.1) a/ in order that more accurate measurements be counted more heavily. The minimization o f S with respect to JJ does not depend on the size of any b ut only o n their proportion. In Chapter 5 we expand o n this idea. I f the e's are not independent, the weighting that suggests itself involves the covariances among the e's as well as the variances. We shall deal with some such cases in later chapters. The sum of squares to be minimized is a; 4.5 LEAST SQUARES In forms (4.2.2) or (4 .2.3) or (4.4.2) we shall be interested in E ( YIX;). I t is (4.5.1) I f E (e;} = po, we can easily rewrite " (4.5.2) " S (JJ)= ~ ~ ( Y/-"'i )Wij(lj-1Jj) i -I j - I (4.6.2) I II CHAPTER .. PARAMETER ES11MAnON ME11IODS where the W,:;'s are elements o f the inverse of the variance--covariance matrix o f the f ·S. See Section 6.1.7. G auss-Markov theorem. 123 ".7 SOME O nlER E SnMATORS the mode. that is. a f(lIlY)I a p. , " .1 S OME O THER ESTIMATORS .':.. I f we know the form o f the joint distribution of the f /S a nd therefore o f the Y·s. we c an seek joint maximum likelihood estimators o f the p;'s. We may sometimes need to estimate parameters of the distribution of the f ;'S in the process. Maximum likelihood estimators are discussed in Chapters 5 a nd 6 u nder various sets of assumptions about the distribution of the f ·S. We shall find that. if the f ,'S are normally independently distributed with zero mean a nd c ommon variance. the maximum likelihood estimators turn o ut t o b e the ordinary least squares estimators. H the f /S have a multivariate normal distribution. not necessarily with equal variances o r zero covariances, but possessing known proportions among the variances a nd covariances, the maximum likelihood estimators are generalizations o f least squares estimates. I f the conditional distribution o f Y, •. . .• Y" given P, •...• Pp c an be described by a probability density function (or by a discrete probability function) which has continuous second partial derivatives with respect to each fl; a nd jointly with respect to each pair. then the first partial derivatives will be zero a t the maximum likelihood estimate bML of II. that is. I I a J(YIII) = 0= a I nJ(YIII) a p;.. a p; .. WL f ori=1.2 . ...• p _ -0 f or;= 1.2• ...• p The estimator which minimizes the expected value o f the square o f the deviation o f the estimator from the parameter being estimated is called the squared error loss Bayes estimator a nd we symbolize it by bSEl ' F or scalar fl. if the expected value o f the posterior distribution o f p given V exists. it is b SF.\.: that is. bSEL = {>C PJ( PIV)dP (4.7.3) - 01) Another possible estimator is the median o f the posterior distribution. This estimator is associated with minimizing the expected a bsolute deviation o f e stimator from estimated. H the posterior distribution o f P is symmetric the median a nd mean coincide. Some symmetric densities for scalar II a re shown in Fig. 4.1. F or such cases the b SF.l vector defined by (4.7.3) is given by the mean o r (4.7.1 ) ~L H the form o f the distribution of the random variable I ! is known a nd it is known that the parameter(s). (a. for example) of the distribution of I ! are chosen in accordance with a known probability distribution a nd if we know we c an adequately approximate the prior distribution of the parameters. we can use Bayes's theorem to obtain the posterior distribution. the M AP estimators. a nd the squared error loss estimators. The MAP estimates are found by maximizing J( lilY). H there is a unique II which maximizes J( II IV). that is. if there is a unique mode. the MAP estimate of II is the mode of the posterior distribution of II. I f the distribution of II is described by a probability density function which has continuous second partial derivatives with respect to each P a nd j ointly ; with respect to each pair o f P;'s, the first partial derivative will be zero a t (4.7.2) b wAP Mean Medi .... Median M ode M ode Sy~tk.du.l~ > Mean M edian /I .. F Ipre • • 1 Some symmetric conditional probability densities. C HAPTER.. PARAMETER ESTIMATION M EllIODS 124 median value of the conditional distribution of P given Y. J( PlY). I f the density J( PlY) in addition to being symmetric is also unimodal. the mea~. median. and mode will all be at the s ame location. Hence when J( fJIY) IS symmetric about the parameter vector P and is also unimodal. bSEL is b . When the distribution is not symmetric or not unimodal b SEL a nd bM AP are rarely the same. Some nonsymmetric unimodal probability densiMAP " ties are depicted in Fig. 4 .2. Note that the modes d0 not comcl d ' h the e wIt. means. This causes the parameters bSEL given by (4.7.3) and assocIated with the mean to be not equivalent to those given by the mode which are indicated by (4.7.2). The conditional probability density J UIIY) used in (4.7.2) can be written in terms of other densities using the form of Bayes's theorem written as - J ( IJ IY ) - J(YIIJ)J(IJ) J(Y) (4.7.4) The probability density J( IJ) contains the prior information known regarding the parameter vector IJ. Notice that the parameters appear only in the numerator of the right side of (4.7.4); this numerator can also be written as (4.7.5) J (Y.IJ) = J (YIIJ ) J( IJ) Then the necessary conditions given by (4.7.2) can be written equivalently as a ln[J(Y.IJ)] I -- aln[J(YIIJ)] I ap . ap' b MAP ' bMA , + a ln[J(IJ)] ap, I =0 (4.7.6) lit.,.", since the maximum of J(Y.IJ) exists at the same location as the maximum of its natural logarithm. .." Mode ·.....l t o- Med,an Median F lpn " .1 Some nonsymmetric conditional probability densities. 115 The estimators bMAP> b SEL are described without reference to the linearity or nonlinearity of the expected value of Y in the p's nor to the independence of the Y/s. Under some assumptions about the structure of 'Ii a nd under some assumptions about the prior distribution of the p's, the MAP a nd SEL procedures are equivalent in arithmetic to certain least squares or Gauss-Markov procedures. 4.8 C OST Methods of collecting data and analyzing them must be coordinated. I f observations are expensive, sophisticated methods o f analysis to extract all pertinent information are justified. Sometimes more expensive methods of collecting data yield net returns by drastically reducing the cost of analysis. Increased costs due to collecting more data o r using more sophisticated methods of analysis m ayor may not reduce the cost occasioned by the degree to which the estimate is incorrect. Some remarks in Chapter 3 were directed to these matters. 4.9 MONTE C ARW M ETHODS One method for investigating the effects of nonlinearity o r various other effects that are difficult to analyze otherwise is called the Monte Carlo method. Actually, what we describe is sometimes referred to as the " crude" Monte Carlo method. More sophisticated Monte Carlo methods often provide the same amount of information as the crude method but at a lower cost [I}. The Monte Carlo method can be used to investigate analytically the properties of a proposed estimation method.' To simulate a series' of experiments on the computer we proceed as follows: I. io" ' MONTE CARLO METHODS Define the system by prescribing ( 0) the model equation, also called regression function, (b) the way in which "errors" are incorporated in the model of the observations, (c) the probability distribution of all the errors and, where applicable, ( d) a prior distribution. Assign " true" values to all the parameters ( IJ) in the regression function and to those in the distribution of error. 1. Select a set of values of the independent variables. Then calculate the associated set of " true" values of 11 from the regression equations. 3. Use the computer to produce a set of errors ~ drawn from the prescribed probability distribution. F or most computers programs are 126 CHAPTER .. PARAMETER E SnMAnON ME11fODS available which can generate a stream of numbers that have all the important characteristics of successive independent observations on a population uniform over the interval (0. I). Since they are generated by a deterministic scheme. they are not actually random. Such numbers are called pseudorandom numbers. Suitable transformations are used to obtain samples for any other distribution. T o obtain a sequence of pseudorandom observations on a normal population with expected value 0 a nd variance I. we can make use of the Box-Muller transformation (2]. I f U2/ _ 1 a nd U 2i are independent (0, I ) random numbers. X li-I =( - 21n U2/ _ 1)1/2 cos(2",u2/ ) ( 4.9.la) and (4.9.1 b) are independent random observations on a normal distribution with expected value 0 a nd variance I. T he normal random numbers are then adjusted to have the desired variances and covariances. The simulated measurements are obtained by combining the errors with the regression values. For additive errors. the ith error is simply added to the ith 1) value. This then provides simulated measurements. 4. Acting as though the parameters are unknown, we estimate the parameters, denoting the estimates 11-. 5. Replicate the series of simulated experiments N times by repeating steps 3 and 4, each time with a new set of errors. 6. We use appropriate methods to estimate properties of the distribution of parameter estimates. (We consider the estimates actually obtained by our pseudorandom number scheme to be a random sample from the distribution of all possible estimates.) The expected value of our parameter estimator is estimated by the mean of our parameter estimates, (4.9.2) where 1t1 is the j th component of the 11- found on the ith replication. I f 11- may be a biased estimator. W- 11 is a n estimate of the bias. I f it is not clear whether o r not 11- is biased the size of 11- - 11 needs to be compared with an estimate of its variance-covariance matrix. The variances and covariances of the distribution of 11- may be . .., M ON'R CARLO M E'IlIODS 121 estimated by I est. cov( fl/, f l:) - N _ I II _ _ l : (1t1- fl/ )( flt; - fl: ) (4.9.3a) I -I I f fJ- is known to be unbiased, we c an make use of our knowledge o f fJ a nd use a slightly more efficient estimator (4.9.3b) I f fJ- is biased, the right side o f (4.9.3b) which are estimates o f mean square error and corresponding product moments, may be more interesting than variances and covariances. I f we use actual experiments rather than simulated ones (4.9.3b) will not be available although (4.9.2) and (4.9.3a) are. The nexibility o f the above simulation procedure is great. We can estimate the sample properties for any model, linear o r nonlinear, and for any parameter values. We can estimate the effect o f different probability distributions upon ordinary least squares estimation o r o ther estimation methods. Many other possibilities also exist. An example o f a Monte Carlo simulation is given below and another one is given in Section 6.9. These simulations can be accomplished on a modern high-speed computer at a small fraction of the cost, in time and money, of a comparable set of physical experiments. The great power of the Monte Carlo procedure is that we can investigate the properties of estimators in cases for which the character o f the estimators cannot be derived. T o demonstrate the validity of a Monte Carlo procedure a n example is considered which is simple enough to be analyzed without recourse to simulation. We investigate estimating fl in the model 1)i .... flX, for the case of additive, zero mean, constant variance, uncorrelated errors; that is for ;"I=j T he distribution o f I , is uniform in the interval ( - .5, .5); each E, is found using a pseudorandom number generator. There are no errors in X, a nd there is n o prior information. The X, values are X ,-i for i -I,2. .... 10 a nd f l-1. F or the kth set o f simulated measurements, f l: is found using the ordinary least squares u. CHAPTER .. PARAMETER ESTIMATION ME11IODS REFERENCES estimator, Table 4.2 10 P: = [ .~ Xj Y,.k ] [ 10 X,2 ] - I .~ , -I 1 10 P O=1o ~ p :,est. 1 V(P*)="9 I e-I ~ - ( Pt-P*) I 10 (Pt - 1)2 I e-I F or independent sets of errors, estimates were calculated for N = 5, 25, 50, 100, 200, a nd 500. The results are shown in Table 4.1 where the estimated standard deviation and estimated root mean square error are given rather than their squares. In Table 4.2 comparable results for a simulation involving normal errors are given. The variance of f ; in this case was taken as 1 /12, the same as the variance for the uniform_case. In both Tables 4.1 a nd 4.2 the sample mean {1* tends to approach the true value o f I as N becomes large. Hence P* is an unbiased estimator of p. Also the estimated standard error of po a nd estimated root mean square error tend to their common exact value {o2[~X/) Table 4.1 Sample Size ~ - -. 1 W 5 25 50 100 200 500 25 2 _I 1 /2 } = { 1/12 385 0.9969 0.9972 0.9973 0.9995 0.9997 200 500 }112 = 0 .014712 Monte Carlo Simulation for 1J; = PX j , with P= I and Xi= i , ;= 1,2, . .. , 10. Uniform Distribution of Errors Est. SId Dev Est. Root Mean Square Error fio ( PO) ( PO) 1.0044 1.0014 0.9992 0.9996 1.0018 0.9987 0.00950 0.0/616 0.01350 0.01425 0.01440 0.01415 0.00958 0.01589 0.01339 0.01418 0.01448 0.0/419 Est. R oot Mean Square E rror (PO) J.()()21 50 Ie-I est. mean square error ( {1*) = 10 ~ Est. S td Dev 5 100 10 M onte Carlo Simulation l or lIi = P Xi, with /J= I a nd Xi = i, i = 1,2, . .. , 10. N ormal Distribution 01 Errors Sample Size .-1 T he estimated expected value of Pt, (4 .9.2), the estimated variance of Pt, (4.9.3a), and the estimated mean square error of Pt, (4.9.3b), are obtained by using - IZ!I (PO) 0,01156 0.01608 0.01496 0.01486 0.01410 0.01480 0,01055 0.01606 0.01507 0.01502 0.01407 0.01478 This example shows that the number o f simulations N must be quite large in order to provide accurate estimates o f the variance o f the parameter estimate. Such simulations are still inexpensive compared to actual exPeriments ~o determine the variance. Moreover, methods' are available for making the simulation procedure more efficient [ I J. REFERENCES I. Hammersley, J. M. and Handscomb, D. C., Monte Carlo Methods, Methuen .t: Co. Ltd., London, 1964. 2. B Oll, G. E. P. and Muller, M. E., "A Note on the Generation of Random Normal Deviates," Ann. Math. Stat., Z!I (19S8), 61~11.