# Statistical Inference I

## Full text

(1)

### Statistical Inference I

Shirsendu Mukherjee

Department of Statistics, Asutosh College, Kolkata, India.

shirsendu st@yahoo.co.in Introduction to statistical inference

In the present module we are going to introduce the concept of point estima- tion which is a part of statistical inference. Statistical inference is the process of going from information gained from a sample to infer about a population from which the sample is taken. There are two aspects of statistical inference that we will be studying in this course: (i) Estimation and (ii) Hypothesis testing. In an estimation problem some features of the population in which an enquirer is interested may be completely unknown to him, and he may want to make a guess about this feature completely on the basis of a random sample from the population. There are two types of estimation problem : (i) Point estimation and (ii) Interval estimation. In this lecture we shall discuss some preliminary concept of point estimation.Let us start our discussion with a brief history of estimation problem.

Historical Perspective The problem of estimation arose in a very natural way in problems of Astronomy and Geodesy in the first half of the 18-th century. For example in Astronomy, the determination of interplanetary distances, determining the position of planets and their movements in time were some of the important problems. Whereas in Geodesy, determining the spheroidical shape of the earth was one of the most important problems. It is known that the figure of the earth is almost a sphere except for some flatness near the poles. Observations were obtained on the measurement of the length of one degree of a certain meridian and the problem was to determine the parameters say α and β which specified the spheroid of the earth. Indirect observations on (α, β) were given by the relation

Yi =α+βxi, i= 1,2, . . . , n

Where xi’s are known fixed constants. Note that (α, β) are uniquely deter- mined if only two observations on Y at different values of (x1, x2) are avail- able.However, as is customary in science, several observations were made at different values (x1, x2, . . . xn) and this led to the theory of combination of observations with random error which directly or indirectly measured ”mag- nitudes of interest or parameters. To estimate α and β on the basis of the

### red!50

(2)

given data the first attempt was made by Rogerr Boscovich(1757) in course of a geodetic study of ellipticity (extent of flatness at the poles) of the earth.

He suggested that the estimates of (α, β) are to be determined such that (i) The sum of positive and negative residuals or errors should balance i.e.

Pn

i=1(yi−α−βxi) = 0 and

(ii) Subject to the above constraint we determine (α, β) such that R =

Pn

i=1|yi−α−βxi| The sum of absolute values of errors ei =yi−α−βxi is as small as possible.

Using geometric argument Boscovich solve the problem for the five obser- vations that he had. Laplace (1789) gave a general algebric algorithm to obtain estimates of (α and β) on the above principles for any number of observations. This problem was later solved by Gauss and Legendre using the method of least squares.

Boscovich has made the assumption that the errors of overestimation and un- derestimation must balance out. This idea was used by so many researchers in future time. For estimating the parameterθin the simplest modelYi =θ+ei, Simpson(1776) used this idea by assuming that errors are symmetrically uni- formly distributed about zero or the probability density function of the error is given by f(e) = 2h1 ,−h < e < h, h > 0. Euler (1778) proposed the arc of a parabolic curve given by f(e) = 4r33(r2−e2),−r < e < r, r > 0 as the pdf of the random error. Laplace suggested the probability density function f(e) = 2h1 exph−|e|h i,−∞ < e < ∞. As the model for distribution of errors and Gauss proposed the normal distribution with probability density func- tion f(e) = 1

2πh2exph−e2h22

i,−∞< e <∞. It is important to point out here that the double exponential distribution used by Laplace to represent error distribution led to the median of the sample of the ”best” estimator of the

”True Value” of the parameter θ whereas the normal distribution used by Gauss led to the mean of the sample as the ”best” estimator of the ”True value”.

Theory of Point estimation Background

We consider a random experiment E. The outcome ofE is represented by a Observable random vector X = (X1, X2, . . . , Xn), n ≥1. A particular value of X is denoted by x = (x1, x2, . . . , xn). The character X could be real or vector valued and the set of all values of X is called the sample space and it is denoted by X ⊂ Rn.

The random vector X is generated by F(x) = P(X ≤ x), x∈ X, the dis- tribution function of X.

### red!50

(3)

In a parametric point estimation problem we assume that the functional form of F(x) is known except perhaps for a certain number of parameters. Let θ = (θ1, θ2, . . . , θk) be the unknown parameters associated with F(x). The parameter θ may be real valued or vector valued and is usually called a la- belling or indexing parameter. The labelling parameter θ varies over a set of values, called as parameter space and is denoted by Θ ⊂ Rk. So F(x) can be looked upon as a function of θ and henceforth we will write it as Fθ(x).

IfX is discrete or absolutely continuous thenF(x) is generated byfθ(x), the probability mass function(p.m.f.) or of probability density function (p.d.f.) X. We write Fθ = {p(x, θ) : θ ∈ Θ}, as the class of all probability mass or density functions. The object of inference is the parameterθ or a function of the parameter θ say g(θ), which is of interest. Let us consider few examples.

Example 1 Suppose a coin is tossed 50 times.

The outcome of ith toss can be described by a random variableXi such that Xi = 1or 0 according as the ith toss results in a head or a tail.

X ={(x1, x2, . . . x50) :xi = 0 or 1 for all i}

If θ be the probability of getting a head in any toss then Θ = (0,1) and the probability function of X is p(x, θ) = Π50

i=1θxi(1−θ)1−xi,x ∈ X,θ ∈ Θ. We may want to estimate θ or any function ofθ.

Example 2 Suppose that 100 seeds of a certain flower were planted one in each pot and letXi equal one or zero according as the seed in theithpot ger- minates or not. The data consists of (x1, x2, . . . x100) a sequence of ones and zeroes and is regarded as a realization of (X1, X2, . . . X100) such that compo- nents are i.i.d. random variables with P[X1 = 1] =θ andP[X1 = 0] = 1−θ, where θ represents the probability that a seed germinates. The object of estimation is θitself or a functiong(θ) that may be of interest. For example, considerg(θ) =108θ8(1−θ)2, which is the probability that in a batch of 10 seeds exactly 8 seeds will germinate.

Example 3In a pathbreaking experiment Rutherford Chadwick and Elis(1920) observed 2608 time intervals of 7.5 seconds each and counted the number of time intervals Nr in which exactly r number of α particles hit the counter.

They obtained the following table

r 0 1 2 3 4 5 6 7 8 9 ≥ 10

Nr 57 203 383 525 532 408 273 139 45 27 18

### red!50

(4)

It is quite well known that the Poisson distribution with p.m.f fθ(x) =

exp(−θ)θx

x! , x= 1,2, . . . θ >0 serves as a good model for the number of times a given event E occurs in a unit time interval. If Xi denotes the number of α particles hitting the counter in the i−th time interval then (X1, X2, . . . Xn) where n = 2608 are i.i.d. Poisson random variables with parameter θ. We may want to estimate θ on the basis of the given data.

Example 4 Consider determination for an ideal physical constant such as gravity g. Usual way to estimate g is by the pendulum experiment and observe X = T2l, where l is the length of the pendulum and T the time re- quired for a fixed number of oscillations. Due to variation which depends on several factors such as the skill of the experimentor and measurement errors, the i−th observation Xi =g +ei where ei is the random error. Assuming the distribution of error is normal with zero mean and variance σ2 we have X1, X2, . . . , Xn are i.i.d.N(g, σ2). Here the parameter θ is a two dimensional vector, θ = (g, σ2). Here we can view estimation of g. On the other hand one may be interested in estimating the error variance σ2 through which we can estimate the ability of the experimenter.

Example 5 Suppose an experiment is conducted by measuring the length of lives in hours of n electric bulbs produced by a certain company.

Let Xi be the length of live for the ith bulb.

X ={(x1, x2, . . . xn) :xi ≥0 for all i}

If we assume that the distribution of eachXi is exponential with meanθthen Θ = (0,∞) and and the probability function of X isp(x, θ) = Πn

i=1 1

θexiθ,x∈ X,θ ∈Θ. We may want to estimate the parameter θ or g(θ) =e60θ , which represents the probability that the lifetime of a bulb will be at least 60 hours.

Objective

The distribution of X is characterized by the unknown parameter θ about which we only know that it belongs to the parameter space Θ. To discuss the problem of point estimation, for the sake of simplicity, we consider the case when the parameter of interest is a real valued function g =g(θ) of θ.

In point estimation we try to approximate g(θ) on the basis of the observed value xof X. In other words we try to put forward a particular statistic or a function of X, say T = T(X), which would represent the unknown g(θ)

### red!50

(5)

very closely. Such a statistic T is called an estimator or a point estimator of g(θ). Mathematically, T is a measurable mapping from X to the space of g(θ) and it is called an admissible estimator. Any observed value of T is called an estimate of g(θ). In a nutshell, a point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called a point estimator of θ. It is to be noted that for a particular estimator T for a parameter θ the estimate of θ may vary from sample to sample.

Example Suppose we want to estimate θ in Example 1.We may use the statistic T = n1

n

X

i=1

Xi as an estimator of θ. Here T is a mapping from X to (0,1) and it is admissible.

Updating...

## References

Related subjects :