### Statistical Inference I

Shirsendu Mukherjee

Department of Statistics, Asutosh College, Kolkata, India.

shirsendu st@yahoo.co.in Introduction to statistical inference

In the present module we are going to introduce the concept of point estima- tion which is a part of statistical inference. Statistical inference is the process of going from information gained from a sample to infer about a population from which the sample is taken. There are two aspects of statistical inference that we will be studying in this course: (i) Estimation and (ii) Hypothesis testing. In an estimation problem some features of the population in which an enquirer is interested may be completely unknown to him, and he may want to make a guess about this feature completely on the basis of a random sample from the population. There are two types of estimation problem : (i) Point estimation and (ii) Interval estimation. In this lecture we shall discuss some preliminary concept of point estimation.Let us start our discussion with a brief history of estimation problem.

Historical Perspective The problem of estimation arose in a very natural way in problems of Astronomy and Geodesy in the first half of the 18-th century. For example in Astronomy, the determination of interplanetary distances, determining the position of planets and their movements in time were some of the important problems. Whereas in Geodesy, determining the spheroidical shape of the earth was one of the most important problems. It is known that the figure of the earth is almost a sphere except for some flatness near the poles. Observations were obtained on the measurement of the length of one degree of a certain meridian and the problem was to determine the parameters say α and β which specified the spheroid of the earth. Indirect observations on (α, β) were given by the relation

Y_{i} =α+βx_{i}, i= 1,2, . . . , n

Where xi’s are known fixed constants. Note that (α, β) are uniquely deter-
mined if only two observations on Y at different values of (x_{1}, x_{2}) are avail-
able.However, as is customary in science, several observations were made at
different values (x1, x2, . . . xn) and this led to the theory of combination of
observations with random error which directly or indirectly measured ”mag-
nitudes of interest or parameters. To estimate α and β on the basis of the

### red!50

given data the first attempt was made by Rogerr Boscovich(1757) in course of a geodetic study of ellipticity (extent of flatness at the poles) of the earth.

He suggested that the estimates of (α, β) are to be determined such that (i) The sum of positive and negative residuals or errors should balance i.e.

Pn

i=1(y_{i}−α−βx_{i}) = 0 and

(ii) Subject to the above constraint we determine (α, β) such that R =

Pn

i=1|y_{i}−α−βx_{i}| The sum of absolute values of errors e_{i} =y_{i}−α−βx_{i} is
as small as possible.

Using geometric argument Boscovich solve the problem for the five obser- vations that he had. Laplace (1789) gave a general algebric algorithm to obtain estimates of (α and β) on the above principles for any number of observations. This problem was later solved by Gauss and Legendre using the method of least squares.

Boscovich has made the assumption that the errors of overestimation and un-
derestimation must balance out. This idea was used by so many researchers in
future time. For estimating the parameterθin the simplest modelY_{i} =θ+e_{i},
Simpson(1776) used this idea by assuming that errors are symmetrically uni-
formly distributed about zero or the probability density function of the error
is given by f(e) = _{2h}^{1} ,−h < e < h, h > 0. Euler (1778) proposed the arc
of a parabolic curve given by f(e) = _{4r}^{3}3(r^{2}−e^{2}),−r < e < r, r > 0 as the
pdf of the random error. Laplace suggested the probability density function
f(e) = _{2h}^{1} exp^{h}^{−|e|}_{h} ^{i},−∞ < e < ∞. As the model for distribution of errors
and Gauss proposed the normal distribution with probability density func-
tion f(e) = ^{√}^{1}

2πh^{2}exp^{h}^{−e}_{2h}2^{2}

i,−∞< e <∞. It is important to point out here that the double exponential distribution used by Laplace to represent error distribution led to the median of the sample of the ”best” estimator of the

”True Value” of the parameter θ whereas the normal distribution used by Gauss led to the mean of the sample as the ”best” estimator of the ”True value”.

Theory of Point estimation Background

We consider a random experiment E. The outcome ofE is represented by a
Observable random vector X = (X1, X2, . . . , Xn), n ≥1. A particular value
of X is denoted by x = (x_{1}, x_{2}, . . . , x_{n}). The character X could be real or
vector valued and the set of all values of X is called the sample space and it
is denoted by X ⊂ R^{n}.

The random vector X is generated by F(x) = P(X ≤ x), x∈ X, the dis- tribution function of X.

### red!50

In a parametric point estimation problem we assume that the functional form
of F(x) is known except perhaps for a certain number of parameters. Let
θ = (θ1, θ2, . . . , θk) be the unknown parameters associated with F(x). The
parameter θ may be real valued or vector valued and is usually called a la-
belling or indexing parameter. The labelling parameter θ varies over a set of
values, called as parameter space and is denoted by Θ ⊂ R^{k}. So F(x) can
be looked upon as a function of θ and henceforth we will write it as F_{θ}(x).

IfX is discrete or absolutely continuous thenF(x) is generated byf_{θ}(x), the
probability mass function(p.m.f.) or of probability density function (p.d.f.)
X. We write F_{θ} = {p(x, θ) : θ ∈ Θ}, as the class of all probability mass or
density functions. The object of inference is the parameterθ or a function of
the parameter θ say g(θ), which is of interest. Let us consider few examples.

Example 1 Suppose a coin is tossed 50 times.

The outcome of ith toss can be described by a random variableX_{i} such that
Xi = 1or 0 according as the ith toss results in a head or a tail.

X ={(x_{1}, x_{2}, . . . x_{50}) :x_{i} = 0 or 1 for all i}

If θ be the probability of getting a head in any toss then Θ = (0,1) and the
probability function of X is p(x, θ) = Π^{50}

i=1θ^{x}^{i}(1−θ)^{1−x}^{i},x ∈ X,θ ∈ Θ. We
may want to estimate θ or any function ofθ.

Example 2 Suppose that 100 seeds of a certain flower were planted one in
each pot and letX_{i} equal one or zero according as the seed in theithpot ger-
minates or not. The data consists of (x_{1}, x_{2}, . . . x_{100}) a sequence of ones and
zeroes and is regarded as a realization of (X_{1}, X_{2}, . . . X_{100}) such that compo-
nents are i.i.d. random variables with P[X_{1} = 1] =θ andP[X_{1} = 0] = 1−θ,
where θ represents the probability that a seed germinates. The object of
estimation is θitself or a functiong(θ) that may be of interest. For example,
considerg(θ) =^{}^{10}_{8}^{}θ^{8}(1−θ)^{2}, which is the probability that in a batch of 10
seeds exactly 8 seeds will germinate.

Example 3In a pathbreaking experiment Rutherford Chadwick and Elis(1920)
observed 2608 time intervals of 7.5 seconds each and counted the number of
time intervals N_{r} in which exactly r number of α particles hit the counter.

They obtained the following table

r 0 1 2 3 4 5 6 7 8 9 ≥ 10

N_{r} 57 203 383 525 532 408 273 139 45 27 18

### red!50

It is quite well known that the Poisson distribution with p.m.f f_{θ}(x) =

exp(−θ)θ^{x}

x! , x= 1,2, . . . θ >0 serves as a good model for the number of times a
given event E occurs in a unit time interval. If X_{i} denotes the number of α
particles hitting the counter in the i−th time interval then (X_{1}, X_{2}, . . . X_{n})
where n = 2608 are i.i.d. Poisson random variables with parameter θ. We
may want to estimate θ on the basis of the given data.

Example 4 Consider determination for an ideal physical constant such as
gravity g. Usual way to estimate g is by the pendulum experiment and
observe X = ^{4π}_{T}^{2}^{l}, where l is the length of the pendulum and T the time re-
quired for a fixed number of oscillations. Due to variation which depends on
several factors such as the skill of the experimentor and measurement errors,
the i−th observation X_{i} =g +e_{i} where e_{i} is the random error. Assuming
the distribution of error is normal with zero mean and variance σ^{2} we have
X_{1}, X_{2}, . . . , X_{n} are i.i.d.N(g, σ^{2}). Here the parameter θ is a two dimensional
vector, θ = (g, σ^{2}). Here we can view estimation of g. On the other hand
one may be interested in estimating the error variance σ^{2} through which we
can estimate the ability of the experimenter.

Example 5 Suppose an experiment is conducted by measuring the length of lives in hours of n electric bulbs produced by a certain company.

Let X_{i} be the length of live for the ith bulb.

X ={(x_{1}, x_{2}, . . . x_{n}) :x_{i} ≥0 for all i}

If we assume that the distribution of eachX_{i} is exponential with meanθthen
Θ = (0,∞) and and the probability function of X isp(x, θ) = Π^{n}

i=1 1

θe^{−}^{xi}^{θ},x∈
X,θ ∈Θ. We may want to estimate the parameter θ or g(θ) =e^{−}^{60}^{θ} , which
represents the probability that the lifetime of a bulb will be at least 60 hours.

Objective

The distribution of X is characterized by the unknown parameter θ about which we only know that it belongs to the parameter space Θ. To discuss the problem of point estimation, for the sake of simplicity, we consider the case when the parameter of interest is a real valued function g =g(θ) of θ.

In point estimation we try to approximate g(θ) on the basis of the observed value xof X. In other words we try to put forward a particular statistic or a function of X, say T = T(X), which would represent the unknown g(θ)

### red!50

very closely. Such a statistic T is called an estimator or a point estimator of g(θ). Mathematically, T is a measurable mapping from X to the space of g(θ) and it is called an admissible estimator. Any observed value of T is called an estimate of g(θ). In a nutshell, a point estimate of a parameter θ is a single number that can be regarded as a sensible value for θ. A point estimate is obtained by selecting a suitable statistic and computing its value from the given sample data. The selected statistic is called a point estimator of θ. It is to be noted that for a particular estimator T for a parameter θ the estimate of θ may vary from sample to sample.

Example Suppose we want to estimate θ in Example 1.We may use the
statistic T = _{n}^{1}

n

X

i=1

X_{i} as an estimator of θ. Here T is a mapping from X to
(0,1) and it is admissible.