Paper: Regression Analysis III Module: Components of GLM

(1)

Subject: Statistics

Paper: Regression Analysis III Module: Components of GLM

Regression Analysis III 1 / 14

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray,Analytics professional, Kolkata

Content reviewer: Department of Statistics, University of Calcutta

(3)

Components of a GLM

There are 3 components of a GLM :

I Random component

I Systematic component

I Link function

(4)

Components of a GLM

I Random component

I Link function

(5)

Components of a GLM

I Random component

I Link function

(6)

Components of a GLM

I Random component

I Link function

(7)

The random component

I ConsiderY to be the response variable.

I Y = (Y₁, Y₂, . . . , Y_n) : Y_i’s are independently distributed.

I We are considering one parameter exponential families here so we assume the distribution of Y to belong to that family and have the form :

f(y) =e⁽

yθ−b(θ)

a(φ) +C(y,φ))

I θ : natural or canonical parameters

I φ: dispersion parameters

(8)

f(y) =e⁽

yθ−b(θ)

a(φ) +C(y,φ))

(9)

f(y) =e⁽

yθ−b(θ)

a(φ) +C(y,φ))

(10)

f(y) =e⁽

yθ−b(θ)

a(φ) +C(y,φ))

(11)

Examples of One parameter Exponential family

I Normal: f(y) =e⁽

yµ−µ2/2 σ2 +[−^y²

2σ2−log√ 2πσ])

I Binomial: f(y) =e

(ylog_1−π^π +nlog(1−π)+log





n y



)

(12)

Examples of One parameter Exponential family

I Normal: f(y) =e⁽

yµ−µ2/2 σ2 +[−^y²

2σ2−log√ 2πσ])

I Binomial: f(y) =e

(ylog_1−π^π +nlog(1−π)+log





n y



)

(13)

Members of the one parameter exponential family

Other members of the exponential family

I Poisson

I Binomial

I Negative Binomial

I Gamma

I Inverse Gaussian

(14)

I Poisson

I Binomial

I Negative Binomial

I Gamma

I Inverse Gaussian

(15)

I Poisson

I Binomial

I Negative Binomial

I Gamma

I Inverse Gaussian

(16)

I Poisson

I Binomial

I Negative Binomial

I Gamma

I Inverse Gaussian

(17)

I Poisson

I Binomial

I Negative Binomial

I Gamma

I Inverse Gaussian

(18)

Systematic component

I Consists of the explanatory variable x and a linear function of x andβ

I x= (x₁, x₂, . . . , x_p) : set ofp explanatory variables

I β = (β₁, β₂, , β_p) : parameter vector

I η =x⁰β =Pp

j=1β_jx_j : linear predictor

(19)

I η =x⁰β =Pp

(20)

I η =x⁰β =Pp

(21)

I η =x⁰β =Pp

(22)

Link function

I It is a link between the random and systematic components.

I It is a function that specifies the relationship between the expected value of the random component and the systematic component.

I ηi =g(µi) whereµi=E(Yi)

and, g(.) is a continuous, monotone and differentiable function

I Identity link : g(µ_i) =µ_i

I Canonical link : g(µ) =θ

(23)

Link function

I Canonical link : g(µ_i) =θ_i

(24)

Link function

(25)

Link function

(26)

Link function

(27)

Link function

(28)

Example

Example : Yi ∼Bernoulli(pi)

∴ ,µ_i =E(Y_i) =p_i

I logit link : g(µ_i) =log(_1−µ^µⁱ

i)

I probit link : g(µi) = Φ⁻¹(µi)

I linear probability model : g(µi) =µi

(29)

Example

∴ ,µ_i =E(Y_i) =p_i

i)

(30)

Example

∴ ,µ_i =E(Y_i) =p_i

i)

(31)

Example

∴ ,µ_i =E(Y_i) =p_i

i)

(32)

Example

∴ ,µ_i =E(Y_i) =p_i

i)

(33)

Example

∴ ,µ_i =E(Y_i) =p_i

i)

(34)

Likelihood estimation

log likelihood function based on a single observation :

I L_i = ^yⁱ^θ_a(φⁱ^−b(θⁱ⁾

i) +C(y_i, φ_i)

I δLi

δθi = ^yⁱ^−b_a(φ⁰^(θⁱ⁾

i)

I E(^δL_δθⁱ

i) = 0⇒E(Yi) =µi =b⁰(θi)

I (^δ_δθ²^L2ⁱ i

) =−^b_a(φ⁰⁰^(θⁱ⁾

i)

(35)

i) +C(y_i, φ_i)

I δLi

i)

I E(^δL_δθⁱ

i) = 0⇒E(Yi) =µi =b⁰(θi)

) =−^b_a(φ⁰⁰^(θⁱ⁾

i)

(36)

i) +C(y_i, φ_i)

I δLi

i)

I E(^δL_δθⁱ

i) = 0⇒E(Yi) =µi =b⁰(θi)

) =−^b_a(φ⁰⁰^(θⁱ⁾

i)

(37)

i) +C(y_i, φ_i)

I δLi

i)

I E(^δL_δθⁱ

i) = 0⇒E(Yi) =µi =b⁰(θi)

) =−^b_a(φ⁰⁰^(θⁱ⁾

i)

(38)

i) +C(y_i, φ_i)

I δLi

i)

I E(^δL_δθⁱ

i) = 0⇒E(Yi) =µi =b⁰(θi)

) =−^b_a(φ⁰⁰^(θⁱ⁾

i)

(39)

Variance function

E(^δL_δθⁱ

i)² =E[^yⁱ^−b_a(φ⁰^(θⁱ⁾

i) ]²

= ^{V ar(Y}_[a(φ ⁱ⁾

i)]²

= ^b_a(φ⁰⁰^(θⁱ⁾

i)

⇒V ar(Yi) =b⁰⁰(θi)a(φi)

V =b⁰⁰(θi) : Variance function

V =V(µi) = ^db_d(θ⁰^(θⁱ⁾

i) = ^d(µ_d(θⁱ⁾

i)

(40)

Variance function

E(^δL_δθⁱ

i)² =E[^yⁱ^−b_a(φ⁰^(θⁱ⁾

i) ]²

= ^{V ar(Y}_[a(φ ⁱ⁾

i)]²

= ^b_a(φ⁰⁰^(θⁱ⁾

i)

V =V(µ) = ^db⁰^(θⁱ⁾ = ^d(µⁱ⁾

(41)

Variance function

E(^δL_δθⁱ

i)² =E[^yⁱ^−b_a(φ⁰^(θⁱ⁾

i) ]²

= ^{V ar(Y}_[a(φ ⁱ⁾

i)]²

= ^b_a(φ⁰⁰^(θⁱ⁾

i)

V =V(µi) = ^db_d(θ⁰^(θⁱ⁾

i) = ^d(µ_d(θⁱ⁾

i)

(42)

Variance function

E(^δL_δθⁱ

i)² =E[^yⁱ^−b_a(φ⁰^(θⁱ⁾

i) ]²

= ^{V ar(Y}_[a(φ ⁱ⁾

i)]²

= ^b_a(φ⁰⁰^(θⁱ⁾

i)

V =V(µ) = ^db⁰^(θⁱ⁾ = ^d(µⁱ⁾

(43)

Salient features of GLM

I GLM does not transform Y, it only transforms the mean (E(Y)) and models it as a function of linear predictors.

I The objective is to investigate whether and how the mean varies as a function of the levels of our predictor or explanatory variables.

I Link function transforms the model to a linear model and retains the assumptions of normality with different mean for each observation Y_i.

(44)

(45)

(46)

Salient features of GLM contd ...

I GLM relaxes normality.

I GLM allows for non-uniform variance.

I Variance of each observation Yi is a function of the meanµi.

I Distribution is completely specified in terms of its mean and variance.

(47)

(48)

(49)

(50)

Summary

I Random component : the response variable Yi and it belongs to the one parameter exponential family.

I Systematic component : linear function of the explanatory variables (linear predictor).

I Link function : links the random component with the systematic component to make the relationship linear.

I Variance of each observation is a function of the mean of that observation.

(51)

Summary

(52)

Summary

(53)

Summary