Paper: Regression Analysis III Module: Polytomous regression II

(1)

Subject: Statistics

Paper: Regression Analysis III Module: Polytomous regression II

Regression Analysis III 1 / 19

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray,Analytics professional, Kolkata

Content reviewer: Department of Statistics, University of Calcutta

(3)

Polytomous data

I Polytomous data

The response can choose from one of m categories with probability π

₁

, π

₂

, ..., π

_m

m

X

j=1

π

j

= 1

I Example :

Colours - nominal Opinion categories - ordinal Religion - nominal

Education level - ordinal

I

For ordinal data it is usual to model on the basis of the cumulative probabilities

γ

_j

=

X

k≤j

π

_k

, j = 1(1)m

(4)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

_j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

_j

1

−

γ

j

= α

j −

x

¯

0

β

¯ , j = 1(1)m

with α

1≤

α

2 ≤

...

≤

α

m

to ensure γ

1 ≤

γ

2 ≤

...

≤

γ

m

. log of odds of j

^th

or less category against greater than j

^th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

(5)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

_j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

_j

1

−

γ

j

= α

j −

x

¯

0

β

¯ , j = 1(1)m

with α

1≤

α

2 ≤

...

≤

α

m

to ensure γ

1 ≤

γ

2 ≤

...

≤

γ

m

. log of odds of j

^th

or less category against greater than j

^th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

(6)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

_j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

_j

1

−

γ

j

= α

j −

x

¯

0

β

¯ , j = 1(1)m

with α

1≤

α

2 ≤

...

≤

α

m

to ensure γ

1 ≤

γ

2 ≤

...

≤

γ

m

. log of odds of j

^th

or less category against greater than j

^th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

(7)

Proportional Odds model

I

For any two individuals with say x

1

¯ and x

2

¯

γj(

x

1

¯

)/[1−γ_j(

x

1

¯

^)]

γj(

x

₂

¯

^)/[1−γ^j⁽

x

₂

¯

^)]

= e

⁽

x

₂

¯

⁻

x

₁

¯

⁾

0

β

¯

←

independent of j

⇒

Same ratio of odds overall categories for any two individuals proportional odds model.

(8)

Complementary log - log link

I

log{−log(1

−

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γ_j(

x

2

¯

⁾⁾

= e

⁽

x

2

¯

−

x

1

¯

⁾

0

β

¯

←

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

^th

category.

(Y

₁

, Y

₂

, ...., Y

_m

)

∼

mult(n, π

₁

, ..., π

_m

) ,

m

X

j=1

Y

_j

= n,

m

X

j=1

π

_j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

−

π

j

) Cov(Y

_j

, Y

_K

) =

−nπ_j

π

_k

(9)

Complementary log - log link

I

log{−log(1

−

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γ_j(

x

2

¯

⁾⁾

= e

⁽

x

2

¯

−

x

1

¯

⁾

0

β

¯

←

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

^th

category.

(Y

₁

, Y

₂

, ...., Y

_m

)

∼

mult(n, π

₁

, ..., π

_m

) ,

m

X

j=1

Y

_j

= n,

m

X

j=1

π

_j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

−

π

j

) Cov(Y

_j

, Y

_K

) =

−nπ_j

π

_k

(10)

Complementary log - log link

I

log{−log(1

−

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γ_j(

x

2

¯

⁾⁾

= e

⁽

x

2

¯

−

x

1

¯

⁾

0

β

¯

←

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

^th

category.

(Y

₁

, Y

₂

, ...., Y

_m

)

∼

mult(n, π

₁

, ..., π

_m

) ,

m

X

j=1

Y

_j

= n,

m

X

j=1

π

_j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

−

π

j

) Cov(Y

_j

, Y

_K

) =

−nπ_j

π

_k

(11)

Complementary log - log link

I

log{−log(1

−

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γ_j(

x

2

¯

⁾⁾

= e

⁽

x

2

¯

−

x

1

¯

⁾

0

β

¯

←

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

^th

category.

(Y

₁

, Y

₂

, ...., Y

_m

)

∼

mult(n, π

₁

, ..., π

_m

) ,

m

X

j=1

Y

_j

= n,

m

X

j=1

π

_j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

−

π

j

) Cov(Y

_j

, Y

_K

) =

−nπ_j

π

_k

(12)

Polytomous data

I

Very often while working with ordinal data and the γ

_j

we instead look at Z

_j

= Y

₁

+ Y

₂

+ ... + Y

_j







z

1

z

₂

.. . z

m







= A







Y

1

Y

₂

.. . Y

m







where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

−

γ

j

)

(13)

Polytomous data

I

Very often while working with ordinal data and the γ

_j

we instead look at Z

_j

= Y

₁

+ Y

₂

+ ... + Y

_j







z

1

z

₂

.. . z

m







= A







Y

1

Y

₂

.. . Y

m







where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

−

γ

j

)

(14)

Polytomous data

I

Very often while working with ordinal data and the γ

_j

we instead look at Z

_j

= Y

₁

+ Y

₂

+ ... + Y

_j







z

1

z

₂

.. . z

m







= A







Y

1

Y

₂

.. . Y

m







where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

−

γ

j

)

(15)

Polytomous data

I

Very often while working with ordinal data and the γ

_j

we instead look at Z

_j

= Y

₁

+ Y

₂

+ ... + Y

_j







z

1

z

₂

.. . z

m







= A







Y

1

Y

₂

.. . Y

m







where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

−

γ

j

)

(16)

Polytomous data

Cov(z

j

, z

_k

) = Cov(

j

X

l=1

Y

_l

,

k

X

l=1

Y

_l

), j < k

= n

j

X

l=1

π

_j

(1

−

π

_j

) +

j

X

l=1 j

X

l⁰=1l6=l⁰

Cov(Y

_l

, Y

_l⁰

)

= nγ

j −

n

j

X

l=1

π

_j²

(17)

Polytomous data

Cov(z

j

, z

_k

) = Cov(

j

X

l=1

Y

_l

,

k

X

l=1

Y

_l

), j < k

= n

j

X

l=1

π

_j

(1

−

π

_j

) +

j

X

l=1 j

X

l⁰=1l6=l⁰

Cov(Y

_l

, Y

_l⁰

)

= nγ

j −

n

j

X

l=1

π

_j²

(18)

Summary

I

Cumulative probabilities : γ

j

=

P

k≤j

π

k

, j = 1(1)m

I Link function: Multiple logit:

log

_1−γ^γ^j

j

= α

j−

x

¯

0

β

¯ , j = 1(1)m with α

1≤

α

2 ≤

...

≤

α

m

to ensure γ

1 ≤

γ

2 ≤

...

≤

γ

m

.

I Proportional odds model:

γj(

x

1

¯

)/[1−γj(

x

1

¯

^)]

γj(

x

₂

¯

^)/[1−γ^j⁽

x

₂

¯

^)]

= e

⁽

x

₂

¯

−

x

₁

¯

⁾

0

β

¯

←

independent of j

I Complementary log - log link

log{−log(1

−

γ

j

)} = α

j−

x

¯

0

β

¯ , α

1 ≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γj(

x

₂

¯

⁾⁾

= e

⁽

x

₂

¯

⁻

x

₁

¯

⁾

0

β

¯

←

proportional hazard.

(19)

Summary

I

Cumulative probabilities : γ

j

=

P

k≤j

π

k

, j = 1(1)m

I Link function: Multiple logit:

log

_1−γ^γ^j

j

= α

j−

x

¯

0

β

¯ , j = 1(1)m with α

1≤

α

2 ≤

...

≤

α

m

to ensure γ

1 ≤

γ

2 ≤

...

≤

γ

m

.

I Proportional odds model:

γj(

x

1

¯

)/[1−γj(

x

1

¯

^)]

γj(

x

₂

¯

^)/[1−γ^j⁽

x

₂

¯

^)]

= e

⁽

x

₂

¯

−

x

₁

¯

⁾

0

β

¯

←

independent of j

I Complementary log - log link

log{−log(1

−

γ

j

)} = α

j−

x

¯

0

β

¯ , α

1 ≤

α

2 ≤

...

≤

α

m

Two individuals with x

₁

¯ and x

₂

−log(1−γ_j(

x

1

¯

⁾⁾

−log(1−γj(

x

₂

¯

⁾⁾

= e

⁽

x

₂

¯

⁻

x

₁

¯

⁾

0

β

¯

←

proportional hazard.

(20)

ILLUSTRATION ON MENTAL IMPAIRMENT DATA

¹

## Source of data : Agresti, A. (2014). Categorical data

## analysis. John Wiley and Sons. Regression Analysis III impair = read.table("mental.txt",header=T)

summary(impair) dim(impair) attach(impair)

tapply(life,mental,mean)

tapply(life[ses==1],mental[ses==1],mean) tapply(life[ses==0],mental[ses==0],mean)

1http://www.stat.uchicago.edu/ meiwang/courses/CDA-Rfiles/mlogit- Rcode-class.txt

Regression Analysis III Sample R script to complement this module 10 / 19

(21)

Illustration contd ...

###### boxplots ##########

par(mfrow=c(2,2))

boxplot(life~mental,xlab="mental",ylab="life evnets") boxplot(mental~ses,ylab="mental",xlab="ses")

boxplot(life[ses==0]~mental[ses==0],xlab="mental", ylab="life events")

title("ses = 0")

boxplot(life[ses==1]~mental[ses==1],xlab="mental", ylab="life events")

title("ses = 1")

cbind(impair[1:10,],impair[11:20,],impair[21:30,], impair[31:40,])

(22)

Illustration contd ...

#####################################################

#

# Use 'vglm' in library(VGAM) - ordinal logit

# on Mental Impairment data

#

#####################################################

# install.packages("VGAM") library(VGAM)

##### Model ordinal logit: mental ~ life, without ses summary(vglm(mental~life,family=cumulative(parallel=T)))

##### Model ordinal logit: mental ~ life + ses

summary(vglm(mental~life+ses,family=cumulative(parallel=T)))

##### Model ordinal logit: mental ~ life + ses + life*ses summary(vglm(mental~life*ses,family=cumulative(parallel=T)))

(23)

Illustration contd ...

# Use 'polr' in library(MASS) --- Alternative, not so good

# !!!! Important !!!!

# Negative of the effects (-\beta) is used in "polr":

# log(P(Y =< j)/P(Y>j)) = intercept - beta*x

# So change signs of all input variables !!!

# Or change signs of all estimated parameters !!!

fmental=factor(mental) library(MASS)

newlife = -life newses = -ses

######## mental ~ newlife ###############

summary(polr(fmental~newlife))

######### Compare with using the old variable without *(-1) summary(polr(fmental~life))

######## mental ~ newlife + newses ###############

summary(polr(fmental~newlife+newses))

(24)

Illustration contd ...

########### compared with old vars without *(-1) ######

summary(polr(fmental~life+ses))

##### plot prob curve (fmental ~ newlife + newses) for y > j

# ps.options(paper="letter") x=c(0:90)/10

# x=c(0:100)/2 -20 # fuller range

#### ses = 1 (dash)

# P(Y>3)

plot(x,1/(1+exp((-0.33*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life", ylab="P(Y>j)",lty=2) ## w/o "-" in the exp()

# P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y>1)

points(x,1/(1+exp((-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)

(25)

Illustration contd ...

#### ses=0 (solid)

points(x,1/(1+exp((-0.32*x+2.21))),type="l",col="blue",lwd=2) # P(Y<=3) points(x,1/(1+exp((-0.32*x+1.21))),type="l",col="green",lwd=2) # P(Y<=2) points(x,1/(1+exp((-0.32*x-0.28))),type="l",col="pink",lwd=2) # P(Y<=1) title("Mental=well(=1,higher curve),mild,moderate,

\n impaired(=4); ses=0(solid),1");dev.copy2eps()

##### plot prob curve (fmental ~ newlife + newses) for Y=j

#ps.options(paper="letter") x=c(0:90)/10

par(mfrow=c(2,1))

#x=c(0:100)/2 -20 # fuller range

### P(Y=4)=P(Y>3) ses=1

plot(x,1/(1+exp((-0.32*x + 1.11 +2.21))),type="l",lwd=2, ylim=c(0,1),col="red",xlab="life",ylab="P(Y=j)",lty=2)

title("Mental = well(pink), mild,moderate,impaired(red);ses=1")

### P(Y=3)=P(Y>2)-P(Y>3)

points(x,1/(1+exp((-0.32*x +1.11+1.21)))-1/(1+exp((-0.32*x + 1.11+2.21))),type="l",lwd=2,col="blue",lty=2)

(26)

Illustration contd ...

### P(Y=2)=P(Y>1)-P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11 - 0.28))) - 1/(1+exp((-0.32*x + 1.11 + 1.21))), type="l",lwd=2,col="green",lty=2)

### P(Y=1)=P(Y<=1)

points(x,1/(1+exp(-(-0.32*x+1.11-0.28))), type="l",lwd=2,col="pink",lty=2)

plot(x,1/(1+exp((-0.32*x +2.21))),type="l",lwd=2, ylim=c(0,1),col="red",xlab="life",ylab="P(Y=j)")

points(x,1/(1+exp((-0.32*x+1.21)))-1/(1+exp((-0.32*x+2.21))), type="l",col="blue",lwd=2)

points(x,1/(1+exp((-0.32*x - 0.28)))- 1/(1+exp((-0.32*x + 1.21))), type="l",col="green",lwd=2)

points(x,1/(1+exp(-(-0.32*x-0.28))),type="l",col="pink",lwd=2) title("Mental = well(pink), mild,moderate,impaired(red);ses=0") dev.copy2eps()

(27)

Illustration contd ...

##### plot prob curve (fmental ~ newlife + newses) for y <= j

##### ses = 1 (dash)

# P(Y<=3)

plot(x,1/(1+exp(-(-0.32*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life",

#ylab="P(Y>j)",lty=2) ## w/o "-" in the exp() ylab="P(Y=<j)",lty=2) ## w/ "-"

# P(Y<=2)

points(x,1/(1+exp(-(-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y<=1)

points(x,1/(1+exp(-(-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)

(28)

Illustration contd ...

###### ses=0 (solid)

points(x,1/(1+exp(-(-0.32*x+2.21))),type="l",col="blue",lwd=2)

# P(Y<=3)

points(x,1/(1+exp(-(-0.32*x+1.21))),type="l",col="green",lwd=2)

# P(Y<=2)

points(x,1/(1+exp(-(-0.32*x-0.28))),type="l",col="pink",lwd=2)

# P(Y<=1)

title("Mental = well(=1,lower curve), mild, moderate,\n impaired(=4); ses=0(solid),1")

dev.copy2eps()

########### compared with old vars without *(-1) ######

summary(polr(fmental~life+ses))

(29)

Illustration contd ...

##### plot prob curve (fmental ~ newlife + newses) for y > j

#### ses = 1 (dash)

# P(Y>3)

plot(x,1/(1+exp((-0.33*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life", ylab="P(Y>j)",lty=2) ## w/o "-" in the exp()

# P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y>1)

points(x,1/(1+exp((-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)