• No results found

Paper: Regression Analysis III Module: Polytomous regression II

N/A
N/A
Protected

Academic year: 2022

Share "Paper: Regression Analysis III Module: Polytomous regression II"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Subject: Statistics

Paper: Regression Analysis III Module: Polytomous regression II

Regression Analysis III 1 / 19

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray,Analytics professional, Kolkata

Content reviewer: Department of Statistics, University of Calcutta

Regression Analysis III 2 / 19

(3)

Polytomous data

I Polytomous data

The response can choose from one of m categories with probability π

1

, π

2

, ..., π

m

m

X

j=1

π

j

= 1

I Example :

Colours - nominal Opinion categories - ordinal Religion - nominal

Education level - ordinal

I

For ordinal data it is usual to model on the basis of the cumulative probabilities

γ

j

=

X

k≤j

π

k

, j = 1(1)m

Regression Analysis III 3 / 19

(4)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

j

1

γ

j

= α

j

x

¯

0

β

¯ , j = 1(1)m

with α

1

α

2

...

α

m

to ensure γ

1

γ

2

...

γ

m

. log of odds of j

th

or less category against greater than j

th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

Regression Analysis III 4 / 19

(5)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

j

1

γ

j

= α

j

x

¯

0

β

¯ , j = 1(1)m

with α

1

α

2

...

α

m

to ensure γ

1

γ

2

...

γ

m

. log of odds of j

th

or less category against greater than j

th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

Regression Analysis III 4 / 19

(6)

Cumulative logit model

I

(I) Generally easier to interpret.

I

(II) Categories may not be well defined and hence γ

j

more meaningful than π

j

, j = 1(1)m

I Link functions :

Multiple logit : log γ

j

1

γ

j

= α

j

x

¯

0

β

¯ , j = 1(1)m

with α

1

α

2

...

α

m

to ensure γ

1

γ

2

...

γ

m

. log of odds of j

th

or less category against greater than j

th

category modelled against linear predictor.

The linear predictors for different categories are parallel.

Regression Analysis III 4 / 19

(7)

Proportional Odds model

I

For any two individuals with say x

1

¯ and x

2

¯

γj(

x

1

¯

)/[1−γj(

x

1

¯

)]

γj(

x

2

¯

)/[1−γj(

x

2

¯

)]

= e

(

x

2

¯

x

1

¯

)

0

β

¯

independent of j

Same ratio of odds overall categories for any two individuals proportional odds model.

Regression Analysis III 5 / 19

(8)

Complementary log - log link

I

log{−log(1

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

th

category.

(Y

1

, Y

2

, ...., Y

m

)

mult(n, π

1

, ..., π

m

) ,

m

X

j=1

Y

j

= n,

m

X

j=1

π

j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

π

j

) Cov(Y

j

, Y

K

) =

−nπj

π

k

Regression Analysis III 6 / 19

(9)

Complementary log - log link

I

log{−log(1

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

th

category.

(Y

1

, Y

2

, ...., Y

m

)

mult(n, π

1

, ..., π

m

) ,

m

X

j=1

Y

j

= n,

m

X

j=1

π

j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

π

j

) Cov(Y

j

, Y

K

) =

−nπj

π

k

Regression Analysis III 6 / 19

(10)

Complementary log - log link

I

log{−log(1

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

th

category.

(Y

1

, Y

2

, ...., Y

m

)

mult(n, π

1

, ..., π

m

) ,

m

X

j=1

Y

j

= n,

m

X

j=1

π

j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

π

j

) Cov(Y

j

, Y

K

) =

−nπj

π

k

Regression Analysis III 6 / 19

(11)

Complementary log - log link

I

log{−log(1

γ

j

)} = α

j

- x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

I

Let there be n individuals each making a single choice and let y

j

= no. of individuals changing j

th

category.

(Y

1

, Y

2

, ...., Y

m

)

mult(n, π

1

, ..., π

m

) ,

m

X

j=1

Y

j

= n,

m

X

j=1

π

j

= 1

I

E(Y

j

) = nπ

j

V ar(Y

j

) = nπ

j

(1

π

j

) Cov(Y

j

, Y

K

) =

−nπj

π

k

Regression Analysis III 6 / 19

(12)

Polytomous data

I

Very often while working with ordinal data and the γ

j

we instead look at Z

j

= Y

1

+ Y

2

+ ... + Y

j

z

1

z

2

.. . z

m

= A

Y

1

Y

2

.. . Y

m

where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

γ

j

)

Regression Analysis III 7 / 19

(13)

Polytomous data

I

Very often while working with ordinal data and the γ

j

we instead look at Z

j

= Y

1

+ Y

2

+ ... + Y

j

z

1

z

2

.. . z

m

= A

Y

1

Y

2

.. . Y

m

where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

γ

j

)

Regression Analysis III 7 / 19

(14)

Polytomous data

I

Very often while working with ordinal data and the γ

j

we instead look at Z

j

= Y

1

+ Y

2

+ ... + Y

j

z

1

z

2

.. . z

m

= A

Y

1

Y

2

.. . Y

m

where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

γ

j

)

Regression Analysis III 7 / 19

(15)

Polytomous data

I

Very often while working with ordinal data and the γ

j

we instead look at Z

j

= Y

1

+ Y

2

+ ... + Y

j

z

1

z

2

.. . z

m

= A

Y

1

Y

2

.. . Y

m

where A is lower triangular matrix will all non-zero elements equal to 1.

E(z

j

) = n(π

1

+ π

2

+ ... + π

j

) = nγ

j

V ar(z

j

) = nγ

j

(1

γ

j

)

Regression Analysis III 7 / 19

(16)

Polytomous data

Cov(z

j

, z

k

) = Cov(

j

X

l=1

Y

l

,

k

X

l=1

Y

l

), j < k

= n

j

X

l=1

π

j

(1

π

j

) +

j

X

l=1 j

X

l0=1l6=l0

Cov(Y

l

, Y

l0

)

= nγ

j

n

j

X

l=1

π

j2

Regression Analysis III 8 / 19

(17)

Polytomous data

Cov(z

j

, z

k

) = Cov(

j

X

l=1

Y

l

,

k

X

l=1

Y

l

), j < k

= n

j

X

l=1

π

j

(1

π

j

) +

j

X

l=1 j

X

l0=1l6=l0

Cov(Y

l

, Y

l0

)

= nγ

j

n

j

X

l=1

π

j2

Regression Analysis III 8 / 19

(18)

Summary

I

Cumulative probabilities : γ

j

=

P

k≤j

π

k

, j = 1(1)m

I Link function: Multiple logit:

log

1−γγj

j

= α

j

x

¯

0

β

¯ , j = 1(1)m with α

1

α

2

...

α

m

to ensure γ

1

γ

2

...

γ

m

.

I Proportional odds model:

γj(

x

1

¯

)/[1−γj(

x

1

¯

)]

γj(

x

2

¯

)/[1−γj(

x

2

¯

)]

= e

(

x

2

¯

x

1

¯

)

0

β

¯

independent of j

I Complementary log - log link

log{−log(1

γ

j

)} = α

j

x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

Regression Analysis III 9 / 19

(19)

Summary

I

Cumulative probabilities : γ

j

=

P

k≤j

π

k

, j = 1(1)m

I Link function: Multiple logit:

log

1−γγj

j

= α

j

x

¯

0

β

¯ , j = 1(1)m with α

1

α

2

...

α

m

to ensure γ

1

γ

2

...

γ

m

.

I Proportional odds model:

γj(

x

1

¯

)/[1−γj(

x

1

¯

)]

γj(

x

2

¯

)/[1−γj(

x

2

¯

)]

= e

(

x

2

¯

x

1

¯

)

0

β

¯

independent of j

I Complementary log - log link

log{−log(1

γ

j

)} = α

j

x

¯

0

β

¯ , α

1

α

2

...

α

m

Two individuals with x

1

¯ and x

2

−log(1−γj(

x

1

¯

¯

))

−log(1−γj(

x

2

¯

))

= e

(

x

2

¯

x

1

¯

)

0

β

¯

proportional hazard.

Regression Analysis III 9 / 19

(20)

ILLUSTRATION ON MENTAL IMPAIRMENT DATA

1

## Source of data : Agresti, A. (2014). Categorical data

## analysis. John Wiley and Sons. Regression Analysis III impair = read.table("mental.txt",header=T)

summary(impair) dim(impair) attach(impair)

tapply(life,mental,mean)

tapply(life[ses==1],mental[ses==1],mean) tapply(life[ses==0],mental[ses==0],mean)

1http://www.stat.uchicago.edu/ meiwang/courses/CDA-Rfiles/mlogit- Rcode-class.txt

Regression Analysis III Sample R script to complement this module 10 / 19

(21)

Illustration contd ...

###### boxplots ##########

par(mfrow=c(2,2))

boxplot(life~mental,xlab="mental",ylab="life evnets") boxplot(mental~ses,ylab="mental",xlab="ses")

boxplot(life[ses==0]~mental[ses==0],xlab="mental", ylab="life events")

title("ses = 0")

boxplot(life[ses==1]~mental[ses==1],xlab="mental", ylab="life events")

title("ses = 1")

cbind(impair[1:10,],impair[11:20,],impair[21:30,], impair[31:40,])

Regression Analysis III Sample R script to complement this module 11 / 19

(22)

Illustration contd ...

#####################################################

#

# Use 'vglm' in library(VGAM) - ordinal logit

# on Mental Impairment data

#

#####################################################

# install.packages("VGAM") library(VGAM)

##### Model ordinal logit: mental ~ life, without ses summary(vglm(mental~life,family=cumulative(parallel=T)))

##### Model ordinal logit: mental ~ life + ses

summary(vglm(mental~life+ses,family=cumulative(parallel=T)))

##### Model ordinal logit: mental ~ life + ses + life*ses summary(vglm(mental~life*ses,family=cumulative(parallel=T)))

Regression Analysis III Sample R script to complement this module 12 / 19

(23)

Illustration contd ...

# Use 'polr' in library(MASS) --- Alternative, not so good

# !!!! Important !!!!

# Negative of the effects (-\beta) is used in "polr":

# log(P(Y =< j)/P(Y>j)) = intercept - beta*x

# So change signs of all input variables !!!

# Or change signs of all estimated parameters !!!

fmental=factor(mental) library(MASS)

newlife = -life newses = -ses

######## mental ~ newlife ###############

summary(polr(fmental~newlife))

######### Compare with using the old variable without *(-1) summary(polr(fmental~life))

######## mental ~ newlife + newses ###############

summary(polr(fmental~newlife+newses))

Regression Analysis III Sample R script to complement this module 13 / 19

(24)

Illustration contd ...

########### compared with old vars without *(-1) ######

summary(polr(fmental~life+ses))

##### plot prob curve (fmental ~ newlife + newses) for y > j

# ps.options(paper="letter") x=c(0:90)/10

# x=c(0:100)/2 -20 # fuller range

#### ses = 1 (dash)

# P(Y>3)

plot(x,1/(1+exp((-0.33*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life", ylab="P(Y>j)",lty=2) ## w/o "-" in the exp()

# P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y>1)

points(x,1/(1+exp((-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)

Regression Analysis III Sample R script to complement this module 14 / 19

(25)

Illustration contd ...

#### ses=0 (solid)

points(x,1/(1+exp((-0.32*x+2.21))),type="l",col="blue",lwd=2) # P(Y<=3) points(x,1/(1+exp((-0.32*x+1.21))),type="l",col="green",lwd=2) # P(Y<=2) points(x,1/(1+exp((-0.32*x-0.28))),type="l",col="pink",lwd=2) # P(Y<=1) title("Mental=well(=1,higher curve),mild,moderate,

\n impaired(=4); ses=0(solid),1");dev.copy2eps()

##### plot prob curve (fmental ~ newlife + newses) for Y=j

#ps.options(paper="letter") x=c(0:90)/10

par(mfrow=c(2,1))

#x=c(0:100)/2 -20 # fuller range

### P(Y=4)=P(Y>3) ses=1

plot(x,1/(1+exp((-0.32*x + 1.11 +2.21))),type="l",lwd=2, ylim=c(0,1),col="red",xlab="life",ylab="P(Y=j)",lty=2)

title("Mental = well(pink), mild,moderate,impaired(red);ses=1")

### P(Y=3)=P(Y>2)-P(Y>3)

points(x,1/(1+exp((-0.32*x +1.11+1.21)))-1/(1+exp((-0.32*x + 1.11+2.21))),type="l",lwd=2,col="blue",lty=2)

Regression Analysis III Sample R script to complement this module 15 / 19

(26)

Illustration contd ...

### P(Y=2)=P(Y>1)-P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11 - 0.28))) - 1/(1+exp((-0.32*x + 1.11 + 1.21))), type="l",lwd=2,col="green",lty=2)

### P(Y=1)=P(Y<=1)

points(x,1/(1+exp(-(-0.32*x+1.11-0.28))), type="l",lwd=2,col="pink",lty=2)

plot(x,1/(1+exp((-0.32*x +2.21))),type="l",lwd=2, ylim=c(0,1),col="red",xlab="life",ylab="P(Y=j)")

points(x,1/(1+exp((-0.32*x+1.21)))-1/(1+exp((-0.32*x+2.21))), type="l",col="blue",lwd=2)

points(x,1/(1+exp((-0.32*x - 0.28)))- 1/(1+exp((-0.32*x + 1.21))), type="l",col="green",lwd=2)

points(x,1/(1+exp(-(-0.32*x-0.28))),type="l",col="pink",lwd=2) title("Mental = well(pink), mild,moderate,impaired(red);ses=0") dev.copy2eps()

Regression Analysis III Sample R script to complement this module 16 / 19

(27)

Illustration contd ...

##### plot prob curve (fmental ~ newlife + newses) for y <= j

#ps.options(paper="letter") x=c(0:90)/10

#x=c(0:100)/2 -20 # fuller range

##### ses = 1 (dash)

# P(Y<=3)

plot(x,1/(1+exp(-(-0.32*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life",

#ylab="P(Y>j)",lty=2) ## w/o "-" in the exp() ylab="P(Y=<j)",lty=2) ## w/ "-"

# P(Y<=2)

points(x,1/(1+exp(-(-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y<=1)

points(x,1/(1+exp(-(-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)

Regression Analysis III Sample R script to complement this module 17 / 19

(28)

Illustration contd ...

###### ses=0 (solid)

points(x,1/(1+exp(-(-0.32*x+2.21))),type="l",col="blue",lwd=2)

# P(Y<=3)

points(x,1/(1+exp(-(-0.32*x+1.21))),type="l",col="green",lwd=2)

# P(Y<=2)

points(x,1/(1+exp(-(-0.32*x-0.28))),type="l",col="pink",lwd=2)

# P(Y<=1)

title("Mental = well(=1,lower curve), mild, moderate,\n impaired(=4); ses=0(solid),1")

dev.copy2eps()

########### compared with old vars without *(-1) ######

summary(polr(fmental~life+ses))

Regression Analysis III Sample R script to complement this module 18 / 19

(29)

Illustration contd ...

##### plot prob curve (fmental ~ newlife + newses) for y > j

#ps.options(paper="letter") x=c(0:90)/10

#x=c(0:100)/2 -20 # fuller range

#### ses = 1 (dash)

# P(Y>3)

plot(x,1/(1+exp((-0.33*x+1.11+2.21))),

type="l",lwd=2,ylim=c(0,1),col="blue",xlab="life", ylab="P(Y>j)",lty=2) ## w/o "-" in the exp()

# P(Y>2)

points(x,1/(1+exp((-0.32*x+1.11+1.21))),type="l",lwd=2, col="green",lty=2)

# P(Y>1)

points(x,1/(1+exp((-0.32*x+1.11-0.28))),type="l",lwd=2, col="pink",lty=2)

Regression Analysis III Sample R script to complement this module 19 / 19

References

Related documents

I Link function transforms the model to a linear model and retains the assumptions of normality with different mean for each observation Y i.. Regression Analysis III 12

I It is more convenient to test the parameters of logit model than calculating many Odds Ratios from contingency tables and testing for them, when we have multiple predictor

Content reviewer: Dr. Kalyan Das, Professor, Department of Statistics, University of Calcutta.. Development Team.. Principal investigator: Dr. Bhaswati Ganguli, Professor, Department

Descriptive statistics; Design of experiments; Regression analysis; Sampling; SPC charts; Time series analysis 4.9.e Process control Need to approve processes

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray, Analytics professional, Kolkata.. Content

Descriptive statistics; Design of experiments; Regression analysis; Sampling; SPC charts; Time series analysis 4.9.e Process control Need to approve processes

I Categorical variable such as social class, educational level where the categories are ordered but the distance or spacing between the categories is not defined or unknown, is

I Our major objective while analyzing a contingency table is testing for independence of the response and predictor variable.. I For large samples we can use the Pearsonian