• No results found

Paper: Regression Analysis III

N/A
N/A
Protected

Academic year: 2022

Share "Paper: Regression Analysis III"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Subject: Statistics

Paper: Regression Analysis III

Module: Types of Data

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University. Sujit Kumar Ray,Analytics professional, Kolkata

Content reviewer: Department of Statistics, University of Calcutta

Regression Analysis III 2 / 20

(3)

Classification of Categorical data

Nominal data

Categorical data

Ordinal data

(4)

Nominal data

I When a categorical variable has no natural order among the categories, it is a nominal variable. It corresponds to the nominal scale of measurement. Data consisting of nominal variable is called nominal data.

I Examples of nominal data include nationality, religion, gender etc.

I It may or may not be binary, since it can have more than two categories.

Regression Analysis III Nominal data 4 / 20

(5)

Nominal data

I When a categorical variable has no natural order among the categories, it is a nominal variable. It corresponds to the nominal scale of measurement. Data consisting of nominal variable is called nominal data.

I Examples of nominal data include nationality, religion, gender etc.

I It may or may not be binary, since it can have more than two categories.

(6)

Nominal data

I When a categorical variable has no natural order among the categories, it is a nominal variable. It corresponds to the nominal scale of measurement. Data consisting of nominal variable is called nominal data.

I Examples of nominal data include nationality, religion, gender etc.

I It may or may not be binary, since it can have more than two categories.

Regression Analysis III Nominal data 4 / 20

(7)

Definition and examples of Nominal data

Formal definition

A categorical variable whose categories do not have any natural order is called anominal variable.

Examples

I Name of School

I Name of books

I Different subjects studied in school

I Party affiliation

I Favourite sport

(8)

Definition and examples of Nominal data

Formal definition

A categorical variable whose categories do not have any natural order is called anominal variable.

Examples

I Name of School

I Name of books

I Different subjects studied in school

I Party affiliation

I Favourite sport

Regression Analysis III Nominal data 5 / 20

(9)

Ordinal data

I Categorical variable such as social class, educational level where the categories are ordered but the distance or spacing between the categories is not defined or unknown, is an ordinal variable. It corresponds to the ordinal scale of measurement.

Data consisting of ordinal variable is called ordinal data.

I Examples of ordinal data include age-group of people, pay bands in a company etc.

I It may or may not be binary.

(10)

Ordinal data

I Categorical variable such as social class, educational level where the categories are ordered but the distance or spacing between the categories is not defined or unknown, is an ordinal variable. It corresponds to the ordinal scale of measurement.

Data consisting of ordinal variable is called ordinal data.

I Examples of ordinal data include age-group of people, pay bands in a company etc.

I It may or may not be binary.

Regression Analysis III Ordinal data 6 / 20

(11)

Ordinal data

I Categorical variable such as social class, educational level where the categories are ordered but the distance or spacing between the categories is not defined or unknown, is an ordinal variable. It corresponds to the ordinal scale of measurement.

Data consisting of ordinal variable is called ordinal data.

I Examples of ordinal data include age-group of people, pay bands in a company etc.

I It may or may not be binary.

(12)

Definition and examples of Ordinal data

Formal definition

A categorical variable whose categories have a clear natural order but the absolute distances between the categories is not defined, is called anordinal variable.

Examples

I diagnostic stages of lung cancer - normal, benign, suspicious, and malignant

I satisfaction level of a service availed - satisfactory, indifferent, unsatisfactory

I level of pain - extremely painful, painful, little painful, not at all painful

Regression Analysis III Ordinal data 7 / 20

(13)

Definition and examples of Ordinal data

Formal definition

A categorical variable whose categories have a clear natural order but the absolute distances between the categories is not defined, is called anordinal variable.

Examples

I diagnostic stages of lung cancer - normal, benign, suspicious, and malignant

I satisfaction level of a service availed - satisfactory, indifferent, unsatisfactory

I level of pain - extremely painful, painful, little painful, not at all painful

(14)

Concordant pairs

Figure:a) concordant pairs1, b) concordant pairs 2

1©Dayton Creative Photography

2©Laura Farris Photography

Regression Analysis III Concordant-Discordant pairs 8 / 20

(15)

Discordant pairs

Figure:discordant pairs

(16)

Concordant-Discordant pairs

I Let X be the predictor variable andY be the response variable.

I Concordant Pairs : Pairs in which the one larger inX is also larger inY and vice-versa i.e. ifXi > Xj and Yi > Yj or Xi< Xj andYi < Yj.

I Discordant Pairs : Pairs in which the one larger inX is smaller inY or vice versa i.e. ifXi > Xj andYi< Yj or Xi< Xj andYi > Yj.

I Tied Pairs : Pairs which have the sameX and/orY.

I TX : Pairs tied inX

I TY : Pairs tied inY

Regression Analysis III Concordant-Discordant pairs 10 / 20

(17)

Concordant-Discordant pairs

I Let X be the predictor variable andY be the response variable.

I Concordant Pairs : Pairs in which the one larger inX is also larger inY and vice-versa i.e. ifXi > Xj and Yi > Yj or Xi< Xj andYi < Yj.

I Discordant Pairs : Pairs in which the one larger inX is smaller inY or vice versa i.e. ifXi > Xj andYi< Yj or Xi< Xj andYi > Yj.

I Tied Pairs : Pairs which have the sameX and/orY.

I TX : Pairs tied inX

I TY : Pairs tied inY

(18)

Concordant-Discordant pairs

I Let X be the predictor variable andY be the response variable.

I Concordant Pairs : Pairs in which the one larger inX is also larger inY and vice-versa i.e. ifXi > Xj and Yi > Yj or Xi< Xj andYi < Yj.

I Discordant Pairs : Pairs in which the one larger inX is smaller inY or vice versa i.e. ifXi > Xj andYi< Yj or Xi< Xj andYi > Yj.

I Tied Pairs : Pairs which have the sameX and/orY.

I TX : Pairs tied inX

I TY : Pairs tied inY

Regression Analysis III Concordant-Discordant pairs 10 / 20

(19)

Concordant-Discordant pairs

I Let X be the predictor variable andY be the response variable.

I Concordant Pairs : Pairs in which the one larger inX is also larger inY and vice-versa i.e. ifXi > Xj and Yi > Yj or Xi< Xj andYi < Yj.

I Discordant Pairs : Pairs in which the one larger inX is smaller inY or vice versa i.e. ifXi > Xj andYi< Yj or Xi< Xj andYi > Yj.

I Tied Pairs : Pairs which have the sameX and/orY.

I TX : Pairs tied inX

I TY : Pairs tied inY

(20)

Example

Example3: Judges at a singing competition rank 5 contestants, where 1 is the best.

Judge 1 3 1 2 4 5

Judge 1 2 3 1 5 4

I Juan vs Sophia: XJ,XS = 1, 2 andYJ,YS= 2, 3; 1<2 and 2 <3 : concordant pair.

I Juan vs Robert: XJ,XR = 1, 3 andYJ,YR = 2, 1; 1<3 and 2 >1 : discordant pair.

I Juan vs Helena: XJ,XH = 1, 4 andYJ,YH= 2, 5; 1<4 and 2 <5: concordant pair.

I Juan vs Marie : XJ,XM = 1, 5 andYJ,YM = 2, 4; 1< 5 and 2 <4 : concordant pair.

3http://support.minitab.com/en-us/minitab/17/topic-library/basic- statistics-and-graphs/tables/data-and-table-layouts/what-are-concordant-and- discordant-pairs/

Regression Analysis III Concordant-Discordant pairs 11 / 20

(21)

Example continued ...

I Sophia vs Robert: XS,XR = 2, 3 andYS,YR = 3, 1; 2<3 and 3 >1 : discordant pair.

I Sophia vs Helena: XS,XH = 2, 4 andYS,YH= 3, 5; 2<4 and 3 <5 : concordant pair.

I Sophia vs Marie: XS,XM = 2, 5 andYS,YM= 3, 4; 2<5 and 3 <4 : concordant pair.

I Robert vs Helena: XR,XH = 3, 4 and YR,YH= 1, 5; 3<4 and 1 <5 : concordant pair.

I Robert vs Marie: XR,XM = 3, 5 andYR,YM= 1, 4; 3 <5 and 1 <4 : concordant pair.

I Helena vs Marie: XH,XM = 4, 5 and YH,YM= 5, 4; 4<5 and 5 >4 : discordant pair.

Number of concordant pairs = 7 Number of discordant pairs = 3

(22)

An example data and its illustrative plot 4

Figure:a) example dataset , b) A plot showing the concordant and discordant pairs

4http://mathematica.stackexchange.com/questions/51345/measures-of- association-concordant-and-discordant

Regression Analysis III Concordant-Discordant pairs 13 / 20

(23)

Concordance-Discordance Matrix : An example in R

>crab<-data.frame(gill.wt=c(159,179,100,45,384,230, +100,320,80,220,320,210), body.wt=c(14.4,15.2,11.3, +2.5,22.7,14.9,1.41,15.81,4.19,15.39,17.25,9.52))

>attach(crab)

>crabm<-ConDis.matrix(gill.wt,body.wt);crabm

1 2 3 4 5 6 7 8 9 10 11 12 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 1 NA NA NA NA NA NA NA NA NA NA NA 3 1 1 NA NA NA NA NA NA NA NA NA NA 4 1 1 1 NA NA NA NA NA NA NA NA NA 5 1 1 1 1 NA NA NA NA NA NA NA NA 6 1 -1 1 1 1 NA NA NA NA NA NA NA 7 1 1 0 -1 1 1 NA NA NA NA NA NA 8 1 1 1 1 1 1 1 NA NA NA NA NA 9 1 1 1 1 1 1 -1 1 NA NA NA NA 10 1 1 1 1 1 -1 1 1 1 NA NA NA 11 1 1 1 1 1 1 1 0 1 1 NA NA 12 -1 -1 -1 1 1 1 1 1 1 1 1 NA

(24)

Concordant-Discordant pairs

I No. of concordant pairs,

C =X

k

X

l

(X

i<k

X

j<l

nijnkl)

I

I No. of discordant pairs,

D=X

k

X

l

(X

i<k

X

j>l

nijnkl)

I

I Probability of Concordance πC = 2X

k

X

l

(X

i<k

X

j<l

πijπkl)

I

I Probability of discordance πd= 2X

k

X

l

(X

i<k

X

j>l

πijπkl)

Regression Analysis III Concordant-Discordant pairs 15 / 20

I

(25)

Summary

I A nominal variable is a categorical variable whose categories are not ordered.

I An ordinal variable is a categorical variable whose categories have a natural order.

I But the distance between the ordered categories of an ordinal variable is not defined.

I Concordant pairs are those which have the same order for the explanatory and response variable (either both are increasing or both are decreasing).

I Discordant pairs are those which have a different order for the explanatory and response variable (one is increasing and the

(26)

Summary

I A nominal variable is a categorical variable whose categories are not ordered.

I An ordinal variable is a categorical variable whose categories have a natural order.

I But the distance between the ordered categories of an ordinal variable is not defined.

I Concordant pairs are those which have the same order for the explanatory and response variable (either both are increasing or both are decreasing).

I Discordant pairs are those which have a different order for the explanatory and response variable (one is increasing and the other is decreasing).

Regression Analysis III Concordant-Discordant pairs 16 / 20

(27)

Summary

I A nominal variable is a categorical variable whose categories are not ordered.

I An ordinal variable is a categorical variable whose categories have a natural order.

I But the distance between the ordered categories of an ordinal variable is not defined.

I Concordant pairs are those which have the same order for the explanatory and response variable (either both are increasing or both are decreasing).

I Discordant pairs are those which have a different order for the explanatory and response variable (one is increasing and the

(28)

Summary

I A nominal variable is a categorical variable whose categories are not ordered.

I An ordinal variable is a categorical variable whose categories have a natural order.

I But the distance between the ordered categories of an ordinal variable is not defined.

I Concordant pairs are those which have the same order for the explanatory and response variable (either both are increasing or both are decreasing).

I Discordant pairs are those which have a different order for the explanatory and response variable (one is increasing and the other is decreasing).

Regression Analysis III Concordant-Discordant pairs 16 / 20

(29)

Summary

I A nominal variable is a categorical variable whose categories are not ordered.

I An ordinal variable is a categorical variable whose categories have a natural order.

I But the distance between the ordered categories of an ordinal variable is not defined.

I Concordant pairs are those which have the same order for the explanatory and response variable (either both are increasing or both are decreasing).

I Discordant pairs are those which have a different order for the explanatory and response variable (one is increasing and the

(30)

Example 1 5

## creating an example dataset for calculating concordance admit=c(0,0,1,0,0,1,0,0,0,1,0,1,0,0,1)

gre =c(636,660,800,640,520,760,487,890,765,345,456,675, 666,546,786)

gpa=c(3.61,3.67,4,3.19,2.93,3,2.98,3.4,3.2,1.98,4, 5.1,3.3,5.1,4.7)

rank=c(3,3,1,4,4,2,4,4,4,3,3,3,2,2,1) admission=data.frame(admit,gre ,gpa,rank)

## modelling response variable on the set of predictors model=glm(admit~., family="binomial", data=admission )

5Source : Statour Blog on Concordance and Discordance in Logistic Regression by Vaibhav Mainkar

Regression Analysis III Sample R script 17 / 20

(31)

Example 1 contd ...

Association=function(ModelName) {

Con_Dis_Data = cbind(model$y, model$fitted.values) ones = Con_Dis_Data[Con_Dis_Data[,1] == 1,]

zeros = Con_Dis_Data[Con_Dis_Data[,1] == 0,]

conc=matrix(0, dim(zeros)[1], dim(ones)[1]) disc=matrix(0, dim(zeros)[1], dim(ones)[1]) ties=matrix(0, dim(zeros)[1], dim(ones)[1]) for (j in 1:dim(zeros)[1])

{

for (i in 1:dim(ones)[1]) {

if (ones[i,2]>zeros[j,2]) {conc[j,i]=1}

else if (ones[i,2]<zeros[j,2]) {disc[j,i]=1}

else if (ones[i,2]==zeros[j,2]) {ties[j,i]=1}

} }

Pairs=dim(zeros)[1]*dim(ones)[1]

PercentConcordance=(sum(conc)/Pairs)*100 PercentDiscordance=(sum(disc)/Pairs)*100 PercentTied=(sum(ties)/Pairs)*100 Concordance=sum(conc)

Discordance=sum(disc)

return(list("Percent Concordance"=PercentConcordance,

"Percent Discordance"=PercentDiscordance,

"Percent Tied"=PercentTied,"Pairs"=Pairs,"Concrodance"=Concordance,

"Discordance"=Discordance))

} #***FUNCTION TO CALCULATE CONCORDANCE AND DISCORDANCE ENDS***#

Association(model)

(32)

Example 2 6

## calculating concordance and discordance without

## using a GLM model

## creating a dataset set.seed(1)

n=40

library(mnormt)

X=rmnorm(n,c(0,0), matrix(c(1,.4,.4,1),2,2))

## function to calculate concordance and discordance i=rep(1:(n-1),(n-1):1); j=2:n

for(k in 3:n){j=c(j,k:n)}

M=cbind(X[i,],X[j,])

concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0) discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0)

6Source : R Bloggers on Association and concordance measures by Arthur Charpentier

Regression Analysis III Sample R script 19 / 20

(33)

Example 3 7

tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23))) ConDisPairs(tab)

7Source : help file of the function ConDisPairs in DescTools library, R

References

Related documents

LRT entertains any kind of null and alternative hypotheses; simple as well as composite.... This is an UMP

involves the assignment of part-of-speech information or labels such as word categories (e.g., adjective, article, noun, proper noun, preposition, verb) and other lexical

There are 8 labor categories distinguished by education level (primary, secondary, or tertiary education, or none) and rural-urban divide, supporting the analysis of

Discriminant function analysis is a statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary

mucosal ds like Celiac ds, Crohn’s ds, Intestinal resection, Infections, intestinal

The results allow for conclusions not only on resource consumption and primary energy input but also on environmental categories such as acidification, eutrophication, the greenhouse

The scan line algorithm which is based on the platform of calculating the coordinate of the line in the image and then finding the non background pixels in those lines and

If the object is defined in the same source- code file where the corresponding class has been declared, the distance can be calculated as above; but if the Object-Oriented software