Module: Simpson’s Paradox

(1)

Paper: Regression Analysis III

Module: Simpson’s Paradox

(2)

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University

Content reviewer: Department of Statistics, University of Calcutta

(3)

What is a Simpson’s Paradox

Sometimes the result from a marginal association can have a different direction of inference than the inference from each conditional association. This phenomenon is called Simpson’s paradox. Although the paradox is named after Simpson (1951), but it dates back to Yule (1903).

(4)

I This phenomenon can be found to occur in both quantitative as well as categorical data.

I Statisticians commonly use it to caution against imputing causal effects from an association between the response and predictor variables.

I They say that although we might come across very strongly associated response and predictor variables but there might exist one or more confounding factor which might alter the results altogether. Hence controlling for the relevant factors is necessary to study the true association.

(5)

(6)

(7)

Let us consider the following 2 X 2 contingency table of a study on Conviction status by race of defendant and victim.

Table:Death Penalty Verdict by Defendant’s Race

Death Penalty Defendant’s Race Yes No

White 53 430

Black 15 176

Total no. of cases = 674

(8)

I Y: death penalty verdict, (categories - yes and no)

I X: race of defendant (categories - black or white)

I Objective of the study : to study the effect of defendant’s race on the death penalty verdict

I White defendants - 11%

I Black defendants -7.9%

(9)

(10)

(11)

(12)

(13)

I Thus it seems that the death penalty was imposed less often on black defendants than on white defendants.

I Now let us consider another factor - victim’s race

I Z: race of victims (categories - black or white)

I Let us stratify the marginal table with respect to Z.

(14)

(15)

(16)

(17)

Let us consider the partial table for white victims only.

White 53 414

Black 11 37

(18)

I White defendants - 11.3%

I Conclusion : death penalty was imposed more often on black defendants than on white defendants.

(19)

(20)

(21)

Let us consider the partial table for black victims only.

White 0 16

Black 4 139

(22)

I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.

I But if we ignore victim’s race the conclusion is reversed.

(23)

(24)

(25)

(26)

(27)

I The association betweenX and Y changes when we ignore victims’ race versus control it.

I This is due to the nature of the association between Z and each of X and Y.

I The association betweenZ andX is extremely strong.

I The odds ratio betweenX andZ is (467 X 143)/(48 X 16)=

87.0.

(28)

87.0.

(29)

87.0.

(30)

87.0.

(31)

I From the marginal table we see that regardless of defendant’s race, the death penalty was much more likely when the victims were white than when the victims were black.

I So whites are more likely to kill whites resulting in death penalty.

I Thus marginal association favours white defendants (for death penalty) than the conditional associations.

(32)

(33)

(34)

I Simpson’s paradox occurs when the results of association between the response or predictor variable changes due to the introduction of another variable.

I Usually this variable is a confounder.

I Simpson’s paradox can occur in both categorical and continuous data.

(35)

(36)