Paper: Regression Analysis III
Module: Simpson’s Paradox
Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta
Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta
Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University
Content reviewer: Department of Statistics, University of Calcutta
What is a Simpson’s Paradox
Sometimes the result from a marginal association can have a different direction of inference than the inference from each conditional association. This phenomenon is called Simpson’s paradox. Although the paradox is named after Simpson (1951), but it dates back to Yule (1903).
I This phenomenon can be found to occur in both quantitative as well as categorical data.
I Statisticians commonly use it to caution against imputing causal effects from an association between the response and predictor variables.
I They say that although we might come across very strongly associated response and predictor variables but there might exist one or more confounding factor which might alter the results altogether. Hence controlling for the relevant factors is necessary to study the true association.
I This phenomenon can be found to occur in both quantitative as well as categorical data.
I Statisticians commonly use it to caution against imputing causal effects from an association between the response and predictor variables.
I They say that although we might come across very strongly associated response and predictor variables but there might exist one or more confounding factor which might alter the results altogether. Hence controlling for the relevant factors is necessary to study the true association.
I This phenomenon can be found to occur in both quantitative as well as categorical data.
I Statisticians commonly use it to caution against imputing causal effects from an association between the response and predictor variables.
I They say that although we might come across very strongly associated response and predictor variables but there might exist one or more confounding factor which might alter the results altogether. Hence controlling for the relevant factors is necessary to study the true association.
Let us consider the following 2 X 2 contingency table of a study on Conviction status by race of defendant and victim.
Table:Death Penalty Verdict by Defendant’s Race
Death Penalty Defendant’s Race Yes No
White 53 430
Black 15 176
Total no. of cases = 674
I Y: death penalty verdict, (categories - yes and no)
I X: race of defendant (categories - black or white)
I Objective of the study : to study the effect of defendant’s race on the death penalty verdict
I White defendants - 11%
I Black defendants -7.9%
I Y: death penalty verdict, (categories - yes and no)
I X: race of defendant (categories - black or white)
I Objective of the study : to study the effect of defendant’s race on the death penalty verdict
I White defendants - 11%
I Black defendants -7.9%
I Y: death penalty verdict, (categories - yes and no)
I X: race of defendant (categories - black or white)
I Objective of the study : to study the effect of defendant’s race on the death penalty verdict
I White defendants - 11%
I Black defendants -7.9%
I Y: death penalty verdict, (categories - yes and no)
I X: race of defendant (categories - black or white)
I Objective of the study : to study the effect of defendant’s race on the death penalty verdict
I White defendants - 11%
I Black defendants -7.9%
I Y: death penalty verdict, (categories - yes and no)
I X: race of defendant (categories - black or white)
I Objective of the study : to study the effect of defendant’s race on the death penalty verdict
I White defendants - 11%
I Black defendants -7.9%
I Thus it seems that the death penalty was imposed less often on black defendants than on white defendants.
I Now let us consider another factor - victim’s race
I Z: race of victims (categories - black or white)
I Let us stratify the marginal table with respect to Z.
I Thus it seems that the death penalty was imposed less often on black defendants than on white defendants.
I Now let us consider another factor - victim’s race
I Z: race of victims (categories - black or white)
I Let us stratify the marginal table with respect to Z.
I Thus it seems that the death penalty was imposed less often on black defendants than on white defendants.
I Now let us consider another factor - victim’s race
I Z: race of victims (categories - black or white)
I Let us stratify the marginal table with respect to Z.
I Thus it seems that the death penalty was imposed less often on black defendants than on white defendants.
I Now let us consider another factor - victim’s race
I Z: race of victims (categories - black or white)
I Let us stratify the marginal table with respect to Z.
Let us consider the partial table for white victims only.
Table:Death Penalty Verdict by Defendant’s Race
Death Penalty Defendant’s Race Yes No
White 53 414
Black 11 37
Total no. of cases = 515
I White defendants - 11.3%
I Black defendants -22.9%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I White defendants - 11.3%
I Black defendants -22.9%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I White defendants - 11.3%
I Black defendants -22.9%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
Let us consider the partial table for black victims only.
Table:Death Penalty Verdict by Defendant’s Race
Death Penalty Defendant’s Race Yes No
White 0 16
Black 4 139
Total no. of cases = 159
I White defendants - 0%
I Black defendants -2.8%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.
I But if we ignore victim’s race the conclusion is reversed.
I White defendants - 0%
I Black defendants -2.8%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.
I But if we ignore victim’s race the conclusion is reversed.
I White defendants - 0%
I Black defendants -2.8%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.
I But if we ignore victim’s race the conclusion is reversed.
I White defendants - 0%
I Black defendants -2.8%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.
I But if we ignore victim’s race the conclusion is reversed.
I White defendants - 0%
I Black defendants -2.8%
I Conclusion : death penalty was imposed more often on black defendants than on white defendants.
I Thus by controlling for victims’ race, it was found that death penalty was imposed more often on black defendants than on white defendants.
I But if we ignore victim’s race the conclusion is reversed.
I The association betweenX and Y changes when we ignore victims’ race versus control it.
I This is due to the nature of the association between Z and each of X and Y.
I The association betweenZ andX is extremely strong.
I The odds ratio betweenX andZ is (467 X 143)/(48 X 16)=
87.0.
I The association betweenX and Y changes when we ignore victims’ race versus control it.
I This is due to the nature of the association between Z and each of X and Y.
I The association betweenZ andX is extremely strong.
I The odds ratio betweenX andZ is (467 X 143)/(48 X 16)=
87.0.
I The association betweenX and Y changes when we ignore victims’ race versus control it.
I This is due to the nature of the association between Z and each of X and Y.
I The association betweenZ andX is extremely strong.
I The odds ratio betweenX andZ is (467 X 143)/(48 X 16)=
87.0.
I The association betweenX and Y changes when we ignore victims’ race versus control it.
I This is due to the nature of the association between Z and each of X and Y.
I The association betweenZ andX is extremely strong.
I The odds ratio betweenX andZ is (467 X 143)/(48 X 16)=
87.0.
I From the marginal table we see that regardless of defendant’s race, the death penalty was much more likely when the victims were white than when the victims were black.
I So whites are more likely to kill whites resulting in death penalty.
I Thus marginal association favours white defendants (for death penalty) than the conditional associations.
I From the marginal table we see that regardless of defendant’s race, the death penalty was much more likely when the victims were white than when the victims were black.
I So whites are more likely to kill whites resulting in death penalty.
I Thus marginal association favours white defendants (for death penalty) than the conditional associations.
I From the marginal table we see that regardless of defendant’s race, the death penalty was much more likely when the victims were white than when the victims were black.
I So whites are more likely to kill whites resulting in death penalty.
I Thus marginal association favours white defendants (for death penalty) than the conditional associations.
I Simpson’s paradox occurs when the results of association between the response or predictor variable changes due to the introduction of another variable.
I Usually this variable is a confounder.
I Simpson’s paradox can occur in both categorical and continuous data.
I Simpson’s paradox occurs when the results of association between the response or predictor variable changes due to the introduction of another variable.
I Usually this variable is a confounder.
I Simpson’s paradox can occur in both categorical and continuous data.
I Simpson’s paradox occurs when the results of association between the response or predictor variable changes due to the introduction of another variable.
I Usually this variable is a confounder.
I Simpson’s paradox can occur in both categorical and continuous data.