Subject: Statistics
Paper: Statistical Inference
Module: Bayesian Hypothesis Testing and
Bayes Factors
Development Team
Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta Paper co-ordinator: Dr. Dipak K Dey,Associate Dean and BOT
Distinguished Professor, Department of Statistics, University of Connecticut
Content writer: Dr. Sourish Das,Assistant Professor, Chennai Mathematical Institute
Content reviewer: Department of Statistics, University of Calcutta
2 / 17
Outline
1. Bayesian p-values
2. Bayes Factors for model comparison
3. Easy to implement alternatives for model comparison
Outline
1. Bayesian p-values
2. Bayes Factors for model comparison
3. Easy to implement alternatives for model comparison
3 / 17
Outline
1. Bayesian p-values
2. Bayes Factors for model comparison
3. Easy to implement alternatives for model comparison
Bayesian Hypothesis Testing
I Bayesian hypothesis testing is less formal than frequentist approach.
I In fact, Bayesian researchers typically summarize the posterior distribution without applying a rigid decision process.
I If one wanted to apply a formal process, Bayesian decision theory is the way to go because it is possible to get a
probability distribution over the parameter space and one can make expected utility calculations based on the costs and benefits of different outcomes.
I Considerable energy has been given, however, in trying to map Bayesian statistical models into the null hypothesis hypothesis testing framework, with mixed results at best.
4 / 17
Bayesian Hypothesis Testing
I Bayesian hypothesis testing is less formal than frequentist approach.
I In fact, Bayesian researchers typically summarize the posterior distribution without applying a rigid decision process.
I If one wanted to apply a formal process, Bayesian decision theory is the way to go because it is possible to get a
probability distribution over the parameter space and one can make expected utility calculations based on the costs and benefits of different outcomes.
I Considerable energy has been given, however, in trying to map Bayesian statistical models into the null hypothesis hypothesis testing framework, with mixed results at best.
Bayesian Hypothesis Testing
I Bayesian hypothesis testing is less formal than frequentist approach.
I In fact, Bayesian researchers typically summarize the posterior distribution without applying a rigid decision process.
I If one wanted to apply a formal process, Bayesian decision theory is the way to go because it is possible to get a
probability distribution over the parameter space and one can make expected utility calculations based on the costs and benefits of different outcomes.
I Considerable energy has been given, however, in trying to map Bayesian statistical models into the null hypothesis hypothesis testing framework, with mixed results at best.
4 / 17
Bayesian Hypothesis Testing
I Bayesian hypothesis testing is less formal than frequentist approach.
I In fact, Bayesian researchers typically summarize the posterior distribution without applying a rigid decision process.
I If one wanted to apply a formal process, Bayesian decision theory is the way to go because it is possible to get a
probability distribution over the parameter space and one can make expected utility calculations based on the costs and benefits of different outcomes.
I Considerable energy has been given, however, in trying to map Bayesian statistical models into the null hypothesis hypothesis testing framework, with mixed results at best.
Similarities between Bayesian and Frequentist Hypothesis Testing
I Maximum likelihood estimates of parameter means and standard errors and Bayesian estimates with flat priors are equivalent.
I Asymptotically, the data will overwhelm the choice of prior, so if we had infinite data sets, priors would be irrelevant and Bayesian and frequentist results would converge.
I Frequentist one-tailed tests are basically equivalent to what a Bayesian would get using credible intervals.
5 / 17
Similarities between Bayesian and Frequentist Hypothesis Testing
I Maximum likelihood estimates of parameter means and standard errors and Bayesian estimates with flat priors are equivalent.
I Asymptotically, the data will overwhelm the choice of prior, so if we had infinite data sets, priors would be irrelevant and Bayesian and frequentist results would converge.
I Frequentist one-tailed tests are basically equivalent to what a Bayesian would get using credible intervals.
Similarities between Bayesian and Frequentist Hypothesis Testing
I Maximum likelihood estimates of parameter means and standard errors and Bayesian estimates with flat priors are equivalent.
I Asymptotically, the data will overwhelm the choice of prior, so if we had infinite data sets, priors would be irrelevant and Bayesian and frequentist results would converge.
I Frequentist one-tailed tests are basically equivalent to what a Bayesian would get using credible intervals.
5 / 17
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
6 / 17
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
6 / 17
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
Differences between Frequentist and Bayesian Hypothesis Testing
I The most important pragmatic difference between Bayesian and frequentist hypothesis testing is that Bayesian methods are poorly suited for two-tailed tests.
I Why? Because the probability of zero in a continuous distribution is zero.
The best solution proposed so far is to calculate the probability that, say, a regression coefficient is in some range near zero.
e.g., two-sided p-value=P(−e < B < e)
I However, the choice ofeseems very ad hoc unless there is some decision theoretic basis.
I The other important difference is more philosophical.
Frequentist p-values violate the likelihood principle.
6 / 17
Bayes Factors
I Bayes Factors are the dominant method of Bayesian model testing. They are the Bayesian analogues of likelihood ratio tests.
I The basic intuition is that prior and posterior information are combined in a ratio that provides evidence in favor of one model specification verses another.
I Bayes Factors are very flexible, allowing multiple hypotheses to be compared simultaneously and nested models are not required in order to make comparisons−→ it goes without saying that compared models should obviously have the same dependent variable.
Bayes Factors
I Bayes Factors are the dominant method of Bayesian model testing. They are the Bayesian analogues of likelihood ratio tests.
I The basic intuition is that prior and posterior information are combined in a ratio that provides evidence in favor of one model specification verses another.
I Bayes Factors are very flexible, allowing multiple hypotheses to be compared simultaneously and nested models are not required in order to make comparisons−→ it goes without saying that compared models should obviously have the same dependent variable.
7 / 17
Bayes Factors
I Bayes Factors are the dominant method of Bayesian model testing. They are the Bayesian analogues of likelihood ratio tests.
I The basic intuition is that prior and posterior information are combined in a ratio that provides evidence in favor of one model specification verses another.
I Bayes Factors are very flexible, allowing multiple hypotheses to be compared simultaneously and nested models are not required in order to make comparisons−→ it goes without saying that compared models should obviously have the same dependent variable.
The General Form for Bayes Factors
I Suppose that we observe data X and with to test two competing models - M1 andM2 relating these data to two different sets of parameters θ1 andθ2.
I We would like to know which of the following likelihood specifications is better:
M1 : f1(x|θ1) andM2 : f2(x|θ2)
I Obviously, we would need prior distributions for the θ1 andθ2 and prior probabilities forM1 andM2
8 / 17
The General Form for Bayes Factors
I Suppose that we observe data X and with to test two competing models - M1 andM2 relating these data to two different sets of parameters θ1 andθ2.
I We would like to know which of the following likelihood specifications is better:
M1 : f1(x|θ1) andM2 : f2(x|θ2)
I Obviously, we would need prior distributions for the θ1 andθ2 and prior probabilities forM1 andM2
The General Form for Bayes Factors
I Suppose that we observe data X and with to test two competing models - M1 andM2 relating these data to two different sets of parameters θ1 andθ2.
I We would like to know which of the following likelihood specifications is better:
M1 : f1(x|θ1) andM2 : f2(x|θ2)
I Obviously, we would need prior distributions for the θ1 andθ2 and prior probabilities forM1 andM2
8 / 17
The General Form for Bayes Factors
I The posterior odds ratio in favor of M1 overM2 is:
Posterior Odds=Prior Odds/Data ×Bayes factor π(M1|x)
π(M2|x) = p(M1)/p(x) p(M2)/p(x) ×
R
θ1f1(x|θ1)p1(θ1)dθ1
R
θ2f2(x|θ2)p2(θ2)dθ2
I Rearranging terms, we find the Bayes’ factor is:
Bayes Factor=B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I If we have nested models andP(M1) =P(M2) = 0.5then the Bayes factor reduces to likelihood ratio
The General Form for Bayes Factors
I The posterior odds ratio in favor of M1 overM2 is:
Posterior Odds=Prior Odds/Data ×Bayes factor π(M1|x)
π(M2|x) = p(M1)/p(x) p(M2)/p(x) ×
R
θ1f1(x|θ1)p1(θ1)dθ1
R
θ2f2(x|θ2)p2(θ2)dθ2
I Rearranging terms, we find the Bayes’ factor is:
Bayes Factor=B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I If we have nested models andP(M1) =P(M2) = 0.5then the Bayes factor reduces to likelihood ratio
9 / 17
The General Form for Bayes Factors
I The posterior odds ratio in favor of M1 overM2 is:
Posterior Odds=Prior Odds/Data ×Bayes factor π(M1|x)
π(M2|x) = p(M1)/p(x) p(M2)/p(x) ×
R
θ1f1(x|θ1)p1(θ1)dθ1
R
θ2f2(x|θ2)p2(θ2)dθ2
I Rearranging terms, we find the Bayes’ factor is:
Bayes Factor=B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I If we have nested models andP(M1) =P(M2) = 0.5then the Bayes factor reduces to likelihood ratio
The General Form for Bayes Factors
I The posterior odds ratio in favor of M1 overM2 is:
Posterior Odds=Prior Odds/Data ×Bayes factor π(M1|x)
π(M2|x) = p(M1)/p(x) p(M2)/p(x) ×
R
θ1f1(x|θ1)p1(θ1)dθ1
R
θ2f2(x|θ2)p2(θ2)dθ2
I Rearranging terms, we find the Bayes’ factor is:
Bayes Factor=B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I If we have nested models andP(M1) =P(M2) = 0.5then the Bayes factor reduces to likelihood ratio
9 / 17
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
10 / 17
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
10 / 17
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
10 / 17
Rule of Thumb
I Bayes Factor :
B(x) = π(M1|x)/p(M1) π(M2|x)/p(M2)
I With this setup, if we interpret model 1 as the null model, then:
1. IfB(x)≥1then model 1 is supported
2. If1> B(x)≥10−1/2then minimal evidence against model 1.
3. If10−1/2> B(x)≥10−1 then substantial evidence against model 1.
4. If10−1> B(x)≥10−2 then strong evidence against model 1.
5. If10−2> B(x)then decisive evidence against model 1.
The Bad News
I Unfortunately, while Bayes Factors are rather intuitive, as a practical matter they are often quite difficult to calculate.
I However, in the MCMCpack package Bayes Factor can be computed routinely for standard statistical models
I You also may want to use Carlin and Chib’s technique for computing Bayes Factors for competing non-nested regression models reported in Journal of Royal Statistical Society. Series B. vol 57:3 1995.
I Our discussion will focus on alternatives to the Bayes Factor.
11 / 17
The Bad News
I Unfortunately, while Bayes Factors are rather intuitive, as a practical matter they are often quite difficult to calculate.
I However, in the MCMCpack package Bayes Factor can be computed routinely for standard statistical models
I You also may want to use Carlin and Chib’s technique for computing Bayes Factors for competing non-nested regression models reported in Journal of Royal Statistical Society. Series B. vol 57:3 1995.
I Our discussion will focus on alternatives to the Bayes Factor.
The Bad News
I Unfortunately, while Bayes Factors are rather intuitive, as a practical matter they are often quite difficult to calculate.
I However, in the MCMCpack package Bayes Factor can be computed routinely for standard statistical models
I You also may want to use Carlin and Chib’s technique for computing Bayes Factors for competing non-nested regression models reported in Journal of Royal Statistical Society. Series B. vol 57:3 1995.
I Our discussion will focus on alternatives to the Bayes Factor.
11 / 17
The Bad News
I Unfortunately, while Bayes Factors are rather intuitive, as a practical matter they are often quite difficult to calculate.
I However, in the MCMCpack package Bayes Factor can be computed routinely for standard statistical models
I You also may want to use Carlin and Chib’s technique for computing Bayes Factors for competing non-nested regression models reported in Journal of Royal Statistical Society. Series B. vol 57:3 1995.
I Our discussion will focus on alternatives to the Bayes Factor.
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Good: The Likelihood Ratio
Ratio=−2[log L(θRestricted M odel∗ |y)−log L(θF ull M odel∗ |y)]
I This statistic will always favor the unrestricted model, but when the Bayes estimators or equivalent to the maximum likelihood estimates, then the Ratio is distributed as a χ2 where the number of degrees of freedom is equal to the number of test parameters.
12 / 17
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Good: The Likelihood Ratio
Ratio=−2[log L(θRestricted M odel∗ |y)−log L(θF ull M odel∗ |y)]
I This statistic will always favor the unrestricted model, but when the Bayes estimators or equivalent to the maximum likelihood estimates, then the Ratio is distributed as a χ2 where the number of degrees of freedom is equal to the number of test parameters.
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Good: The Likelihood Ratio
Ratio=−2[log L(θRestricted M odel∗ |y)−log L(θF ull M odel∗ |y)]
I This statistic will always favor the unrestricted model, but when the Bayes estimators or equivalent to the maximum likelihood estimates, then the Ratio is distributed as a χ2 where the number of degrees of freedom is equal to the number of test parameters.
12 / 17
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Better: Akaike Information Criterion (AIC) AIC =−2 log L(θ∗|y) + 2p
I wherep = the number of parameters including the intercept.
I To compare two models, compare the AIC from model 1 against the AIC from model 2.
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Better: Akaike Information Criterion (AIC) AIC =−2 log L(θ∗|y) + 2p
I wherep = the number of parameters including the intercept.
I To compare two models, compare the AIC from model 1 against the AIC from model 2.
13 / 17
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Better: Akaike Information Criterion (AIC) AIC =−2 log L(θ∗|y) + 2p
I wherep = the number of parameters including the intercept.
I To compare two models, compare the AIC from model 1 against the AIC from model 2.
Alternatives to the Bayes Factor for model assessmen
I Let θ∗ denote your estimates of the parameter means (or medians or modes) in your model and suppose that the Bayes estimate is approximately equal to the maximum likelihood estimate, then the following stats used in frequentist statistics will be useful diagnostics.
I Better: Akaike Information Criterion (AIC) AIC =−2 log L(θ∗|y) + 2p
I wherep = the number of parameters including the intercept.
I To compare two models, compare the AIC from model 1 against the AIC from model 2.
13 / 17
Alternatives to the Bayes Factor for model assessmen
I Models do not need to be nested
I The AIC tends to be biased in favor of more complicated models, because the log-likelihood tends to increase faster than the number of parameters.
I Bayesian Information Criterion (BIC):
BIC =−2 log L(θ∗|y) +p×log(n)
wherep is the number of parameters and nis the sample size.
I This statistic can also be used for non-nested models
I BIC1−BIC2 ≈ −2log(Bayes F actor12) for Model 1 vs Model 2
Alternatives to the Bayes Factor for model assessmen
I Models do not need to be nested
I The AIC tends to be biased in favor of more complicated models, because the log-likelihood tends to increase faster than the number of parameters.
I Bayesian Information Criterion (BIC):
BIC =−2 log L(θ∗|y) +p×log(n)
wherep is the number of parameters and nis the sample size.
I This statistic can also be used for non-nested models
I BIC1−BIC2 ≈ −2log(Bayes F actor12) for Model 1 vs Model 2
14 / 17
Alternatives to the Bayes Factor for model assessmen
I Models do not need to be nested
I The AIC tends to be biased in favor of more complicated models, because the log-likelihood tends to increase faster than the number of parameters.
I Bayesian Information Criterion (BIC):
BIC =−2 log L(θ∗|y) +p×log(n)
wherep is the number of parameters and nis the sample size.
I This statistic can also be used for non-nested models
I BIC1−BIC2 ≈ −2log(Bayes F actor12) for Model 1 vs Model 2
Alternatives to the Bayes Factor for model assessmen
I Models do not need to be nested
I The AIC tends to be biased in favor of more complicated models, because the log-likelihood tends to increase faster than the number of parameters.
I Bayesian Information Criterion (BIC):
BIC =−2 log L(θ∗|y) +p×log(n)
wherep is the number of parameters and nis the sample size.
I This statistic can also be used for non-nested models
I BIC1−BIC2 ≈ −2log(Bayes F actor12) for Model 1 vs Model 2
14 / 17
Alternatives to the Bayes Factor for model assessmen
I Models do not need to be nested
I The AIC tends to be biased in favor of more complicated models, because the log-likelihood tends to increase faster than the number of parameters.
I Bayesian Information Criterion (BIC):
BIC =−2 log L(θ∗|y) +p×log(n)
wherep is the number of parameters and nis the sample size.
I This statistic can also be used for non-nested models
I BIC1−BIC2 ≈ −2log(Bayes F actor12) for Model 1 vs Model 2
Application
I Consider ‘birthwt’ dataset available in MASS package of R
I The dataset tries to look for the risk factors associated with low infant birth weight.
lowi =
1 indicator of birth weight less than 2.5 kg.
0 otherwise
i= 1,2...,189
zi =β0 + β1Agei+β2I(racei =black)
+ β3I(racei=others) +β4I(Smokei =yes) +i
i ∼N(0,1)and
P(low= 1|Age, race, smoke) =P(z >0|Age, race, smoke)
15 / 17
Application
I Consider ‘birthwt’ dataset available in MASS package of R
I The dataset tries to look for the risk factors associated with low infant birth weight.
lowi =
1 indicator of birth weight less than 2.5 kg.
0 otherwise
i= 1,2...,189
zi =β0 + β1Agei+β2I(racei =black)
+ β3I(racei=others) +β4I(Smokei =yes) +i
i ∼N(0,1)and
P(low= 1|Age, race, smoke) =P(z >0|Age, race, smoke)
Application
I Consider ‘birthwt’ dataset available in MASS package of R
I The dataset tries to look for the risk factors associated with low infant birth weight.
lowi =
1 indicator of birth weight less than 2.5 kg.
0 otherwise
i= 1,2...,189
zi =β0 + β1Agei+β2I(racei =black)
+ β3I(racei=others) +β4I(Smokei =yes) +i
i ∼N(0,1)and
P(low= 1|Age, race, smoke) =P(z >0|Age, race, smoke)
15 / 17
Application
I Consider ‘birthwt’ dataset available in MASS package of R
I The dataset tries to look for the risk factors associated with low infant birth weight.
lowi =
1 indicator of birth weight less than 2.5 kg.
0 otherwise
i= 1,2...,189
zi =β0 + β1Agei+β2I(racei =black)
+ β3I(racei=others) +β4I(Smokei =yes) +i
i ∼N(0,1)and
P(low= 1|Age, race, smoke) =P(z >0|Age, race, smoke)
Application
Posterior Summary
> library(MCMCpack)
> set.seed(8135)
> data(birthwt)
> M1 <- MCMCprobit(low~as.factor(race)+age+smoke
+ , data=birthwt, b0 = 0, B0 = 10
+ ,marginal.likelihood="Chib95")
> M2 <- MCMCprobit(low~as.factor(race) +smoke
+ , data=birthwt, b0 = 0, B0 = 10
+ ,marginal.likelihood="Chib95")
> M3 <- MCMCprobit(low~as.factor(race) +age
+ , data=birthwt, b0 = 0 , B0 = 10
+ ,marginal.likelihood="Chib95")
16 / 17
Application
Posterior Summary
> BF <- BayesFactor(M1, M2, M3)
> round(BF$BF.mat,digit=3)
M1 M2 M3
M1 1.000 1.445 6.807 M2 0.692 1.000 4.711 M3 0.147 0.212 1.000
I BF1,2 = 1.445indicates that data occurred1.41 times more likely under Model 1 (M1) than Model 2 (M2). It can be considered as an anecdotal evidence
I BF1,3 = 6.807indicates that data occurred6.81 times more likely under Model 1 (M1) than Model 3 (M3). It can be considered as moderate evidence
I BF2,3 = 4.711indicates that data occurred4.71 times more likely under Model 2 (M2) than Model 3 (M3).
Application
Posterior Summary
> BF <- BayesFactor(M1, M2, M3)
> round(BF$BF.mat,digit=3)
M1 M2 M3
M1 1.000 1.445 6.807 M2 0.692 1.000 4.711 M3 0.147 0.212 1.000
I BF1,2 = 1.445indicates that data occurred1.41 times more likely under Model 1 (M1) than Model 2 (M2). It can be considered as an anecdotal evidence
I BF1,3 = 6.807indicates that data occurred6.81 times more likely under Model 1 (M1) than Model 3 (M3). It can be considered as moderate evidence
I BF2,3 = 4.711indicates that data occurred4.71 times more likely under Model 2 (M2) than Model 3 (M3).
17 / 17
Application
Posterior Summary
> BF <- BayesFactor(M1, M2, M3)
> round(BF$BF.mat,digit=3)
M1 M2 M3
M1 1.000 1.445 6.807 M2 0.692 1.000 4.711 M3 0.147 0.212 1.000
I BF1,2 = 1.445indicates that data occurred1.41 times more likely under Model 1 (M1) than Model 2 (M2). It can be considered as an anecdotal evidence
I BF1,3 = 6.807indicates that data occurred6.81 times more likely under Model 1 (M1) than Model 3 (M3). It can be considered as moderate evidence
I BF2,3 = 4.711indicates that data occurred4.71 times more likely under Model 2 (M2) than Model 3 (M3).