Subject: Statistics
Paper: Regression Analysis III
Module: Predicting the failure of O Ring
Regression Analysis III 1 / 12
Development Team
Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta
Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta
Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray,Analytics professional, Kolkata
Content reviewer: Department of Statistics, University of Calcutta
Regression Analysis III 2 / 12
The O-rings dataset
I Title of the dataset: Spache Shuttle Challenger O-rings.
I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.
I Source of the dataset in R: library faraway under the name
‘orings’.
I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.
Regression Analysis III The dataset 3 / 12
The O-rings dataset
I Title of the dataset: Spache Shuttle Challenger O-rings.
I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.
I Source of the dataset in R: library faraway under the name
‘orings’.
I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.
Regression Analysis III The dataset 3 / 12
The O-rings dataset
I Title of the dataset: Spache Shuttle Challenger O-rings.
I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.
I Source of the dataset in R: library faraway under the name
‘orings’.
I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.
Regression Analysis III The dataset 3 / 12
The O-rings dataset
I Title of the dataset: Spache Shuttle Challenger O-rings.
I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.
I Source of the dataset in R: library faraway under the name
‘orings’.
I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.
Regression Analysis III The dataset 3 / 12
Description of the variables
I The data frame has 23 observations on the following 2 variables.
I temp : temperature at launch in degrees F
I damage: number of damage incidents out of 6 possible.
Regression Analysis III The dataset 4 / 12
Description of the variables
I The data frame has 23 observations on the following 2 variables.
I temp : temperature at launch in degrees F
I damage: number of damage incidents out of 6 possible.
Regression Analysis III The dataset 4 / 12
Description of the variables
I The data frame has 23 observations on the following 2 variables.
I temp : temperature at launch in degrees F
I damage: number of damage incidents out of 6 possible.
Regression Analysis III The dataset 4 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data
I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.
I Let us consider ‘damage’ to be the response variable and
‘temp’ as the predictor variable.
I Now we fit a linear model.
I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.
I So we consider a Binomial distribution.
I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.
I Also since the dataset only has 6 observations, so a normal approximation is not recommended.
Regression Analysis III Analysis 5 / 12
Analysis of the data contd ...
I So we consider a logit model with the binomial family.
I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.
I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.
Regression Analysis III Analysis 6 / 12
Analysis of the data contd ...
I So we consider a logit model with the binomial family.
I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.
I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.
Regression Analysis III Analysis 6 / 12
Analysis of the data contd ...
I So we consider a logit model with the binomial family.
I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.
I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.
Regression Analysis III Analysis 6 / 12
Analysis of the data contd ...
I So we consider a logit model with the binomial family.
I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.
I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.
Regression Analysis III Analysis 6 / 12
Analysis of the data contd ...
I No we consider the binomial model with a probit link.
I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.
I Then we predicting the response at 31F for both models and found high probability of damage with either model.
I The goodness of fit test with the logit model showed that it fitted pretty well.
I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 7 / 12
Analysis of the data contd ...
I No we consider the binomial model with a probit link.
I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.
I Then we predicting the response at 31F for both models and found high probability of damage with either model.
I The goodness of fit test with the logit model showed that it fitted pretty well.
I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 7 / 12
Analysis of the data contd ...
I No we consider the binomial model with a probit link.
I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.
I Then we predicting the response at 31F for both models and found high probability of damage with either model.
I The goodness of fit test with the logit model showed that it fitted pretty well.
I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 7 / 12
Analysis of the data contd ...
I No we consider the binomial model with a probit link.
I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.
I Then we predicting the response at 31F for both models and found high probability of damage with either model.
I The goodness of fit test with the logit model showed that it fitted pretty well.
I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 7 / 12
Analysis of the data contd ...
I No we consider the binomial model with a probit link.
I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.
I Then we predicting the response at 31F for both models and found high probability of damage with either model.
I The goodness of fit test with the logit model showed that it fitted pretty well.
I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 7 / 12
Summary
I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.
I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.
I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.
I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.
I We were also able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 8 / 12
Summary
I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.
I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.
I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.
I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.
I We were also able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 8 / 12
Summary
I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.
I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.
I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.
I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.
I We were also able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 8 / 12
Summary
I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.
I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.
I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.
I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.
I We were also able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 8 / 12
Summary
I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.
I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.
I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.
I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.
I We were also able to conclude that the effect of launch temperature is statistically significant.
Regression Analysis III Analysis 8 / 12
Analysis of the orings dataset 1
# install.packages("faraway") library(faraway)
data(orings)
plot (damage/6 ~ temp, orings, xlim=c(25,85), ylim = c(0,1), xlab="Temperature",ylab="Prob of damage") ## graphical display lmod <- lm(damage/6 ~ temp, orings) ## fitting a linear model abline(lmod) ## adding the smooth line to the plot
## Problems : This plot can predict probabilities greater than 1
## or less than 0. One might suggest truncating predictions
## outside the range to zero or one as appropriate, but it does
## not seem credible that these probabilities would be exactly 0
## or 1, in this particular example or many others. Here we had
## assumed that the errors are approximately normally distributed.
1Faraway, J.J. (2006). Extending the Linear Model with R. Chapman
Regression Analysis III Sample R script to complement this module 9 / 12
Example contd ...
## So let us consider the Binomial distribution. Since there
## are ony 6 trials, so a normal approximation is not
## recommended. The variance of a binomial variable is not
## constant which violates the linear model.
## Logit model
logitmod <- glm(cbind(damage,6-damage) ~ temp, family=binomial, orings) summary(logitmod)
plot (damage/6 ~ temp, orings, xlim=c(25,85), ylim=c(0,1), xlab="Temperature", ylab="Prob of damage")
x <- seq(25,85,1)
lines(x,ilogit(11.6630-0.2162*x))
## Interpretation : the logit fit tends asymptotically toward 0
## and 1 at high and low temperatures, respectively. The fitted
## values never actually reach zero or one, so the model never
## predicts anything to completely certain or completely impossible.
Regression Analysis III Sample R script to complement this module 10 / 12
Example contd ...
## Probit model
probitmod <- glm(cbind(damage,6-damage) ~ temp, family=binomial(link=probit), orings)
summary(probitmod)
lines(x, pnorm(5.5915-0.1058*x), lty=2)
## the coefficients are different, but the fits are similar,
## predicting the response at 31ˇrF for both models ilogit (11.6630-0.2162*31)
pnorm(5.5915-0.1058*31)
## high probability of damage with either model
Regression Analysis III Sample R script to complement this module 11 / 12
Example contd ...
## Testing goodness of fit
pchisq(deviance(logitmod), df.residual(logitmod),lower=FALSE)
## Conclusion : model fits well
## this does not mean that this model is correct or that a
## simpler model might not also fit adequately
## The model without temperature is just the null model and the
## difference in degrees of freedom or parameters is one:
pchisq(38.9-16.9,1,lower=FALSE)
## Since the p-value is so small, we conclude that the effect of
## launch temperature is statistically significant.
Regression Analysis III Sample R script to complement this module 12 / 12