• No results found

Paper: Regression Analysis III

N/A
N/A
Protected

Academic year: 2022

Share "Paper: Regression Analysis III"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Subject: Statistics

Paper: Regression Analysis III

Module: Predicting the failure of O Ring

Regression Analysis III 1 / 12

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Bhaswati Ganguli,Professor, Department of Statistics, University of Calcutta

Content writer: Sayantee Jana, Graduate student, Department of Mathematics and Statistics, McMaster University Sujit Kumar Ray,Analytics professional, Kolkata

Content reviewer: Department of Statistics, University of Calcutta

Regression Analysis III 2 / 12

(3)

The O-rings dataset

I Title of the dataset: Spache Shuttle Challenger O-rings.

I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.

I Source of the dataset in R: library faraway under the name

‘orings’.

I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.

Regression Analysis III The dataset 3 / 12

(4)

The O-rings dataset

I Title of the dataset: Spache Shuttle Challenger O-rings.

I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.

I Source of the dataset in R: library faraway under the name

‘orings’.

I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.

Regression Analysis III The dataset 3 / 12

(5)

The O-rings dataset

I Title of the dataset: Spache Shuttle Challenger O-rings.

I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.

I Source of the dataset in R: library faraway under the name

‘orings’.

I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.

Regression Analysis III The dataset 3 / 12

(6)

The O-rings dataset

I Title of the dataset: Spache Shuttle Challenger O-rings.

I Source of the dataset: S. Dalal, E. Fowlkes and B. Hoadley (1989) “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association. 84: 945-957.

I Source of the dataset in R: library faraway under the name

‘orings’.

I Description of the dataset: The 1986 crash of the space shuttle Challenger was linked to failure of O-ring seals in the rocket engines. Data was collected on the 23 previous shuttle missions. The launch temperature on the day of the crash was 31F.

Regression Analysis III The dataset 3 / 12

(7)

Description of the variables

I The data frame has 23 observations on the following 2 variables.

I temp : temperature at launch in degrees F

I damage: number of damage incidents out of 6 possible.

Regression Analysis III The dataset 4 / 12

(8)

Description of the variables

I The data frame has 23 observations on the following 2 variables.

I temp : temperature at launch in degrees F

I damage: number of damage incidents out of 6 possible.

Regression Analysis III The dataset 4 / 12

(9)

Description of the variables

I The data frame has 23 observations on the following 2 variables.

I temp : temperature at launch in degrees F

I damage: number of damage incidents out of 6 possible.

Regression Analysis III The dataset 4 / 12

(10)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(11)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(12)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(13)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(14)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(15)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(16)

Analysis of the data

I Objective : To understand how the probability of failure in a given O-ring is related to the launch temperature and predicting that probability when the temperature is 31F.

I Let us consider ‘damage’ to be the response variable and

‘temp’ as the predictor variable.

I Now we fit a linear model.

I It was observed from the plot of the linear model that it can predict probabilities greater than one or less than zero.

I So we consider a Binomial distribution.

I A binomial distribution assumes heteroscedasticity which is a violation of the linear model.

I Also since the dataset only has 6 observations, so a normal approximation is not recommended.

Regression Analysis III Analysis 5 / 12

(17)

Analysis of the data contd ...

I So we consider a logit model with the binomial family.

I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.

I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.

Regression Analysis III Analysis 6 / 12

(18)

Analysis of the data contd ...

I So we consider a logit model with the binomial family.

I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.

I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.

Regression Analysis III Analysis 6 / 12

(19)

Analysis of the data contd ...

I So we consider a logit model with the binomial family.

I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.

I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.

Regression Analysis III Analysis 6 / 12

(20)

Analysis of the data contd ...

I So we consider a logit model with the binomial family.

I The logit fit tends asymptotically toward zero and one at high and low temperatures, respectively.

I The fitted values never actually reach zero or one, so the model never predicts anything to completely certain or completely impossible.

Regression Analysis III Analysis 6 / 12

(21)

Analysis of the data contd ...

I No we consider the binomial model with a probit link.

I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.

I Then we predicting the response at 31F for both models and found high probability of damage with either model.

I The goodness of fit test with the logit model showed that it fitted pretty well.

I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 7 / 12

(22)

Analysis of the data contd ...

I No we consider the binomial model with a probit link.

I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.

I Then we predicting the response at 31F for both models and found high probability of damage with either model.

I The goodness of fit test with the logit model showed that it fitted pretty well.

I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 7 / 12

(23)

Analysis of the data contd ...

I No we consider the binomial model with a probit link.

I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.

I Then we predicting the response at 31F for both models and found high probability of damage with either model.

I The goodness of fit test with the logit model showed that it fitted pretty well.

I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 7 / 12

(24)

Analysis of the data contd ...

I No we consider the binomial model with a probit link.

I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.

I Then we predicting the response at 31F for both models and found high probability of damage with either model.

I The goodness of fit test with the logit model showed that it fitted pretty well.

I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 7 / 12

(25)

Analysis of the data contd ...

I No we consider the binomial model with a probit link.

I On comparison of the logit and the probit models we found that the coefficients are different, but the fits are similar.

I Then we predicting the response at 31F for both models and found high probability of damage with either model.

I The goodness of fit test with the logit model showed that it fitted pretty well.

I By testing the goodness of fit of the model without temperature, we were able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 7 / 12

(26)

Summary

I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.

I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.

I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.

I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.

I We were also able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 8 / 12

(27)

Summary

I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.

I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.

I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.

I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.

I We were also able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 8 / 12

(28)

Summary

I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.

I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.

I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.

I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.

I We were also able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 8 / 12

(29)

Summary

I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.

I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.

I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.

I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.

I We were also able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 8 / 12

(30)

Summary

I The ‘orings’ dataset contains information on 23 shuttle missions. It has 2 variables.

I The objective is to understand how the probability of failure in a given O-ring is related to the launch temperature.

I The response variable considered is ‘damage’ and a simple linear regression model is fitted. But this model was not appropriate.

I Hence we use a binomial model with logit link and the goodness of fit test shows that this model fits quite well.

I We were also able to conclude that the effect of launch temperature is statistically significant.

Regression Analysis III Analysis 8 / 12

(31)

Analysis of the orings dataset 1

# install.packages("faraway") library(faraway)

data(orings)

plot (damage/6 ~ temp, orings, xlim=c(25,85), ylim = c(0,1), xlab="Temperature",ylab="Prob of damage") ## graphical display lmod <- lm(damage/6 ~ temp, orings) ## fitting a linear model abline(lmod) ## adding the smooth line to the plot

## Problems : This plot can predict probabilities greater than 1

## or less than 0. One might suggest truncating predictions

## outside the range to zero or one as appropriate, but it does

## not seem credible that these probabilities would be exactly 0

## or 1, in this particular example or many others. Here we had

## assumed that the errors are approximately normally distributed.

1Faraway, J.J. (2006). Extending the Linear Model with R. Chapman

Regression Analysis III Sample R script to complement this module 9 / 12

(32)

Example contd ...

## So let us consider the Binomial distribution. Since there

## are ony 6 trials, so a normal approximation is not

## recommended. The variance of a binomial variable is not

## constant which violates the linear model.

## Logit model

logitmod <- glm(cbind(damage,6-damage) ~ temp, family=binomial, orings) summary(logitmod)

plot (damage/6 ~ temp, orings, xlim=c(25,85), ylim=c(0,1), xlab="Temperature", ylab="Prob of damage")

x <- seq(25,85,1)

lines(x,ilogit(11.6630-0.2162*x))

## Interpretation : the logit fit tends asymptotically toward 0

## and 1 at high and low temperatures, respectively. The fitted

## values never actually reach zero or one, so the model never

## predicts anything to completely certain or completely impossible.

Regression Analysis III Sample R script to complement this module 10 / 12

(33)

Example contd ...

## Probit model

probitmod <- glm(cbind(damage,6-damage) ~ temp, family=binomial(link=probit), orings)

summary(probitmod)

lines(x, pnorm(5.5915-0.1058*x), lty=2)

## the coefficients are different, but the fits are similar,

## predicting the response at 31ˇrF for both models ilogit (11.6630-0.2162*31)

pnorm(5.5915-0.1058*31)

## high probability of damage with either model

Regression Analysis III Sample R script to complement this module 11 / 12

(34)

Example contd ...

## Testing goodness of fit

pchisq(deviance(logitmod), df.residual(logitmod),lower=FALSE)

## Conclusion : model fits well

## this does not mean that this model is correct or that a

## simpler model might not also fit adequately

## The model without temperature is just the null model and the

## difference in degrees of freedom or parameters is one:

pchisq(38.9-16.9,1,lower=FALSE)

## Since the p-value is so small, we conclude that the effect of

## launch temperature is statistically significant.

Regression Analysis III Sample R script to complement this module 12 / 12

References

Related documents

We are now ready to see how the forward and backward probabilities can help compute the transition probability a i j and observation probability b i (o t ) from an ob-

 Probability of a state of the network is given by energy.  Probability of the state of a neuron is given

Trust, confidence, role models Same as what runs community services!.. What runs the

g.t ; ½/ D .¾ C ½/e .¾ C ½/ t : (8) This means that the age distribution of an exponen- tially growing population of objects with (identical) exponential age distributions

SOCIO-ECONOMIC DEVELOPMENT SERVICES For the Multifarious Development of Society at large, Old, Youth, School Dropouts, Housewives and Children of Financially Downtrodden

Jitendra Kumar, student of Dayalbagh Educational Institute, Agra completed a 6-week Internship Programme under Hankernest Technologies Pvt.. As part-fulfillment of the

Ventricle → left atrial pressure increases → Pulmonary hypertension →Right heart failure.  The characteristic physical

This report provides some important advances in our understanding of how the concept of planetary boundaries can be operationalised in Europe by (1) demonstrating how European