# Binary Logistic Regression Question

Six Sigma – iSixSigma › Forums › Operations › Manufacturing › Binary Logistic Regression Question

This topic contains 3 replies, has 2 voices, and was last updated by Eileen Beachell 8 months, 2 weeks ago.

- AuthorPosts
- March 13, 2018 at 4:21 am #55958

AnonymousHow to draw inference from analysis of 24 distinct Xs?

I am facing challenge to infer meaningful information from data of uncoated patches on hot rolled coils by conducting analysis using Minitab18. The project is about reducing the uncoated patches in the coils which is CTQ and is discrete in nature (Yes/NO) by analyzing 24 distinct Xs.

My query:

1. No interaction effects are included in the analysis despite I checked the option to include it. What could be the reason?

2. Rsq. predicted value is only 39%. But as per technical point of view, all possible Xs has been included. What could be the reason and how it could be resolved?

3. Residual plot is not normal. How could I proceed further?

4. How can information on “Fits and diagnostics for unusual observation” can be used?

5. What is the significance of AIC and how it is used during analysis?

I had attached a file for the same.###### Upload Errors:

**Uncoated patches data.rtf**: This file is too large. We accept files up to 1MB. You can edit your post by clicking on the*EDIT*link at the top of your post and upload a new version (e.g., smaller file or .zip), or embed a link to your Dropbox public folder.

March 13, 2018 at 4:28 am #202362

AnonymousPlease find the attached file.

March 13, 2018 at 9:38 am #202364There are some things you should have done before you even tried to run a multivariable logistic regression.

24 variables – how do you know you can include all of these terms in a multivariable model? Unless you built and ran a rather large experimental design the odds that these variables are independent of one another is close to nil. In short, any model you might build with the 24 will contain variables exhibiting a level of confounding such that you cannot make any statements concerning the actual correlation between a given variable and the outcome.

The ideal situation would be to check your variables using eigenvalues and condition indices but, to the best of my knowledge, Minitab does not have this capability. The next best thing would be a check using Variance Inflation Factors (VIF). It is my understanding that this is available in Minitab. Once you have checked the VIF’s and eliminated any variables whose VIF’s recommend their exclusion you could use the reduced list for your multivariable analysis.

Once you have the possible model terms you don’t just toss them into the machine and hit run. You will need to run backward elimination and stepwise logistic regression to identify your reduced model and it would be the fit of the reduced models that you would need to check for predictive capabilities.

To your questions:

1. No interaction effects are included in the analysis despite I checked the option to include it. What could be the reason?

I don’t know Minitab but it would be my guess that you have done something incorrectly with respect to interaction specification.

2. Rsq. predicted value is only 39%. But as per technical point of view, all possible Xs has been included. What could be the reason and how it could be resolved?

Low R2 in logistic models is a fact of life – A bigger question is why are you investigating the model capability using R2? You can compute a generalized R2 but it CANNOT be interpreted as a proportion of variance “explained” by the independent variables.

The way you assess a logistic model is to see how well your model discriminates between the two outcome groups (patches yes and patches no). What you need to do is build the ROC and look at summary measures of discrimination such as the area under that curve.

There’s no neat table for the area under the curve (c) but there are general rules of thumb

c = .5 – might as well toss a coin and call the results in the air

.5 < c < .7 not much better than a coin toss

.7 < c < .8 – acceptable discrimination

.8 < c < .9 excellent discrimination

c > .9 outstanding discriminationYou will, of course, want to look at other discriminate statistics in addition to the area under the curve.

A good discussion of this issue and the steps you need to take to assess your model can be found in Applied Logistic Regression 3rd Edition – Hosmer, Lemshow, Sturdivant – Chapter 5 Assessing the Fit of the Model. You should be able to get this book through inter-library loan.

3. Residual plot is not normal. How could I proceed further?

True – it’s a logistic regression so they shouldn’t be. The way you proceed with assessment is as stated above.

4. How can information on “Fits and diagnostics for unusual observation” can be used?

You use them to assess the stability of your model.

DFBETAS give a measure of how much each regression coefficient changes when a particular observation is deleted. Usually you will find the extreme values, drop them from the list and re-run the analysis to see if there is any major change in final model terms.

Hat matrix diagonal – measures how extreme the observation is in the “space” of explanatory variables – use the same way as DFBETAS statistics.

etc.

It is my understanding that Minitab has some good on-line text which describes the value of the “Fits and diagnostics for unusual observation” output and what you can do with them.5. What is the significance of AIC and how it is used during analysis?

This is just one of a number of statistics used to compare models with different numbers of parameters. The lower the AIC is better and, no, there is no statistical test to compare values of AIC.

Again, you should check the Minitab website for more details of this statistic and others.

April 2, 2018 at 6:33 am #202430Interesting that you have focused on hot rolling for a coating problem. Seems to me this is not a planned experiment. You are inspecting coils after the coating process and determining whether there are patches of uncoated areas on the coil. Then you are taking 24 variables from hot line rolling process, which are automated data collections, and hoping to find the needle in the haystack, so to speak. Most bare spots are not caused by the hot line.

Although you have focused your questions related to the statistics, I don’t think you will find the root cause based on your questions.

You need to gather your process experts, lay out the entire process (usually cold mill to finish processing) and determine the key variables to study. Then take some dedicated coils and actually plan a design of experiment to focus on the vital few variables to understand the main effects and the interactions. Aluminum finishing is a minefield of interaction effects.

Your issues are really not with the statistical analysis but with your hope of finding a quick solution in the automated data collection process from the hot line. Not likely.

- AuthorPosts

You must be logged in to reply to this topic.