# Binary Logistic Regression

Six Sigma – iSixSigma › Forums › General Forums › Tools & Templates › Binary Logistic Regression

- This topic has 5 replies, 2 voices, and was last updated 9 years, 11 months ago by Robert Butler.

- AuthorPosts
- June 21, 2010 at 3:22 pm #53490

DoaParticipant@optomist1**Include @optomist1 in your post and this person will**

be notified via email.Hi To All,

I am working on a process improvement problem (in Minitab); I have run correlation marix, step-wise regression and I am trying to get the best fit for two input variables. Step-Wise Regression yielded two significant input variables both have with p-values of 0.00 and R2 values of 39.74 and 61.21.

When running Binary Logistic Regression to get the best fit and establish suggested operating ranges do you do so with both variables at once or one at a time. Seems like a naive question; its my first time through.

Obviously the calculated event probabilities will be different in each case. I think they should be done serially but I would like some input or confirmation.

Many thanks…..

Marty

0June 21, 2010 at 7:47 pm #190354

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I think you will need to provide some clarification.

1. You say you have R2 values of 39 and 61. Do you mean .39 and .61? R2 can only vary between 0 and 1.

2. If you just dropped the decimal points in the R2 values in #1 – what do you mean by the statement “Regression yielded two significant input variables both have with p-values of 0.00 and R2 values of 39.74 and 61.21.”

a. do you mean that you developed a model with both X’s and the final R2 was .61 or do you mean something else?

b. If you built a model and R2 values for the first term was .39 and then when the second was added (we’re assuming you were running stepwise and that the “final model has two terms and that the resultant R2 is .61) what kind of checks did you run to make sure the various X’s were independent enough to be considered for inclusion in the model building effort?3. The initial part of your post gives the impression that you built a model using standard regression methods. The second part of your post gives the impression that you have shifted from standard regression to logistic regression for some reason.

a. Did you do this or was the entire exercise run using logistic regression (we’re assuming the Y was binomial)?

4. If, in fact, you ran the entire exercise using stepwise and BE logistic techniques and your reduced model consists of a Y and two X’s then what you have is a model with odds ratios, instead of coefficients, associated with each X.

You test logistic model adequacy by running sensitivity and specificity tests of the reduced model and then you compare these results to the sensitivity and specificity of either the full model or some other model with different/additional terms.

If you want to check assumptions of fit you run the final model, and then plot the standardized Pearson residuals against the ordered observation number for the individual measures. This index plot allows you to identify observations with unusually large residuals relative to other observations in the data set. To test for influence you then re-run the analysis with these data points removed

0June 21, 2010 at 8:52 pm #190355

DoaParticipant@optomist1**Include @optomist1 in your post and this person will**

be notified via email.Hi Robert,

The model was constructed using Minitab, 1) step-wise regression to narrow the field of potential predictors or “X”‘s, yes the R2 are decimal, and are the cummulative results from Minitab.

Your points 1 & 2 are correct, I will endeavor to be more specific rather than assume; this site serves a diverse geograhic and business community as I am discovering details and specifics are essential, your input well taken.

1) My first step was to build a correlation matrix of all potential predictors to give a little general insight as to what may be significant as well as any possible correlation among the “X”‘s. Crude, but it gives some insight.

2) Step-Wise Regression, inputing all eight of the potential predictors, (setting alpha “in and out” at 0.15) mintab gave me two predictors with pvalues of 0.00 each and cummulative R2 adjusted of 39.74 and 61.21 respectively. The only check I made was in the correlation matrix table, the two variables or predictors in question have an R of 0.29 or R2 of 0.08.

3) I then ran a Binary Logistic Regression with the same two values, this yielded p values of 0.046 and 0.086. This was done to establish a best fit for the model, utilizing the “Event Probability” capabilities in Minitab. Then plotting EPRO1 (Y axis) vs. one “X” or the other, to establish the setting to meet my probability goal in this case 10% or less. I am aware of the “Odds Ratios”, however my prof is using EPRO function instead. We use this plot to select settings for

I am aware that if there is some collinearity of “X”‘s or correaltion among the “X”‘s Step-wise helps to “minimize” the effects and that the ultimate answer is DOE…our subject tomorrow.

Thanks for your insight Robert..

0June 21, 2010 at 9:11 pm #190356

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.That helps.

A couple of things.

I don’t know Minitab but everything I’ve read on this site concerning it indicates it has VIF capability – VIF is much better than a simple correlation matrix if you want to try to get some sense of X variable colinearity. The best bet is running eigenvectors and looking at condition indices but, as far as I know, this isn’t an option with Minitab. As for stepwise minimizing the effects of collinearity – not really.

It is also my understanding that Minitab can do stepwise regression in the logistic format. If that is true and if the VIF analysis indicates your X’s are sufficiently independent of one another then instead of picking the X’s for model inclusion using a correlation matrix I’d recommend putting all of the terms in the model and let Minitab test them in the usual stepwise manner and see what that gives you in as far as term significance is concerned. If Minitab can’t do this then I’d recommend putting all of the terms in the model and running your backward elimination by hand.

Take the model that results from this effort and run the tests I mentioned earlier.

0June 21, 2010 at 9:25 pm #190357

DoaParticipant@optomist1**Include @optomist1 in your post and this person will**

be notified via email.Hi Robert,

I’ll give your advice a shot in the next couple of days….thanks for the assistance. If not minitab, what statisitcal analysis software do you use?

Regards,

Marty0June 22, 2010 at 11:52 am #190358

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.I use SAS

0 - AuthorPosts

You must be logged in to reply to this topic.