# regression analysis

Six Sigma – iSixSigma › Forums › Old Forums › General › regression analysis

- This topic has 9 replies, 8 voices, and was last updated 18 years, 2 months ago by Robert Butler.

- AuthorPosts
- June 6, 2002 at 2:23 am #29585

Jay FleischmannParticipant@Jay-Fleischmann**Include @Jay-Fleischmann in your post and this person will**

be notified via email.Hi. We’re preparing to perform a DOE for an issue we’re having at work. Today, we were considering which factors to include in the experiment. This was done by taking suggestions on people’s experience with the process. This was our first mistake.

What I’m thinking is performing a multiple regression on parts that are already built; we would include, what we believe are the significant factors and whichever of these had significant coefficients, we would include them in our DOE.

Then I really got to thinking…isn’t a multiple regression kind of a ‘free’ DOE? What I mean is, you’re taking data after the fact from parts that have already been created. There are no resources dedicated to this other than one person’s time. I realize this is different from a DOE in that it is impossible to block any noisel, randomize, or consider any interactions between potential main effects or higher. It seems like a really great way to head into a DOE…any thoughts?0June 6, 2002 at 12:08 pm #76137Jay:

Indeed, design of experiment is based on regression science but offers a lot more to the user.

Taking historical data and plugging it into simple or multiple regression can certainly lend insight as to the movers and shakers of the measures that you deem key to your performance objectives. However, using historical data makes it very difficult to discern and dissiminate everything else that may have been contributing to the effect presumably caused by the independent variables – i.e., the reliability of your data can always be questioned because it was not produced and gathered in a controlled manner. On the other hand, in your words, it is “free” and is certainly an excellent generator of insight as to what factors to experiment with.

Using factors in a DOE that arise as influential to your key measures via regression would enable you to experiment in a controlled environment whereby you can freely be more bold with the levels of the factors to really take a stab at seeing the true magnitude of their effect. Also, as you stated, a DOE will allow you to block and/or restrict randomization as well as replicate the design to uncover variation effects. Most importantly, however, a DOE will shed insight on possible interactions that would be very difficult to disseminate with regression.0June 6, 2002 at 1:00 pm #76140

gretchen ingvasonParticipant@gretchen-ingvason**Include @gretchen-ingvason in your post and this person will**

be notified via email.Jay,

To add to Bob’s comments, regression anaylsis will tell you where you’ve been and may help to determine where to start.

However, DOE is a very powerful tool, not only in the actual running and analysis portion, but the planning step. We have recently begun to use this tool and I’ve found that 2 weeks planning for a one day experiment has led to a much greater understanding of the process in general.

I disagree that your first mistake was asking for operations personnel about their process. They know it best and are not prone to making assumptions.

During the planning phase, everything that can affect the process needs to be discussed and evaluated. There are definitely things you can control and other’s you cannot. But understanding what can affect your process before starting the experiment is crucial. While it can appear overwhelming, understanding the “noise” will assist in the final analysis.

When performing DOEs on our processes, while it may take longer, I generally follow a two pronged approach. Run a screen design to prioritize the factors I can control, followed by opitimization experiments to set run conditions.

Hope this helps.

Gretchen0June 6, 2002 at 1:15 pm #76142

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.You can certainly use regression analysis to look at historical data and the results may help guide you in your thinking but there are a number of caveats that you need to keep in mind.

1. There is a very high probability that your analysis will show variables that you know to be important to your process are not significant. The reason this will occur is because these are the process variables that you control. The fact that they don’t appear merely suggests that you have done an excellent job of controlling them. Because of this control, any interactions with these variables will also appear to be not significant.

2. You will have to do a great deal of data preparation. In particular you will have to perform a full blown regression analysis of your X’s – eigenvector analysis, VIF, etc. What you cannot do, unless you plan on going wrong with great assurance, is to take your X’s and just plug them into a simple correlation matrix.

3. Data consistency: When recording production data people do not record everything that is done to the process. Consequently, many changes are made to variables that are not part of record keeping process. If you have enough data and enough X’s to play with there is a good chance that, just by dumb luck, some of these unrecorded changes will correlate with the X’s that were tracked. The end result will be correlation with no hope of identifiying the underlying cause.

Your initial attempt to ask for suggestions concerning possible factors is a good one. You mentioned that this was a mistake. I guess I’d like to know why this was so. Too many X’s? Too few? If it was too many I’d recommend brainstorming with a wider audience and then a secret ballot of all of the proposals with everyone rating the list from most important to least important. If the people filling out the ballots are the people who know the process you should wind up with a pretty decent set X’s to investigate. Also, remember that for a first look you are going after big hitters. Don’t sweat the interactions-use saturated designs -15 variables in 16 experiments + one additional experiment for error estimate, 31 variables in 32 experiments…etc.0June 6, 2002 at 1:22 pm #76144Jay –

Try to get a copy of Box, Hunter and Hunter “Statistics for Experimenters” and read the section on pitfalls of what they call “happenstance data”. This will expand on the excellent advice Robert posted. It’s about 10 pages of the best use and abuse of regression versus designed experimentation.0June 6, 2002 at 1:46 pm #76149

Carl HaegerParticipant@Carl-Haeger**Include @Carl-Haeger in your post and this person will**

be notified via email.If you have not already done so, consider looking at how noisy and stable the measurement(s) of the Y(s) are.

With historical data, sometimes changes in Y values are not necessarily due to Xs, but changes to measurements. If the meaurement is “noisy’ with high Prec./Total Ratio, then you may not easily see many significant Xs in your DOEs.0June 7, 2002 at 4:32 pm #76195

Jay FleischmannParticipant@Jay-Fleischmann**Include @Jay-Fleischmann in your post and this person will**

be notified via email.hi bob. thanks for an excellent reply. you said:

“There is a very high probability that your analysis will show variables that you know to be important to your process are not significant. The reason this will occur is because these are the process variables that you control. The fact that they don’t appear merely suggests that you have done an excellent job of controlling them.”

so, to confirm, you’re saying that if i have a predictor (x) in my regression analysis that is not varying at all, i.e. i am extremely capable, then it will not appear as significant in my regression analysis…right? i think i understand but want to make sure.

Also, you asked why i didn’t like taking suggestions from process people. i totally didn’t deliver what i was thinking. i think choosing the factors for any doe comes from brainstorming sessions with people that interract with the process on a daily basis. but simply taking their word for it and including it in your experiment where each treatment costs, say $35,000, just isn’t logical. hence, the reason for the topic of performing a multiple regression analysis to quantify peoples suggestion on factors. sorry, i wasn’t clear at all. thanks for your help.

jay

0June 7, 2002 at 11:02 pm #76207

Mike R.Participant@Mike-R.**Include @Mike-R. in your post and this person will**

be notified via email.Jay,

You’ve gotten lots of good advice and suggestions. I’d like to add one more thing. Probably the biggest difference between regression analysis on historical data and a properly designed experiment is the ability to infer cause/effect relationships. Historical data only provide associations between independent and dependent variables. A properly designed and conducted experiment allows one to make causal inferences that cannot be made appropriately with historical data. Good luck with your experimentation and may all your statistics be significant.

Mike.0June 10, 2002 at 4:41 am #76238If people would just read BH2 (and understand at least the concepts, if not the mathematical details) before offering their opinions or problems, then much of the clutter on this Forum wouldn’t be there.

0June 10, 2002 at 12:22 pm #76246

Robert ButlerParticipant@rbutler**Include @rbutler in your post and this person will**

be notified via email.jay,

That is correct. A well controlled X variable will typically not appear as significant when running a regression on historical data. Given your interest in the subject , Dave’s comment, which was seconded by others, is a good one. The Box, Hunter, Hunter books is a very readable statistics book and it covers many of the issues that you will face in your efforts.0 - AuthorPosts

The forum ‘General’ is closed to new topics and replies.