Home › Forums › General Forums › Methodology › DOE, Normality Test

This topic contains 7 replies, has 5 voices, and was last updated by Robert Butler 3 months, 4 weeks ago.

Viewing 8 posts - 1 through 8 (of 8 total)

- AuthorPosts
A question regarding DOE and normality (thru residuals)

In most of the cases i have found that when running a DOE;

first run all variables, and normality is not ok

run only significant variables, normality is OK.But i have find 1 case that after only having signficant variables normality is still not OK, but if i removed the interactions (and only have main factors) the normality is OK again,

do you know why is that?

how to fix?, should i run only with main factors to comply with the normality test???What’s even more important is the variance equal across the region of results.

After you have your reduced model, have you ran the process at the “optimized” settings to see if you get similar enough results–this is part of validating your model from the DOE.

Yes i Ran and i have the same results either if i decide to remove or leave some interactions.

i just dont get why it change the residuals from normal to non normal or viceversa.

One needs to be careful when doing a DOE since there are assumptions which need to be verified in order to use it.

One can do this with Minitab’s Four in One residual plots (Stat > DOE > Factorial > Analyze Factorial Design > Graphs).

Speaking from a statistics background, DOE is just a regression model at the end of the day. If your residuals are not normally distributed, then there is evidence that your model may have some special cause acting on it.

Thomas thanks but that doesnt answer my question.

let me put an example;After running and having only significant variables.

OPTION1

Lets say i have; A,B,C,D, AB AD, ABC

but with that the normality assumption doesnt complyOPTION2

So then i decided to eliminate (ABC interaction) then i have; A, B, C, D, AB, AD

with that the normality assumption PASS.should i proceed then with the 2nd option.??? (because i pass the normality assumption with that)

Let’s start with the following model:

response = function of (A,B,C,D,AB,AC,AD,BC,BD,CD,ABC,ABD,BCD)

If one did a stepwise regression model of all these variables, then one could see the AIC differences between models. It would be interesting to see if there was the model with (A,B,C,D, AB AD, ABC) had a lower AIC than the model with (A, B, C, D, AB, AD).

In any event, you could use the stepwise regression model to determine which models were better than others in terms of AIC.

One needs to be careful when using stepwise regression. If some of your variables are highly correlated then one may eliminate a variable which may not be wise in some cases.

This technique may be useful for you!You haven’t told us what you mean by “first run all variables, and normality is not ok

run only significant variables, normality is OK.” Specifically, how are you determining that “normality is not ok”? If all you are doing is running your residuals through some test such as Shapiro-Wilks then you need to back up and run a real residual analysis. That means looking at plots of residuals vs predicted, plotting the residuals on a normal probability plot, residual histogram, etc. Any good book on regression will tell you what you need to do.It is very easy to have residuals fail one or more of the normality tests for the simple reason that these tests are very sensitive to non-normality. Indeed it is easy to have data from a random number generator with an underlying normal distribution to fail these tests and it is for this reason that so much emphasis is placed on an evaluation of the graphics of the residuals.

The t-test is robust to normality and the issue

…Rats, cut off the last part of my post..

The t-test is robust to normality and the issue is that of “approximate” normality. In order to get a good visual understanding of what that means you should borrow Fitting Equations to Data by Daniel and Wood from the library and look at the cumulative distribution plots of normal data (for various sample sizes) pages 34-43 (in the 2nd edition). Another thing you should do is run side-by-side checks for significant differences between data distributions using both the t-test and the Wilcox-Mann_Whitney test to give yourself an understanding of just how crazy non-normal data has to be before the t-test fails to indicate significance when the Wilcox-Mann-Whitney test indicates a significant difference exists.

- AuthorPosts

Viewing 8 posts - 1 through 8 (of 8 total)