DOE, Normality Test
April 18, 2018 at 7:22 am #55981
A question regarding DOE and normality (thru residuals)
In most of the cases i have found that when running a DOE;
first run all variables, and normality is not ok
run only significant variables, normality is OK.
But i have find 1 case that after only having signficant variables normality is still not OK, but if i removed the interactions (and only have main factors) the normality is OK again,
do you know why is that?
how to fix?, should i run only with main factors to comply with the normality test???0April 18, 2018 at 9:22 am #202486
What’s even more important is the variance equal across the region of results.
After you have your reduced model, have you ran the process at the “optimized” settings to see if you get similar enough results–this is part of validating your model from the DOE.0April 18, 2018 at 9:46 am #202488
Yes i Ran and i have the same results either if i decide to remove or leave some interactions.
i just dont get why it change the residuals from normal to non normal or viceversa.0April 18, 2018 at 10:31 am #202489
One needs to be careful when doing a DOE since there are assumptions which need to be verified in order to use it.
One can do this with Minitab’s Four in One residual plots (Stat > DOE > Factorial > Analyze Factorial Design > Graphs).
Speaking from a statistics background, DOE is just a regression model at the end of the day. If your residuals are not normally distributed, then there is evidence that your model may have some special cause acting on it.0April 18, 2018 at 12:46 pm #202490
Thomas thanks but that doesnt answer my question.
let me put an example;
After running and having only significant variables.
Lets say i have; A,B,C,D, AB AD, ABC
but with that the normality assumption doesnt comply
So then i decided to eliminate (ABC interaction) then i have; A, B, C, D, AB, AD
with that the normality assumption PASS.
should i proceed then with the 2nd option.??? (because i pass the normality assumption with that)0April 18, 2018 at 2:06 pm #202491
Let’s start with the following model:
response = function of (A,B,C,D,AB,AC,AD,BC,BD,CD,ABC,ABD,BCD)
If one did a stepwise regression model of all these variables, then one could see the AIC differences between models. It would be interesting to see if there was the model with (A,B,C,D, AB AD, ABC) had a lower AIC than the model with (A, B, C, D, AB, AD).
In any event, you could use the stepwise regression model to determine which models were better than others in terms of AIC.
One needs to be careful when using stepwise regression. If some of your variables are highly correlated then one may eliminate a variable which may not be wise in some cases.
This technique may be useful for you!0April 21, 2018 at 5:59 am #202500
You haven’t told us what you mean by “first run all variables, and normality is not ok
run only significant variables, normality is OK.” Specifically, how are you determining that “normality is not ok”? If all you are doing is running your residuals through some test such as Shapiro-Wilks then you need to back up and run a real residual analysis. That means looking at plots of residuals vs predicted, plotting the residuals on a normal probability plot, residual histogram, etc. Any good book on regression will tell you what you need to do.
It is very easy to have residuals fail one or more of the normality tests for the simple reason that these tests are very sensitive to non-normality. Indeed it is easy to have data from a random number generator with an underlying normal distribution to fail these tests and it is for this reason that so much emphasis is placed on an evaluation of the graphics of the residuals.
The t-test is robust to normality and the issue0April 21, 2018 at 7:50 am #202501
…Rats, cut off the last part of my post..
The t-test is robust to normality and the issue is that of “approximate” normality. In order to get a good visual understanding of what that means you should borrow Fitting Equations to Data by Daniel and Wood from the library and look at the cumulative distribution plots of normal data (for various sample sizes) pages 34-43 (in the 2nd edition). Another thing you should do is run side-by-side checks for significant differences between data distributions using both the t-test and the Wilcox-Mann_Whitney test to give yourself an understanding of just how crazy non-normal data has to be before the t-test fails to indicate significance when the Wilcox-Mann-Whitney test indicates a significant difference exists.0
You must be logged in to reply to this topic.