Anova and normality
Six Sigma – iSixSigma › Forums › Old Forums › General › Anova and normality
 This topic has 5 replies, 5 voices, and was last updated 19 years, 6 months ago by MMBB.

AuthorPosts

February 10, 2003 at 4:38 pm #31427
HyltonParticipant@hyltoto Include @hyltoto in your post and this person will
be notified via email.Our MBB insists that data used in Anova (minitab) must be normal for accurate results. I can find no mention of this in MiniTab. The test populations are small, just 6 pieces per, so how could a normality test be accurate anyhow?
0February 10, 2003 at 5:08 pm #82887Score a point for the MBB. Normality is the underlying assumption in the use of ANOVA. Do the six data points represent a sub group that you are comparing against four or five other sub groups? The rule of thumb I have used is that you need 3050 data points before any preliminary conclusions can be made.
0February 10, 2003 at 9:48 pm #82897
Robert ButlerParticipant@rbutler Include @rbutler in your post and this person will
be notified via email.Your MBB is taking the conservative approach to ANOVA and there is certainly nothing wrong with that. However, you should know that both ANOVA and the ttest are quite robust with respect to nonnormal datain other words you can still get accurate results with nonnormal data. Pages 5153 of The Design and Analysis of Industrial Experiments, Second Edition ,by Davies has a good discussion of the issue and the end of the chapter cites papers about this topic if you wish to read more.
As for the accuracy of a normality test with six pieces of data…it will be as accurate as 6 pieces of data will permit it to be! I realize that this sounds flippant, I don’t mean it to be. If you have the luxury of gathering 20,30,40 etc. measures per population before running an ANOVA or a ttest, by all means do so but if your situation is such (as mine often is) that 6 measurements per population is a genuine strain on resources then it is better to proceed with an analysis of the 6 measurements per population than to do nothing.
Regardless of the amount of data you have, if you have gathered it over time and if you made a point of keeping track of when you gathered it you should first plot the data by population against time. Often these kinds of plots will reveal much more about your process than all of your other efforts combined.
There is an excellent article Statistics and RealityPart 2 by Balestracci in the Winter 2002 ASQ Statistics Division Newsletter which discusses the benefits of the kind of data plotting I mentioned.0February 11, 2003 at 2:14 am #82900
Chip HewetteParticipant@ChipHewette Include @ChipHewette in your post and this person will
be notified via email.If you are testing six of method A vs. six of
method B, creating six replicates of each factor
level, it is better to evaluate each individual
measurement for ‘quality.’ What is the range of
values for the six observations? Is this range as
expected? Is one value way different from all the
others? Why? The MBB is correct in a sense,
that the observations must all be of high quality
and truly represent the factor. This is not the
same as requiring all six observations to fit on a
normal distribution line.0February 11, 2003 at 1:52 pm #82906
HyltonParticipant@hyltoto Include @hyltoto in your post and this person will
be notified via email.MiniTab’s onboard tutorial mentions no need for normality, only that the factors must be discrete. You can even use attributes for responses. How can we then demand normality? Their own example for oneway anova uses only four carpet samples per wear test group.
Our experiment was with springs manufactured on different tools, macnines and lines. The scrap is so great we were loking for a screening tool to direct us toward the key variable. As for quality of data, we did insist on a robust gauge using the anova GRR.0February 11, 2003 at 8:40 pm #82918Robert knows his stuff. Beware of the other replies.
The oneway ANOVA (and ttests) assumes that the MEANS are normally distributed. That can be achieved in one of two ways (ignoring use of transformations):
1 – the raw data themselves are normally distributed, so naturally the means of the raw data will be normally distributed.
2 – the raw data are NOT normally distributed, but the sample sizes are large enough that, as predicted by the central limit theorem (CLT), the means are normally distributed. If the raw data are relatively symmetric, the sample sizes can be as small as n=5 and the CLT will ensure normality of the means. If the raw dat are very skewed, the sample sizes will need to be much larger, maybe even n=50, before the CLT will provide normality of the means.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.