# Anova and normality

Six Sigma – iSixSigma Forums Old Forums General Anova and normality

Viewing 6 posts - 1 through 6 (of 6 total)
• Author
Posts
• #31427

Hylton
Participant

Our MBB insists that data used in Anova (minitab) must be normal for accurate results. I can find no mention of this in MiniTab. The test populations are small, just 6 pieces per, so how could a normality test be accurate anyhow?

0
#82887

Ed
Participant

Score a point for the MBB.  Normality is the underlying assumption in the use of ANOVA.  Do the six data points represent a sub group that you are comparing against four or five other sub groups?  The rule of thumb I have used is that you need 30-50 data points before any preliminary conclusions can be made.

0
#82897

Robert Butler
Participant

Your MBB is taking the conservative approach to ANOVA and there is certainly nothing wrong with that.  However, you should know that both ANOVA and the t-test are quite robust with respect to non-normal data-in other words you can still get accurate results with non-normal data.  Pages 51-53 of The Design and Analysis of Industrial Experiments, Second Edition ,by Davies has a good discussion of the issue and the end of the chapter cites papers about this topic if you wish to read more.
As for the accuracy of a normality test with six pieces of data…it will be as accurate as 6 pieces of data will permit it to be!  I realize that this sounds flippant, I don’t mean it to be.  If you have the luxury of gathering 20,30,40 etc. measures per population before running an ANOVA or a t-test, by all means do so but if your situation is such (as mine often is) that 6 measurements per population is a genuine strain on resources then it is better to proceed with an analysis of the 6 measurements per population than to do nothing.
Regardless of the amount of data you have, if you have gathered it over time and if you made a point of keeping track of when you gathered it you should first plot the data by population against time.  Often these kinds of plots will reveal much more about your process than all of your other efforts combined.
There is an excellent article Statistics and Reality-Part 2 by Balestracci in the Winter 2002 ASQ Statistics Division Newsletter which discusses the benefits of the kind of data plotting I mentioned.

0
#82900

Chip Hewette
Participant

If you are testing six of method A vs. six of
method B, creating six replicates of each factor
level, it is better to evaluate each individual
measurement for ‘quality.’ What is the range of
values for the six observations? Is this range as
expected? Is one value way different from all the
others? Why? The MBB is correct in a sense,
that the observations must all be of high quality
and truly represent the factor. This is not the
same as requiring all six observations to fit on a
normal distribution line.

0
#82906

Hylton
Participant

MiniTab’s on-board tutorial mentions no need for normality, only that the factors must be discrete. You can even use attributes for responses. How can we then demand normality? Their own example for one-way anova uses only four carpet samples per wear test group.
Our experiment was with springs manufactured on different tools, macnines and lines. The scrap is so great we were loking for a screening tool to direct us toward the key variable. As for quality of data, we did  insist on a robust gauge using the anova GRR.

0
#82918

MMBB
Participant

Robert knows his stuff. Beware of the other replies.
The one-way ANOVA (and t-tests) assumes that the MEANS are normally distributed. That can be achieved in one of two ways (ignoring use of transformations):
1 – the raw data themselves are normally distributed, so naturally the means of the raw data will be normally distributed.
2 – the raw data are NOT normally distributed, but the sample sizes are large enough that, as predicted by the central limit theorem (CLT), the means are normally distributed. If the raw data are relatively symmetric, the sample sizes can be as small as n=5 and the CLT will ensure normality of the means. If the raw dat are very skewed, the sample sizes will need to be much larger, maybe even n=50, before the CLT will provide normality of the means.

0
Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.