In my on-going journey to master the tools & techniques of Lean Sigma I have been reviewing the materials on ANOVA. How useful this has been, it has really moved me on a level.

ANOVA comes across in training as a very useful tool and Tukey’s pairwise comparison really hits the spot. However in our business with samples of skewed non-normal distributions with unequal variance, I infrequently meet the assumptions for use, “Samples must be normally distributed and of equal variance” and swap over to Mood’s Median Test.

So when I reviewed this topic in my own time I was surprised to read in Principles of Applied Statistics that “in practise the normality and equal variance assumptions are not important” I was intrigued. Not having a statistical background I had always assumed these were fixed assumptions and used the tools accordingly. Could it be that I had misunderstood the theory? What about the other statistical tests based on the normal distribution e.g. 1-sample t-test, 2-sample t-test, paired t-test and F-test for equal variance, could these be used as well?

I started doing some empirical tests in Minitab. Useful research but did not answer the questions. I then reread my collection of Six Sigma books. These gave differing and sometimes vague advice on how to handle hypothesis tests on non-normal & unequal variance samples. From this I got two things, first I needed to look outside of my “Universe” to find an information source I could trust, second I needed to be more specific in my questions. So I framed the questions:

  • To what degree does non-normality impact the hypothesis test?
  • To what degree does unequal variance impact the hypothesis test?
Handpicked Content:   Unexpected Application of Lean Six Sigma Tools in Daily Operational Activities

To find an information source I could trust I went to the library and borrowed the biggest academic book on statistics I could find, Introduction to the Practice of Statistics. It went in-depth in the areas I was looking for guidance.

In terms of output I wanted to produce a simple table of practical advice that could be used. Here is what it looks like now, still working on it and going through the theory.

HypothesisTest Non-normal Data Unequal Variance
1-sample t-test Robust test except against outliers or strongly skewed data. Samplesless than 15 – must have normal data,less than 40 -no outliers or strongly skewed andgreater than40 – no outliers but can be skewed No significant impact
2-sample t-test
In research In research
Paired t-test

In research In research
F-test Do not run test, use Levene’s test In research
ANOVA In research Largest standard deviation should be less than twice smallest standard deviation

This is the start of the research and joins the growing body of knowledge I am building in the subject. What it does is provoke the next set of questions, things like what is the risk of using the results, how does sample size impact on confidence intervals in these situations, what other situations should be considered e.g. outliers & different distributions and when should non-parametric tests be used.

Handpicked Content:   Seeing the wood from the trees

Anyone out there with the answers?

Comments 13

  1. Robert Butler

    You should note that ALL of the instructions concerning rules about statistical methods you receive in BB training are, of necessity, conservative boiler plate. They set artificial limits so that you, as a novice user of statistical methods, won’t make some egregious mistake.

    To your table – look up two sample t-test with unequal sample sizes and unequal variances. As for the issues surrounding non-normality look up message ID 110215 over in the discussion forum.

  2. Ron

    Great questions and post Robin.

    Here is my take on it. If we go back to the basics… like why we generally prefer to use the median over the mean with non normal data I always come back to the "house price" example.

    In most American subdivisions we come across a few very expensive homes… these homes can skew the "mean home price" to the right while the "median home price" is a little more left (and more realistic).

    This is at the heart, in my opinion, of why we need to be careful when testing non normal data. The "expensive" homes can skew the results moving us away from reality.

    Another option is this… don’t worry too much about it and do both a parametric AND non parametric test when faced with non normal data. It is just a few more clicks in Minitab and you can share the results of both tests proving how diligent you are! I find in most cases the results are the same anyhow.

    Of course you wont read this in some stuffy stats book… but this less dogmatic approach has served me well.

    Plus, ask 4 different statisticians this question and you will likely hear 4 different things. And since many of them never leave the classroom (or hotel conference room where they teach others) let’s go to the gemba and learn for ourselves. That’s my take on it at least.

    Perhaps I will blog about this in the future.

  3. Robin Barnwell

    Thanks for the feedback, couldn’t figure out how the access the message ID. It seemed too far back to access? Could you send link please?

    Good example, problem I have is translating medians back to business-speak. Customers tend to work with averages and their eyes start to shut at the mention of medians. But I focus on the outliers and talk about how eliminating these will have the biggest impact on improving the average.

    Agree with the do-all-tests approach. I like to be absolutely sure (95% confidence?) on results because of the danger of making type I & II errors. Calling something and being wrong is embarrassing!

    What I am doing is pushing the academic side and have already got a few wins. For example found an immediate application for the Spearman Ranking Coefficient.

    Please publish the link to the blog when you write it.

  4. Robert Butler

    This blog site won’t take links for some reason. I’ve just copy pasted the message from my original post.

    The best short discussion of this question of which I’m aware can be found on pages 48-56 of The Design and Analysis of Industrial Experiments by Davies-2nd Edition.

    The text is too long to quote here but the short version is this – if we use skewness and kurtosis to define what we mean by departures from normality (remembering that an ideal normal distribution has skewness and kurtosis = 0) and if we ask the question of how much these departures impact our level of significance (that is we wish to test for P < .05 but what are we really testing when the data isn’t normal) then Table 2A.3 on pp. 56 gives the following:

    For an Analysis of Variance of 5 groups of 5 observations each the true percentage probabilities associated with 5% normal theory significance are


    Kurtosis 0 1 2

    -1.5 5.36

    0 5.00 5.1 5.2

    2 4.52 4.62 4.72

    The bolded 5 is the P < .05.

    So, worst case (that is worst case in terms of overall shape) with a 2,2 instead of a 0,0 for skewness and kurtosis your P < .05 would actually be a P < .0472. In short, even extreme non-Normality has little se

  5. Robert Butler

    has little serious effect on the probability levels. Which means that, for ANOVA (and the t-test which is also discussed in the above citation), non-normality isn’t much of an issue.

  6. Robert Butler

    Hmmm the table doesn’t copy worth a darn.

    Kurtosis 0 1 2

    -1.5 5.36
    0 5.00 5.1 5.2
    2 4.52 4.62 4.72

    I tried retyping it above. If this doesn’t work go over to the discussion forum and under search discussion forum type in "robert butler non normal" and sort by date. The message is #6 and is titled Re: ANOVA. I’m sorry for cluttering up your comments section like this.

  7. Robin Barnwell

    Thanks Robert, the link looks like it has been put in place (thanks Michael?). Good debate, will go through in more depth over the week.
    Regards Robin

  8. Ed Lim

    Robin, I too was struggling with the normality and equal variances problem in my quest to use ANOVA. I found this article very helpful for analyzing my data. Give it a read and see if it helps:
    Handling Non-Normal Data By Shree Padnis

    It helped me understand my data more. You may want to set a lower spec limit or upper spec limit. In practical terms, my data can’t be above or below this. It’s a good way to eliminate outliers which may have been wrong from the get go.

    The article also gives you the options of transformations which will make your data normal. If you use Minitab, the Individual Distribution Identification is great!

    Good luck!

  9. Robin Barnwell

    Hi Ed

    Thanks for the link, good article.

    In the end I have written a small training course for our Black Belts that I presented at our lunch and learn sessions. This has helped them understand and deal with non-normal data in an environment where non-normality is the norm.


  10. Learner

    Dear Robin,

    I too have been struggling in application of the statistical tools to the analysis of data using the six sigma rigour.

    I would appreciate any help in getting good references in the practical approach to data analysis using Statistical Analysis.

    I also do not wish to be a bookworm and just reading books and gaining no experience. I have undergone six sigma trainings but have yet not been able to use the tools appropriately.

    Any help would be highly appreciated to clear my ambiguity.


  11. Robin Barnwell

    Hello Learner

    Happy to help, will make contact off-line to better understand the issues.


  12. William Reith


    Did you ever finish the table you were working on? I have a problem that deals with unequal variances and skewed data. I am limited by the software we are using so a non parametric test is not an option and I need to know if ANOVA is still ok to use.

    Really hope you get this,


  13. MBBinWI

    I know this thread is a bit stale, but I’m in a bit of a quandry. I don’t have my references handy and need to refresh on the sensitivity of Levene’s Test for equal variances to differences in sample size. The sample with more data points (and larger variation) is a bit more than 5 times the size as the other (with less variation). 181 vs. 33 samples.

Leave a Reply