The distribution of data can be categorized in two ways: normal and nonnormal. If data is normally distributed, it can be expected to follow a certain pattern in which the data tend to be around a central value with no bias left or right (Figure 1). Nonnormal data, on the other hand, does not tend toward a central value. It can be skewed left or right or follow no particular pattern.
Nonnormal data sounds more dire than it may be. The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.
Nonnormality is the result of either:
To move forward with analysis, the cause of the nonnormality should be identified and addressed. For example, in the case of the website load time data in Figure 2, once the data was stratified by weekends versus working days, the result was two sets of normally distributed data (Figure 4). Each set of data can then be analyzed using statistical tools for normal data.
If the data follows an alternative distribution (see table below for common nonnormal distribution types), transforming the data will allow practitioners to still take advantage of the statistical analysis options that are available to normal data. The best method for transforming nonnormal data depends upon the particular situation, and it is unfortunately not always clear which method will work best. A common transformation technique is the BoxCox.
Common Nonnormal Distribution Types  
Distribution  Type Data  Examples 
Lognormal  Continuous  Cycle or lead time data 
Weibull  Continuous  Mean timetofailure data, time to repair and material strength 
Exponential  Continuous  Constant failure rate conditions of products 
Poisson  Discrete  Number of events in a specific time period (defect counts per interval such as arrivals, failures or defects) 
Binomial  Discrete  Proportion or number of defectives 
Another option is to use tools that do not require normally distributed data. Testing for statistical significance can be done with nonparametric tests such as the MannWhitney test, Mood’s median test and the KruskalWallis test.
To learn more about nonnormal data and related topics, refer to the following articles and discussions on iSixSigma.com:
Nonnormal data is a typical subject in Green Belt training. To learn more about nonnormal data and hypothesis testing, purchase the Six Sigma Green Belt Training Course available at the iSixSigma Marketplace.


© Copyright iSixSigma 20002014. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »
Comments
Very good articles, easy description & contains good articles.
This is a great shortandsweet primer that nicely frames the issue of how to handle nonnormal data. In this context, the article would fit nicely as “teaser” for a larger introduction.
To the credit of the author, this article focuses on several carefully selected examples that are “clean and simple.” The examples are textbooklike problems and solutions, both of which are devoid of the ambiguous circumstances that usually accompanies reality. Of course, this is where most such problems exist — in the grey zone of reality. When a typical practitioner parachutes in this zone, the advice is simple — get a subjectmatterexpert to assist with the problem.
Achieving the ultimate aims of this article can be quite difficult, even for the experienced practitioner. For example, one would likely want to consider the “robustness” of the statistic being used to analyze the nonnormal data. This means giving full analytical consideration to the Type I error stability, as well as that of Type II errors, not to mention the mitigating effects related to degreesoffreedom and delta sigma. If the statistic of choice proves to be reasonably robust, there is no need to transform the data. On the other hand, if the statistic of choice is not robust, then what? Herein lies the “grey zone” of the “circumstantially complex.”
Far too often, practitioners attempt to transform nonnormal data into a state of normality prior to conducting some type or form of analysis that is theoretically dependent on an underlying distribution that is normal. However, there are other types of distributions that can be used as the target of transformation, like the uniform or triangular distribution, just to mention a couple.
To illustrate, consider figure 2 in the article. This graphic clearly displays the case of when the data are actually associated with two different categories, but for whatever reason, have been inappropriately combined (i.e., blended into a single distribution that appears to be nonnormal. For the case provided in this article, the solution is quite simple and straight forward — just sort the data on the categorical variable and then analyze each distribution separately. However, many times such bimodal looking distributions are entirely natural, like the family of extreme value distributions (that are known to naturally exist). The proper use of extreme value distributions is an entirely different manner, yet related to the topic at hand.
In summary, its one thing to form a panoramic view of the mountain tops, but quite another to farm the fertile soil at the base of that mountain, so to speak. The panoramic view is for sightseers, but the growing of produce is the job of an experienced farmer. Just because you might be able to use a set of binoculars does not mean you can drive a tractor.
The article failed to mention the consequences of using methods such as those of MannWhitney and KruskallWallis. Generally any nonparametic methods use assumptions that are just as unlikely and have a loss of power.
As Dr. Harry has suggested it can be very complex. However, I would add that outliers can be very valuable when we consider what has pushed them to the extremities of our data set.
What rubbish. Read “Normality and the Process Behaviouir Chart” – Wheeler.
I’m sorry but the statement “data contains pollution” is a reason for nonnormality is absurd. Is it absurd to have a poor MSA or two overlapping distributions or outliers with special causes?
Solve the causes for the various peaks and you’ll improve the process. Don’t bother using Box Cox…..A customer won’t say “Hey, I’ll just Box Cox your incoming process results” and I’ll feel better. You still have the same % defective in the process with a proper transformation.
I agree with Dr. MH’s advise to not just transform for the sake of transformation.
Nice!
While the two reasons (although I would have preferred some other word than “pollution”) for nonnormality are accurate, few people realize that the same two reasons occur FOR normality.
Did you know that a distribution may appear normal because it is a combination of two overlapping distributions? This occurs, for example, when two normal distributions with approximately the same standard deviation have means that are about one standard deviation different. If you have a small enough sample size, you may not recognize that there are actually two distributions.
The author raises a critical issue: “The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.” This is backward.
Why are practitioners wanting to use a particular tool rather than wanting to answer a specific question? Then, determine the best tool that would provide the answer. Unfortunately, too many pracititoners in my experience are tooloriented rather than on purpose, deliverable, question oriented.
I would suggest not transforming to get a normal distribution.
The problem with nonlinear transformations (e.g., BoxCox) is that the characteristics of the distribution change. So, what is true of the transformed data may not be true of the original. And, supposedly, it is the original in which you have an interest in describing or analyzing.
I don’t totally agree with your opening comments that non normal data doesn’t tend towards a central value. Of course it does. You can calculate means, medians and modes of non normal continuous data which are basic descriptors of central tendency. But, depending on the degree of non normality, the central tendency will not be symmetrically located which is, I believe, the point you were trying to make.
As has been pointed out, while this is just an oversimplistic article, there are some statements that aren’t quite spot on. First, we cannot prove that data is normal, we can only state it is statistically different or not different than normal. That is the value of the p value and testing for normality. A small point in interpretation but never the less important. I also agree that the first step is NOT to transform but to first understand the distribution and why it might be non normal or use the less powerful but useful non parametrics. Transformation should be the last step since it involves an output that is difficult to understand due to the change in the values of the original data.
As Dr. Burns suggests, a quick read of the Wheeler booklet on the subject explains his position and by association that of Shewhart’s regarding normality as it applies to control charts. But normality is also a consideration in doing Process Capability or certain Hypothesis Tests. Good news, is, as Dr. Harry points out, the normality assumption is often robust to departures from normal.
Finally, although it is probably way outside your intent in this primer, you tease us with some discrete distributions yet fail to mention the normal approximation to the Binomial/Poisson and the fact that large counts may also be treated as continuous data with certain caveats.
Thanks for taking a shot at a tough and oft misunderstood topic.
@Darth. Good points about measures of central tendency for normal and nonnormal distributions, robustness of the normality assumptions for some tests, and the normal approximations to various discrete distributions.
To elaborate on two of your comments (to make sure they are more “spot on”):
“ First, we cannot prove that data is normal .” True—simply because we can prove that actual data IS never normal as the normal distribution contains on infinite number of values and actual datasets are finite. Testing for normality is not testing whether the data are normally distributed but whether they represent a sample from a (theoretical or hypothetical) normal distribution. That is a key distinction. Another way to look at distributional tests is to recognize that distributions (normal and the others listed) are models. So, we test whether the data have sufficiently similar characteristics to the model to use the model to make inferences.
“ use the less powerful but useful non parametrics.” I have always wondered why this misleading (false, in many cases) statement got into Six Sigma. Nonparametric tests are more powerful than parametric tests for many distributions and many (actually infinite) situations. Proofs can be found in peerreviewed journals (e.g., Journal of the American Statistical Association) going back more than half a century.
Kicab,
I agree on the first of your further comments. I try to emphasize the Null and Alternate hypotheses to state that “the data is not different from normal” or “data is different from normal” rather than “data is normal” or “data is not normal”.
As to your second follow up comment, nonparametrics are “less efficient” or have less power than the parametric test…for the same sample size…..if the data does actually adhere to the assumed parametric distribution. Are you in agreement with this statement?
@Darth. Not completely.
First, a parametric test may not be testing the same thing as a nonparametric test (e.g., means vs. medians). So, the means may be different while the medians are not or viceversa. Thus, if one test showed a difference in means and the other didn’t show a difference in medians, doesn’t show/prove that the former is more powerful.
Second, the factors that determine sample size are alpha, beta, delta=difference to be detected, sigma=population standard deviation. If the first three are determined and the same for both types of tests, then the only difference is the standard deviation (SD). For the statement to be true, the SD for parametric values must be smaller than for nonparametric. This is not always the case.
Third,does the statement claim that parametric tests are always more powerful than nonparametric (given the conditions you stated)? If not, then why make the general claim without specifying the conditions when it’s true and when it’s false?
Fourth, the published articles show that for some distributions nonparametric tests have greater efficiency (power) than parametric tests.
Kicab,
I guess we won’t be resolving this issue this year. So, have a Happy Healthy New Year and we will speak again in a year.
Super primer!
Minitab’s Quality Trainer is a great resource to learn more about this topic, and other statistical topics that are difficult to understand/interpret. Great examples and walk thru lesson to help you go thru a lesson/exercise yourself after you have learned the theory.