The distribution of data can be categorized in two ways: normal and non-normal. If data is normally distributed, it can be expected to follow a certain pattern in which the data tend to be around a central value with no bias left or right (Figure 1). Non-normal data, on the other hand, does not tend toward a central value. It can be skewed left or right or follow no particular pattern.

Non-normal data sounds more dire than it may be. The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.

Non-normality is the result of either:

- Data that contains “pollution,” such as outliers, the overlap of two or more processes (Figure 2), the result of inaccurate measures, etc.
- Data that follows an alternative distribution, such as cycle time data, which has a natural limit of zero (Figure 3).

To move forward with analysis, the cause of the non-normality should be identified and addressed. For example, in the case of the website load time data in Figure 2, once the data was stratified by weekends versus working days, the result was two sets of normally distributed data (Figure 4). Each set of data can then be analyzed using statistical tools for normal data.

If the data follows an alternative distribution (see table below for common non-normal distribution types), transforming the data will allow practitioners to still take advantage of the statistical analysis options that are available to normal data. The best method for transforming non-normal data depends upon the particular situation, and it is unfortunately not always clear which method will work best. A common transformation technique is the Box-Cox.

Common Non-normal Distribution Types |
||

Distribution |
Type Data |
Examples |

Lognormal | Continuous | Cycle or lead time data |

Weibull | Continuous | Mean time-to-failure data, time to repair and material strength |

Exponential | Continuous | Constant failure rate conditions of products |

Poisson | Discrete | Number of events in a specific time period (defect counts per interval such as arrivals, failures or defects) |

Binomial | Discrete | Proportion or number of defectives |

Another option is to use tools that do not require normally distributed data. Testing for statistical significance can be done with nonparametric tests such as the Mann-Whitney test, Mood’s median test and the Kruskal-Wallis test.

To learn more about non-normal data and related topics, refer to the following articles and discussions on iSixSigma.com:

- Are You Sure Your Data Is Normal?
- Dealing with Non-normal Data: Strategies and Tools
- Non-normal Data Needs Alternate Control Chart Approach
- Process Capability Calculations with Non-Normal Data
- Tips for Recognizing and Transforming Non-normal Data
- Making Data Normal Using Box-Cox Power Transformation
- Trying to Make Sense of Non-normal Data
- Non-normal Data: Transform?
- Non-normal Data: Use Control Charts?
- Non-normal Data on Control Charts – Transformation Versus Percentile Methods
- Non-parametric Data – Which Factor Has the Greatest Influence?
- The Cox-Box: Data Transformation

Non-normal data is a typical subject in Green Belt training. To learn more about non-normal data and hypothesis testing, purchase the Six Sigma Green Belt Training Course available at the iSixSigma Store.

Very good articles, easy description & contains good articles.

This is a great short-and-sweet primer that nicely frames the issue of how to handle non-normal data. In this context, the article would fit nicely as “teaser” for a larger introduction.

To the credit of the author, this article focuses on several carefully selected examples that are “clean and simple.” The examples are textbook-like problems and solutions, both of which are devoid of the ambiguous circumstances that usually accompanies reality. Of course, this is where most such problems exist — in the grey zone of reality. When a typical practitioner parachutes in this zone, the advice is simple — get a subject-matter-expert to assist with the problem.

Achieving the ultimate aims of this article can be quite difficult, even for the experienced practitioner. For example, one would likely want to consider the “robustness” of the statistic being used to analyze the non-normal data. This means giving full analytical consideration to the Type I error stability, as well as that of Type II errors, not to mention the mitigating effects related to degrees-of-freedom and delta sigma. If the statistic of choice proves to be reasonably robust, there is no need to transform the data. On the other hand, if the statistic of choice is not robust, then what? Herein lies the “grey zone” of the “circumstantially complex.”

Far too often, practitioners attempt to transform non-normal data into a state of normality prior to conducting some type or form of analysis that is theoretically dependent on an underlying distribution that is normal. However, there are other types of distributions that can be used as the target of transformation, like the uniform or triangular distribution, just to mention a couple.

To illustrate, consider figure 2 in the article. This graphic clearly displays the case of when the data are actually associated with two different categories, but for whatever reason, have been inappropriately combined (i.e., blended into a single distribution that appears to be non-normal. For the case provided in this article, the solution is quite simple and straight forward — just sort the data on the categorical variable and then analyze each distribution separately. However, many times such bimodal looking distributions are entirely natural, like the family of extreme value distributions (that are known to naturally exist). The proper use of extreme value distributions is an entirely different manner, yet related to the topic at hand.

In summary, its one thing to form a panoramic view of the mountain tops, but quite another to farm the fertile soil at the base of that mountain, so to speak. The panoramic view is for sightseers, but the growing of produce is the job of an experienced farmer. Just because you might be able to use a set of binoculars does not mean you can drive a tractor.

The article failed to mention the consequences of using methods such as those of Mann-Whitney and Kruskall-Wallis. Generally any non-parametic methods use assumptions that are just as unlikely and have a loss of power.

As Dr. Harry has suggested it can be very complex. However, I would add that outliers can be very valuable when we consider what has pushed them to the extremities of our data set.

What rubbish. Read “Normality and the Process Behaviouir Chart” – Wheeler.

I’m sorry but the statement “data contains pollution” is a reason for nonnormality is absurd. Is it absurd to have a poor MSA or two overlapping distributions or outliers with special causes?

Solve the causes for the various peaks and you’ll improve the process. Don’t bother using Box Cox…..A customer won’t say “Hey, I’ll just Box Cox your incoming process results” and I’ll feel better. You still have the same % defective in the process with a proper transformation.

I agree with Dr. MH’s advise to not just transform for the sake of transformation.

While the two reasons (although I would have preferred some other word than pollution) for nonnormality are accurate, few people realize that the same two reasons occur FOR normality.

Did you know that a distribution may appear normal because it is a combination of two overlapping distributions? This occurs, for example, when two normal distributions with approximately the same standard deviation have means that are about one standard deviation different. If you have a small enough sample size, you may not recognize that there are actually two distributions.

The author raises a critical issue: The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it. This is backward.

Why are practitioners wanting to use a particular tool rather than wanting to answer a specific question? Then, determine the best tool that would provide the answer. Unfortunately, too many pracititoners in my experience are tool-oriented rather than on purpose, deliverable, question oriented.

I would suggest not transforming to get a normal distribution.

The problem with nonlinear transformations (e.g., Box-Cox) is that the characteristics of the distribution change. So, what is true of the transformed data may not be true of the original. And, supposedly, it is the original in which you have an interest in describing or analyzing.

I don’t totally agree with your opening comments that non normal data doesn’t tend towards a central value. Of course it does. You can calculate means, medians and modes of non normal continuous data which are basic descriptors of central tendency. But, depending on the degree of non normality, the central tendency will not be symmetrically located which is, I believe, the point you were trying to make.

As has been pointed out, while this is just an oversimplistic article, there are some statements that aren’t quite spot on. First, we cannot prove that data is normal, we can only state it is statistically different or not different than normal. That is the value of the p value and testing for normality. A small point in interpretation but never the less important. I also agree that the first step is NOT to transform but to first understand the distribution and why it might be non normal or use the less powerful but useful non parametrics. Transformation should be the last step since it involves an output that is difficult to understand due to the change in the values of the original data.

As Dr. Burns suggests, a quick read of the Wheeler booklet on the subject explains his position and by association that of Shewhart’s regarding normality as it applies to control charts. But normality is also a consideration in doing Process Capability or certain Hypothesis Tests. Good news, is, as Dr. Harry points out, the normality assumption is often robust to departures from normal.

Finally, although it is probably way outside your intent in this primer, you tease us with some discrete distributions yet fail to mention the normal approximation to the Binomial/Poisson and the fact that large counts may also be treated as continuous data with certain caveats.

Thanks for taking a shot at a tough and oft misunderstood topic.

Super primer!

Minitab’s Quality Trainer is a great resource to learn more about this topic, and other statistical topics that are difficult to understand/interpret. Great examples and walk thru lesson to help you go thru a lesson/exercise yourself after you have learned the theory.