The distribution of data can be categorized in two ways: normal and non-normal. If data is normally distributed, it can be expected to follow a certain pattern in which the data tend to be around a central value with no bias left or right (Figure 1). Non-normal data, on the other hand, does not tend toward a central value. It can be skewed left or right or follow no particular pattern.

Figure 1: Normally Distributed Data
Figure 1: Normally Distributed Data

Non-normal data sounds more dire than it may be. The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.

Non-normality is the result of either:

  1. Data that contains “pollution,” such as outliers, the overlap of two or more processes (Figure 2), the result of inaccurate measures, etc.
  2. Data that follows an alternative distribution, such as cycle time data, which has a natural limit of zero (Figure 3).
Figure 2: Website Load Data Time
Figure 2: Website Load Data Time
Figure 3: Cycle Time Data
Figure 3: Cycle Time Data

To move forward with analysis, the cause of the non-normality should be identified and addressed. For example, in the case of the website load time data in Figure 2, once the data was stratified by weekends versus working days, the result was two sets of normally distributed data (Figure 4). Each set of data can then be analyzed using statistical tools for normal data.

Figure 4: Website Load Time Data After Stratification
Figure 4: Website Load Time Data After Stratification

If the data follows an alternative distribution (see table below for common non-normal distribution types), transforming the data will allow practitioners to still take advantage of the statistical analysis options that are available to normal data. The best method for transforming non-normal data depends upon the particular situation, and it is unfortunately not always clear which method will work best. A common transformation technique is the Box-Cox.

Common Non-normal Distribution Types
Distribution Type Data Examples
Lognormal Continuous Cycle or lead time data
Weibull Continuous Mean time-to-failure data, time to repair and material strength
Exponential Continuous Constant failure rate conditions of products
Poisson Discrete Number of events in a specific time period (defect counts per interval such as arrivals, failures or defects)
Binomial Discrete Proportion or number of defectives

Another option is to use tools that do not require normally distributed data. Testing for statistical significance can be done with nonparametric tests such as the Mann-Whitney test, Mood’s median test and the Kruskal-Wallis test.

To learn more about non-normal data and related topics, refer to the following articles and discussions on iSixSigma.com:

Non-normal data is a typical subject in Green Belt training. To learn more about non-normal data and hypothesis testing, purchase the Six Sigma Green Belt Training Course available at the iSixSigma Store.

About the Author