The distribution of data can be categorized in two ways: normal and non-normal. If data is normally distributed, it can be expected to follow a certain pattern in which the data tend to be around a central value with no bias left or right (Figure 1). Non-normal data, on the other hand, does not tend toward a central value. It can be skewed left or right or follow no particular pattern.
Non-normal data sounds more dire than it may be. The distribution becomes an issue only when practitioners reach a point in a project where they want to use a statistical tool that requires normally distributed data and they do not have it.
Non-normality is the result of either:
To move forward with analysis, the cause of the non-normality should be identified and addressed. For example, in the case of the website load time data in Figure 2, once the data was stratified by weekends versus working days, the result was two sets of normally distributed data (Figure 4). Each set of data can then be analyzed using statistical tools for normal data.
If the data follows an alternative distribution (see table below for common non-normal distribution types), transforming the data will allow practitioners to still take advantage of the statistical analysis options that are available to normal data. The best method for transforming non-normal data depends upon the particular situation, and it is unfortunately not always clear which method will work best. A common transformation technique is the Box-Cox.
|Common Non-normal Distribution Types|
|Lognormal||Continuous||Cycle or lead time data|
|Weibull||Continuous||Mean time-to-failure data, time to repair and material strength|
|Exponential||Continuous||Constant failure rate conditions of products|
|Poisson||Discrete||Number of events in a specific time period (defect counts per interval such as arrivals, failures or defects)|
|Binomial||Discrete||Proportion or number of defectives|
Another option is to use tools that do not require normally distributed data. Testing for statistical significance can be done with nonparametric tests such as the Mann-Whitney test, Mood’s median test and the Kruskal-Wallis test.
To learn more about non-normal data and related topics, refer to the following articles and discussions on iSixSigma.com:
Non-normal data is a typical subject in Green Belt training. To learn more about non-normal data and hypothesis testing, purchase the Six Sigma Green Belt Training Course available at the iSixSigma Marketplace.