J. DeLayne Stroud
February 26, 20102
Many consultants remember the hypothesis testing roadmap, which was a great template for deciding what type of test to perform. However, think about the type of data one gets. What if there is only summarized data? How can that data be used to make conclusions? Having the raw data is the best case scenario, but if it is not available, there are still tests that can be performed.
In order to not only look at data, but also interpret it, consultants need to understand distributions. This article discusses how to:
Six Sigma Green Belts receive training focused on shape, center and spread. The concept of shape, however, is limited to just the normal distribution for continuous data. This article will expand upon the notion of shape, described by the distribution (for both the population and sample).
With probability, statements are made about the chances that certain outcomes will occur, based on an assumed model. With statistics, observed data is used to determine a model that describes this data. This model relates to the distribution of the data. Statistics moves from the sample to the population while probability moves from the population to the sample.
Inferential statistics is the science of describing population parameters based on sample data. Inferential statistics can be used to:
Inferential statistics are based on a normal distribution.
Figure 1: Normal Curve and Probability Areas

Normal curve distribution can be expanded on to learn about other distributions. The appropriate distribution can be assigned based on an understanding of the process being studied in conjunction with the type of data being collected and the dispersion or shape of the distribution. It can assist with determining the best analysis to perform.
Distributions are classified in the same ways as data is classified – continuous and discrete:
Probability mass function (pmf) - For discrete variables, the pmf is the probability that a variate takes the value x.
Probability density function (pdf) - For continuous variables, the pdf is the probability that a variate assumes the value x, expressed in terms of an integral between two points.
In the continuous sense, one cannot give a probability of a specific x on a continuum – it will be some specific (and small) range. For additional insight, think of x + Dx where Dx is small.
The notation for the pdf is f(x). For discrete distributions:
f(x) = P(X = x)
Some refer to this as the probability mass function, since it is evaluating the probability upon that one discrete mass. For continuous distributions, one mass cannot be established.
Cumulative density function (cdf) - The probability that a variable takes a value less than or equal to x.
Figure 2: Normal Distribution Cdf

Cdf progresses to a value of 1 because there cannot be a probability greater than 1. Once again, cdf is F(x) = P(X < x).This holds for both continuous and discrete.
Parameter is a population description. Consultants rely on parameters to characterize the distributions. There are three parameters:
Not all distributions have all the parameters. For example, the normal distribution parameters have just the mean and standard deviation. Just those two need to be known to describe a normal population.
The remaining portion of this article will summarize the various shapes, basic assumptions and uses of distributions. Keep in mind that there is a different pdf and different distribution parameters associated with each.
Figure 3: Normal Distribution Shape

Basic assumptions:
Uses include:
Figure 4:Exponential Distribution Shape

Basic assumptions:
Uses include probabilistic assessments of:
Figure 5: Lognormal Distribution Shape

Basic assumptions:
Asymmetrical and positively skewed distribution that is constrained by zero.
Uses include simulations of:
Figure 6: Weibull Distribution Pdf

Basic assumptions:
Uses include:
Figure 7: Binomial Distribution Shape

Basic assumptions:
Uses include:
Figure 8: Geometric Distribution Pdf

Basic assumptions:
Uses include:
Figure 9: Negative Binomial Distribution Pdf

Basic assumptions:
Uses include:
Figure 10: Poisson Distribution Pdf

Basic assumptions:
Uses include:
Shape is similar to Binomial/Poisson distribution.
Basic assumptions:
There are other distributions – for example, sampling distributions and X2, t and F distributions.
Distribution refers to the behavior of a process described by plotting the number of times a variable displays a specific value or range of values rather than by plotting the value itself. It is often said that a picture is worth a thousand words. Viewing data graphically will make a much greater impact to an audience. Becoming familiar with the various distributions can help consultants to better interpret their data.
|
|
© Copyright iSixSigma 2000-2013. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »
Comments
this information make me aware of the statistical measure of six sigma….and its also been useful to me for my project…
All the content is excellant, Thanx