Six Sigma Quality Resources for Achieving Six Sigma Results
Click To Learn More About PremiumLinks
 Home > Statistics  > Normality Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
Nominations for iSixSigma Awards! close November 30 – nominate your project/program today!
iSixSigma Magazine Signup
 iSixSigma Live!  
  Live! Home
  2010 Summit & Awards
  2010 Energy Forum
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  Europe
  Financial Services
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Best Practices
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
   Normality
   Variation
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Tips for Recognizing and Transforming Non-normal Data

Bookmark This Page Bookmark This Page
Email This Page Email This Page
Format for Printing Format for Printing
Cite This Article Cite This Article
Submit an Article Submit an Article
Six Sigma Article Archive Read More Articles
Related Tools & Articles
  • Discussion Forum
    "How does one explain non-normal distributions in manufacturing processes? Shouldn't we expect distributions to be normal?"

    Contribute to this Discussion
    Download Products

    By Peter J. Sherman

    Six Sigma professionals should be familiar with normally distributed processes: the characteristic bell-shaped curve that is symmetrical about the mean, with tails approaching plus and minus infinity (Figure 1).

    Figure 1: Normally Distributed Data 

    When data fits a normal distribution, practitioners can make statements about the population using common analytical techniques, including control charts and capability indices (such as sigma level, Cp, Cpk, defects per million opportunities and so on).

    But what happens when a business process is not normally distributed? How do practitioners know the data is not normal? How should this type of data be treated? Practitioners can benefit from an overview of normal and non-normal distributions, as well as familiarizing themselves with some simple tools to detect non-normality and techniques to accurately determine whether a process is in control and capable.

    Spotting Non-normal Data

    There are some common ways to identify non-normal data:

    1. The histogram does not look bell shaped. Instead, it is skewed positively or negatively (Figure 2).

    Figure 2: Positively and Negatively Skewed Data

        2.  A natural process limit exists. Zero is often the natural process limit when describing cycle times and lead times. For example, when a restaurant promises to deliver a pizza in 30 minutes or less, zero minutes is the natural lower limit.
        3.  A time series plot shows large shifts in data.
        4.  There is known seasonal process data.
        5.  Process data fluctuates (i.e., product mix changes).

    Transactional processes and most metrics that involve time measurements exist with non-normal distributions. Some examples:

    • Mean time to repair HVAC equipment
    • Admissions cycle time for college applicants
    • Days sales outstanding
    • Waiting times at a bank or physician's office
    • Time being treated in a hospital emergency room

    Example: Time in a Hospital Emergency Room

    A sample hospital's target time for processing, diagnosing and treating patients entering the ER is four hours or less. Historical data is shown in Figure 3.

    Figure 3: Time Spent in ER

    An Individuals chart shows several data points outside of the upper control limits (Figure 4). Based on control chart rules, these special causes indicate the process is not in control (i.e., not stable or predictable). But is this the correct conclusion?

    Figure 4: Individuals Chart of Time Spent in ER

    There are a couple of ways to tell the data may not be normal. First, the histogram is skewed to the right (positively). Second, the control chart shows the lower control limit is less than the natural limit of zero. Third, notice the number of high points and no real low points. These tell-tale signs indicate the data may not be normally distributed enough for an individuals control chart. When control charts are used with non-normal data, they can give false special-cause signals. Therefore, the data must be transformed to follow the normal distribution. Once this is done, standard control chart calculations can be used on the transformed data.

    A Closer Look at Non-normal Data

    There are two types of non-normal data:

    • Type A: Data that exists in another distribution
    • Type B: Data that contains a mixture of multiple distributions or processes

    Type A data - One way to properly analyze the data is identify it with the appropriate distribution (i.e., lognormal, Weibull, exponential and so on). Some common distributions, data types and examples associated with these distributions are in Table 1.

    Table 1: Distribution Types
    DistributionType DataExamples
    NormalContinuousUseful when it is equally likely the readings will fall above or below the average
    LognormalContinuousCycle or lead time data
    WeibullContinuousMean time-to-failure data, time to repair and material strength 
    ExponentialContinuousConstant failure rate conditions of products
    PoissonDiscreteNumber of events in a specific time period (defect counts per interval such as arrivals, failures or defects)
    BinomialDiscreteProportion or number of defectives

    A second way is to transform the data so that it follows the normal distribution. A common transformation technique is the Box-Cox. The Box-Cox is a power transformation because the data is transformed by raising the original measurements to a power lambda (l). Some common lambda values, the transformation equation and resulting transformed value assuming Y = 4 are in Table 2.

    Table 2: Lambda Values and Their Transformation Equations and Values
    Lambda (l)Transformation EquationTransformed Value
    -21/Y21/42 = 0.0625
    -0.51/((sq.rt)Y)1/((sq.rt)4) = 0.5
    -1.01/Y1/4 = 0.25
    0.0Lognormal (ln)The logarithm having base e, where e is the constant equal to approximately 2.71828. The natural log of any positive number, n, is the exponent, x, to which e must be raised so that ex = n. For example, 2.71828x = 4, so the natural log of 4 is 1.3863.
    0.5

    (sq.rt)Y

    (sq.rt)4 = 2

    1.0Y4
    2.0Y242 = 16

    Type B data - If none of the distributions or transformations fit, the non-normal data may be "pollution" caused by a mixture of multiple distributions or processes. Examples of this type of pollution include complex work activities; multiple shifts, locations, or customers; and seasonality. Practitioners can try stratifying or breaking down the data into categories to make sense of it. For example, the cycle time required for attorneys to complete contract documents is generally not normally distributed. Nor does it have a lognormal distribution. Stratifying the data can make some contract documents, such as residential real estate closings, much simpler to research, draft and execute than more complex contract documents. Hence, the complex contracts represent all the longer times, while the simpler contracts have shorter times. Another approach is to convert all the process data into a common denominator, such as contract draft time per page. After, all the data can be recombined and tested for a single distribution.

    Revisiting the Hospital Example

    Because the hospital ER data is non-normal, it can be transformed using the Box-Cox technique and statistical analysis software. The optimum lambda value of 0.5 minimizes the standard deviation (Figure 5).

    Figure 5: Box-Cox Plot of Time Spent in ER

    Notice that the histogram of the transformed data (Figure 6) is much more normalized (bell-shaped, symmetrical) than the histogram in Figure 3.

    Figure 6: ER Time Data after Transformation

    An alternative to transforming the data is to find a non-normal distribution that does fit the data. Figure 7 shows probability plots for the ER waiting time using the normal, lognormal, exponential and Weibull distributions.

    Figure 7: Various Distributions of Time in ER Data

    Statistical software calculated the x- and y-axis of each probability plot so the data points would follow the blue, perfect-model line if that distribution was a good fit of the data. Looking at the various distributions, the exponential distribution appears to be a poor model for hospital ER times. In contrast, data points in the lognormal and Weibull probability plots follow the model line well. But which one is the better distribution?

    The Anderson-Darling Normality test can be used as an indicator of goodness-of-fit. It produces a p-value, which is a probability that is compared to the decision criteria, alpha (a) risk. Assume a = 0.05, meaning there is a 5 percent risk of rejecting the null when it is true. The hypothesis test for this example is:

    Null (H0) = The data is normally distributed

    Alternate (H1) = The data is not normally distributed

    If the p-value is equal to or less than alpha, there is evidence that the data does not follow a normal distribution. Conversely, a p-value greater than alpha suggests the data is normally distributed.

    The p-value for the lognormal distribution is 0.058 while the p-value for the Weibull distribution is 0.162. While both are above the 0.05 alpha risk, the Weibull distribution is the better distribution because there is a 16.2 percent chance of being wrong when rejecting the null.

    Now the Weibull distribution can be used to construct the proper individuals control chart (Figure 8). Notice all of the data points are within the control limits; hence, it is stable and predictable.

    Figure 8: Individuals Control Chart Using Weibull Distribution

    Now that the process is in control, it can be assessed using indices such as Cpk (Figure 9). Overall, this is a predictable process with 8.85 percent of ER visit time out of specification.

    Figure 9: Process Capability of Time in ER 

    A similar assessment can be made with a probability plot, which shows this is a predictable process and that 91 percent of the ER waiting times are within four hours. Put another way, only 9 percent of the patients will take longer than the four-hour target to be processed, diagnosed and treated in the hospital ER. This is an explanation that management can readily understand.

    Figure 10: Probability Plot of Time Spent in ER

    Better Knowledge, Better Decisions

    Non-normal data may be more common in business processes than many people think. When control charts are used with non-normal data, they can give false signals of special cause variation, leading to inaccurate conclusions and inappropriate business strategies. Given this reality, it is important to be able to identify the characteristics of non-normal data and know how to properly transform the data. In doing so, practitioners will make better decisions about their business and save time and resources in the process.

    About the Author: Peter J. Sherman is a certified Lean Six Sigma Master Black Belt and an ASQ-certified Quality Engineer with 22 years of experience, including serving as senior Black Belt for AT&T's Product Development Group. He has a master's degree in engineering from the Massachusetts Institute of Technology (MIT) and an MBA from Georgia State University. As a visiting scholar to Japan while at MIT, he worked with quality expert W. Edwards Deming. Sherman is the lead instructor at Emory University's Six Sigma Certificate Program in Atlanta, and is a member of the American Society for Quality and the International Society of Six Sigma Professionals. He can be reached at psherm1@bellsouth.net.

     
    Rate This Article:  Current Rating: 4.63
      Poor    Excellent     
              1    2    3     4    5
    Copyright � 2000-2009 iSixSigma – All Rights Reserved
    Reproduction Without Permission Is Strictly Prohibited – Copyright Requests


    Publish an Article: Do you have a Six Sigma tip, learning or case study?
    Share it with the largest community of Six Sigma professionals, and be recognized by your peers.
    It's a great way to promote your expertise and/or build your resume. Read more about submitting an article.




    "The Bottom Line" Links

    BEST SELLING PRODUCTS (iSixSigma Publications)
    1. Six Sigma Black Belt (DMAIC) Training Slides - 2009 Version!
      The 2009 Six Sigma Black Belt course includes over 40 more slides than the 2008 version. Contents include: 1,220 PowerPo...
    2. Certified Lean Six Sigma Black Belt Assessment Exam
      Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
    3. Certified Lean Six Sigma Green Belt Assessment Exam
      This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
    4. Certified Lean Six Sigma Black Belt E-book
      In 670 pages learn everything within the Lean Six Sigma DMAIC body of knowledge to successfully achieve Black Belt certi...
    5. Kaizen Workshop E-book
      This 150+ page ebook teaches key tools and techniques of Kaizen, as well as real application to enhance learning. Kaizen...
    6. Six Sigma Yellow Belt Training Slides - 2009 Version
      The 2009 Six Sigma Yellow Belt course is comprised of: 503 slidesInstructor notesSlide explanations15 data sets19 suppo...
    7. Design For Six Sigma (DFSS) E-Book or Print
      Need an "encyclopedia" consisting of many of the tools you’ll study? Need a helpful refresher to apply the DFSS process?...
     
    Six Sigma AdLinks
    AdLinks Information


    Google AdWords
     
    Home | Discussion Forum | Event Calendar | Job Shop
    Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
     Terms of Service. �2000-2009 iSixSigma. All rights reserved. v3.0lb, 0.1
    About iSixSigmaContact UsPrivacy PolicySite Map