Six Sigma Quality Resources for Achieving Six Sigma Results
Click To Learn More About PremiumLinks
 Home > Statistics  > Data / Sampling / Descriptive Statistics Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
Nominations for iSixSigma Awards! close November 30 – nominate your project/program today!
iSixSigma Magazine Signup
 iSixSigma Live!  
  Live! Home
  2010 Summit & Awards
  2010 Energy Forum
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  Europe
  Financial Services
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Best Practices
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
   Normality
   Variation
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Email This Page

Return To Previous Page
From: 
Please enter your email address above.
To: 
Please enter recipient's email address above.
Subject Line: 
Please enter a subject line for your email above.
Additional Notes or Comments: 

Page:  Interpreting Anomalies Correctly Can Help Avoid Waste

By Peter Sherman

People say that Six Sigma is sometimes like using a rocket ship engine in an automobile. The techniques and statistical software tools are so powerful they can lead to anomalies in the data or produce “bad” results. These include:

  • Histograms that do not appear normal
  • Scatter plot diagrams that do not fit a straight line
  • Control charts that appear to be in control, but are not
  • Control charts reflecting a process not in control, even though it is operating within limits
  • Control charts with wide and highly irregular control limits

Without the benefit of extensive project experience and strong statistical interpretive skills, a Green Belt or even Black Belt may improperly diagnose these results, leading to poor decisions. Poor decisions waste time, resources and capital – specifically through re-sampling data, re-analyzing data and improving processes not in control.

Fortunately, there are some reliable, quick and cost-effective techniques that Six Sigma professionals can use in these situations to help them more properly analyze the data to make better decisions.

Histogram Anomalies

A practitioner has collected some data that represents average call-handling time for a help desk (Table 1).

Table 1: Samples of Average Call-handling Times for Help Desk
Minute Measurements By Operator
SampleOperator 1Operator 2Operator 3Operator 4
1910105
24776
351197
4691212
57784
674511
76445
85894
921278
1071096
118884
1211897

Using a statistical software package, the practitioner produced the following histogram (Figure 1).

Figure 1: Average Call-handling Time for Help Desk

At first glance, the practitioner might conclude the data is not normally distributed. It does not have the classic bell-shaped, symmetrical curve. They may decide to collect more data or even perform another data sampling. But before going to such extremes practitioners should try calculating the measures of central tendency. In other words, determining how the data clusters or centralizes around particular values. There are three measures of central tendency:

  1. Mean – the sum of all the data values divided by the number of data values
  2. Median – the middle value when the data values are arranged in ascending or descending order
  3. Mode – the number that occurs most frequently in a data set

If these three values are equal or approximately the same, it is possible to conclude with a fairly high degree of confidence that the data is normally distributed. A quick calculation of the mean, median and mode in this case show they are 7.3, 7.0 and 7.0, respectively. Thus, the data is normally distributed.

Another simple but effective technique is to increase or decrease the number of classes in the histogram. In this example, reducing the class size from 6 to 5 produces the following normal histogram (Figure 2).

Figure 2: Normal Histogram

Sometimes practitioners will produce histograms that are skewed in either the positive or negative direction. Before jumping to conclusions and re-sampling the data, think about the type of process being measured. A positively skewed histogram could represent accounts receivables days outstanding or late deliveries, in which case a practitioner should expect the distribution to be one-sided with a tail to the right. Conversely, a negatively skewed histogram could reflect accounts payable days outstanding, where a one-sided distribution with a tail to the left could be expected.

Scatter Plot Anomalies

Simple linear regression is a great statistical tool to determine if there is a relationship between two variables. A scatter plot graphically shows this linear relationship through a straight line connecting the data points. The equation for that linear relationship can be determined as follows: Y = mx+ b, where Y is the dependent variable, m is the slope, b is the intercept and x is the independent variable.

If there is linear correlation (i.e., a straight line) it is possible to make predictions about Y given x. Correlation (r) ranges from +1 (perfect direct or positive relationship) to -1 (perfect indirect or negative relationship). Further, R2 (coefficient of determination) is the percent of variation that can be explained by the regression equation. A higher R2 value means that x is a better predictor variable of Y, and high correlation indicates a strong relationship. Taking the square root of R2 produces r.

Sometimes, however, the relationship between two variables may be represented by a curve instead of a straight line. Seeing the data is not linear, a practitioner may attempt to calculate more complex regression models such as polynomial, power or exponential functions. They might even conclude there is no strong relationship between the two variables, and, therefore, decide to select another input variable. Before going down these paths, practitioners should consider transforming the data to create a straighter line to fit the data. A simple yet highly effective approach to transforming the data into a straighter line is to square the x values and calculate new R2 values until reaching a point where the R2 is at a maximum.

Control Chart Anomalies

Control charts are powerful techniques to determine if a process is in control or out of control. Six Sigma professionals are all familiar with the general rules of control charts:

  • A process in control typically contains all of its data points within the upper and lower control limits. It is stable and predictable.
  • If one or more data point lies on or outside the control limits, the process is out of control (unstable, unpredictable).

Consider the control chart shown in Figure 3.

Figure 3: Sample Control Chart

None of the data points are touching or exceeding the control limits. The process appears to be in control, in which case the Six Sigma practitioner should continue to gradually improve the process. But think again. Even though the points are within the control limits, this control chart shows a trend (six or more successive points in ascending or descending direction) that is evidence of a special cause. Consequently, the practitioner should stop, identify and eliminate any special causes before improving the process. Other examples of control charts exhibiting distinct patterns within the control limits are:

  • Cycle – Variation caused by regular changes in the process inputs or methods (i.e., time of day, seasonal)
  • Repeats – A pattern where every nth item is different (i.e., one station out of alignment)
  • Jumps – Distinct changes from low to high values attributed to a change, such as an operator shift or different material

Here is another call center example: A practitioner is trying to assess how well calls are resolved during the first call, and they collect the data in Table 2. In this scenario, p represents the proportion of calls not resolved.

Table 2: Call Center Data Related to First-call Resolutions
WeekHelp Desk CallsCalls Not Resolvedpp-bar
137,3744,2040.011250.143744
232,6123,3710.010340.143744
338,9724,0800.10470.143744
435,0453,6800.10500.143744
535,4113,9240.11080.143744
650,9385,4490.10700.143744
752,9705,7610.10880.143744
853,4086,2450.11690.143744
938,5814,1800.10830.143744
1036,8635,2070.14130.143744
1130,8106,4260.20860.143744
1228,6774,3310.15100.143744
1326,2145,8390.22270.143744
1423,7764,4920.18890.143744
1524,3354,6410.19070.143744
1624,3114,6060.18950.143744
1726,8634,8710.18130.143744
1832,0525,7640.17980.143744
1931,0265,4310.17500.143744
2037,1916,7500.18150.143744
2130,1095,3270.17690.143744

From this data, a p-chart is produced (Figure 4).

Figure 4: Customer Calls Not Resolved

The practitioner’s first reaction is that the entire process is out of control. Virtually every data point is either below or above the control limits – this should raise red flags. They might want to stop to identify and resolve all the special causes before proceeding to improve the process. Or, the pracitioner may think there are errors in the data and decide to re-perform the collection sampling. But before doing so, it is important to look again at the data to see if there are any shifts occurring. In this example, there are two distinct sets of p values: Weeks 1-9 and Weeks 10-21 (Table 3). Accordingly, two separate p-bar values are needed: 0.10896 for Weeks 1-9 and 0.18236 for Weeks 10-21.

Table 3: Customer Calls Not Resolved, P-bar Divided into Weeks 1-9 and 10-21
WeekHelp Desk CallsCalls Not Resolvedpp-bar
137,3744,2040.011250.10896
232,6123,3710.010340.10896
338,9724,0800.10470.10896
435,0453,6800.10500.10896
535,4113,9240.11080.10896
650,9385,4490.10700.10896
752,9705,7610.10880.10896
853,4086,2450.11690.10896
938,5814,1800.10830.10896
1036,8635,2070.14130.18236
1130,8106,4260.20860.18236
1228,6774,3310.15100.18236
1326,2145,8390.22270.18236
1423,7764,4920.18890.18236
1524,3354,6410.19070.18236
1624,3114,6060.18950.18236
1726,8634,8710.18130.18236
1832,0525,7640.17980.18236
1931,0265,4310.17500.18236
2037,1916,7500.18150.18236
2130,1095,3270.17690.18236

The resulting p-chart (Figure 5) indicates the process is in control during Weeks 1-9. During Week 10, however, there is a process shift that significantly increases the p values. During Weeks 10-21, the average proportion of calls not resolved hovers around 18 percent. Thus, efforts to identify special causes should be focused only on what happened during Week 10 and beyond, in order to save considerable time and resources.

Figure 5: Customer Calls Not Resolved, Highlighting Process Shift

Another example of a strange control chart, this time involving the number of calls per customer. The practitioner collects the following data over a period of 13 weeks (Table 4), and decides to use a u-chart because there could be more than one call per person.

Table 4: Number of Calls Per Customer
WeekCallersCallsu
110111.1
2221
310121.2
414171.214286
59131.444444
6781.142857
722321.454545
829381.310345
91391861.338129
102423031.252066
113534401.246459
123464081.179191
133273961.211009

Figure 7 shows the resulting u-chart. While the process appears in control, the control limits during the initial eight weeks are highly irregular and quite wide. During Weeks 9-13, the control limits appear more normal (regular and narrower).

Figure 6: Number of Calls Per Caller

Before jumping to conclusions that the data is invalid, the practitioner should consider the type of chart they are working with. U-charts (and p-charts) are control charts for data containing defects and defectives, respectively. The control limits for these types of charts are highly sensitive to the number of samples collected. Notice the average number of samples for Weeks 1 to 8 is nearly 17 calls. The average for Weeks 9 to 13 is 347 calls. The higher the sample number, the more regular and narrow the control limits. In this example, it appears the process had not reached a “steady-state” of calls until week 9. Consequently, the practitioner might consider using only the data for Weeks 9 to 13, rather than re-sample the entire 13 weeks. Again, this can save considerable time and resources.

Making Proper Interpretations

Poor decisions waste time, resources and capital. This waste occurs in re-sampling data, collecting more data, re-analyzing data or improving a process that is not in control. Generally, the problem is not related to the tool itself, but how the results are interpreted. In the absence of extensive project experience, Six Sigma professionals can use the techniques described here to identify these anomalies and transform the data in order to make better quality decisions.

About the Author: Peter J. Sherman is a Black Belt and quality engineer with 21 years of experience, including serving as senior Black Belt for AT&T’s Product Development Group. He has a master’s degree in engineering from the Massachusetts Institute of Technology (MIT) and an MBA from Georgia State University. As a visiting scholar to Japan while at MIT, he worked with quality expert W. Edwards Deming. Sherman is lead Instructor at Emory University’s Six Sigma Certificate Program in Atlanta, and is a member of the American Society for Quality (ASQ) and the International Society of Six Sigma Professionals (ISSSP). He can be reached at psherm1@bellsouth.net.

Return To Previous Page



"The Bottom Line" Links

BEST SELLING PRODUCTS (iSixSigma Publications)
  1. Six Sigma Black Belt (DMAIC) Training Slides - 2009 Version!
    The 2009 Six Sigma Black Belt course includes over 40 more slides than the 2008 version. Contents include: 1,220 PowerPo...
  2. Certified Lean Six Sigma Black Belt Assessment Exam
    Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
  3. Certified Lean Six Sigma Green Belt Assessment Exam
    This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
  4. Certified Lean Six Sigma Black Belt E-book
    In 670 pages learn everything within the Lean Six Sigma DMAIC body of knowledge to successfully achieve Black Belt certi...
  5. Kaizen Workshop E-book
    This 150+ page ebook teaches key tools and techniques of Kaizen, as well as real application to enhance learning. Kaizen...
  6. Six Sigma Yellow Belt Training Slides - 2009 Version
    The 2009 Six Sigma Yellow Belt course is comprised of: 503 slidesInstructor notesSlide explanations15 data sets19 suppo...
  7. Design For Six Sigma (DFSS) E-Book or Print
    Need an "encyclopedia" consisting of many of the tools you’ll study? Need a helpful refresher to apply the DFSS process?...
 
Six Sigma AdLinks
AdLinks Information


Google AdWords
 
Home | Discussion Forum | Event Calendar | Job Shop
Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
 Terms of Service. �2000-2009 iSixSigma. All rights reserved. v3.0lb, 0.0
About iSixSigmaContact UsPrivacy PolicySite Map