# Capability calculation for non-normal data

Six Sigma – iSixSigma Forums Old Forums General Capability calculation for non-normal data

Viewing 9 posts - 1 through 9 (of 9 total)
• Author
Posts
• #30281

Hi,
How can I manipulate a set of non-normal distributed data in Capability index calculation?
I have learned that there may be some way to transfer the data, like Box-Cox transformation. Can someone tell me what it is and how it works?
Thanks.

0
#78690 Hemanth
Participant

0
#78728 Robert Butler
Participant

The question that you raised concerning problems with non-normal data has been raised by other posts to this forum.  Many of the replies to questions such as yours are helpful and offer good advice.  There are a few points concerning some of the advice that needs some clarification.
Box-Cox transform- This transform has been recommended by many and it is indeed a very useful transform.  It allows the investigator to check a huge family of power transforms of the data.  This family includes 1/y, square root of y, and log y.  If you have a package that will permit you to use this transform and if the results of your analysis indicates that no transform is necessary this does not necessarily mean that your data is normal, it only means that a power transform will not be of value to you.  There are, of course other options such as the Johnson transform, Weibull estimation etc.  However, if the issue is one of process capability then it is possible to assess the capability without having to worry about the actual distribution.
If you take your data and plot it on normal probability paper and identify the .135 and 99.865 percentile values (Z = +-3) then the difference between these two values is the span for producing the middle 99.73% of the process output.  This is the equivalent 6 sigma spread and the capability goal of this spread is to have this equal to .75 x Tolerance.  If you have software that permits you to fit Johnson distributions it will find these values for you but if you don’t the above will permit you to do it by hand.
If you would like a further reference try Measuring Process Capability by Bothe.  Chapter 8 is titled “Measuring Capability for Non-Normal Variable Data.

0
#102924 John Noguera
Participant

Hi Robert,
I’m pulling up this old thread, to ask you a question about your use of Bothe’s technique for handling non-normal data.  I am planning to implement a similar approach in a software tool.
When the sample size is small (say < 100), do you extrapolate the curve on the nplot out to the 0.135 and 99.865 percentiles or do you simply take the min and max values?  If you extrapolate, do you use the last two data points and draw the line, or do you use "smoothing"?
Thanks!

0
#102981 Robert Butler
Participant

You raise an interesting question for which (as far as I can tell) there isn’t a particularly satisfactory answer.  In his book Bothe assumes that you plot will cross the .135 and the 99.865.  I do know that for small data sets (as in 30 points or less) I have run into several situations where this did not occur.
On pp. 444 Bothe states “One disadvantage is having to extend the NOPP out far enough to obtain estimates of the .135 and 99.865 percentiles. This task is especially difficult for highly capable processes and/or those studies with small amounts of data.”  In the instances where the data does not cross these percentiles Bothe recommends using a french curve to extend the plot line and this is the method I use.
This method does work but it has its drawbacks. For the kinds of small data sets with which I’ve had to work (very short production runs-lots of changes- maximum of 10-20 measurements), I have had situations where fitting with a french curve will give an extrapolated line that simply will not cross one or the other of those percentiles.
In instances of this type, I take the last two points and use them to predict a straight line intercept to the percentile of interest. I then go back to the historical data (assuming it is available) and the people on the line and ask if the intercepts I’ve “identified” with this less than elegant approach are reasonable.  If it’s acceptable, we go ahead and use these numbers for our initial capability estimates but we make sure everyone knows how these estimates were reached and we also make sure everyone understands it is ok to question our estimates at any time.
I would recommend extrapolation over the simple choice of minimum and maximum values because, at least for the data with which I’ve had to work, the simple min and max will result in capability estimates that are much too optimistic.

0
#102989 John Noguera
Participant

Thanks Robert.
In looking for an alternative for software automation, what’s your thought on the use of Johnson curves verus Box-Cox?

0
#103046 Robert Butler
Participant

Both of the transforms have their strengths and weaknesses and I think it would be appropriate to include both in a package.  The real issue is what do you mean by automation?

One of the big shortcomings of all of the computer packages Ive seen is the apparent desire to automate the decision process concerning these two without providing any kind of brake to the transformation process.  By this I mean forcing the user to really look at the data at hand and really understand the assumptions being made and the validity of the chosen transform.

With small data sets or with large ones containing known and deliberate mean/variance shifts the blind application of Box-Cox transform allows you to go wrong with great assurance because of the unstable nature of the resulting lambda estimates. Ive seen case after case where someone somewhere generated a lambda and then proceeded to apply it to all subsequent measurements. Ultimately the statistics based on the transformed data, which are used to describe the process, fail to aid in its control and when everything goes out of control no one has a clue as to why it happened.

Similar issues surround the application of the Johnson transform.  It is important to recognize there are three families of Johnson transforms. Ive seen several instances where a computer package has chosen a single family of Johnson transform and just blindly applies it to all incoming data. The results of such an application are easy to imagine. In order to determine which family applies to your situation you need to compute beta1 and beta2 values for your data and look up their point of intersection on a beta1-beta2 plot.  An excellent discussion of the issues surrounding the use of the Johnson transform and the steps one should take before and after applying it can be found in Chapter 6 of Statistical Models in Engineering by Hahn and Shapiro.

0
#103109 John Noguera
Participant

Robert,
Thanks for the insight and the Hahn Shapiro reference.
John

0
#103127 John Noguera
Participant

Hi Robert:
I am attaching an e-mail I received from Davis Bothe on this topic.  By the way he has read some of your responses on isixsigma and said that you were “quite knowledgeable and experienced in many statistical methods”!
Regards,John
The issue you raise about non-normal distributions is an important one, especially when dealing with small sample sizes (throughout the following discussion, I am assuming the data came from a process that is in statistical control because even just one out-of-control reading in a small sample will greatly influence the apparent shape of the distribution).
I don’t think defaulting to the minimum and maximum values of the data set for the .135 and 99.865 percentiles is a good idea.  Doing so would produce an artificially high estimate of process capability because the measurement associated with the .135 percentile would be located much farther below the minimum measurement of the data set and the “true” 99.865 percentile would be much greater than the maximum value of the data set.
For example, in a sample of 30, the smallest measurement would be assigned a percentage plot point of around 2 percent (depending on how you assign percentages).  This point represents the 2.00 percentile (x2.00), which is associated with a much higher measurement value than that associated with the .135 percentile (x.135).
The largest measurement would be assigned a percentage plot point of somewhere around 98 percent.  This point represents the 98.00 percentile (x98.00), which would be associated with a lower measurement than that associated with the 99.865 percentile (x99.865).
The difference between x98.00 and x2.00 will therefore be much smaller than the difference between x99.865 and x.135.  Thus, the Equivalent Pp index using the first difference will be much larger than the one computed with the second difference.
Equivalent Pp = (USL – LSL) / (x98.00 – x2.00) is greater than
Equivalent Pp = (USL – LSL) / (x99.865 – x.135)

So I think it would be best to avoid this approach as it produces overly optimistic results.

So what would be better? I recommend fitting a curve through the points and extrapolating out to the .135 and 99.865 percentiles.  However, this can be difficult when there are few measurements.  First, there should be at least 6 different measurement values in the sample.  Often, with a highly capable process and a typical gage (15% R&R), most measurements will be identical.  For a sample size of 30, I have seen 6 repeated readings of the smallest reading, 19 of the middle reading, and 5 of the largest reading.  With only three distinct measurement values, it is impossible to use curve fitting or NOPP.

Second, with really small sample sizes (less than 20), you must extrapolate quite a distance from the first plotted point down to the .135 percentile and quite a ways from the last plotted point to 99.865 percentile.  This presents an opportunity for a sizable error in the accuracy of the extrapolation.  However, I believe this error would always be less than that of using the smallest value as an estimate of the .135 percentile.  This method is always wrong whereas the extrapolation method might produce the correct value.

You had asked about my preference between using the last two plot points for the extrapolation or relying on a smoothing technique.  I would prefer the smoothing technique as this approach takes into consideration all of the plot points.  If there is any type of curvature in the line fitted through the plot points (which there will be since we are talking about non-normal distributions), then this method would extend a curved line out to estimate the .135 and 99.865 percentiles.

The “last-two-points” method will always extend a straight line out to the .135 and 99.865 percentiles (through any two points, there exists a straight line).  Because we are dealing with non-normal distributions, we know a straight line is not the best way to extrapolate.  Thus, the smoothing approach will produce better estimates.

One note of caution: be careful of using a purely automated approach.  Its always best to look at the data for each process rather than relying solely on an automated computerized analysis.

0
Viewing 9 posts - 1 through 9 (of 9 total)

The forum ‘General’ is closed to new topics and replies.