Tagged: minitab cpk transform
Hi, why is it that once you do a Capability analysis using a distribution there is no Cpk shown, only Ppk?
I found this:
post, which near the end someone says:
“The Cp metrics only apply to normal cases, because only in a normal distribution will the 6 sigma spread cover the probability range. For most distributions, sigma is not a parameter. Although it can be calculated on any data set, it has no predictive value in the sense of +/-x sigma covers some known probability spread”
However, don’t both Cpk and Ppk use sigma in their calculations? Why can it still do Ppk?
We have access to the minitab training, and best I can tell after reviewing the capability analysis section, the only difference is weather it compares the variability within a sub-group (Cpk), or looking at the entire sample size’s variability (Ppk). However, if your sub-group size is 1, how is that different?
If your study is actually short term data, then just change the display to Cpk. Minitab assumes the data is long term data with special cause inherently included.
Any reason why you’re asking…just curious.
I did a presentation on my summery/cheat sheet of my minitab training, and some higher ups noted that once I did a distribution that it did not show Cpk, only Ppk, and that’s not the same as Cpk.
We use Cpk to quantify if something can be transferred to manufacturing or not. So just renaming it wouldn’t be right, and I would like to explain why it’s omitted as well.
I’ve given you an explanation earlier but again…if you’ve done a truly short term capability study, just change the Minitab output to Cpk.
I find the statement about being able to be transferred to mfg kind of interesting. Although most capability studies are with short term data, I’d hate to think someone actually wouldn’t accept a long term process capability study.
So, it sounds like a suspicion of mine is true. That Ppk is always < or = Cpk?
Thanks, but you have told me the difference between Cpk and Ppk, but not why minitab doesn't calculate Cpk for distributions.
I don’t work for Minitab or have financial ties with them but I’ve used them and am aware of some of their past guiding principles.
Yes, since long term capability analysis (summed up in Ppk) has many more sources of variation (often referred to as special causes) than the s.d. is larger and therefore unless the mean has shifted much more to the center (assuming 2 sided specs) then the Ppk is always smaller than a properly done Cpk of a short term capability analysis.
Anyone else know why minitab does this?
I’ll give the short answer and then provide the long one in a link.
Think about the formulas for Cpk and Ppk – with Cpk, you are comparing to the closest spec whereas with Ppk you are comparing to the entire range of the specs. For Cpk to make sense, you have to have equal likelihood of falling outside of each spec given the same distance away for the metric to be useful, so you have to have a symmetric distribution. However, almost every nonnormal distribution you come across is NOT symmetric, so Cpk is useless.
As a quick example, go make some exponential data with a mean of 1. Imagine your specs are 0 and 3, and make a histogram of the data. If you utilized Cpk, your value would be based on the mean being about 1 unit from the lower spec even though it is obvious you will never see a single data point below the lower spec. Your Cpk will be about 0.33 (based on CPL) whereas only CPU is realistic and would give a value of about 0.67. Cpk assumes that one unit away in either direction from the mean will have equal probability, whereas for a skewed distribution like this that is not even remotely close to reality.
You can get a longer explanation at http://www.minitab.com/en-US/support/documentation/Answers/NoWithinSubgroupCapability.pdf.
Let me know if that doesn’t make sense!
@joelatminitab Always a pleasure reading your posts. Hope all is well.
OK, I’ve slept on it and I’m still confused.
“The first method of calculating the standard deviation is to assume all measurements make one big sample, and calculate its sample standard deviation. … The second method is to use the within-subgroup standard deviation to calculate the capability indices.”
Sounds to me like they both need to be able to calc standard deviation. Why can you calc standard deviation for “all measurements”, but not “within-subgroup”?
“the properties of the normal distribution that make these methods possible are not shared by nonnormal distributions”
Isn’t the whole point of using a distribution to make the data normal? Can’t you make a secondary normal x-axis and calc the standard deviation based on that, now that it’s normal on the non-normal distribution?
“However, the within-subgroup standard deviation does not account for
the variation between subgroups. Therefore, capability indices that use the within-subgroup standard deviation represent better performance than the actual one.”
How does this work when you set your sub-group size as 1?
SigmaXL includes Cpk for non-normal capability using a normalized within standard deviation. Statgraphics also does this.
John, the whole point of using a distribution isn’t to make something normal. It’s to fit a curve that best describes your data. A non-normal curve will describe data that’s NOT bell shaped.
Cpk is used for short term process capability with all sources of special causes removed (e.g. within only one subgroup) while Ppk describes capability with more sources of variation, including across subgroups. Keep it simple.
There’s more in depth stuff that Joel and I haven’t gone into because it’s often case by case specific–e.g. I’d never advocate reporting process capability for one subgroup on your original post about use of Cpk for product acceptance.
FYI, I’d caution you to NOT just use Cpk “to get the best number”. You should be at least still be looking across subgroups on decisions for product launch.
My two cents.
@Mike-Carnell Yes, a transform or distribution should only be used if it fits the data better, which you can tell using “Individual Distribution Identification”. If the data is not normal, then the resulting Cpk on a normal graph, can be just as misleading as if you transformed data that shouldn’t be. It’s not about maximizing Cpk, it’s about maximizing fit.
In fact I almost had a Cpk of 9.43, until someone asked if it was normal, and I had to transform it which brought it down to a pitiful 1.37 ;P.
Also, this is more of a theoretical question so that I can describe why no Cpk on a distribution to my co-workers and understand the difference myself more clearly.
What we’ll do with that information is another question. I suspect it will simply be “since Ppk is always worse then Cpk, as long as your Ppk is better then our min requirement for transfer then you’re good”. But without fully understanding the differences I don’t feel I can bring this to them nor answer follow up questions, not to mention my own curiosity as well.
@cseider and/or @joelatminitab, based on @jnoguera post, other software does calculate Cpk for non-normal distributions, so why doesn’t minitab? They decided it was misleading (if so why), or the software just isn’t that sophisticated?
The whole sub-group thing still doesn’t help explain the difference if you usually set your sub-group to 1.
Note that Minitab does calculate both Ppk and Cpk when using the Box-Cox transformation. It does not do so for the Johnson transformation or nonnormal distributions.
An important assumption in this discussion is the data are inherently non-normal, not non-normal due to outliers, bimodality or a “chunky data” measurement problem.
Given that we have inherently non-normal data, there is a benefit to reporting both Cpk and Ppk, in that the Cpk is a potential capability if the process is stable (same principle for Cpk with normal data).
To illustrate this, do the following in Minitab or SigmaXL:
1. Generate 100 rows of random lognormal data (use location = 0, scale = 1).
2. Add 2 to the random numbers 51 to 100. This denotes a large shift in the process, an assignable cause that has nothing to do with the inherent lognormal distribution. Stack the data if necessary so that it is one column.
3. Now run a process capability (normal in Minitab) but use the Box-Cox transformation with Lambda = 0. Use USL = 5, subgroup size=1.
4. Note the large difference in Cpk and Ppk due to the assignable cause!
From SigmaXL’s appendix:
Process Capability Indices (Nonnormal)
Z-Score Method (Default)
Transformed z-values are obtained by using the inverse cdf of the normal distribution on the cdf of the nonnormal distribution. Normal based capability indices are then applied to the transformed z-values. This approach offers two key advantages: the relationship between the capability indices and calculated defects per million is consistent across the normal and all nonnormal distributions, and short term capability indices Cp and Cpk can be estimated using the standard deviation from control chart methods on the transformed z-values. The Z-Score method was initially developed by Davis Bothe and expanded on by Andrew Sleeper. For further details, see Sleeper, Six Sigma Distribution Modeling.
I do need to add that SigmaXL only supports nonnormal capability for individuals not subgroups, so that may be a limiting factor.
I think we’re getting a little side tracked here.
At this point I understand the main difference between Cpk and Ppk, but I still don’t understand why Minitab doesn’t calc Cpk for certain transforms and for distributions.
I also still don’t understand the difference between Cpk and Ppk when your sub-group size is 1 (or individuals, as @jnoguera calls it).
John, StdDev for non-normal Cpk uses MR-bar/1.128 of the z-scores.
I would like to share my practical experience, with my experience,
whether you use any software for Cpk & Ppk calcy the intent is
Ppk is a short term study but important study where it helps you to decide on the process capability and establish the tool life, offset if any and time period till which you can run process without altering / any interface and sigma is calculated by comparing how much individual value (reading)is deviated or spread compared to average value hence this is accurate and study is done on continuous readings / data captured, where as Cpk intern is based on subgroup sizes and data collected at defined frequency will help to monitor the on going process stability and CPk sigma uses Range as key parameter hence sigma in Ppk and Cpk though seems it is different. —- and if you are using Minitab jsut check inputs selection.