iSixSigma

Monthly Report Cpk or Ppk?

Six Sigma – iSixSigma Forums Old Forums General Monthly Report Cpk or Ppk?

Viewing 24 posts - 1 through 24 (of 24 total)
  • Author
    Posts
  • #32922

    migs
    Participant

    We are currently organizing a system to report monthly performance status of critical characteristics under SPC implementation. The purpose is to review monthly capability indices and thereby generate actions for improvement.
    My question is, do we review Cpk or Ppk or both? For info, our SPC system is capable of generating both.
    By the way, I work for a semicon assembly company.
    Thanks for all your help.

    0
    #88458

    Karimkonda
    Participant

    Hi migs,
    Its like this. Cpk and Ppk differ on just one fundamental aspect. The kind of variation used to compute them. Cpk uses the variation computed using Rbar, whereas Ppk uses the s.d of the data itself. Now, like the indices are called Cpk is for capability, and Ppk is for performance.
    It all depends on what you want to do with the indices. I recall reading somewhere on this post that for a process in statistical control (no special causes) that Cpk provides a sort of glimpse into the future of the process.
    If you’re doing some long term analysis, I would say go for the Cpk. That’s not to say ignore Ppk completely.
    I hope this helps.
    Ashwin.

    0
    #88465

    Rick Pastor
    Member

    Migs:
     
    I do not like the way the meaning of Cpk and Ppk have evolved.  Therefore, I hope I got your attention by saying that “Ppk is junk.” 
     
    The definition of Cpk is that it uses an estimate of sigma, and Ppk uses the calculated sigma.  In this definition of Cpk, the term estimate of sigma means sigma=R/d2 where R is the range and d2 is a value that is automatically calculated by computer programs or you find in a table without the user knowing what it is all about.  The Cpk is called the short-term capability.  In contrast, Ppk uses the calculated sigma and hence is called the long-term sigma. 
     
    This seems all to confusing to me.  As an engineer, I would say that the true statisticians have lost contact with the concept of evolution.  For the old timers that do not use computers, they could subtract two numbers much easier than calculate a standard deviation.  Therefore, control charts with a subgroup of 5 would be X_bar, R chart. 
     
    Think about it.  If you want to produce an X_bar, sigma control chart in the old days with a subgroup size of 5, you had to get out your pencil and paper and slide rule and start to work.  Wow, back in the dark ages making a control chart using X_bar and sigma required lots of work.  In general, calculating sigma in the dark ages was not much fun unless you had no friends and did not eat breakfast or lunch.  
     
    Engineers did have a life.  Since most engineers had some kind of life, they developed short cuts.  Therefore, a table of d2 values, a subtraction and one division allowed them to quickly calculate the value of sigma.  Today we have computers that do sigma calculations in a flash without the need of a short cut.
     
    What does the lack of evolution bring us today? – Two confusing concepts!  I say stop the confusion.  First, Ppk is not a measure of long-term process capability.  If I report a weekly Ppk using the calculated sigma based upon say 100 measurements (subgroup size 1 or individual measurements), is that really a measure of long‑term capability from a manufacturing point-of-view?  No!  It is a snap shot of that process step.
     
    Here is the solution, get rid of Ppk and use Cpk only.  More managers have heard of Cpk and fewer of Ppk.  Just make sure that you let them know that you calculated sigma (subgroup size 1).  To explain how the process behaves as a function of time, a long-term look at the process, I would create a bar chart showing the Cpk on a weekly basis or monthly basis.  If Cpk goes up, you need to explain why?  If Cpk goes down, should I get a bonus? Ok, forget about the bonus. 

    0
    #88501

    Karimkonda
    Participant

    Rick,
    The long term issue of Cpk and Ppk often become irrelevant if your process is under statistical control. If your process is under control, and has been for a while, and it shows no sign of deviating from being under control (as per control charts and so forth) then what happens is that your actual process variance, calculated from the raw data, and the sigma_hat, the estimated variance (from Rbar/d2) become more or less similar.
    As a result, you can use either one to give a fair picture of short- and reasonably long-term process capabilities. One thing to note, I have worked with companies that require Ppk as well as Cpk presented side by side.
    Also, I don’t mean to start a flame war here or anything, but from what I have had experience with, I don’t think Cpk is going to go out soon.
    Also, if I may quote you from your previous post :
    “If Cpk goes up, you need to explain why?  If Cpk goes down, should I get a bonus? “
    I find this rather perplexing. The aim of studying your process, is to increase Cpk (1.33 is a nice number but the higher the better). Your statement seems to read otherwise. Cpk is obtained by dividing by variation, so the smaller the variation the better, and hence the smaller the Cpk.
    As per the ‘snapshot’ comment, a process under statistical control will report similar (or if a source of common cause variation is found and eliminated, increased) values of both Cpk and Ppk. So, Cpk and Ppk can both be used as a long term indicator of process capability and performance if and only if your process is under statistical control. It becomes a snapshot only if your process is not under statistical control, because your process is subject to random causes of variation (a.k.a. special causes).
    Since the use of these indices is statistical in nature in the first place, and there are robust theories behind them, I think you should not say such things about statisticians (I am an engineer myself.). Just one more thing to be said about statistics, is that the sigma = Rbar/d2 generally gives an overestimate of the sigma_hat calculated from the raw data. This is a consequence of Shewart methods.
    Personally, I think that Cpk is the better one to use, simply because (as Rick said) more people know about it, and have a better idea of what it stands for.

    0
    #88513

    Rick Pastor
    Member

    Ashwin:
     
    Let me clarify a statement that I made.  I wanted to say, “If Cpk goes down you need to explain why!  If Cpk goes up, should I get a bonus? Ok forget the bonus.” 
     
    I believe that we fundamentally agree.  I do not think that we shall have a flame war.  In addition, I do think very highly of statisticians.  Nevertheless, I still believe that Ppk is a bad concept, and the term Cpk should be reported using the standard deviation of individual measurements. 
     
    The use of the word long-term capability for Ppk is missing leading.  By hiding the fundamental assumption used in the calculation of Cpk and Ppk we confuse people.  We make them think we can see into the future.  I believe that Ppk is the correct way to calculate the weekly snap shot view of a process.  Unless someone improves the process, both Cpk and Ppk will fluctuate around some typical value.  A bar chart of the weekly results over time will reveal that fluctuation.  Get rid of Ppk and use only the term Cpk and tell your people how you did the calculation.
     
    I will ask you the following question.  What real information does the comparison of Cpk and Ppk provide?  Here is an example for you to use
     
    Calculate Cpk Calculated in Excel:

    Generate 100 random numbers between 0 and 1.  Uniform distribution
    X_bar=0.457
    R_bar=0.641
    d2=2.33 for a subgroup of 5
    s_est = R_bar/d2 = 0.275
    Cpk = Min(1.65,1.97)
    Cpk=1.65 (Central Limit theorem makes things look better)
     
    Calculate Ppk Calculated in JMP:

    Using the same 100 random numbers generated above
    X_bar = 0.457
    s = 0.284
    Ppk = Min(0.536,0.63)
    Ppk = 0.536
     
    Note:  both s_est, and s are estimates of the population’s standard deviation.  In other words they are both snap shots of the process over the period of time that the data was collected. 
     
    Engineers in the early 1970s produced control chart limits, GR&Rs, Cpk, etc using estimates of sigma because it could be done quickly.  Using the trick made the calculation more accurate but not the estimate of the population sigma.  On the other hand, I do not believe that any textbook written then suggested that the estimate was the correct way.  It was a trick to get to the need values. 
     
    Computers made the calculation of process capability easy.  I believe that Ppk is the original definition of Cpk.  I think Ppk was introduced to allow people stuck without computers (being polite rather than saying stuck in the past) to continue their use of statistics.  

    0
    #88548

    migs
    Participant

    Thanks To All!
    However, if I may add, The goal of Cpk = 2 (which have been the standard goal at least in semicon industry) is based (I think) from the concept of short-term Cpk, using the Rbar/d2.
    As all our customers requirement is Cpk of 2, would it justify using the Rbar/d2?
     
     

    0
    #88586

    Hong
    Participant

    Good question. Our company has been doing that at the division level for a number of years. It is not as simple a question as it appears. I can say that it is almost always Cpk, but how you do it depends on what you want to track and what you want to the information for. Please email or call me directly. I would be more than happy to discuss with you. I do travel quite a bit, so please leave messages when you call.
    Hong
    Kaiser Aluminum
    [email protected]
    9493466700×203 (Office)

    0
    #88654

    Rick Pastor
    Member

    If your customer requires Cpk, you have no choice but to do it.  On the other hand, I would still track Ppk.  Ppk uses the calculated s.  When you report Cpk or Ppk you need to report how s is determined.     
     
    Let me just emphasize one point in the calculation of Cpk or Ppk.  Let’s assume that you examine the same set of data and you calculate Cpk and Ppk.  Both require the calculation of X_bar.  For that set of data, X_bar is the grand mean.  Therefore, the value of X_bar is the same for both Cpk and Ppk.  The difference between Cpk and Ppk is how you determine the value of s.
     
    Minitab’s Six Pack gives both Cpk and Ppk.  Cpk is called the within group capability and Ppk is called the overall capability.  I tried to find out how Minitab calculated Cpk, but the index in the help menu did not reveal the defining equation for Cpk. 
    The index in the help menu of SAS Institutes JMP shows how JMP calculates a short-term capability for a fixed group size.  JMP uses the grand mean of all the subgroup, and the average of each subgroup to calculate s.  I “think” that JMP’s short-term Cpk is Minitab’s within group capability. 
    I suppose that the only package that I know of that calculates Cpk using sest=R_bar/d2 is in PQ System’s Chart.  In retrospect, it is important to know the assumption made in reporting any metric.

    0
    #88662

    Ozarski
    Participant

    Rick,If you go to the help menu for Capability Analysis, there is a “see also” tab which lists the calculations.  Under “Estimate” you can also change the method of estimating standard deviation.
    Stat->Quality Tools->Capability Analysis->Help->See Also->Calculations
    Jeff

    0
    #88690

    Rick Pastor
    Member

    Jeff:
    I use SAS Institutes JMP for my statistical analysis so I do not have a copy of Minitab.  I had someone follow  the path
    Stat->Quality Tools->Capability Analysis->Help->See Also->Calculations
    They gave me a copy of what they found.  The Sigma within is the standard deviation estimate based on within subgroup variation.  I assume that this calculates a sigma using group averages and the grand mean.  JMP calls the Cpk calculated from this sigma the short term Cpk.  I don’t think that either JMP or Minitab uses the old method of R_bar/d2 to estimate sigma.

    0
    #88692

    Ozarski
    Participant

    Rick,
    The default in Minitab is to use pooled standard deviation, but there is an option under Estimate to use the Rbar method.Jeff

    0
    #88694

    Savage
    Participant

    You stated, “I don’t think that either JMP or Minitab uses the old method of R_bar/d2 to estimate sigma.” Would you elaborate on what you mean by “old method?”

    0
    #88719

    Rickk Pastor
    Member

    If you go back to say 1950 when computers were not available and calculators were mechanical, engineers needed a method for calculating sigma in a reasonable amount of time.  Since the range is a subtraction and R_bar is mostly addition, it was easy to estimate sigma from R_bar/d2.  In today’s world, there is no reason to estimate Cpk  using R_bar/d2. Therefore, I called it the old method.  
     

    0
    #88724

    Savage
    Participant

    I accept and mostly agree with this statement. One issue I see is is that computing Cpk with two different sigmas causes confusion. Also, it does not make sense to calculate an index using two different methods. Remember before Ppk was developed, Cpk was calculated using three different methods? That was a mess.
    The A.I.A.G. SPC manual shows Cpk being calculated using (R-bar/d-sub-2 ). Are you suggesting that we can calculate Cpk using (S-bar/c-sub-4)? where S-bar is the average of the sample standard deviations. If you are suiggesting this, how should Cpk be expressed so that one knows which method was used?
    Matt

    0
    #88744

    jatin
    Participant

    pls let me know how to calculate Rbar in excel.
    i am having diff in getting cpk values.
    as sigma = rbar / d2.
    also what is d2.
    i have 100 figures.(data for 100 batches) & what is sub group.  i have no sub groups.
    pls help
     
    thanks a lot,
    jatin

    0
    #88769

    Hemanth
    Participant

    Hi Jatin
    I am not sure if any of the other messages in this thread answer your question.d2 is a constant in the formula for std deviation in control charts. It can be obtained from a table. (try searching on this site and google for tables on constants for control charts.) There are lot of books which have tables at the end. But Statistical Quality Control by Grantt. All your questions will be answered in that.As for the sample size in your case. I strongly recommend the above book, its pretty much available in any technical book store.
    Happy control charting….

    0
    #88770

    Hemanth
    Participant

    customer is god..go ahead and report it.

    0
    #88775

    Rick Pastor
    Member

    First you have to determine your subgroup size.  The size of the subgroup is something that you determine before you start the calculation.  It seems to be that a subgroup size of 5 is used a lot. 
     
    For a subgroup size of 5 (5 batches with one data point per batch) gives d2=2.33.  If you want a different subgroup size you need to let me know and I could look up the number in a statistics book. 
     
    Assuming that you have your 100 data points from 100 bathes and the data is in a column with the first data point in A2, you would perform the following:
     

    In cell B2 calculate   =MAX(A2:A6)-MIN(A2:A6). 
    Copy and paste cell B(2) into cell B(7).  Cell (7) is the range for the next subgroup. 
    Continue the copy and paste until you have 20 subgroups. 
    R_bar is the average of the 20 subgroups. 
     
    This may not be the most efficient method of calculating R_bar but it will get you there.  If you have more than one data point per batch, then you need to do things a little differently.
     
    With the 100 batches you may have been running an X-bar and R control charts.  The subgroup size is normally determined when you establish the control charts.  For example, someone may decide to measure 5 random samples per batch.  The sample size is determined by balancing several factors:

    Materials cost of the measurement
    Human resources available to make the measurements
    Time to make the measurement
    Consequences (cost/liability) of making a bad batch
    And there are probably more.
     
    Hope this helps.
     
     
     

    0
    #88776

    Rick Pastor
    Member

    Matt:
    What I was suggesting is that we use the term Cpk for all the different calculation methods.  We then specify how we did the calculations.  If you want to calculate sest using R_bar/d2, just tell me how you did the calculation.  If you are my supplier, I may ask you to calculate Cpk using a different method that I could then specify. 
    Even if I am making a control chart in excel, I would not calculate s using R_bar/d2.  I would use the standard formula for calculating s (which is still an estimate of the population’s s).  Of course, I would be using the x_bar of the subgroups to calculate s. 
    My original point involved the following question.  What is the probability that a manager picked at random from all the companies in the USA knows the difference between Cpk and Ppk?  My guess is that the probability is well less than 50%.  

    0
    #88790

    Gabriel
    Participant

    Rick, there is an error in the calculation of Cpk in your example. it is 0.554 (not 1.65), and that is comparable to Ppk=0.536.
    Besides that, I find this discussion very interesting. I want to say a few things:
    1) Summary What to report? Cpk or Ppk? I go for Both. They show differnet things and the comparison between them can be enlightening.
    2) Both estimations of Cpk and Ppk uses estimations of the process sigma, since they are both based on a sample and not the whole process population. The main difference is NOT that one is calculated as Rbar/d2 and the other is the square root of the raw data. In fact, the sigma estimation for Cpk can be also calculated using not the subgroup ranges, but the subgroups standard deviations, in which case you use Sbar/c4 instead of Rbar/d2. As you said, in the old days calculating a standard deviation was a problem, specially in the shop floor. Now calcualtors and computers make it no more difficult than calculating a range. That was certainly a reason (not the only one, I think) for the use of R in those days. But then, we can switch to subgroups S instead of R and it will be still Cpk, not Ppk. The main difference between the estimations of sigma used for Cpk and Ppk is that those used for Cpk use ONLY data from INSIDE THE SUBGROUP. If the subgroups are rational (as they should be), typically there will be no variation due special causes within the subgroup (at least, not a special cause that affect only some of the individuals within the same subgroup). Then it is called the “within subgroup standard deviation” and it is a meassure of the “within subgroup variation”, which I think is the most acurate name for it (as you said, “short term” is, at best, missleading). On the other hand, with Ppk you forget about subgroups. If anything, it would not be using subgroups of size 1 (as I understood from what you said) but using only one big subgroup of size N (all the raw data). Then the standard deviation calculated in this way is the “total standard deviation”, or the standard deviation of the “total varliation”. (Again, “long term varaition” is missleading). So we have S(within) and S(total). We have the typical question: Can Ppk be greater than Cpk? No, since S(total) can never be smaller than S(within). At best, S(total) = S(within) if there is no between subgroup variation (well, in fact I should say “no more between subgroup variation than what is expected for that within subgroup variation”, but for the sake of simplicity I will live it just like that). As said before, there is typically no “special variation” between subgroups, so S(total)=S(within)= no special variation = stable process. Can I “calculate” a Cpk and Ppk and find Ppk to be greater than Cpk? Yes, that may happen, because when you calculate Cpk and Ppk they are not the real Cpk and Ppk, they are estimations that use two different estimations of sigma so, due sampling variation, the estimation of sigma(within) can be different (smaller or greater) than the estimation of sigma(total) even when, in fact, the process was perfectly stable and then the true process sigma(within) = sigma(total). But let´s go back to Earth. No real process is perfectly stable. Sometimes you will not note that as an out-of-control signal on the chart (there is a big chance that when you see an OOC signal there is special variation arround, but the inverse is not true specially if the variation due to special causes is not very big). Ppk is “Process Performance”. It describes the distribution of the process parts in a given span of time. So it’s in line with “actual customer satisfaction”. If the process is stable, however, you can go further and say that you expect that Ppk to be sustained in time. On the other hand, Cpk is not performance, since it does not includes all the process variation unless the process is perfeclty stable (what, as said, just dosen’t happen). It could be said that Cpk is “what the Process Performance would be if we eliminated all the special variation”, and suddenly the term “Process Capability” matchs perfectly with Cpk. In this way, Cpk can be related with “potential customer satisfaction”.
    3) I’ll tell you what do we do. We have SPC chats. The control limits gives you what we could call the “assumed” proecess average and process standard deviation, and you test the process data against those limits. Periodically (for example, monthly) we take the last chart from each process and compute Xbarbar, S(within), S(total), Cp, Cpk, Pp and Ppk. Then we plot three charts in a time seriers, where each point is one month. In one chart we plot Cp(assumed), Cp(computed for this last chart) and the 95% confidence interval for that computed Cp. While the control limits are not changed, Cp(assumed) shows a stright horizontal line. Typically, Cp(computed) will oscilate arround Cp(assumed) due to sampling variation, and Cp(assumed) will be inside the confidence interval for Cp(computed), meaning that the difference between Cp(computed) and Cp(assumed) is not unexpectable. If you see a trend or Cp(assumed) falls outside the confidence interval, then it is time to question whetther the process has changed and it is time to recalculate the limits. That would be not a problem. Of course that the goal is that the process improves, so you want to see the trend and recalculate the limits. By the way, we are now thinking to add the Pp plot on the same chart. Exactly the same chart we plot for Cpk. Comparing the Cp and Cpk chart you can see if process location is an issue or not. The third chart is for S. We plot S(within), S(total) and S(assumed). Again, as long as you don’t change the control limits S(assumed) is an horizontal stright line. In a stable process with the control limits well calculated, S(within) an S(total) will both independently oscilate arrond S(assumed), but the typical case is that you have S(within) oscilating arround S(assumed) and S(total) is somewhere higher and with a down trend as special cases are being eliminated.
    4) How do you use that information? Is Ppk (customer satisfaction) good enough? If yes, that’s it. Go for “continous improvement” but this process will not be a priority, unless you had not a worse case to work on. If not Why? Is Pp good enough? If yes, center the process to get a good Ppk (if it can be centered, of course). If not, is Cp good enough? If yes, then work on special variation. Use SPC to find and eliminate one by one the special causes of variation. That typically requires “local actions” that can be made by operators and supervisors. If not, then the system is not capable of meeting the required level of customer satisfaction. Centering the process and eliminating special causes will help, but you will not reach the desired level. The system must be improved, what my mean for example to buy a new machine or overhaul and/or upgrade the existing one, to change the process specifications, to change the product specification (a tolerance, a material), or just to use a differnt process (for example, grinding instead of turning). This typically requires management action, and is not something an operator can handle. In any way, once the action plan is stated monitor the effectiveness and make tha adjustments to the control limits as needed.
    So, what to report? Cpk or Ppk? I go for Both. They show differnet things and the comparison between them can be enlightening.

    0
    #88794

    Gabriel
    Participant

    “The A.I.A.G. SPC manual shows Cpk being calculated using (R-bar/d-sub-2 ). Are you suggesting that we can calculate Cpk using (S-bar/c-sub-4)?”
    Than’s wrong. AIAG’s SPC manual states to calculate Cpk using the estimation of the inherent process variation which can be Rbar/d2 or Sbar/c4. See pages 67, 79 and 80, and don’t let the word “Typically” used on page 80 confuse you.
    If you undo the thread, you will find that “Cp” is “capability index defined as the tolerance width divided by the process capability”, that the “process capability” is “the 6 sigma range of the inherent process variation”, and that the “inherent porcess variation” is “the portion of process variation due to common causes only that can be ESTIMATED by Rbar/d2 or Sbar/c4. Once you put one of them, it is an just estimation and not the real “inherent process variation”, from which you can get just an estimation of the process capability, and from which you can get just an estimation of the process capability index Cpk. So the definition of Cpk does not change, it’s just the way of estimating it what changes. Then I don’t agree that different ways of estimating the same things should lead to a different names of the thing itself.
    But with Cpk and Ppk, int’s not just to ways of estimating the same things. You are estimating different things (inherent variation vs total variation), and hence 2 different names are needed.
    And for the sake of sanity, Every Cp, Cpk Pp and Ppk that you can write down on a paper are estimations, and these estimations are all calculated (they are not guesstimations). The difference between Cpk and Ppk is NOT than one uses an estimated sigma while the other uses a calculated sigma.
    Ok, not everybody thinks like that, so I guess that’s just my view. My very stong view, as you can see.

    0
    #88801

    Savage
    Participant

    If we use the same term, Cpk, and have different methods to calculate Cpk, this may lead to confusion. Imagaine two people calculate “Cpk” for the same data set. One uses R-bar and the other S-bar. The results may be different and I can just imagine the arguments of which is correct. (They both would be correct, but they used different methods.)
    Also, my guess is less than 10%.

    0
    #88929

    Rick Pastor
    Member

    Matt:
    It has been a while since I have had a chance to follow the discussion.  First, it was pointed out that I made a mistake in the calculation of Cpk using R_bar/d2.  When I went back and checked the results.  I failed to divide by three.  The Cpk and Ppk for a uniform distribution of 100 points was much close in value. 
    Nevertheless, I know of at least two ways that Cpk is being calculated and we do not spell out the assumptions.  Both methods use a fixed subgroup size to perform the calculation.  One is using R_bar/d2 to estimate s.  The second method calculate s using the average of all the subgroup, which is the grand mean of the population, and then uses the standard formula sqrt[sum(Xi-X_bar_bar)2/(n-1)] to calculate s.  The latter of the two methods is called the short‑term sigma.  The short‑term s matches the sigma in control charts. 
    If a Cpk is reported without specifying the calculation method, I see confusion.      

    0
    #89633

    Rick Pastor
    Member

    I have been out of the loop on this discussion for roughly a month and changed e-mail address during that time.  Now, I am back and I shall ask two questions. Gabriel, you said, “But with Cpk and Ppk, int’s not just to ways of estimating the same things. You are estimating different things (inherent variation vs total variation), and hence 2 different names are needed.” 
    You claim that Cpk is inherent variation: 

    Question: Does sest=Rbar/d2 estimate the s calculated from the equation that defines s, “the calculated s”?
    Question: What property of the data makes Cpk inherent?
     Assumption: Both Cpk and Ppk are based upon the same subpopulation.

    0
Viewing 24 posts - 1 through 24 (of 24 total)

The forum ‘General’ is closed to new topics and replies.