Long and short term capability studies
Six Sigma – iSixSigma › Forums › Old Forums › General › Long and short term capability studies
 This topic has 13 replies, 6 voices, and was last updated 20 years, 2 months ago by Mike Carnell.

AuthorPosts

June 7, 2002 at 9:08 pm #29606
Some industry standards say “Both long and short capability studies shall be performed”. What are the
definitions of long and short capability studies? Are the calculation methods different between the two?
If we want to combine the Cpk values of different machines for one process. What would the calculation be? Thank you for your help.0June 7, 2002 at 9:50 pm #76205
GabilanParticipant@Gabilan Include @Gabilan in your post and this person will
be notified via email.Please open Statistics & Analysis,, Process capability, then you will have an understanding about this issue.
0June 7, 2002 at 10:08 pm #76206
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.This is not “the definition”, but my understanding:
A long study is that one where all potential causes of variation (normal and special) have time or chance to show (different operators, machine set up, raw material batches, gages, perishable tools, day/night, etc). If not, it is short. The length deppends on the process, but typically a long study is about one week or more and a short study ranges from 100 consecutive pieces (what can be a few minutes) to two shifts.
The Cp/Cpk formulae is not time or “study length” dependant. The same is valid for the Pp/Ppk forulae. But I do not agree on using Cp/Cpk on short studies because a prerequisite for Cp/Cpk is stability, i.e. absence of special causes of variation. And by what I said a short study is, the process my have special causes of variation that had no time to show on a short study. If so, the Cp/Cpk figures would be meaningless. Pp/Ppk can be calculated in both studies, but be aware that those figures will not say anything about “capability” in a sense of “what can I expect the performance of the process will be”, but will tell you “how your process performed during the study”. Try searching posts by keyword using “Cp, Cpk” and “Cpk against Pp”
Just my opinion…
Gabriel
PS: I don’t understand the question about “combine Cpk values of different machines”0June 8, 2002 at 3:59 am #76211Thank you very much for your comments, which are very beneficial. I will look into them more as you suggested.
By the way, I saw a shortterm capability formula written as Cp=(USLLSL)/8s, but I am not sure that this formula is common.Regarding my second question, let’s say there is an etching process using five machines to process many products at one time. Although all machines do the same thing, each machine shows a slitely different mean and standard deviation. which means that each machine shows a different data distribution and each has a different Cpk. I want to combine five Cpks into one to present as a Cpk for the etching process. How would you calculate it? Average them?I appreciate your help.0June 8, 2002 at 3:31 pm #76213
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Your formula you posted should be 6s, not 8s. Also, with regards to “combining” the Cpk’s. The following should be considered.
I have heard others discuss calculating an average Cpk and find this amusing. One would only want to combine data from 5 different machines if they produced the same component, product, or similar process. The way to combine would be to first be sure to represent the same % of data points as each machine contributes to the needs of the downstream customer. In other words, if machine #1 only contributes 10% of the volume, don’t combine 30 data points from that machine with 30 points of the other 4 machines or else you will get a misled capability of the process (which is defined as all 5 machines). This does not mean to cherry pick the appropriate number to fit it–get more samples of the other machines to compensate or randomly select to get appropriate weighting.
After getting all the data points together, be sure to do an Anderson Darling normality test to be sure the assumption of a normal distribution is kept for the above mentioned formula you described. If it fails and you cannot justify it (granularity, lack of samples, etc.), then you should consider another analysis OR be aware of the assumption when reporting a Cp. When doing the Cp analysis, be sure to use all the data points and one set of specifications (common to all machines–or else they cannot be part of the same process as I defined above). Be aware that you are starting to blur the short term capability definition when you combine the effects of different operators, machines, etc.–you are going closer to a long term capability definition (though I doubt you have taken into account for potential environmental or supplier variation in your study).
Do not overlook the power of knowing the Cp or Cpk of the 5 machines on an individual basis because this may be an important source of solving whatever business issue you have launched upon. Do be aware that as the standard deviation you have calculated for each machine has some confidence intervals, the Cp’s calculated also have confidence intervals. In other words, do not get too excited if one machine seems to be 0.1 higher or lower than all the rest.0June 11, 2002 at 6:16 pm #76320Thank you very much.
I learned a lot!0June 11, 2002 at 8:26 pm #76328
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.A few words more…
As the collegue said, the right formula for Cp is with 6s and not 8s. But “s” is not any “standard deviation”. It is a stndard deviation obtained using only “within subgroup” variation (no subgroup to subgroup variation is included). This is typically done using Rbar/d2, if you are using a Xbar/R control chart. The formula for Pp is exactly the same as for Cp, only that in this case s is the total variation, calculated with all data together. For Cpk and Ppk you replace (USLLSL) by min(USLXbarbar;XbarbarLSL).
About combining different processes in one Cp, for me this has a meaning only if the process have comparable distributions. But you said that the machines have “little different avearges and s” I imagine that this is the result of measuring a sample. How different are they? Can this difference be attributed to sampling variation or is it due real processes differences? You must have averages not significanly different and s not significant different between process (i.e. you must not be able to rejet Ho in the correspondant hypotesis test). Another way is to fill a control chart with parts from all the machines but where each subgroup comes from one machine. If you find outofcontrol signals, then the “special cause” may be “different machines”. If you find no OOC signal, than you can asume that all parts come from the same process (or processes with the same distribution). Saying in a differnt way, if you take a big sample from one of the machines and you can tell from the measurements which machin did they came from, forget about combining the Cp. Why? If you combine in one population individuals from several population with different sigma, then you will get a multimodal distribution (even if individuals distributions are normally distributed). If you combine distributins with different sigma, then you have a combined distribution that does not match the normal (to visualize it, imagine superimposing two Gauss curves one with a very high sigma and one with very little sigma). What is the meaning of Cp in such a case? AIAG’s SPC handbook dedicated a full appendix to sampling from a multiple stream process (many machines, multispindle turning, an injection mould with many cavities, etc.) See Appendix A (pages 131 to 135). I think it will really help you.
Finally, I don’t agree with the method to weight the s of each process to get one “combined” s. Never mind if one machine produces 90% of the production and the other 10%. If you get this far to calculate the combined s is beacause you have reasons to assume that the distribution is the same in all machines. Then, statistically, they can be considered as “one population”. If you have the s values from 3 samples from the same population (or from 3 populations that are mathematically the same) the best estimator of the population’s s is::
s^2=(s1^2+s2^2+s3^2)/3 (Never average standard deviations, instead average variances)
That is my opinion
Good luck
Gabriel0June 11, 2002 at 9:00 pm #76331
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.I agree with most of what Gabriel said. You need to remember that the stats you put together are meant to translate some group of data into information not just mechanical calculations. When you are trying to figure out how to do a calculation (what data to use) you should start with 2 questions:
1. What do I want to know?
2. How do I want to see it?
If we use your 5 machines as an example and they are all making the same product and your customer can have product shipped from any one machine or a combination of machines and you want to describe the material as it hits his dock you pool all the data (what is the definition of a run on sentence). That is what your customer will see.
If you have dedicated machines then you don’t pool the data.
If you are trying to figure out if the machines all run the same (internal focus) you keep them separate.
The stats calculations are mechanical and should be relatively simple. Figuring out what story you are trying to tell with the stats is a function of understanding what it is you need to know.
Just my opinion.
Good luck.0June 11, 2002 at 9:37 pm #76333I have a suggestion that might help you discern long term and short term capability. Take your entire data set run the capability analysis and this will provide you with your long term capability. Analyze the same data on a run chart and “brush/identify” the consecutive points that are behaving ideally and rerun your capability analysis to get a short term capability. The goal of a short term capability is to understand the best your process can do. This a brute force way of doing it, but hopeful it will give you a stratification of your best short term process (shift/material lot/operators). After all that is what the subgrouping standard deviation of the short term capability calculation is trying to accomplish. Once you identify your best process and practices standardize it and drive your long term to the same value as your short term.
0June 12, 2002 at 1:48 am #76336
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Gabriel,
I appreciate and agree with your comments except I want to clarify a point I had made–which you helped give more details on.
I did not mean to convey that one would average the 5 machines’ s.d. to get a combined s.d. I had meant that combining the data (and if still met normal distribution test) one could calculate a combined process sigma (again assuming all 5 machines give the same intended product).
Given the above clarification, would you still disagree with my one statement that weighting of machines (as described in earlier comment)is important if they don’t equally supply the customer downstream. My thoughts are if one machine runs 3 shifts on 5 days but the other 4 only run 1 shift/day and all have equal hourly output, this should be considered in the shape of the population given to the downstream customer.
I will take a look at AIAG’s SPC book.0June 12, 2002 at 12:14 pm #76339
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.Seider:
I didn’t stated it but I also assumed that all machines were producing the same product. Another consideration (a point from Mike) is that they should all be supplying the same customer too.
I still think (and I am saying “think”, it is an opinion) that, once you proved that the processes are the same distrbution, the best sd will come from averaging the individual variances. That is because what you are trying to do is to reduce the “sampling error” in the calculation of sd (different samples from the same population will have different sd an Xbar). If you weight each sd you are also weighting the sampling error. If one machine provides 90% of the parts and the sd of the sample from this machine is different from the “true” sd of the process distribution (proven equal for all machines), this “error” will have a weight of 90% thus reducing the “compensating errors” effect of averaging. Another way, again once you proved that all machines can be considered as one, would be to calculate one unique s with the data from all machines together, thus increasing the sample size and reducing the sampling error.
Now let’s assume that the “same distributions” condition is not proven. You said to check if the combined data met the normal distribution. Be aware that adding different distributions with different sigmas and different averages can lead, some times, to a new distribution that looks pretty normal but is very different from any of the original distributions. To make it vissible with an extreme example, imagine adding two distributions with the sahpe of a half normal distribution each (one left and one right). In that case the calculations for the normal distribution can be done, for example to guess global PPM. But I feel uncomfortable calling it a Cp/Cpk calculation. If you have a situation like this and make an XbarR chart keeping parts from the same machine within each subgroup, then you will find unstability. Then Cp/Cpk are meaningless. In such a case I see two possibilities: One would be to take a representative random sample from a manufactured batch where parts from all machines are mixed (or the sampling is randomized in other way so every part has the same chance to be chosen), then you can make Z, PPM and even Pp/Ppk calculations (not Cp/Cpk). The other one (I like this one more) is to consider each machine as a different process and calculate Cp/Cpk and PPM for each one. You can not cmbine Cp/Cpk (my opinion) but you can combine PPM. If this batch contains 10% of parts from each machine from 1 to 4 and 60% from machine 5, then PPM=0.1xPPM1+…+0.1xPPM4+0.6xPPM5.
What do you think about what I am saying?. I would like your feedbak.
Gabriel0June 12, 2002 at 6:36 pm #76350
Chris SeiderParticipant@cseider Include @cseider in your post and this person will
be notified via email.Gabriel,
I agree with your caution on the weighting of the sampling error for s.d. Your last sentence on your second paragraph succinctly implies what I would ideally want to do. Get at least 30 samples from each machine to get a best estimate for that subgroup’s variation, but I still contend it is better that if a large discrepancy exists among the machines which supply the common customer or process downstream–it would be prudent to gather enough samples (assuming cost is minimal and measurement is nondestructive) to get 300 total samples if one machine really gives 10% of the volume (again assuming the total distribution looks normal and an extreme case as you correctly stated was not the case of two abnormal distributions combining to make a normal one). Now, if cost wasn’t minimal or measurement was destructive–if forced because of business needs I think I would use the approach of weighting of the ppm levels with the fewer samples available (as you discussed in paragraph 3) and keep in mind the combined distribution’s look AND convert that weighted ppm level to a Ppk (Pp) if management absolutely needed one.
Of course, someone’s earlier comments (Mike I believe) said to use the business needs to understand why a Cpk, Ppk, etc. is needed. It is always easy to get engrossed in getting a proper description without remembering the business case for looking at the machines.
I do agree that once you combine the 5 machines, you have just added one source of variation and so it would be hard pressed to truly call the Cpk calcuations based on short term variability–as I hinted at in my first comments to the original writer.
I appreciate the constructive, detailed discussion we’ve had on this topic. Any more thoughts on this part or other items in isixsigma would be appreciated.0June 12, 2002 at 9:10 pm #76352
GabrielParticipant@Gabriel Include @Gabriel in your post and this person will
be notified via email.A few comments:
The extreme example was just for easy visualization. You can have other cases. Imagine 5 machines. The average of the processescare 7, 9, 10, 11, 13 for machines 1 to 5 and all have a sd arround 2. You can imagine that about whatever the individual distribution shapes look like, deppending on sampling size and class width a combined histogram will look pretty normal. And this is not an extreme case if the target is 10 and there are some difference between the averages of each machine.
PPM weighting does not require normality neither in the individual distributions nor in the combined one. And if management needs a metric, based on PPM you can get sigma level that also does not need normality and will look much nicer than Ppk! :)0June 12, 2002 at 10:24 pm #76355
Mike CarnellParticipant@MikeCarnell Include @MikeCarnell in your post and this person will
be notified via email.All,
While this is somewhat intersting from a statistical view I think you have gone down a rathole (orwhatever rodent is your favorite – Billybob) that is making Kiki’s solutionmore complex than its needs to be.
If I am producing common product on 5 machines they should all be relatively similar in their performance. If you do an analysis by machine of Variance and Means you should be able to tell. If they are not showing significant difference (be very careful on sample size so you do not force a difference). If they show no significant difference the weighting is an exercise that will be of no real relevance in the real world. Statistical world maybe.
Unless you are selling these parts to statisticians, do the individual tests if they are equal. Roll it up. The effect of the weighting is probably negligible.
Just my opinion.0 
AuthorPosts
The forum ‘General’ is closed to new topics and replies.