Subgroups in Capability Analysis

Six Sigma – iSixSigma Forums Old Forums General Subgroups in Capability Analysis

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #34010


    I’m looking at data that is collected on a per-shift basis.  For conducting a capability analysis, I have an option to use subgroups.  When using a value of 1 (individual observations) or a value of 3 (for 3 shifts), the numbers generated do not change.  Could someone please explain how the subgroups work?  I thought it has to do with making individual observations but when I think about it, I’m struggling with truly understanding how this works.  Any help is appreciated!
    Thank you!


    Kim Niles

    Dear Jason: 
    Since no one else has commented, I’ll throw in my 2 cents worth but take it with a grain of salt as it’s not within my area of expertise. 
    Sub-groups work via the central limit theorem in that they don’t affect statistical confidence for most statements formed from groups of those types of measurements, only the measurement accuracy assuming that the underlying distribution is not perfectly normal. 
    Regarding selecting subgroups, that’s really a different subject that depends upon what you want to understand / measure and how non-normal you think the data might be.  Using subgroups of three (one per shift in your case) would have the effect of normalizing the overall process and give you the best overall accuracy for any general statements you might form around your process CpK.  However, what if one shift is producing more problems than another?  In practical terms, as I understand your situation, if you suspect that your data does form a normal distribution, you might be better off forming and comparing three different CpK values, made from individual points, one for each shift.
    I hope that helps. 
    KN – –    


    Gerry Murphy

    I have come across this before.
    3 shifts onfen operats as 3 completely different processes. Consider each individually. Compare the process averages and variation. this will point the way to where the improvement opportunities lie.
    One shift could be running the plant flat out. High average production, the next shift picks up the pieces and has to do a lot of  Repairs & Maintenance, the middle shift has then to adjust everything to get back on track for the cowboys on the first shift to reap the benefit!!!
    Solution – train, redeploy etc the coyboys. Output and quality goes up as the differences between the shifts are significantly reduced.



    It seems to me that some confusion is caused by 2 distinctly different uses of the term “sub-group” that often appear in the one discussion without being distinguished.
    1. A sub-group can be used when collecting data, with the average of n individual readings from the same process being used in the later calculations.
    This can help to meet the normallity requirement of many statistical tools (see below).
    2. The output being studied can be divided into sub-groups to analyse contributions from multiple processes. eg Each shift may have different parameters that effectively makes them different processes with different capability.
    The sub-groups in Jason’s question seem to be the first kind. I presume software is being used – it may assume that the values are already subgroup averages of n readings, or treat the input as raw data that it averages in sub-groups of size n. Check out the formula in the manual to see what is happening. In any case I think the number of shifts is incidental to the calculation.
    Also check out the Article Archives for some good reading – see under Quick Access on left of the screen. These are a few I found that might help:
    Make Valid Control Chart and Subgroup Assumptions
     – the section Control Charts Subgrouping By Machine Nozzle has a great example of the 2 uses of subgroup being intertwined.
    Should You Use A Mean Or Individuals Control Chart?
    Are You Sure Your Data Is Normal? 
    – Page 3 > Methods For Handling Non-Normal Data


    Jonathon L. Andell

    The trouble with subgroups is that we try to anticipate which sampling scheme will tell us what we want to know, before we have gathered the data which contains the real answers. I would advise you to consider a sampling plan called Multi-vari on a trial basis. It will tell you a lot about what subgroup scheme you should use as you go forward. Good hunting.


    Andy Urquhart

    I agree with Jonathon because Cp and Cpk are ‘individual’ metrics by definition. Many years ago I used to calculate Cp and Cpk indirectly from Shewhart Charts. This used to confuse the poor production manager because a low Cpk implies that some parts had been made out of tolerance, which was not the case. (The data came from a multi-normal distribution!) Remember, much of statistical theory assumes random independence, homogeneity (equal variance) and normality, which is often not the case. The great advantage of the Austin Motorola Multi-vari chart was that it provided both univariate and multivariate analysis.

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.