iSixSigma

CLT is confusing me

Six Sigma – iSixSigma Forums Old Forums General CLT is confusing me

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #50981

    newbie
    Participant

    How do you determine when “enough is enough” when invoking the CLT to get to normality?
    For example, if I generate 1000 random data points following a normal distribution and chart it in an IMR, I get alot more than the “app. 3 OOC points out of a 1000”.  If I start subgrouping at 4, 6, 8, 10 using an XbarR chart, I get less and less OCC until all points indicate fall within the CL. 
    I used a CC for the example, but my question would apply anytime you were using subgrouping to “eliminate” variability…..Are there not negative consequences to simply “averaging out” variability at some point?  For example, how would I know when an OOC point really occurred if I simply keep increasing N until I get my data within limits?

    0
    #176048

    Darth
    Participant

    BINGO!!!  Averaging is a great way to get rid of extremes and fool the reader.  As Jack Welch once said, the customer doesn’t experience the average but the variation.  Lots of danger in hiding the true variation.  Be careful about mentioning central limit theorem and control charts in the same sentence.  You will unleash a torrent of abuse.  Check out Wheeler if you are not sure.

    0
    #176050

    newbie
    Participant

    Ok, Dr D, so tell me this then…Where do I actually need normality (or need to invoke the CLT through the use of subgrouping)?  I am not seeing the need for it in a alot of areas (despite what my training said), if the following is accurate:

    Capability Studies

    Use a yield-based metric like DPU, PPM, DPMO
    Use MTB to first determine the appropriate distribution type and conduct the capability study accordingly
    Transform the data using the appropriate power setting
    Variability Studies

    Control Charts are robust to it (I know, I know)
    Analytics

    Test of means are largely robust
    Non-parametrics are available
    Regression does not require it (residuals only)
    DOE (essentially a data collection method whereby the  analytical method used – regression, anova, etc – is also robust to a violation of normality)
    It appears that it is necessary in the Power and Sample Size calculations (unless it is a test of means I suppose – so t, z, and ANOVAs are out), when wanting to study Cp/Cpk/Pp/Ppk, or when calculating CI (although I suspect there is a way around the latter). 
    Where am I wrong?
    THANKS!
     

    0
    #176055

    Darth
    Participant

    CLT is one of those fundamental principles of statistics that is great in the first week of stat class.  In the practical world it has a lot less importance.  Nobody really worries about normality of data since there are so many alternatives for analysis and despite the claim that it is very common, that’s not really the case.  I see alot of practioners using the “fat pencil” or “fat forearm” test in lieu of the p value.  If it looks like a duck, might as well call it a duck despite how it sounds.

    0
    #176057

    newbie
    Participant

    Very good….ok, I am going with the “torso check”then…..thanks doc!

    0
    #176069

    Dr P Smartt
    Participant

    When will people wake up that you don’t need to know if a process is normal.  It takes 3200 data points to prove normality out to 2.95 sigma … and by then the process will have changed !!!
    Use Shewhart Charts … they work for any distribution !

    0
    #176076

    Ron
    Member

    Can you tell us who is responsible for the poor training that you received that continues to rely on normal data for charting???

    0
    #176078

    luke skywalker
    Participant

    One thng to keep in mind with all of these Six Sigma flavored stats is the timeline of their development. I’m a bit surprized the wizened OLD Darth didn’t mention it. The CLT is a practical, though intensive sampling strategy to “get normal behaving data” – so you can use it with your limited box of analytic tools, historically based on the normal.
    Sometime when you have a few minutes free (ha ha) take a look at when some of these tools were developed and contrast the traditional stuff against non-parametric tools and yes, even Shewhart’s charts.
    We exist in a cool time – sort of like CGI, where the tools and technology actually look good, we have a good number of tools available to essentially work with data in its raw form without need of transformation (most of the time).
    And Darth is right – In classrooms, the CLT is mostly useful as a way to explain the standard error of the mean, which has a bit more play within most SS curricula.
     
    Happy charting.

    0
Viewing 8 posts - 1 through 8 (of 8 total)

The forum ‘General’ is closed to new topics and replies.