iSixSigma

Normal Data

This topic contains 4 replies, has 4 voices, and was last updated by  Mike Carnell 3 years, 4 months ago.

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #55348

    CG
    Participant

    The central limit theorem states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.

    The issue with this idea is that when doing the normality test on minitab, it clearly proves that the data is not normal. What should I believe?

    Any suggestions would be greatly appreciated

    0
    #199655

    Robert Butler
    Participant

    The issue is that your statement of the central limit theorem is poorly stated. The central limit theorem applies to distributions of averages not to distributions of individual data points from a random distribution.

    Snedecor and Cochran state it this way:

    “Whatever the shape of the frequency distribution of the original population of X’s, the frequency distribution of the averages in repeated random samples of size n tends to become normal as n increases.”

    Thus the issue is one of the number of samples per average needed so that if you generate Q sample averages each based on n samples each the distribution of the Q sample averages well approach normality.

    0
    #199673

    Nik
    Participant

    My students struggle with this too. Let’s say you take a sample from a population (any shape and type) and record the average. Then I take a sample from the same population and record the average. And then Robert, Alfie, Mark…. (i.e., every student in the class) all take their own samples and record the average. If I graph just our averages, the graph will look normal if we each had gathered a big enough sample size. And the sample size of each student (we are all gathering the same number of samples), the more normal the graph of our averages will look – – independent of the distribution of the population.

    This is a thing you can only do/see if you gather multiple groups of samples. You won’t see it with one sample set of data. Actually, your one sample set should be representative of the population and therefore have a similar shape, etc.

    Thankfully, with the advent of computers and additional population distributions, the practical need for normal data has been dramatically reduced. So just because you data isn’t normal, doesn’t stop you from being able to analyze it.

    0
    #199675

    Robert Butler
    Participant

    In reality there really wasn’t that big a need for normally distributed data at any time.

    Regression: No need for either Y’s or X’s to be normally distributed – the normal requirement is for residuals only and the tests that have this requirement and which are used to evaluate term significance are robust to non-normal data. (See Applied Regression Analysis -Draper and Smith – 2nd Edition pp 8 – 28)

    Control Charts: No normality requirement here (see pp. 65 Understanding Statistical Process Control 2nd Edition Wheeler and Chambers for an explanation).

    t-tests, ANOVA, both are robust with respect to non-normal data – See The Design and Analysis of Industrial Experiments 2nd Edition – Davies pp. 48-53

    Process Capability Limits – the basic calculation assumes normal data. For non-normal data you can still compute a surrogate process capability – See Measuring Process Capability – Bothe – Chapter 8 for the details

    0
    #199705

    Mike Carnell
    Participant

    @rbutler Very nice answer. I see you are still educating the masses. We appreciate it. Thank you.

    0
Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.