# Normal Data

Viewing 5 posts - 1 through 5 (of 5 total)
• Author
Posts
• #55348

CG
Participant

The central limit theorem states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.

The issue with this idea is that when doing the normality test on minitab, it clearly proves that the data is not normal. What should I believe?

Any suggestions would be greatly appreciated

0
#199655

Robert Butler
Participant

The issue is that your statement of the central limit theorem is poorly stated. The central limit theorem applies to distributions of averages not to distributions of individual data points from a random distribution.

Snedecor and Cochran state it this way:

“Whatever the shape of the frequency distribution of the original population of X’s, the frequency distribution of the averages in repeated random samples of size n tends to become normal as n increases.”

Thus the issue is one of the number of samples per average needed so that if you generate Q sample averages each based on n samples each the distribution of the Q sample averages well approach normality.

0
#199673

Nik
Participant

My students struggle with this too. Let’s say you take a sample from a population (any shape and type) and record the average. Then I take a sample from the same population and record the average. And then Robert, Alfie, Mark…. (i.e., every student in the class) all take their own samples and record the average. If I graph just our averages, the graph will look normal if we each had gathered a big enough sample size. And the sample size of each student (we are all gathering the same number of samples), the more normal the graph of our averages will look – – independent of the distribution of the population.

This is a thing you can only do/see if you gather multiple groups of samples. You won’t see it with one sample set of data. Actually, your one sample set should be representative of the population and therefore have a similar shape, etc.

Thankfully, with the advent of computers and additional population distributions, the practical need for normal data has been dramatically reduced. So just because you data isn’t normal, doesn’t stop you from being able to analyze it.

0
#199675

Robert Butler
Participant

In reality there really wasn’t that big a need for normally distributed data at any time.

Regression: No need for either Y’s or X’s to be normally distributed – the normal requirement is for residuals only and the tests that have this requirement and which are used to evaluate term significance are robust to non-normal data. (See Applied Regression Analysis -Draper and Smith – 2nd Edition pp 8 – 28)

Control Charts: No normality requirement here (see pp. 65 Understanding Statistical Process Control 2nd Edition Wheeler and Chambers for an explanation).

t-tests, ANOVA, both are robust with respect to non-normal data – See The Design and Analysis of Industrial Experiments 2nd Edition – Davies pp. 48-53

Process Capability Limits – the basic calculation assumes normal data. For non-normal data you can still compute a surrogate process capability – See Measuring Process Capability – Bothe – Chapter 8 for the details

0
#199705

Mike Carnell
Participant

@rbutler Very nice answer. I see you are still educating the masses. We appreciate it. Thank you.

0
Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.