Normal Distributions – why does it matter?

Six Sigma – iSixSigma Forums Old Forums General Normal Distributions – why does it matter?

Viewing 16 posts - 1 through 16 (of 16 total)
  • Author
  • #28303


    I am not entirely sure why normal distributions are important. During my BB training, the master BB sometimes says, if you collect data and it is not normal, “you need to collect more data” – What does non-normal data actually say about the process in question (without any other information, can anything actually be inferred?)What do you do if the data is just not normal, other than a box-cox transformation? — and why can’t we simply analyze the data as it truly is?



    Actually, normality is vastly overrated.  :-)
    The basic math that drives T tests, and several other tests, was derived from an assumption of a normal distribution.  So texts, especially older ones, sometimes insist that you check for normality before you run your test.  As it turns out, tests of means are usually very robust to this assumption, and you can almost “violate it with impunity” according to one of the best authors I read.
    When you do a 2^K, you make no assumption of normality when you compute sums of squares, degrees of freedom, mean square error, and an F ratio.  The assumption of normality comes in when you assign a T score and a P.  If you have an F ratio (signal to noise ratio) that is high, you don’t need the T and P.  So, if you’re willing to just go with the F ratio, and ignore T and P, you don’t need normality.
    Where you are vulnerable is with small samples, and when you run a one-tailed test.  If you have a large sample size, you are much less vulnerable to error from non-normality, hence the encouragement to “take more data”.
    If your samples are small, you usually can’t tell whether they are non normal.  5 samples from a 2 d.f. Chi Square distribution will fail to be shown to be non-normal in a high percentage of cases.
    Anyway, if you take enough real world data, it will eventually test non-normal.  As your sample size increases, you can detect progressively smaller and less important deviations from normal.  And, you can never  show that the data are normal in the first place… you can only say that you failed to show that it was non-normal.
    There are places non-normality is very important, but testing means usually isn’t one of them.



    Your MBB does not have a clue. Go to your management and tell them you need good training.Analysis of non-normality can be very enlightening. There are some things that should not be expected to be normal (time and one sided bounded specs for example). There are non-normal, not naturally occuring distributions that tell alot. My training at AlliedSignal Automotive spent considerable time teaching us how to look at this and I consider it on of the most valuable tools I learned.


    Erik L

    The data, in and of itself, need not be normal. As has been mentioned, there are plenty of situations that generate data that does not conform to normality. However, with that said, we do take great effort to force it take on normal aspects (e.g. averaging data so that we can use it in control charting).  The F test for the equality of means (for a fixed effects model) is robust to departures from normality and robust to unequal variances given that there aren’t large differences in sample size.
    In this case, I believe you’re talking more to residual analysis. The output that we expect from a DOE is a regressional relationship. In this case: normality of the errors and constancy of the error variance are important factors to consider. The primary reason is that these residuals are our sanity check that the model that we’re creating Y=b0+b1x1+e is the correct model. If we run a DOE and develop our Y=f(x) equation without looking at these, we may cause people to control factors which truly are not important and we will most likely miss variables that truly should be deemed KPIVs. Of the two, unequal variance should be considered the more important of the two. Remember, they let us get away with a “fat pencil” approach in regards to a linear NPP and hence normality.
    In regards to the other part of your question, if we run transformations and we still cannot stabilize the variance then we may need to use nonparametric approaches to analyze the data. A recommendation here would be to use the Rank F test.


    Ken K.

    Denton & Stan are right on both counts.
    (WARNING – long answer ahead)
    For many common statistical tests, such as z & t tests and F-tests in ANOVAs, the assumption is that the means are normally distributed. As Denton mentioned, the raw data don’t really have to be normally distributed (I has posted a question related to this some time ago).
    The Central Limit Theorem (although it sounds “theoretical” it is important and pretty simple – I usually teach it to people) is the thing that makes these tests work regardless of the distribution of the raw data. It basically says that regardless of the distribution of the raw data, the mean of data will tend toward a normal distribution as the sample size for the mean gets larger. For symmetric distributions this can happen with sample sizes as small as 5. For wilder, nonsymmetric distributions even a sample size as small as 50 will still result in quite normaly distributed means.
    Many textbooks DO say that the raw data have to be normally distributed for t-tests, but to my knowledge that is not true. In my opinion you just need largish sample sizes, and the more nonnormal the distribution of the raw is, the larger sample sizes you’ll need.
    X-bar control charts work the same way, so are much less affected by nonnormality. Individual charts, on the other hand, DO require normality of the raw data.
    Other tests, such as tests of variances & standard deviations, tend to require a much larger sample size. In the past I’ve found that accurate estimation of the standard deviation can require sample sizes of, say, 100 or higher.
    Other methods, such as process capability indices, rely on the distribution of the raw data rather than the means, so the Central Limit Theorem won’t help there. When working with process capability you really do need to assess the normality of the distribution. As Stan said, some characteristics are just not normally distributed – increased sample sizes do nothing to help that. That is where nonparametric or non-normal methods come into play. For example, MINITAB provides a Weibull distribution-based capability tool (the Weibull is pretty flexible and can model many symmetric and nonsymmetric distributions). Other software uses Pearson Curves, which are a family of distributions that also is very flexible.
    If you have large samples you can also use nonparametric percentiles to calculate capability indices, but that typically requires VERY big sampes (certainly 10,000 and more).
    Another place that distribution assumptions become important is when doing reliability analyses. Typically you’ll try to find a distribution (scuh as Weibull or Lognormal) and its parameter estimates that accurately model your failure data. One of the tasks is finding a distribution that models the data well.
    Sorry this was so long – its late on Friday PM and I’m just having fun.



    This is mostly for Ken…  just sharing one of the fun insights that came my way.
    Wheeler did an investigation into whether I-MR charts need normality in order to work.  His conclusion:  Nope.  He ran simulations using 1143 different non-normal data sets, and found that they worked reasonably well, regardless of distribution shape.  Yes, they do work just a little better if the data are normal, but not much.  He wrote up his results in “Normality and the Process Behavior Chart”.  Since I was originally taught you need normal data, it came as kind of a shock, but I read his stuff and have to believe his result, since the data are there to support it.
    Thought you might enjoy.  Have a great weekend.



    Denton,Curious about your response to the efficiency of the I-MR chart and the original question tied to statistical tests. Do you consider evaluating process data using a control chart a statistical test? If so, how is the test formed and what is the statistic that is computed? Perhaps Wheeler’s intention in evaluating 1143 distributions of varying shapes was to illustrate how robust the control chart is at providing an accurate signal of process change despite the shape of the distribution. At least, that’s how I read his work.Ken K.,My experience is that some data are naturally non-normal. Examples are the time-of-flight data used with catapult simulations, and many chemical analysis of organic and inorganic compounds. In many cases these data are best transformed by taking the natural of log base 10 of the individual values. In ANOVA tests you are determining differences between means by comparing the ratio of differences between treatment groups to differences within a treatment group. This comparison allows you to estimate an F-value, a statistic that is sensitive to how well the errors within a given treatment group fit a normal distribution.Most of my training has supported the above understanding for some time. However, I may be missing something here.



    You know the stereotype of two Rabbis arguing over points of doctrine?  My friend Bill and I used to do that on points of statistics…. miss him now that he has left Iomega.  This is about as close as I’ve come… sure enjoy the interchange.
    Don and I have discussed his work at length, and his intent is to show that normality almost does not matter in an I-MR chart.  The limits were set with an assumption of normality, but it turns out that they work almost as well with any distribution you can imagine.  Bill and I both had to change our thinking on that one.  Bill came close to getting into a debate with Don in front of a class over that one…
    The ANOVA case is also interesting.  When you assign a T and a P value, you are making an assumption of normality.  When you compute an F ratio, you are not.  The F ratio you get in ANOVA is purely signal to noise, and is independent of distribution.  You are comparing Mean Square Within to Mean Square between, and those are defined for all distributions.  Now if you want to perform an F test on the F ratio, you have a different kettle of fish.
    Yes, I consider I-MR charts a statistical procedure.  Your statistics are the mean and the limits, and your result is to know whether your process conforms to the rules or not.  I think it qualifies.


    Ken Myers

    Denton,You and I have had discussions of point in past postings on this site. I too have enjoyed the interchange. You sound like a very knowledgeable guy, and your guidance is usually accurate. However, between the two of us a mid-course correction concerning the points raised in this thread might be helpful. I’m not too fluent in religious doctrine, nor am I familiar with the two Rabbis arguing over such. So I’ll try to stick to the logic of your answer to the original question posed by “anonymous” on the need for normally distributed data in performing a statistical analysis or test.Your answer on the 30Nov attempted to use an I-MR chart and Wheeler’s work to illustrate that normality is not a requirement for many tests of statistics. At least that was my understanding from example. First, I completely agree that the effectiveness of I-MR charts are NOT based on the distribution of the data. This has been a clear understanding of mine for over 20 years, and is not disputed. However, I take some exception to the inductive method you used to show that the normality of data and/or the errors is not needed because of Wheeler’s work with 1143 showing the detection efficiency of I-MR charts. This reasoning and the basis for its conclusion appears to be flawed on two points:1) Control charts are NOT formal tests of statistics. References: “Out of the Crisis”, Deming, pp. 334-335, “Advanced Topics in Statistical Process Control”, Wheeler, pp. 16-17.2) Drawing a general understanding of statistical theory from a specific observation, (use of I-MR charts), through the inductive process. This is not the usual way of making a logical conclusion on general observations.Conclusion: Control charting of process data is not a statistical test. As such, it is not appropiate to use it to show any relationship between statistical test methodology and the assumptions and requirements of the distribution of population and/or experimental errors of the tested data.On the question of whether an ANOVA requires normally distributed data. Fine point here as I tried to elude in my earlier post–the errors need to distribute normally… In fact, the general assumptions for Analysis of Variance(paraphrased) are as follows:1) Inputs effects are independent from each other.
    2) Outputs are independent from each other.
    3) Means and variances are additive.
    4) Experimental errors are independent.
    5) Homogeneity of variances for all input effects.
    6) Distribution of experimental errors is normal.Reference: “Statistics Manual”, E.L. Crow, F.A. Davis, and M.W. Maxfield, pp. 119-120.Item six above was my earlier point. While the populations may not need to be normally distributed, possibly your point, the errors do need to be independent and distributed normally. This means to effectively use ANOVA to identify the KPIV’s in a system(Type II test) or make comparisions between means(Type I test), a formal residuals evaluation of the errors is necessary. This again, is a fine point, but non-the-less an important requirement of the test.Concerning an F-test of variances, the following assumptions apply:1) The populations have normal distributions
    2) The samples are random samples drawn independently from the two populations.Same Reference as above: pp. 74-75.Item 1 above is the key point.Conclusion: The F-test of variances DOES require normally distributed data. Therefore, to effectively perform this test a mathematical transformation of the individuals to insure the data closely distribute normally may be necessary.No, I’m not a Statistician! No, I’m not one of these anal retentive guys who needs to be right all the time… I’m just a guy who works hard to insure my guidance is as accurate as it can be. In my past work, I’ve gotten myself and others in more trouble than I can say using the same inductive reasoning above. This is as much a human endeavor as it is an intellectual one. I sure hope I did not offend those Rabbis! Take care.Ken Myers



    Ken, I’m sitting here a little cross-eyed from lack of sleep… don’t know if I have fully assimilated all that you have said.
    The big distinction I draw is between doing an F test, where you assign a P value, and creating an F ratio.  My statement is that the F ratio is signal to noise, with or without the assumption of normality.  Wheeler develops that idea in Understanding Industrial Experimentation, page 96.  You can’t assign a P value for an F ratio of 100:1 without assuming a distribution, but if you’re happy with 100:1 and no P value, you don’t have to make that assumption.
    My original point on normality was a little simpler than all that… there are times, such as in I-MR charts, signal/noise ratios, and T tests that normality doesn’t matter much at all.  Of course, there are other times that it matters a lot.  I was simply saying that there is no need for us to get totally tied up in our socks by insisting on normality in all cases… especially since practically all real-world distributions can be shown to be non-normal, if you take enough data.
    Now if Wheeler is saying that Control Charts aren’t statistical tests, that’s interesting…. I hadn’t thought of them that way, which is the fun part.
    Normal residuals:  I have generally taken them to be evidence of having met the other assumptions, rather than as a starting requirement…. have to think about that again.  My view is that if your model sucks all the information out of the data, all that’s left is random noise, which tends to be normally distributed. 
    The reference to the Rabbis was just me laughing at myself and Bill… neither of us are Jewish, but there is this stereotype of Rabbis recreationally arguing very fine points of doctrine.  One day, I realized that he and I were doing the same thing with statistics, and sort of enjoyed the mental image.


    Stan Alekman

    Control charts for individuals do not have the Central Limit theorem to provide normality because individual and not average of small groups are used. Must individuals control charts have an underlying normal distribution of the charted variable?
    The formulas for calculating control limits are based on between-group variance, assumes that within group variance is zero or very small. Are  these formulas based on distributional assumptions? I am not aware of the derivation of these formula. Are there good references to this?


    Stan Alekman

    Can you provide a reference to Wheeler’s article on his findings regarding normality requirements?
    thank you
    stan alekman


    Dave Strouse

    Stan –
    You had posted another messgae earlier this week where you made the same assertions that the control chart was based on between group variation and an asumption of within group variance being zero.
    Please stop propagating this error. You are incorrect in these statements. If you have references that say this, please provide them so that we can all learn this wonderfull new way to look at the world.
    All control charts are based on capturing the within subgroup variation as common cause. This is the concept of rational sub-grouping. Once this is established, any between groups variation outside of that expected based on the normal within group variation  is seen as special cause variation. 
    Plkease take the time to learn a little about SPC so that you can be more knowledgable. Here is a suggested reading list
    The best book I have yet seen on practical SPC is “Understanding Statistical Process Control” Wheeler and Chambers. Chapter nine has a section on where the control chart factors come from.
    “Economic Control of Quality of Manufactured Product” W. Shewhart. This is the original work by the master. I don’t have a personal; copy, but I know he goes into detail as to how the limit factors were derived.
    “Process Quality Control – Troubleshooting and Interpresattion of Data” Ott, Schilling & Neubauer  
    Grant and Levenworth and Douglas Montgomery also have standard works on this subject. The AIAG manual on SPC is also helpfull.


    John Carder

    Hmm.  Not sure if this discussion is still alive, but it’s on the front page, and an important topic.  Also, I’m not sure if this discussion is supposed to be the ‘definitive’ guide to the question, (plenty of resources on the web for discussions on normality, including here) but, for what it’s worth, here’s some thoughts on the message I read.
    With regards to the last posts, it’s a big confusing.  One person seems to be talking about control charts for ‘individuals’ (single data points) and another about ‘within subgroup variation’.  By definition, there can be no variation within a point represented on an ‘individuals’ chart, since it represents just that – a single measure.  What we are looking for on an individuals chart is:
    1. Is the data distributed normally?  (Use Minitab or your favourite tool to check, plus also just take a look at the raw data – a dotplot is a good place to start.  Check for any extreme points (‘outliers’) and make sure that they are not, for example, measurement errors.  You did do R&R before measuring, right?)
    2. If the data is ‘normal’, (and plenty of times it is not, do some reading), then three things interest us:
    (i) Where the process is centered – that’s what’s you will get, on average.  (But be careful, on some control charts the centre line is the median, i.e. with 50% of the values measured on one side, 50% on the other.  Otherwise the middle line is the average, a good guide to ‘where your process is aiming’). 
    (ii) The next thing to look at is not, I suggest, for signals that the process is ‘out of control’ (sometimes called ‘special causes’), but rather, what are the values of the ‘control limits’, (the red lines if you are using Minitab, which a lot of people seem to…including me).  How were they calculated?  What does that mean for your process?  How do they compare to customer spec.?  (But let’s not get into process capability here…)
    (iii). Finally I get back to the topic – normality.  Data which is not distributed according to Gauss’s, (or Laplace’s if you are Francofile), famous curve will give BOTH a false calculation of your control limits (on an individuals chart) and false ‘signals’ of specials cause variation.  Hence, need to check for normality when doing individuals charts…  These control limits show what is assumed to be the range of ‘usual’ variation in your process.  Of course, since these are based (on individuals charts) on normal theory, the closer the point is to the control limit, the less likely it is to have been created by your standard ‘under control’ process.  Hence once you get a point, say, more than three sigma away from your centre line, it is assumed to be highly improbable that the point was created by your standard process.  Again, if your data is not ‘normal’, then this test just does not work…(By the way, please do not forget the other tests for ‘special’ causes, 14 points up and down, etc.  Note that Minitab does not activate this by default).  Also, for highly-critical applications, consider using 2-Sigma control limits.  You’ll get more ‘false positives’, but in, say, the medical world, you’ll also get less dead patients…
    As for ‘between group’ and ‘within group’ variation, I agree with the last post, (musical reference?), in that, when measuring high-volume processes, or where sampling is costly, it certainly is worthwhile sampling in rational sub-groups.  But whilst it is true that we hope that the variation within each sub-group respresent all the ‘common cause’ variation, and none of the ‘special cause’, please do not take this as fact.  Look at the two types of variation – does one group have a significantly higher variation between the, say, five points measured?  If so, why?  Go check – measurement error, transcription error, or enemy action?  If all OK, then, and only then, read the ‘upper’ chart showing ‘between group’ variation.  Read as per an individuals chart, really.
    Finally, as to the point as per normal distribution of the residuals, most often seems to happen when people are trying to fit a straight (linear) regression line to data where in fact the process is better explained by a non-linear model.



    Some comments:
    1. I am not very sure that we should allways care too much about normality when using individual charts. After all, R, S and P are clearly not normal but when we chart them we use ±3sigma limits as if they were normal. If we should care for individuals, we should care for them too. And, furthe more, we should care for Xbar too. The central limit theorem does not state that thdistribution of the averages is normal, but that it tends to be when the sample size increase. And subgroups of 2 or 3 may not be enough if the individuals are pretty non-normal. Also, d2 (used for the Xbar control limits calculation) assumes a normally distributed process.
    2. (i) The average of the medians of the subgroups is much closer to the process average than to the process median. So the median’s average is an estimator of the process average, not the process median. But, if the process is normal as you assumed for this point, then all this is useless since the median of a normal distribution equals its average.
    (iii) “Data which is not distributed according to Gauss’s (…) curve will give BOTH a false calculation of your control limits (on an individuals chart) and false ‘signals’ of specials cause variation.  Hence, need to check for normality when doing individuals charts…” Please, tell us what a “true” control limit is, so we can derive what a false control limit is. We know what a false signal is. It’s a signal due to chance in a stable process (not due to variation from a special cause). All charts will give you false signals from time to time even if the process is normal. Further more, there are several distributions that have no individuals beyond ±3 sigmas, and for them you will get no false “point beyond control limits”. The uniform and triangular are just two examples of those distributions. The normal is not one of them (0.27% of the indivifduals are beyond those limits).



    Hi Gabriel
    So that’s two of us with too much time on our hands, eh?
    Thanks for your reply – do not disagree with anything, see it more as a clarification.  That’s fine, since my post was intended to be more of a base than a stats class…  Reading some of the posts, seemed to me that that is what is appropriate.  For ex. discussion of median vs. mean – most of the people I coach don’t even look! 
    As for ‘correct’ control limits, well, as you know nothing in stats. is certain, as Wheeler said, ‘all models are wrong, but some are useful’.  There are different ways of calculating limits, (based on R tilde and bar, different ways of calculating sigma for simpler models based on just such as +/- 3 Sigma).   What’s the ‘right’ one, you ask?  Well, for me, the one where the chances of detecting true process deviation (shift) with such limits greater than detecting a false positive…and as you point out, that depends upon your data distribution.  Still, a topic for another forum, methinks…normal distributions, why does it matter?  Because it’s the start point for 99% of the people we have to try and help.

Viewing 16 posts - 1 through 16 (of 16 total)

The forum ‘General’ is closed to new topics and replies.