Test of Normality for Large Sample Size

Six Sigma – iSixSigma Forums Old Forums General Test of Normality for Large Sample Size

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #50600


    I understand that the tests of normality (such as Shapiro-Wilks and Kolmogorov-Smirnov) are “quite sensitive in large samples (exceeding 1,000 observations”. Do anyone know of any other tests suitable for large sample size? I have a sample size of 1,600 respondents and would like to test the independent variables for normality… for my PhD dissertation.


    Ken Feldman

    We know that the Gaussian distribution is a theoretical model and as such the tails go asymptotic with the horizontal which means they go parallel.  With large samples, we tend to get values in those tails.  At the same time, the large sample narrows the confidence intervals for those tests and if there are enough values in the tails, you will fail the test for normality.  Seems to ago against logic but it is what it is. 
    Check out this statement and do a little doctoral type research.
    “The Kolmogorov-Smirnov test, the Shapiro-Wilk test (for sample sizes up to 2000), Stephens’ test (for sample sizes greater than 2000), D’Agostino’s test for skewness, the Anscombe-Glynn test for kurtosis, and the D’Agostino-Pearson omnibus test can be used to test the null hypothesis that the population distribution from which the data sample is drawn is a Gaussian (normal) distribution.”  Also, check out this link.


    Robert Butler

      Another thing to consider:  You said, ” (I) would like to test the independent variables for normality. ”  – Why? 
      If this is for a regression problem you should check requirements for regression – normality is not an issue for dependent or for independent variables.  Applied Regression Analysis by Draper and Smith – first chapter – has the details.


    Bower Chiel

    Hi PatHow about a good old-fashioned chi-squared goodness-of-fit test? See for further information.Good Luck!Bower Chiel



    It would be useful to know what you are measuring (your message regarding “1,600 respondents” implies some sort of survey).  If you are using some sort of ordinal scale (as in satisfaction surveys), these data are normally highly skewed in my experience.  And normality is not really an issue.
    If on the other hand, you are using some type of ratio data such as household income, then the previous comments regarding normality tests are on the mark.
    I’m guessing that you are also surveying some types of categorical data such as, say, age and gender, which would lend themselves well only to descriptive summary and may very well be subject to selection bias based on the availability of respondents.



    Hai Pat,
    another issue that is often forgotten is “Numerical Calculation Error”.
    Data from a perfect Gaussion Distribution are Real Numbers (infinite digits if you write them out). Data that a computer uses are “not”: they are rounded-off numbers (up to 13 or 23 or whatever many decimals).When calculating the outcome of data-analysis:

    all those rounding-offs can add up => oeps Difference between your data and Perfect Normal Distributed Data is too big (i.e. Conclusion: it is not normal)The principle is comparable with a bad Resolution of your gage.
    Calculating with many data also gives Numerical Errors (had it in on Univ a long time ago but forgot how to calculate it exactly). Depending on how bad the software was written these could even strengthen each other.Exercise I did in the past: Take a pocket claculater. Type in the largest number possible (without E-notation). Add it to itself. Press =. Substract that same number again. Outcome is not that number due to rounding-off in the calculations. (Don’t know if it still works with new generation of calculators, maybe with the very cheap ones only).
      Good luck with your dissertation.

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.