Test of Normality for Large Sample Size
Six Sigma – iSixSigma › Forums › Old Forums › General › Test of Normality for Large Sample Size
- This topic has 5 replies, 6 voices, and was last updated 14 years ago by
Remi.
-
AuthorPosts
-
July 24, 2008 at 9:30 pm #50600
Hi,
I understand that the tests of normality (such as Shapiro-Wilks and Kolmogorov-Smirnov) are “quite sensitive in large samples (exceeding 1,000 observations”. Do anyone know of any other tests suitable for large sample size? I have a sample size of 1,600 respondents and would like to test the independent variables for normality… for my PhD dissertation.0July 24, 2008 at 10:02 pm #174155
Ken FeldmanParticipant@DarthInclude @Darth in your post and this person will
be notified via email.We know that the Gaussian distribution is a theoretical model and as such the tails go asymptotic with the horizontal which means they go parallel. With large samples, we tend to get values in those tails. At the same time, the large sample narrows the confidence intervals for those tests and if there are enough values in the tails, you will fail the test for normality. Seems to ago against logic but it is what it is.
Check out this statement and do a little doctoral type research.
“The Kolmogorov-Smirnov test, the Shapiro-Wilk test (for sample sizes up to 2000), Stephens’ test (for sample sizes greater than 2000), D’Agostino’s test for skewness, the Anscombe-Glynn test for kurtosis, and the D’Agostino-Pearson omnibus test can be used to test the null hypothesis that the population distribution from which the data sample is drawn is a Gaussian (normal) distribution.” Also, check out this link.
http://www.basic.northwestern.edu/statguidefiles/n-dist_exam_res.html0July 24, 2008 at 10:51 pm #174158
Robert ButlerParticipant@rbutlerInclude @rbutler in your post and this person will
be notified via email.Another thing to consider: You said, ” (I) would like to test the independent variables for normality. ” – Why?
If this is for a regression problem you should check requirements for regression – normality is not an issue for dependent or for independent variables. Applied Regression Analysis by Draper and Smith – first chapter – has the details.0July 25, 2008 at 12:13 am #174161
Bower ChielParticipant@Bower-ChielInclude @Bower-Chiel in your post and this person will
be notified via email.Hi PatHow about a good old-fashioned chi-squared goodness-of-fit test? See http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm for further information.Good Luck!Bower Chiel
0July 28, 2008 at 5:14 pm #174241
MundorffMember@StratocasterInclude @Stratocaster in your post and this person will
be notified via email.It would be useful to know what you are measuring (your message regarding “1,600 respondents” implies some sort of survey). If you are using some sort of ordinal scale (as in satisfaction surveys), these data are normally highly skewed in my experience. And normality is not really an issue.
If on the other hand, you are using some type of ratio data such as household income, then the previous comments regarding normality tests are on the mark.
Im guessing that you are also surveying some types of categorical data such as, say, age and gender, which would lend themselves well only to descriptive summary and may very well be subject to selection bias based on the availability of respondents.
0August 7, 2008 at 3:55 pm #174670Hai Pat,
another issue that is often forgotten is “Numerical Calculation Error”.
Data from a perfect Gaussion Distribution are Real Numbers (infinite digits if you write them out). Data that a computer uses are “not”: they are rounded-off numbers (up to 13 or 23 or whatever many decimals).When calculating the outcome of data-analysis:all those rounding-offs can add up => oeps Difference between your data and Perfect Normal Distributed Data is too big (i.e. Conclusion: it is not normal)The principle is comparable with a bad Resolution of your gage.
Calculating with many data also gives Numerical Errors (had it in on Univ a long time ago but forgot how to calculate it exactly). Depending on how bad the software was written these could even strengthen each other.Exercise I did in the past: Take a pocket claculater. Type in the largest number possible (without E-notation). Add it to itself. Press =. Substract that same number again. Outcome is not that number due to rounding-off in the calculations. (Don’t know if it still works with new generation of calculators, maybe with the very cheap ones only).
Good luck with your dissertation.0 -
AuthorPosts
The forum ‘General’ is closed to new topics and replies.