Home › Forums › Old Forums › General › Sample Size for Anderson-Darling Normality Test
This topic contains 12 replies, has 1 voice, and was last updated by mvpetrovich 7 years, 5 months ago.
Quick question from a statistics newbie:
I was using the Anderson-Darling test to determine if different sets of data were normal (in Minitab). I noticed that small sample sizes had much higher p values that larger sized sets. Visually, a data set N=200 looked more normal than a set with N=30 (which did not look normal at all), but the smaller set had a p-value of .362, while the large set had a p-value of .013.
What is the minimum sample size for this test to be meaningful (I know that there is not some magical value, but how can I determine the approx sample size necessary)? What do I need to take into account when determining and adqequate sample size for a specific purpose?
Thanks for your help!
John
First of all, remember that the test ASSUMES normality, and is looking for sufficient evidence to reject that null hypothesis. So, for very small sample sizes, there needs to be striking evidence of deviations from normality in order to yield a low p-value. Samples of less than 15 or 20 will produce test results that will usually not reject the null, even in cases where the samples are taken from a moderately non-normal population. There’s also a flip side to the Anderson-Darling test. It was developed to be especially sensitive to deviations from normality in the distribution tails, and is usually not regarded as the best test for very large (say, >1000) sample sizes as well. With very large n, use the Kolmogorov-Smirnov test, or better yet, the eyeball test. I don’t know off the top of my head how to calculate minimum sample size to avoid a type 2 error in normality testing. Maybe someone else out there?
To add to George’s comments…
Use the Anderson-Darling values to compare the fit of competing distributions as opposed to an absolute measure of how a particular distribution fits the data. Smaller Anderson-Darling values indicate that the distribution fits the data better.
Other alternatives include Ryan-Joiner or Shapiro-Wilk tests. They have good power and are based on the correlation between the sample data and the data one would expect from a normal distribution.
Mestre
Here is the easy answer to your question, my Minitab instructors told me that you must have a sample size of at least 25 data points for the AD Normality test to have any credibility. Using that as a rule of thumb, have more than 25 data points. The other side of this discussion is centered around the strengths of the Anderson Darlington versus the Kolmorgorov-Smirnov (Chi-sq based test). The AD works well for sample sizes between 25 & 1000 data points (my experience based observations). Larger samples increase the sensitivity of the test that any minute deviation from a perfect normal bell curve will return a non-normal outcome. Here is where the other tests come into play, the KS is more useful for larger data sets. Several statistical papers have been written about the value of the KS normality test being used over the other for specific data characteristics. You might want to check that out. Last but not least, remember it is very easy to run a normality test on a excel spreadsheet full of data, but remember the value of sampling. Hope this helps…
Let me add a few comments. First, to use Anderson-Darling, you will need a sample size at least greater than 2. (With a sample of size two, you will get the same value, no matter what the data, if the two values are different.) How much greater than two, depends upon your purpose. Generally, when using this test, you are asking if the population sampled can be adequately modeled with a normal distribution or some other distribution. Sometimes, this might be used simply as screen tool.The question of sample size deals with power. I know of no means by which this power can be calculated, but it is very straightforward to simulate it. So, given a specific departure from normality, say an exponential distribution, you can determine the power of the test for a specific sample size.I have had quite of bit of experience using this particular test over the years, and have tested thousands of distributions. I have seen this test reject normality with sample sizes as low as 7 or 8. That does not mean I had lots of power, but extreme situations will reject the assumption. Now, quite often this can be the result of bad data. So, in these cases, the test becomes more of a screen exercise. Testing cells in experimental designs with low samples sizes, is one example.You will find published results for the power of this test given different alternative distributions. I have done quite a bit of this work myself.Let me conclude with two additional thoughts. First, keep in mind that in industry, data is always discrete at some level of resolution. This will affect your test (in fact most statistical procedures), and the determination of any power. This is especially critical as the resolution interval gets larger than a standard deviation. And lastly, keep in mind that fitting data does not equal fitting a population.
I am confused about the significance level. I ran an AD test and get the A*=.891 and a p-value=.023. (I used Minitab), and I found this critical values table (Internet)
Alpha n%
.25
20
15
10
5
2.5
1
0.6
A2* Ñ
0.472
0.509
0.561
0.631
0.752
0.873
1.035
1.159
As you can see my A* is less than 1.035 and the Alphan%=1, what is the meaning of this, and what my conclusion should be?
Can I declare normality a what leavel, it is appropriate an alpha great than 5%,
Thanks and I¡¦m sorry about my English¡K¡K.
Greetings from Mexico
In a Normality test, the hypothesis is:Ho: The data are normal
Ha: The data are not normalSo with a chosen alpha value of 0.05, if the p-value is 0.05 will not allow you to reject the null0.05 will not allow you to reject the null0.05 will not allow you to reject the nullRegards,sophos9
Thats correct sophos9, but iam going to change the question. If I decide test normality an alpha of 10 % what is the critical value of the AD test. it is correct to report 10% significance leavel in a normality test.
I read that a computed value less than 1.027 generally means good fit
HiI believe that the critical value for AD at alpha=0.10 is 1.062. I’m not sure I fully understand your question, in Minitab if the p-value is less than your chosen alpha then despite the AD value you should reject the null hypothesis.
Minor point:
“Ho: The data are normal Ha: The data are not normal”Actually, the null hypothesis to be tested is that the POPULATION can be adequately modeled with a normal distribution.
mvpetrovich
A minor point, the word ‘adequate’ has no place in a hypothesis statemen. Actually, I would hate to see any belts using the word ‘adequate’ in ANY of their hypothesis.
A major point here is that you will never know if the POPULATION can fit a gaussian distribution, you are only statisticaly INFERING this from the sample, you may know what has happened in the past but you cannot predict the future
If you want to split hairs, get a sharper tool….
Hi Gab Diaz
You might find it helpful to look at: – http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
If not then post me a reply.
Greetings from Scotland!
Bower Chiel
Thanks for the response. First, I view hypothesis testing as something used to make inferences regarding populations, not samples. That was the previous point.The research question prompting the use of the A-D test, as I have taught for 20 years, is whether or not the population can be “adequately” modeled with a Normal/Gaussian distribution.The reason this is so worded using the term “adequately” is because in real life, there is really no such thing as a perfectly normal distribution. This is unfortunately the nature of distribution testing. What you are looking for in these cases generally is whether or not you can apply parametric statistical procedures, or perform capability assessment methods with the assumption of normality.In hypothesis testing, of course, one never knows whether or not the null hypothesis is true. Hypothesis testing is set up to generate evidence. Should the assumptions of the test be met and the sampling methods appropriate, one may then look to determine if the evidence generated supports or rejects the null hypothesis. Using test statistics calculated from samples, with the assumptions met, we determine the probability that samples with those test statistics could have been drawn from a population with the described by the null hypothesis.So, in testing distributions, one can look at the sample, calculate test statistics such as A2* for Anderson-Darling, and make a decision. If one looks at the p-value of the A2* statistic, and it is much larger than the selected alpha, one might conclude that the distribution is normal. Technically, what you have to conclude is that you have no evidence to reject the assumption of normality.What makes this a little more tricky is when you reject the assumption of normality. Anderson-Darling is a fairly powerful test, and with large samples sizes, it may reject with the detection of only slight discrepancies. In fact, one may also find that other tests such as the Skewness-Kurtosis indexes, Shapiro-Wilk test, Lin-Mudholkar test, or other tests may at the same time pass. So, then what needs to be asked is, “What’s the research question that prompted this study?” Should one run a t-test, or a non-parametric equivalent with samples from this population? Should a Cpk be generated with standard formula, or should we attempt to fit this population with some other distribution?To illustrate this, and in a topic pertaining to this thread, recently I generated 100 million samples of size 100, using normal random numbers, and calculated the Anderson-Darling test statistic (A2*) for each sample. If we let X = ln(A2* – 0.04), The Random Sampling Distribution of this statistic (X) will be VERY close to a normal distribution. Actually, quite adequate for most cases, but it is not perfect. Using a normal distribution, I get a maximum error of 0.00167 in the estimate of the p-value. So, in this case, I will need to determine if this is adequate for my use.Thank-you so much for your response, and continuing to challenge. Working through disagreements comes real learning.
The forum ‘General’ is closed to new topics and replies.
© Copyright iSixSigma 2000-2014. User Agreement. Any reproduction or other use of content without the express written consent of iSixSigma is prohibited. More »