Testing for normality is often a first step in analyzing your data. Many statistical tools you might use have normality as an underlying assumption. If you fail that assumption, you may need to use a different statistical tool or approach. This article will explore what normality of the data means and how the AD test can be used to confirm whether your data will satisfy the assumption of normality. We will also explain the benefits of the AD test and offer a few best practices for understanding when and how to use the AD test.
Overview: What is the Anderson-Darling Normality Test (AD test)?
The Anderson-Darling test is used to test if a sample of data comes from a population with a specific distribution. Its most common use is for testing whether your data comes from a normal distribution.
But, what does that mean?
Normality refers to a specific statistical distribution called a normal distribution, or sometimes the Gaussian distribution, or bell-shaped curve. The normal distribution is a symmetrical continuous distribution defined by the mean and standard deviation of the data.
The normal distribution is a theoretical distribution. What you are really testing with the AD test is not whether your data is exactly consistent with a normal distribution, but whether your data is close enough to normal that you can use your statistical tool without concern.
In some cases, a statistical tool may be robust to the normality assumption, which means the statistical tool is not overly sensitive to some level of violation of the normality assumption. The normal distribution is popular because it describes many real-life situations, such as the distribution of people’s heights, weights, and income.
The AD test is really a hypothesis test. The null hypothesis (Ho) is that your data is not different from normal. Your alternate or alternative hypothesis (Ha) is that your data is different from normal. You will make your decision about whether to reject or not reject the null based on your p-value.
The test statistic for the AD test is:
Yes, I know it looks really scary, but don’t worry. All the computations can be done by statistical software in your computer. The output you get will include a p-value.
Assuming you selected your alpha risk to be 0.05, you will reject the null if the p-value is less than 0.05. That allows you to claim that your data is statistically different from a normal distribution. On the other hand, if your p-value is higher than 0.05, you can state that your data is not statistically different from a normal distribution.
Here is an example of a probability plot that provides the results for the AD test.
Note that the value of the AD statistic is 0.2307 and the p-value is 0.805. The 0.2307 was calculated from the AD formula above. With a p-value of 0.805, you would fail to reject the null and conclude that your data is not different than normal. This would satisfy any assumption of normality you might need for a statistical test.
Let’s look at another example.
This time, notice that the p-value is 0.0047 based on the AD statistic of 1.1697. In this case, you would reject the null hypothesis and say that your data is different than normal.
3 benefits of the Anderson-Darling Normality Test (AD test)
Knowing the underlying distribution of your data is important so you can apply the most appropriate statistical tools for your analysis.
1. Confirms your data distribution
The AD test will help you determine if your data is not normal rather than tell you whether it is normal. Since the normal distribution is a hypothetical distribution, you can’t prove that the data is normal. The AD test will tell you if it is not normal or if it is not different from normal, but it cannot tell you if the data is normal.
2. Helps guide your decision
The p-value, which is based on the value of the AD statistic, will provide you guidance on whether to reject or not reject your null hypothesis.
3. Can be simple
In many cases, the computer software you use will provide you a graphical representation of the data along with the AD value and p-value. This will give you some visual and logical confirmation about your data.
Why is the AD test important to understand?
Different statistical tools for analysis have different assumptions regarding the underlying distribution of the data that you are analyzing.
For example, the t-test has an assumption that the data is normally distributed. Linear regression assumes that the underlying distribution of the residuals is normal. Binary logistic regression has an assumption of the binomial distribution. Others might have an assumption of the F or Chi-Square distributions.
You need to understand what these assumptions are regarding your data.
What is your hypothesis test?
Since the AD test is a form of hypothesis testing, you want to correctly state your null and alternative or alternate hypotheses. In the case of the AD test, the null is that your data is not different from a normal distribution.
This is what you would want since it is the underlying distribution for your desired statistical tool. The alternate is that it is different from the normal distribution.
Impact of sample size
As the sample size of your data increases, your chances of discovering non-normality increase. Small sample sizes may give you a false reading of normality. If you are using a probability plot, don’t be deceived by the impact of the sample size. Let your decision be guided by the p-value.
Interpreting your p-value
The p-value of your AD test will indicate, with your desired level of risk, whether you can reject your null hypothesis. It’s important to you know what that means so the next action you take is appropriate.
An industry example of the AD test
A manufacturing manager wanted to confirm whether the recent overhaul of his printing press resulted in an increase in production rate as promised by the vendor. He had daily run speed data for 15 runs prior to the overhaul and 17 runs after the overhaul. He wanted to compare the average run speed pre- and post-overhaul.
He decided to consult with his Lean Six Sigma Black Belt on how to analyze the data.
The LSSBB advised that, since the manager was interested in comparing two sets of continuous data, the appropriate test was the 2-sample t-test. An underlying assumption was that the sample data be normally distributed. The LSSBB was concerned that the sample size of 15 and 17 was small, so the normality assumption couldn’t just be ignored.
Upon checking the normality of the data with the Anderson-Darling test, the LSSBB found the data not to be normally distributed. Therefore, he was not comfortable just doing the 2-sample t-test. He then also ran a 2-sample Mood’s Median test, which tests for the difference between two medians and has no assumption of distribution.
Both the t-test and the Mood’s Median test resulted in p-values greater than 0.05, which indicated that the overhaul did have an impact — and run speed had increased.
3 best practices when thinking about the AD test
It’s unlikely you’ll do the hand calculations for the AD test. The important issue will be how you collect the data and interpret the results of your AD test. Here are a few thoughts to keep in mind.
1. Alternative analytical tool
Your data may not be normal, so have a plan B, or alternate analytical tool, that will still answer your statistical question but doesn’t have the same underlying assumptions of the data distribution.
In the event of you failing the assumptions for the t-test, you might consider using a Medians test instead.
2. Proper sampling
Random sampling of a statistically valid size will help you get a truer picture of what your data distribution is. This will give you more confidence in the results of your AD test.
Often, a simple plot of the data on either a histogram or probability plot will provide you enough insight into how your data looks. This will keep you from having to do more complicated analysis.
Frequently Asked Questions (FAQ) about the AD test
What does the Anderson-Darling statistic value mean?
The AD statistic value tells you how well your sample data fits a particular distribution. The smaller the AD value, the better the fit.
Besides the AD test, are there other tests for normality?
Yes, there are a number of other tests. One of the more popular tests is the Kolmogorov-Smirnov (K-S) test. Other commonly used tests are the Ryan-Joiner and Shapiro-Wilk tests.
Who developed the Anderson-Darling Normality Test?
The Anderson-Darling Test was developed in 1952 by Theodore Anderson and Donald Darling.
The AD test in a nutshell
Many statistical tools have an assumption that your data is approximately normally distributed. If it’s not, you must use a different tool to answer your statistical question.
The AD test starts with a null statement that your data is not statistically different than normal. The alternate statement is that it is different from normal. The results you will get will suggest you can either reject the null, or fail to reject the null. From there, you can decide how to proceed.