When conducting the 2sample ttest to compare the average of two groups, the data in both groups must be sampled from a normally distributed population. If that assumption does not hold, the nonparametric MannWhitney test is a better safeguard against drawing wrong conclusions.
The MannWhitney test compares the medians from two populations and works when the Y variable is continuous, discreteordinal or discretecount, and the X variable is discrete with two attributes. Of course, the MannWhitney test can also be used for normally distributed data, but in that case it is less powerful than the 2sample ttest.
Uses for the MannWhitney Test
Examples for the usage of the MannWhitney test include:
 Comparing the medians of manufacturing cycle times (Y = continuous) of two different production lines (X).
 Comparing the medians of the satisfaction ratings (Y= discreteordinal) of customers before and after improving the quality of a product or service.
 Comparing the medians of the number of injuries per month (Y = discrete count) at two different sites (X).
Project Example: Reducing Call Times
A team wants to find out whether a project to reduce the time to answer customer calls was successful. Time is measured before and after the improvement. A dot plot (Figure 1) of the data shows a lot of overlap between the lead times – it is hard to tell whether there are significant differences.
Therefore, the team decides to use a hypothesis test to determine if there are “true differences” between before and after. Because the data is not normally distributed (p < 0.05) (Figure 2), the 2sample ttest cannot be used. The practitioners will use the MannWhitney test instead.
For the test, the null hypothesis (H_{}) is: The samples come from the same distribution, or there is no difference between the medians in the call times before and after the improvement. The alternative hypothesis (H_{a}) is: The samples come from different distribution, or there is a difference.
Passing MannWhitney Test Assumptions
Although the MannWhitney test does not require normally distributed data, that does not mean it is assumption free. For the MannWhitney test, data from each population must be an independent random sample, and the population distributions must have equal variances and the same shape.
Equal variances can be tested. For nonnormally distributed data, the Levene’s test is used to make a decision (Figure 3). Because the pvalue for this test is 0.243, the variances of the before and after groups used in the customer call example are the same.
Ideally the probability plot can be used to look for a similar distribution. In this case, the probability plot (Figure 4) shows that all data follows an exponential distribution (p > 0.05).
If the probability plot does not provide distribution that matches all the groups, a visual check of the data may help. When examining the plot, a practitioner might ask: Do the distributions look similar? Are they all left or rightskewed, with only some extreme values?
Completing the Test
Because the assumptions are now verified, the MannWhitney test can be conducted. If the pvalue is below the usually agreed alpha risk of 5 percent (0.05), the null hypothesis can be rejected and at least one significant difference can be assumed. For the call times, the pvalue is 0.0459 – less than 0.05. The median call time of 1.15 minutes after the improvement is therefore significantly shorter than the 2minute length before improvement.

How the MannWhitney Test Works
Another name for the MannWhitney test is the 2sample rank test, and that name indicates how the test works.
The MannWhitney test can be completed in four steps:
 Combine the data from the two samples into one
 Rank all the values, with the smallest observation given rank 1, the second smallest rank 2, etc.
 Calculate and assign the average rank for the observations that are tied (the ones with the same value)
 Calculate the sum of the ranks of the first sample (the Wvalue)
Table 1 shows Steps 1 through 4 for the call time example.
Table 1: Sum of the Ranks of the First Sample (the Wvalue)  
Call time  Improvement  Rank  Rank for ties 
0.1  Before  1  4 
0.1  Before  2  4 
0.1  After  3  4 
0.1  After  4  4 
0.1  After  5  4 
0.1  After  6  4 
0.1  After  7  4 
0.2  Before  8  11 
0.2  Before  9  11 
0.2  Before  10  11 
0.2  After  11  11 
0.2  After  12  11 
0.2  After  13  11 
0.2  After  14  11 
…  …  …  … 
7.5  Before  173  173 
8  After  174  174 
8.5  After  175  175 
8.6  Before  176  176 
10.3  Before  177  177 
11.3  Before  178  178 
11.9  After  179  179 
18.7  Before  180  180 
Sum of ranks (Wvalue) for before  9,743.5 
Because Ranks 1 through 7 are related to the same call time of 0.1 minutes, the average rank is calculated as (1 + 2 + 3 + 4 + 5 + 6 + 7) / 7 = 4. Other ranks for ties are determined in a similar fashion.
Based on the Wvalue, the MannWhitney test now determines the pvalue of the test using a normal approximation, which is calculated as follows:
where,
W = MannWhitney test statistics, here: 9743.5
n = The size of sample 1 (Before), here: 100
m = The size of sample 2 (After), here: 80
The resulting Z_{W} value is 1.995, which translate for a bothsided test (+/ Z_{W}) and a normal approximation into a pvalue of 0.046.
If there are ties in the data as in this example, the pvalue is adjusted by replacing the denominator of the above Z statistics by
where,
i = 1, 2, …, l
l = The number of sets of ties
t_{i} = The number of tied values in the ith set of ties
The unadjusted pvalue is conservative if ties are present; the adjusted pvalue is usually closer to the correct values, but is not always conservative.
In this example, the pvalue does not vary dramatically through the adjustment; it is 0.0459. This indicates that the probability that such a Z_{W} value occurs if there are actually no differences between the call times before and after the improvement is only 4.59 percent. With such a small risk of being wrong, a practitioner could conclude that the after results are significantly different.
This is a great explanation. Thank you so much.
…and if the two samples are not normally distributed and do not have equal variances and similar shapes…???