In 1920, Sir Ronald A. Fisher invented a statistical way to compare data sets. Fisher called his method the analysis of variance, which was later dubbed an ANOVA. This method eventually evolved into Six Sigma data set comparisons.
The F ratio is the probability information produced by an ANOVA. It was named for Fisher. The orthogonal array, the Results Project, DMAIC (Define, Measure, Analyze, Improve, Control)designed experiment’s cube were also his inventions.
An ANOVA can be, and ought to be, used to evaluate differences between data sets. It can be used with any number of data sets, recorded from any process. The data sets need not be equal in size. Data sets suitable for an ANOVA can be as small as three or four numbers, to infinitely large sets of numbers.
The difficulty of calculating ANOVAs by hand prevented most people from using this Six Sigma tool until the 1990s. Now, using software like Microsoft Excel, anyone and everyone can quickly determine whether differences in a set of counts or measurements were most likely due to chance variation. Or, can we say it should be more likely attributed to a “combination of factors.” These variables are often labeled factor X, Y, or Z.
Here is how you could use an Excel ANOVA to determine who is a better bowler. You could and can use an ANOVA to compare any scores. Lengths of stay, days in AR, the number of phone calls, readmission rates, stock prices and any other measure are all fair game for an ANOVA. Below are six game scores for three bowlers. Which bowler is best? If there is a best bowler, is the difference between bowlers statistically significant?
Step 1. Recreate the columns using Excel. Each bowler’s name is the field title.
Step 2. Go to Tools and select Data Analysis as shown. If Data Analysis does not appear as the last choice on the list in your computer, you must click AddIns and click the Analysis ToolPak options.
Step 3. Click OK to the first choice, ANOVA: Single Factor.
Step 4. Click and drag your mouse from Pat’s name to the last score in Sheri’s column. This automatically completes the Input Range for you:$F$1:$H$7. Click the box labeled “Labels in First Row.” Click Output Range. Then either type in an empty cell location, or mouse click an empty cell, $I$8, as illustrated by the dotted cell below. Click OK.
Step 5. Interpret the probability results by evaluating the F ratio. If the F ratio is larger than the F critical value, F crit, there is a statistically significant difference. If it is smaller than the F crit value, the score differences are best explained by chance.
The F ratio 12.57 is larger than the F crit value 3.68. Mark is a better bowler. The difference between him and the other two bowlers is statistically significant. Excel automatically calculated the average, the variance – which is the standard deviation, s, squared – and the essential probability information instantly. You can use this technique to compare physicians, nurses, hospital lengths of stay, revenue, expense, supply cost, days in accounts receivable or any other factor of interest.
Excel takes care of the first three Six Sigma rules for completing an analysis. Unfortunately, it does not create a meaningful, analytic graph. As mentioned earlier, most Excel graphs are descriptive, rather than analytic.
As you advance in your Six Sigma learning you may want to learn to use a more advanced Six Sigma software program. One such program is called StatEase DesignExpert. StatEase calculates an ANOVA and graphically shows statistical differences between sets of data. It is all achieved with mouse clicks. You won’t have to look at, or calculate an equation.
.
Each IBar has a black square in its center. This square identifies the average score for each bowler. The top and bottom of each IBar extends two standard deviations, 2s, above and below the mean. Think of these as the upper control limit (UCL) and lower control limit (LCL) for each bowler’s score. Each IBar covers 95 percent of an imaginary, onitsside bell curve for each bowler.
This Six Sigma data array of fields and records would tell us a little about each observation. The more fields, the richer our understanding can be. For example, in the same amount of space the following table has twice as much data. Rich data, meaning each column/field has a crystal clear operational definition, can yield rich information. Many times we collect dozens of fields for each recorded observation. Since data collection is time consuming and expensive, design your collection plan with care before you begin.
Note the overlapping values in red circle markers. On occasion both Pat and Sheri could have bowled a better game than Mark. But, when the data are viewed using the mean value (), the standard deviation (), the probability information provided by the F ratio, and a meaningful analytic graph, Mark is obviously a better bowler. In fact, as we mentioned before, he is a better bowler in a statistically significant way.


Comments
well done!
quite useful and an easy explanation
Perfect.
I thought minitab is the only software to calculate the F ratio and Pvalue.
Thanks for your sharing.
Hi, I was just wondering… If i have created a oneway withinsubjects ANOVA, what is the best way to present this data in terms of graphs? Does SPSS or excel have any options? thanks!
wonderfull
Slightly misleading here as the ANOVA does not tell you WHICH person in your sample is causing the test to be significant. Though it’s obvious from looking at the scores, strictly speaking you would need to do a ttest for each person to find out where the difference lies