For purposes of learning, using, or teaching design of experiments (DOE), one can argue that an eight run array is the most practical and universally applicable array that can be chosen. There are several forms of and names given to the various types of these eight run arrays (e.g., 2^3 Full Factorial, Taguchi L8, 2^4-1 Half Fraction, Plackett-Burman 8-run, etc.), but they are all very similar.
A free Microsoft Excel spreadsheet with a 2^3 Full Factorial array showing the mathematical calculations accompanies this article (click below to download it). Generic steps for using the spreadsheet, precautions, and additional advice are included below.
Viewing Tip: Usually, you can click on the icon link above to view the document in a new window — it may open within your browser using the application (in this case either Word or Excel). If you are having difficulty, try right clicking the link and selecting “Save Target As…” or “Save As…” to save it to your computer harddrive.
There are many different articles in the literature that outline steps that should be taken to complete a DOE. The following steps are recommended for using the accompanying spreadsheet:
Note that using the eight run array, we have four runs being tested with each factor at high levels and four without being at a high level. We have the equivalent of eight data points comparing the effects of each high level (4 high + 4 not high = 8 relative to high) and vice versa for each factor and the interactions between the three factors. Therefore, using this balanced multifactor DOE array, our eight run test becomes the statistical equivalent of a 96 run, one-factor-at-a-time (OFAT) test [(8 Ahigh)+(8 Alow)+(8 Bhigh)+(8 Blow)+(8 Chigh)+(8 Clow)+(8 ABhigh)+(8 ABlow)+(8 AChigh)+(8 AClow)+(8 BChigh)+(8 BClow)]. Other advantages to using DOE include the ability to use statistical software to make predictions about any combination of the factors in between and slightly beyond the different levels, and generating various types of informative two- and three-dimensional plots. Therefore, DOEs produce orders of magnitude more information than OFAT tests at the same or lower test costs.
DOEs don’t directly compare results against a control or standard. They evaluate all effects and interactions and determine if there are statistically significant differences among them. They also calculate statistical confidence levels for each measurement. A large effect might result but if the statistical confidence for that measurement is low then that effect is not believed. On the other hand, a small measured effect with high confidence tells us that the effect really isn’t important.
Selecting a test array requires balancing your test objectives, test conditions, test strategy, and resources available. Therefore, it is usually more advantageous to run several DOEs testing only a few factors at once than one large DOE. Comparing the statistical power of this array (inherent ability to resolve differences between test factors; 1-b) with the cost of performing the experiment (number of runs needed) also shows how this array is advantageous since it requires only eight runs and yields successful results in most situations.
Acceptable confidence depends upon your needs. If health could be affected, then you may want more than 99 percent confidence before making a decision. This author’s rule of thumb is that 51-80 percent is considered low but in some cases is worth considering (see precautions below). A confidence level of 80-90 percent is considered moderate with the results likely to be at least partially correct and a confidence level greater than 90 percent is considered high with the results usually considered very likely to be correct.
Since statistical confidence doesn’t increase with additional samples per run (replications are needed to have that effect), it is important to remember that additional samples per run are only needed when concerns of non-normal data exist (in accordance with the central limit theorem) and or to improve measured effect accuracy (significance of changes between test levels). Since test costs and/or knowing the statistical confidence in our effects is usually more important than statistical significance and normality effects, running two to three samples per run is usually ideal.
Since this array has less power than others available, we need to remember that when optimizing a process that isn’t critical to human safety, using test results with a low confidence level can often be much better than not knowing which way to go with machine settings, etc. Assuming all the error in one’s experiment is evenly distributed (random distribution of error), a confidence level of 60 percent (measured to be true via DOE), for example, might seem horrible but really means the equivalent of 80 percent, since the 40 percent that we are unsure of could go either way [60 percent + (40 percent / 2) = 80 percent].
The accompanying spreadsheet cannot easily be changed. It should be used while training others (shows the math), or when you want to perform a quick experiment and are away from statistical software. It can’t be replicated or you can’t add center points in its current form (center points increase statistical confidence by improving measurements of error in the experiment).