Most Practical DOE Explained (with Template)

For purposes of learning, using, or teaching design of experiments (DOE), one can argue that an eight run array is the most practical and universally applicable array that can be chosen. There are several forms of and names given to the various types of these eight run arrays (e.g., 2^3 Full Factorial, Taguchi L8, 2^4-1 Half Fraction, Plackett-Burman 8-run, etc.), but they are all very similar.

A free Microsoft Excel spreadsheet with a 2^3 Full Factorial array showing the mathematical calculations accompanies this article (click below to download it). Generic steps for using the spreadsheet, precautions, and additional advice are included below.

Click here to download template

Viewing Tip: Usually, you can click on the icon link above to view the document in a new window — it may open within your browser using the application (in this case either Word or Excel). If you are having difficulty, try right clicking the link and selecting “Save Target As…” or “Save As…” to save it to your computer harddrive.

Generic Steps For Using The Attached Spreadsheet

There are many different articles in the literature that outline steps that should be taken to complete a DOE. The following steps are recommended for using the accompanying spreadsheet:

Determine the acceptance criteriayou need (i.e., acceptable alpha error or confidence level for determining what you will accept as passing criteria). This is typically alpha=.05 or 95 percen confidence for solid decisions; see additional advice below.
Pick 2-3 factorsto be tested and assign them to columns A, B and C as applicable (advise using the key provided).
Pick 2 different test levelsfor each of the factors you picked (i.e., low/high, on/off, etc.).
Determine the number of samples per run(room for 1-8 only; affects normality and effect accuracy, not confidence).
Randomizethe order to the extent possible.
Run the experimentand collect data. Keep track of everything you think could be important (i.e., people, material lot numbers, etc.). Keep all other possible control factors as constant as possible as these may affect the validity of the conclusions.
Analyze the databy entering the data into the yellow boxes of the spreadsheet and reading the results. A review of the ANOVA table will show you those effects that meet the acceptance criteria established in step number one. If the alpha value in the table is greater than the acceptance criteria, accept the result; if it is less, reject the result. Similarly, the higher the confidence, the higher the probability that that factor is statistically different from the others. Signal to noise measurements are helpful to use when selecting factors for re-testing in subsequent experiments.
Confirm your resultsby performing a separate test, another DOE, or in some other way before fully accepting any results. You may want to more closely define results that are close to your acceptance criteria by retesting the factor using larger differences between the levels.

How DOEs Work

Note that using the eight run array, we have four runs being tested with each factor at high levels and four without being at a high level. We have the equivalent of eight data points comparing the effects of each high level (4 high + 4 not high = 8 relative to high) and vice versa for each factor and the interactions between the three factors. Therefore, using this balanced multifactor DOE array, our eight run test becomes the statistical equivalent of a 96 run, one-factor-at-a-time (OFAT) test [(8 Ahigh)+(8 Alow)+(8 Bhigh)+(8 Blow)+(8 Chigh)+(8 Clow)+(8 ABhigh)+(8 ABlow)+(8 AChigh)+(8 AClow)+(8 BChigh)+(8 BClow)]. Other advantages to using DOE include the ability to use statistical software to make predictions about any combination of the factors in between and slightly beyond the different levels, and generating various types of informative two- and three-dimensional plots. Therefore, DOEs produce orders of magnitude more information than OFAT tests at the same or lower test costs.

DOEs don’t directly compare results against a control or standard. They evaluate all effects and interactions and determine if there are statistically significant differences among them. They also calculate statistical confidence levels for each measurement. A large effect might result but if the statistical confidence for that measurement is low then that effect is not believed. On the other hand, a small measured effect with high confidence tells us that the effect really isn’t important.

Why This Array?

Selecting a test array requires balancing your test objectives, test conditions, test strategy, and resources available. Therefore, it is usually more advantageous to run several DOEs testing only a few factors at once than one large DOE. Comparing the statistical power of this array (inherent ability to resolve differences between test factors; 1-b) with the cost of performing the experiment (number of runs needed) also shows how this array is advantageous since it requires only eight runs and yields successful results in most situations.

Picking Acceptance Criteria

Acceptable confidence depends upon your needs. If health could be affected, then you may want more than 99 percent confidence before making a decision. This author’s rule of thumb is that 51-80 percent is considered low but in some cases is worth considering (see precautions below). A confidence level of 80-90 percent is considered moderate with the results likely to be at least partially correct and a confidence level greater than 90 percent is considered high with the results usually considered very likely to be correct.

Calculating Samples Per Run

Since statistical confidence doesn’t increase with additional samples per run (replications are needed to have that effect), it is important to remember that additional samples per run are only needed when concerns of non-normal data exist (in accordance with the central limit theorem) and or to improve measured effect accuracy (significance of changes between test levels). Since test costs and/or knowing the statistical confidence in our effects is usually more important than statistical significance and normality effects, running two to three samples per run is usually ideal.

Precautions

Since this array has less power than others available, we need to remember that when optimizing a process that isn’t critical to human safety, using test results with a low confidence level can often be much better than not knowing which way to go with machine settings, etc. Assuming all the error in one’s experiment is evenly distributed (random distribution of error), a confidence level of 60 percent (measured to be true via DOE), for example, might seem horrible but really means the equivalent of 80 percent, since the 40 percent that we are unsure of could go either way [60 percent + (40 percent / 2) = 80 percent].

The accompanying spreadsheet cannot easily be changed. It should be used while training others (shows the math), or when you want to perform a quick experiment and are away from statistical software. It can’t be replicated or you can’t add center points in its current form (center points increase statistical confidence by improving measurements of error in the experiment).