DOE – How to Design Experiment with Multiple Levels

Six Sigma – iSixSigma Forums General Forums Implementation DOE – How to Design Experiment with Multiple Levels

Viewing 19 posts - 1 through 19 (of 19 total)
  • Author
  • #54226

    Lucas Xu

    As I know, we usually design our experiment with 2 levels with multiple factors, which is typical DOE format.But how about we design with multiple levels, can we do that? For example, I want to design 4 levels with 5 factors, how can I operate in Mintab? Hope someone could help me.


    Robert Butler

    Two level designs are “typical” because they are the lowest level designs that meet the basic criteria of designs – efficiency with respect to number of runs and ability to identify significant effects. A central concept of design is that if an effect is going to be observed your best chance of seeing it will be by contrasting responses recorded at the extremes of the variables of interest.

    The main drawback to two level designs is that they can only describe linear (straight line) trends. The next step up from a two level design would be a two level design with center points. The design will still only be able to describe straight line trends with respect to the variables of interest but it will also allow for an overall check for the existence of curvilinear behavior somewhere in the design space. It won’t be able to assign the curvilinear response to any one variable but it will tell the researcher if one exists.

    After we pass 2 level designs we come to a fork in the road. We can go after 3 level designs or we can go after composite designs. Both of these will allow for a check of main effects curvilinear behavior (straight line and single bend curvature), however, 3 level designs will also allow for the examination of interaction terms involving squares of the factors of interest.

    If you are going to try for something beyond 3 level designs or composites with specific 5 level settings you will have to build them yourself. However, before you do you really need to think about what it is that you are doing.

    If you want 4 levels what you are saying is that you want to investigate CUBIC response trends. I’m sure there are some processes around somewhere where cubic responses are the norm but I must admit I don’t recall ever having to deal with this situation. I won’t pass judgment on what you are doing but based on what I’ve seen and done I think it is overkill – both with respect to number of experiments needed and the time and effort involved in the work.

    Ok, so much for the op ed. If you want designs above 3 levels (here we are ignoring composites) you will have to build them yourself. The easiest way to do this would be to generate the complete matrix of design points (in your case 4**5 = 1024 combinations), put them in a file, and use them as the search space for a D-optimal (or A-Optimal) design generation algorithm. For your 4 level, 5 factor design, the basic model you would need to specify to the D-optimal package would be either

    Model = A,B,C,D,E, A**2, B**2,C**2, D**2,E**2, A**3, B**3,C**3,D**3, E**3


    Model = A,B,C,D,E, A**3, B**3,C**3,D**3, E**3

    which one would depend on the instructions concerning term definition for your stat package’s D-optimal routine.

    One additional thought (let me pull out the soapbox again) – if the point of having more than 3 levels is because you want to have a “better definition” of the trend of a suspected curve shape then there are two other possibilities.

    First – if the suspected curvature is the very common situation where the curvilinear response is one that is not symmetric (that is the center point and one or other of the extremes will have essentially the same values meaning that you will have missed the inflection point)and if you have some idea as to where in the design space the inflection might occur, you can take a standard 3 level design and locate the “center” values at these levels. Your matrix will no longer be a series of -1,0,1 levels. Rather you will have to “normalize” your chosen “center” values to whatever it is they scale to. (For example: if the -1,0,1 levels of the perfect space corresponded to 0,5,10 and if you actually wanted to run 0, 2,10 then the “normalized” values would be -1, -.6, 1.)

    The design will be a tad non-orthogonal but it shouldn’t matter and it will allow you to use standard 3 level designs for your investigation.

    Second – If you really just want to have more points to define a curve you can take your standard 2 level design, add a couple of center points, and then toss in some additional one-at-a-time runs to flesh out the curve shape for whatever factor(s) you are interested in.

    This would amount to augmenting a two level design with a series of ladder experiments where you vary the one variable while freezing the others as levels inside the design. This is a very cumbersome way to go and before you do it you should have some assurance that you actually need to look for curvature in the cubic and higher orders. If you choose this route you will have to give some thought to the analysis. If everything is inside the design space (it should be) then you ought to run the analysis with and without the ladder values and see what you see.


    Chris Seider

    And don’t forget about EVOP.



    @lucasxu83 – Besides the excellent advice provided by Robert Butler, the only other reason to perform more than a 3 level design is if you have discrete variables.


    Lucas Xu

    @rbutler Thanks, maybe for linear equation, 2 level will be enough for most of case…


    Lucas Xu

    @cseider what is EVOP?


    Chris Seider

    See below. It’s a great technique without perturbing the process much.


    Nur Fariha

    Which type of design experiment can be used for two factors at different levels?
    for example i have factor
    time with 21 level
    nanofluids 7 level


    Robert Butler

    You’ll have to provide more information about time and the nanofluids – are we talking multiple kinds of fluids whose levels can be changed independently of one another or are we talking multiple levels of a single fluid or is it something else?

    As for time are we talking taking measurements across time or are we talking about running something for specified intervals of time or are we talking about something else?

    If it is a simple case of having both time and nanofluids at different intervals/concentrations and if you want to check curvilinear effects and two way interactions then everything I said in the earlier post concerning 3 level/composite designs and the lack of any need to go beyond 3 levels applies.




    if the experiment is preliminary, which one is better to conduct?

    1. two level factorial with center point replicates
    2. go directly to three level factorial

    Thank you!


    Robert Butler

    Which one is better depends on many things. If it is preliminary and you have no prior information concerning a possible curvilinear response over the range of variables you are examining then it comes down to a matter of time/money/effort. If we are discussing the example of your other post – 2 variables, then you would be running 6 vs 10 experiments – 2**2 + two center points vs 3**2 and a single replicate of one of the design points.

    One compromise is to run the 2**2 plus the two center points and, if the analysis indicates curvlinear behavior, augment the existing run of experiments with a couple of additional design points to test the curvature with respect to the two variables of interest. To do this you would need to use one of the computer generated design methods such as D-Optimal. You would force in the existing design and allow the program to choose additional runs from all of the possible design points in a 3**2 design.


    Shamshul othman

    You can study first the relationship of each input variable to the output and understand the relationship pattern, if they are linear, then 2 levels is sufficient. But if you recognize a curvy linear relationship, then center points at the inflection point will do. This can save a lot of experimental runs.However if you still find curvatures, you still can further optimize it with RSM…



    Hi every one

    I have been newer in the Design of Experiment theory (DOE). I have explained my project but I don’t know how can I use DOE to do my project. this project related to the offshore pipe which is laid on the seabed under the sea.

    I have 2 factors including “seabed stiffness” and “pipe motion” and then my output (response) is a “max of stress” at the specific point in the pipe. The numerical simulation has been used to find out the “max of stress”. I choose several values for “seabed stiffness”, such as 5, 10,100,500,1000 kN/m, and also consider several values for “pipe motion”, such as 1.2,1.5,1.8,2.4,2.7,3,3.3,3.6,3.9,4.2,4.5 m.

    Through numerical simulation, the values of stress will be determined by having input values.

    the purpose of this project is:

    determining the appropriate model or formulation that can be fitted on my inputs and output. in other words, I want to predict the output (max of stress) by using two factors (stiffness and pipe motion). as mentioned earlier I have two factors with different levels. for “stiffness” I have 5 levels and for “pipe motion” I have 11 levels. Now, how can I use DOE to predict the appropriate formulation?

    I should mention I can’t remove levels for each factor since the relation between “stiffness” and “pipe motion” is complex and I must provide these levels for each factor.

    In fact, I’m looking for a method to predict only an appropriate formulation between inputs and output which both of them are submitted by me. I’m not sure that DOE is the correct way for my project or not!

    I would be thankful if help me in this case.

    Thanks in advance



    Robert Butler

    Based on what you have posted the short answer to your question is – no – DOE won’t/can’t work in this situation.

    The factors in an experimental design require either the ability to change the levels of each factor independently of one another or, in the case of mixture designs, to vary ratios of the variables in the mix.  In your situation you do not have the ability to change stiffness and pipe motion independently of one another – indeed it does not sound like you have the ability to change either of these variables in the real world setting.

    The point of a DOE is to provide outputs that are the result of controlled, organized changes in inputs.  The DOE does not predict anything it only acts as a means of data gathering and once that data has been gathered you have to apply analytical tools such as regression to the existing data in order to understand/model the outputs of the experiments that were part of the DOE.



    Thanks for your reply @rbutler.

    I think some parts of my previous post was not clear.

    I need to mention I can change my “stiffness” or “pipe motion” values, but the provided values (such as 5,10,100,500,1000,5000 for stiffness and 1.2,1.5,1.8,2.4,2.7,3,3.3,3.6,3.9,4.2,4.5 m for pipe motion) are my critical points and I want to use these data to have an accurate model. However, there is not any problem if additional points will be added/defined by the DOE method.

    My question is that: how can I define different levels for each factor (e.g. 6 levels for factor 1, and 11 levels for factor 2) in the DOE method? which method in this theory is beneficial for my project?

    I have searched around these topics and some guys said me: ‘the “I-Optimal” method of the RSM is helpful for your project. because in this method you can define several levels for each factor and there are not any issues if levels of each factor are not the same as other factor’s levels’.

    I have a deadline for my project and still not sure the optimal method is profitable or not.

    In addition, I have attached a picture, please find it to find out my problem. this picture is a simple case of my project.

    according to this attached pic, I want to find out the accurate formulation/model which will be able to predict the relationship between “stiffness” and “pipe motion” as input variables to give a “vessel motion” as an output.

    In fact, I want to determine a formulation/function as same as below:

    vessel motion =f (stiffness, pipe motion)

    Thanks for your consideration




    1. You must be signed in to download files.

    Robert Butler

    I’m missing something.

    The idea of an experimental design is that you have little or no idea of the functional relationship (if any) between variables you believe will have an effect on the output and the level(s) of the output itself.  You put together a design which consists of a series of experimental combinations of the independent variables of interest and you then go out into the field/factory/seabed floor/ hospital OR/whatever and physically construct the various experimental combinations, run those combinations, get whatever output you may get from the experimental run, and then use that data to construct a model.

    It sounds like you already have a model which (I assume) is based on physical principles and mirrors what is known about seabed stiffness/pipe motion behavior.  If you already have a model (which your graph suggests is the case) then all that is left is either a situation where you want to go out on the seabed and run some actual experiments with pipe motion to see if the current model matches (within prediction limits) what is actually observed or a situation where you want to run some matrix of combinations of seabed stiffness and pipe motion just to see what the model predicts.

    If it is the former situation then the quickest way to test your model would be to actually run a simple 2×2 design where you have two seabed conditions and two levels of pipe motion and where you try to find settings that are as extreme as you can make them.  If it is the latter then, unless it is very costly to run your model, I don’t see why you wouldn’t just run all of the combinations and see what you see.  The problem with the latter is, if you already have an acceptable model, I don’t see the point of running all of the calculations since it doesn’t sound like you have any “gold standard” (actual experimental data) for purposes of comparison.

    Shifting subjects for a minute.  Let’s just talk about DOE.

    You said, “My question is that: how can I define different levels for each factor (e.g. 6 levels for factor 1, and 11 levels for factor 2) in the DOE method? which method in this theory is beneficial for my project?

    I have searched around these topics and some guys said me: ‘the “I-Optimal” method of the RSM is helpful for your project. because in this method you can define several levels for each factor and there are not any issues if levels of each factor are not the same as other factor’s levels’.”

    If what you have written above is an accurate summary of what you were told then it is just plain wrong.

    If the graph you have provided really mirrors what is going on then you need to ask yourself the following question:

    Why would I need all of those levels of stiffness and pipe motion?  Consider this – two data points determine a straight line, 3 data points will define a curve that is quadratic in nature, 4 data points will define a cubic shaped curve, 5 data points will define a quartic curve, and 6 will give you a quintic.  The lines in your graph are simple curves – no inflections, no double or triple inflections – in short no reason they couldn’t be adequately described with measurements at three different levels and thus, no reason to bother with more than three levels.

    As for the notion concerning I-optimality (or any of the other optimalities) – they are just computer aided designs.  The usual reason one will opt for a computer aided design is because there are combinations of independent variables that are known to be hazardous, impossible to run, or are known to provide results that will be of zero interest (for example – we want to make a solid – we know the combination of interest will only result in a liquid) and if we try to examine the region of interest using one of the standard designs we will wind up with one or more experiments of this type in the DOE matrix.

    Finally, I’ve never heard of any design that requires all variables to have the same levels.  If this was said then my guess is what was meant was when you are running a design it is not necessary for, say, all of the high levels of a particular variable to be at exactly the same level.

    If that was the case then, yes, this is true – but this is true for all designs – basically, as long as the range of “high” values for the “high” setting of a given variable do not cross over into the range of the “low” settings for that variable you will be able to use the data from the design to run your analysis.  Of course, if you do have this situation then you will need to run the analysis with the actual levels used and not with the ideal levels of the proposed design.






    Thanks for your valuable information. I’m not familiar very much with DOE.

    I’m still confusing regarding my problem! I have told my question in the other way:

    Apart from the DOE, just consider the simple following problem.

    I run some numerical analysis with different stiffness (5,10,100,500 kN/m) and different pipe motion (0.6,0.9,1.2,…,5.1 m) and through these 64 simulations, the vessel motion determined. Then I plot the previous curve. I should mention all other variables (except stiffness and pipe motion) are constant in my problem.

    Now I want to find out the relationship between inputs and output, actually, finding out appropriate formulation which represents:

    vessel motion = f (stiffness , pipe motion)

    If I will be able to find this specific equation, then I can represent output (vessel motion) by giving stiffness and pipe motion.

    For example, by having this specific equation, when any other stiffness, e.g k=1000 kN/m, and other pipe motion, e.g. =6 m, (those are not mentioned in the plot), then I can calculate the vessel motion only through this equation and no numerical simulation is needed.

    I mention again: please forget DOE and do you have any idea regarding this question?

    I attached the excel file and simple data provided. please find it. as you see while the value of stiffness increases the results of vessel motion have strongly linear relationship with pipe motion. But in the lower stiffness, this line was not linear. It seems logical relationship exists between these variables (two inputs: stiffness & pipe motion; and one output: vessel motion) and I’m trying to find out this relationship as an appropriate equation.

    any comments from you will be valuable for me and I can use it to solve my problem.

    Thanks for your guidance


    Robert Butler

    So if I’m understanding correctly the question you are asking is how to generate a fit to the data in your plot that will be in the form of a predictive equation.

    If that’s the case then, based on just looking at the plots, my first try would be a simple linear regression using the terms pipe motion, the square of pipe motion, stiffness and the interaction of stiffness and pipe motion and/or the interaction of stiffness and pipe motion squared.

    The other first try option could be a form of one of Hoerl’s special functions with an additional interaction term.

    The linear form would be ln(vessel motion) = function of ln(pipe motion), pipe motion, stiffness and an interaction term of either stiffness with pipe motion or stiffness with ln(pipe motion).

    For every attempt at model building you will need to run a full residual analysis.  The plots of the residuals vs the predicted, independent variables, etc. will tell you what you need to know as far as things like model adequacy,  missed terms, influential data points, goodness of fit, etc.

    Repeating this exercise with other terms that the residuals might suggest should ultimately result in a reasonable predictive model – be aware – any model of this type will have an error of prediction associated with it and your final decision with respect to model accuracy will have to take this into account.

    If you don’t know much about regression methods you will need to borrow some books through inter-library loan.

    I’d recommend the following:

    Applied Regression Analysis – Draper and Smith – read, understand, and memorize the first chapter – the second chapter is just the first chapter in matrix notation and may not be of much use to you. You will need to read, thoroughly understand and do everything listed in Chapter 3 – The Examination of the Residuals.

    Regression Analysis by Example – Chatterjee and Price – an excellent companion to the above and, as the book title says – it provides lots of examples.

    Fitting Equations to Data – Daniel and Wood – from you standpoint the most useful pages of the book would probably be pages 19-27 (in the second edition – page numbers might be slightly different in later editions) for the chapter titled “One Independent Variable” – yes, I know, you have 2 variables – but the plots and the methods will help get you where you want to go.

    You will also want to follow what these books have to say about models with two independent variables and interaction terms (also known as cross-product terms).




    Thanks a lot @rbutler

    I have found some references you mentioned and I will use them.



Viewing 19 posts - 1 through 19 (of 19 total)

You must be logged in to reply to this topic.