2-Factor 4-Level Full Factorial Design Help

Six Sigma – iSixSigma Forums General Forums Methodology 2-Factor 4-Level Full Factorial Design Help

Viewing 8 posts - 1 through 8 (of 8 total)
• Author
Posts
• #236157

ce_seek
Participant

I’m new to DOE. I’m doing a full factorial design.

In my case, I have two factors each with 4 levels.

I want to know the main effects and the interaction effects, and establish an equation to predict the response.

However, I have seen in most examples that there are only 2 levels (high and low).

With this, I would like to ask if is it possible to conduct multiple linear regression with four levels?

If not, do I need to reduce my levels to 2?

Thank you!

0
#236158

Robert Butler
Participant

The issue is this – the basic philosophy of experimental design is if you are going to see a difference in response when you change conditions your best chance of seeing that difference is by comparing the results for the extremes of the variables of interest. This is the point of two level factorial designs. Thus, if you take the extremes of the settings of your two variables and run the 4 point experimental design and replicate just one of those experiments (total 5 experiments) you can build a single model which will include your main effects and the interaction of the two terms.

You could custom build a 4 level factorial design but the big question is why would you want to do this? A 4 level design means you believe you have responses that would map to a cubic polynomial. Unless you happen to have, within the design space, something like a phase change I can’t think of any measured response of any kind of variable that would plot as a cubic function (and if you do have a phase change then you really need to go back and revisit a number of process issues before considering a design).

If you believe you have a response that is curvilinear (i.e. not a simple straight line) then you could run a three level design for the two variables of interest. This would give you a total of 9 experiments and would allow you to look at all main effects , all squared terms, and the interaction of the two variables. If the cost per experiment is low and there aren’t any issues with time constraints you could opt for a 9 point run.

If time/money/effort is an issue then a better approach would be to run the two level factorial design along with a center point (you haven’t mentioned the kinds of variables you are considering but we are assuming they are continuous in some fashion – either interval or ordinal) and then run a replicate of that center point. This would allow you to build a model with the main effects and the interaction and it would allow you to check for curvilinear behavior. If curvature is present you won’t be able to determine which of the two variables are generating this response but you will at least know it exists and you can augment your existing work with a couple of other design points to parse out the variable that is generating this curvilinear behavior.

2
#236180

ce_seek
Participant

hello! thanks for the reply. right now, im currently pursuing two level full factorial with centerpoint. and yes, my variables are continuous.

however, im not really sure of the process. is the model im gonna come up is still linear? and then ill just be doing a follow up check whether there is curvature? im thinking this is the sequence:

1. evaluate the main effects
2. evaluate the itneraction effects
3. come up with model (equation) using multiple linear regression
4. test for curvature?

does this make sense? or in the equation im gonna come up, there is a term belonging to the curvature part? not really sure. if you have reference material or examples, that would really be a great help!

0
#236181

ce_seek
Participant

@rbutler

these are currently my objectives:

1. Investigate the main effects and interaction effect of (1st factor) and (2nd factor) on the response;
2. To obtain mathematical models that can predict the response through multiple linear regression analysis using (1st factor) and (2nd factor) as predictor variables;
3. To test model for curvilinear behaviour using centre point replicates

QUESTION 1: is this already okay?
QUESTION 2: Do i still need to validate my model using RMSE or R^2 method?

THANKS!

0
#236184

Robert Butler
Participant

QUESTION 1: is this already okay?

Yes, except as written your point #2 sounds like you are just going to look at a model of X1 and X2 and not bother with including the interaction. The full model would be response = fn(X1, X2, X1*X2). What you want to do is run backward elimination on the full model to see what terms remain in the reduced model (the model containing only significant terms).

QUESTION 2: Do i still need to validate my model using RMSE or R^2 method?

I’m not sure what you are suggesting here. You test for model adequacy by looking at a plot of the residuals vs. predicted. If the residuals of the replicates on the center points are split by the 0 line then it would strongly indicate that, over the region you have examined, the relationship between the Y and the X’s is linear (straight line). If the two center point residuals plot either above or below the 0 line then it would suggest there is a curvilinear aspect which would mean you should consider augmenting what you have done with a couple additional runs to identify the variable(s) driving the curvature.

As for model validation – given the residual analysis says everything is OK then you will need to use the regression equation to identify settings of X1 and X2 that will predict the best Y and go out and run that experimental combination and see what you see. If the resulting Y is within the prediction limits of the model then you will have your model validation. If it isn’t then you will have some more work to do.

As for using RMSE or R^2 to assess the model – don’t. R^2 is an easily fooled statistic and RMSE by itself doesn’t tell you much either.

There are any number of posts, blogs, even articles that tell you that you must have an R^2 of X amount before the model is of any use. The often cited value for X is 70% or greater. By itself this statement is of no value. If your process is naturally noisy then the odds of your being able to build a model that explains 70% of the variation is very low. Rather than waste time chasing some level of R^2, just test the equation using the method mentioned above. If the results are as indicated above then you have a model that will address your needs with respect to predicting the behavior of your process over the ranges of the X1 and X2 you used.

As a counter to the R^2 must be such-and-such before the model is useful claim I’ll offer the following: A number of years ago the company I worked for made additives which were used in the hand lotion industry. One of our customers asked if we could run a series of experiments that would give them some level of predictive capability with respect to outcomes. We signed a non-disclosure agreement, they paid for the research and, after a lot of discussion, I built a design that I thought would address their concerns.

Hand lotions are evaluated by a team of people who have been trained on control compounds to assess various characteristics of lotion. The ratings for the various measures are ordinal and, depending on the characteristic, the rating scales can be either 1-5, 1-7 or 1-10. There were about 10 different characteristics of interest. After they had built the compounds corresponding to each of the experimental runs and got the responses from their evaluation teams, I built separate models for each of the characteristics and then used all of the models together to identify the best trade offs with respect to product properties. None of the models had and R^2 greater than 30% and, because of the ordinal nature of the responses, the prediction errors were rounded to the nearest .5.

We generated a series of predictions of product properties based on the regression models and the manufacturer went out and built the formulations that matched the predicted combinations. When they gave the confirming runs to their panels for evaluate – every one of the responses were well within the prediction limits associated with each predicted response. The company found the models to be extremely useful and they used them to guide research into ways to further improve their product.

0
#236185

Robert Butler
Participant

Addendum: With respect to the residual plot. I left out a piece – You want to look at the overall pattern of the residuals – not just where the center points fall. If the model is adequate you should see a random distribution of data points above and below the 0 residual line. As part of that assessment you will want to see how the center points behave.

0
#236192

ce_seek
Participant

If I am going to replicate every thing, I will have 5 experimental runs right (multiplied by 3)?