Simulate Responses for Fractional Factorial Design
- March 30, 2018 at 2:36 pm #55967
I am learning DoE and want to simulate responses for fractional factorial design.
As an input for simulation I want to use estimated effect sizes of main factors and some interactions.
I know there is a formula for calculating effects from obtained responses:
“For each of the main effects, the estimated effect consists of the mean at the high level of the factor minus the mean at the low level of the factor.”
Here is the screenshot of fractional factorial design from a book:
And here is formulas for calculation of effects when responses are known:
But I want inverse:
1. Input values of estimated main effects and interactions (A B C AB AC etc.)
2. Derive a simulated responses for all rows (1, a, b, ab..) so that they lead to that predefined effects.
Could you help me please how to approach this problem?
R or JMP code would be extremely helpful.
After all I want to use derived responses in this SAS JMP code for simulation binomial responses. Also I will use it with higher number of factors.March 30, 2018 at 4:47 pm #202411
In order to solve simultaneous equations you have to have the same number of equations as unknowns and this isn’t the case for what you want to do. If we look at a two variable two level factorial then the equations for A,B, and AB are:
A = 1/2[(a+ab) – (1+b)]
B = 1/2[(b+ab) – (1+a)]
AB = 1/2[(1+ab) – (a+b)]
and the unknowns are a,b, ab, and 1. You could, of course, specify a value for “1” but then the values of a,b,and ab would be unique only to that chosen value.March 31, 2018 at 3:59 am #202414
@rbutler, thank you!
As I understand 1 in this case is a value when all factors at -1. If so, I can specify this value.
Is it possible to solve the equation in such case?March 31, 2018 at 6:59 am #202417
Sure, because now it is 3 equations and 3 unknowns but, as I said, the solution will depend on your specification of “1” and will change every time you change that value. There are a couple of other things you will need to consider.
1. The tone of your post suggests you want to solve for the value for any number of experimental conditions. In that case you will need to use matrix algebra to run the analysis.
2. Your simulation is assuming perfection. That is, you will get A value for each experimental condition and that is not the case in the real world.
3. You stated you were interested in binomial responses. This means your regression equation response will be an odds ratio and that means you will have to work your way through expressing your effect sizes in terms of log odds.March 31, 2018 at 8:09 am #202418
@rbutler, again tahk you,
At this point I can accept perfection or I can try adding a noise in simulation later.
I have used this code to simulate full factorial with binomial response (0 or 1):
New Table( "Untitled", Add Rows( 10 ), New Column( "A", Numeric, "Nominal", Format( "Best", 12 ), Formula( Random Integer( 0, 1 ) ) ), New Column( "B", Numeric, "Nominal", Format( "Best", 12 ), Formula( Random Integer( 0, 1 ) ) ), New Column( "C", Numeric, "Nominal", Format( "Best", 12 ), Formula( Random Integer( 0, 1 ) ) ), New Column( "Success?", Numeric, "Nominal", Format( "Best", 12 ), Formula( If( :A == 0 & :B == 0 & :C == 0, Random Binomial( 1, 0.05 ), :A == 0 & :B == 0 & :C == 1, Random Binomial( 1, 0.05 ), :A == 0 & :B == 1 & :C == 0, Random Binomial( 1, 0.065 ), :A == 0 & :B == 1 & :C == 1, Random Binomial( 1, 0.065 ), :A == 1 & :B == 0 & :C == 0, Random Binomial( 1, 0.055 ), :A == 1 & :B == 0 & :C == 1, Random Binomial( 1, 0.055 ), :A == 1 & :B == 1 & :C == 0, Random Binomial( 1, 0.075 ), :A == 1 & :B == 1 & :C == 1, Random Binomial( 1, 0.075 ), Random Binomial( 1, 0.05 ) ) ) ) )
I would be thankful if you could help me to solve this equation for fractional factorial.
I am from psychology/marketing field and I am learning statistics / DoE by myself, unfortunately I am not familiar with matrix algebra, but do understand that I would be useful in such a case.
As example for full factorial design in case I take Y as a success rate my prediction equation is:
0.12125 + Match( :A, -1, -0.04875, 1, 0.04875, . ) +Match( :B, -1, -0.06375, 1, 0.06375, . ) +Match( :C, -1, -0.00375, 1, 0.00375, . ) +Match( :A, -1, Match( :B, -1, 0.02625, 1, -0.02625, . ), 1, Match( :B, -1, -0.02625, 1, 0.02625, . ), . ) + Match( :A, -1, Match( :C, -1, 0.00625, 1, -0.00625, . ), 1, Match( :C, -1, -0.00625, 1, 0.00625, . ), . ) + Match( :B, -1, Match( :C, -1, 0.00625, 1, -0.00625, . ), 1, Match( :C, -1, -0.00625, 1, 0.00625, . ), . )
If I want to simulate the same responses for fractional factorial, is the only way to solve matrix equation? Is there other statistical techniques that can be done in stat software like JMP, Minitab?March 31, 2018 at 9:33 am #202419
I’m afraid I can’t offer much help with respect to the programming. What coding I need to do is done in SAS and I’ve never had to build a program to run matrix algebra calculations. To the best of my knowledge you will have to use matrix algebra to generate solutions for simultaneous equations when the count is greater than 4.
The one thing you will have in your favor is that as long as you use factorial designs the equations for the various effects will be independent of one another so you won’t have to worry about having equations that are not independent.
One thing is puzzling me. You say you are in psychology/marketing so what is the point of trying to “reverse engineer” an experimental design? I’m in the medical field and I have built and run designs where the response was some measured output such as hospital length of stay or patient satisfaction. In those instances there wouldn’t have been any point in estimating an effect size and then determining the result of a specific experiment because there isn’t any way to insure our effect size estimate has anything to do with reality. What I have done, and this in itself is extremely difficult, is find patients within the population who individually meet the criteria for each of the experimental conditions and then run the analysis on their responses. The drawback to this approach is the issue of finding a sufficient number of people for each of the experimental conditions. The organization I work for has huge resources at it disposal but even with those resources we found it was impossible to find people who met the experimental condition criteria for more than 4 variables.
Rather than try to fill a design with people meeting specific design criteria what I usually do is take the provided patient population data and run co-linearity diagnostics on that data (VIF and condition indices) to determine which variables of interest to the clinician are sufficiently independent of one another to permit their inclusion in a multivariable model building effort.
The key point of this approach is that, within the block of data, the variables that pass the co-linearity check are sufficiently independent of one another, however, it is impossible to say what other variables, either as part of the recorded data set or as part of a larger family of unknown/lurking variables are confounded with each of the terms used in the analysis.
Consequently, we cannot claim the variables used in the analysis are independent of any other variable out there in the universe but we can/do claim (and this is a BIG improvement on what is usually done) that within the block of data used in the analysis the variables we considered were sufficiently independent of one another to allow us to make claims about their importance relative to one another with respect to the outcome of interest.March 31, 2018 at 9:39 am #202420
I found some simultaneous equation calculators, so I hope it will solve the problem.
Thank you very much for a direction!March 31, 2018 at 9:55 am #202421
There are 3 points to reverse engineer DoE for me now.
1. With a help of simulation it is easier to learn DoE
2. Simulation is useful for estimating power and sample sizes, choosing different designs
3. Currently I want to create some software that will check whether online-analytics setup works correctly. I want to simulate human behavior – Yes/No by feeding to the software a simulated trial execution sequence. So measured outcome is reaction.
You are working with very interesting projects, I am often thinking about how DoE can improve some operational process and help people in different fields. Medicine definitively is a field that extremely important for human beings.March 31, 2018 at 11:00 am #202422
You are, of course, welcome to do what you think will help you the most, but as someone who is formally trained as a statistician and has built and analyzed literally hundreds of designs of all types in my career in both industry and medicine I would submit the following:
1. Based on your posts you already know the basic concept of DOE – namely the fact that all experiments in a design are used to assess the impact of a given variable and that assessment is achieved via averaging of the results of the experiments with the respective high and low settings of the variable of interest. This is the basis of all designs. All that is left is knowing the strengths and weaknesses of the various design types and that information is available in any good book on design (recommended reference – Understanding Industrial Designed Experiments – Schmidt and Launsby)
2. There have been a couple of times when I was ordered to provide a sample size estimate for a design effort and, with the exception of a single instance where it was written in the contract, we never bothered running experiments that were anywhere near the number “required” by the sample calculations. In every case the size of the observed effects after running the complete design and a couple of single experiment replicates was either very large, in which case additional experimentation would have been a waste of time, or very small, in which case there was no point in further experimentation because any statistically significant effect we might find by refining our estimate of experimental error would have no physical/clinical value.
For all of the other designs I’ve built the main focus was on the strengths of what a design has to offer – parsimony with respect to time/money/effort. In every case the entire experimental effort consisted of the basic design augmented with a replication of one or two of the design points and in every instance we were able to provide definitive answers to the questions the DOE was built to answer.
3. In the past, when I’ve had to check the performance of some kind of external software what I’ve done is take an experimental design, assign random response values to each of the experiments in a design, choose high and low levels for each of the factors in the design and then analyzed that design to identify the significant terms. Armed with that information I fed the same factor level/responses into the software program of interest and checked it to see if it identified the same significant variables.April 2, 2018 at 2:57 pm #202431
Thank you for your reference, I found it on Amazon, I wish there were kindle version.
Currently I am using a book by Barker “Quality by Experimental Design” for learning.
Thank you for sharing your experience. I understand your point regarding sample size and I agree with importance of time/money/effort balance. I understand what you are saying about testing the performance of some software but in my case I am testing the script that is intended to generate different treatment combinations.April 2, 2018 at 8:12 pm #202432
George W. ChollarParticipant
I believe that you may want to use least squares regression to create a mathematical model for the response that is a function of the factors. Most computer tools (Excel, Minitab, R, JMP, SAS, etc.) include a regression function. For example, the following equation (developed using Minitab) will predict the response for the factorial experiment you provided in your original post:
Resp = 39.44 + 2.698 A + 0.5775 B + 4.243 C – 0.5875 AB + 1.768 AC – 0.6325 BC + 0.7725 ABC
For the factor A, B, C variables in the above, plug in -1 or +1 and you will see that it will predict the original response. You can apply regression methods to your original data to possibly develop a model.
Hope this helps.April 3, 2018 at 9:30 am #202435
My purpose was to derive responses for each row from effects and (1).
Do you mean that I can calculate responses for each row (treatment combination) from predefined effects (A, B, C, (1)) using linear regression? Unfortunately, I could not understand how.
In my case (1), A, B, C, AB, BC, AC, ABC are known.
a ab ac abc b c bc are unknown.
What I did for full-factorial:
1. Created Yates’ Order in a design table
2. Created equations where values of (1), A, B, C, AB, BC, AC, ABC where predefined and a ab ac abc b c bc were unknown.
3. Solved simultaneous equations in order to find values of responses for each rows that is a ab ac abc b c bc.
Because I took the example where responses were already known I could compare the results and they were correct.
May I use the same process for deriving responses for fractional-factorial?April 3, 2018 at 3:05 pm #202436
@gustavjung @gchollar is telling you the same thing I did but perhaps I wasn’t clear enough. So again – define the response values for each of the experiments, (1,a,b, etc.) define your low and high values for each of the factors and fill in the factorial table and then build a regression equation which contains only the significant terms (use backward elimination and stepwise to see if the regressions converge to the same model – with a design they should more often than not).
With the equation in hand, first see how well it predicts each of the experimental outcomes and then feed the design and the outcomes into your unknown program and test it on that block of data to see if the unknown program can match the values generated by your regression equation and identify the same significant termsApril 4, 2018 at 3:51 am #202438
@rbutler, thank you very much for your help once again!
I am sorry, I think I was not clear enough. I used the data in 1st post just for example.
The actual case is that I do not have response values.
My input is:
1. Fractional factorial design with 15 factors resolution IV
2. Only effects are known, 8 effects
Factor % of Successes
When all factors at low “-” level success rate is 1.2%. (1)
3. Responses for each row (a, b, c, etc) are unknown
I want to find these response values in order to use them in a script that simulates binomial responses.
As far as I understand in order to use regression I need to enter effect sizes in one coulmn? But how do I define which row corresponds to a specific factor?
Now I am reading about Yates’ Algorithm for fractional factorial so I could define these equations.
Is my direction wrong?April 5, 2018 at 6:13 am #202440
I’m still puzzled by “simulation” and the context.April 5, 2018 at 7:21 am #202441
Let’s try for a re-cap.
In your initial post you said, “I am learning DoE and want to simulate responses for fractional factorial design. As an input for simulation I want to use estimated effect sizes of main factors and some interactions. I know there is a formula for calculating effects from obtained responses:”
If you want to do this then the only way this can be accomplished is to use matrix algebra to solve an array of simultaneous equations. However, in order to do this you do not get to just use some estimates of “effect sizes of main factors and some interactions”. For this to work you will have to come up with estimates of an effect size for every single main effect and every other term that would be supported by the design in question (interactions, curvilinear terms etc.) What you will get will be predictions of the results of each of the experiments in the design.
As I see it the problem with your approach is that designs are not used in this manner – ever. As a result, I really don’t see how what you want to do will help you understand experimental designs.
The reason for running a design is to determine the effect sizes of the factors that were studied. You do this because, independent of all other factors in your system, you really don’t know what those effect sizes are.
Typically, before you run a design, you have some prior data on the effect size of SOME of the factors you want to include in your study (indeed it is the possibility that these variables might really matter that drove you to include them in the design in the first place). The problem with this preliminary data is there is no guarantee that the effect size you have measured using this data is actually providing an INDEPENDENT measure of the effect size of the factor. In addition, you and your crew will most likely have a list of some other factors which common sense, a report from the latest trade journal, your bosses insistence, a Dirty Harry – do-you-feel-lucky gut feel, etc. indicate might be worth examining and for which there is no prior data for effect size estimation.
So, you choose a design to look at these factors and you run the experiments in that design. By definition, these runs are “experiments” and therefore their results are not guaranteed. Once you have the results in hand you analyze them using methods of regression analysis. The final regression model will not only tell you the effect sizes of all of the variables/interactions/curvilinear factors of interest, independent of one another, it will also tell you if they are greater than the ordinary noise of your system.
In my estimation, if you really want to gain an understanding of designs running what-if analysis I would choose one of the smaller designs say a full factorial 3 variable design, look at the +,- settings for various factors, set response levels to match these patterns, and run the analysis to see the impact of these ordered changes in response outcome.
For example – choose a range of large responses and a range of small responses and assign the large and small values to match the pattern of one of the interactions and then run the analysis to see not only if the interaction is significant but if the main effect terms associated with the interaction are also significant. If you do this you can also include a couple of replicate points and vary the replicate error and see how this impacts your “initial” findings with respect to term significance and effect size.
As for the case where you want to test some other system to see if it is in agreement with respect to effect size – just assign values for the outcomes for each of the experiments in your design, run the analysis, identify the effect sizes using regression methods and then feed the outcome values to the program of interest and see if what it gets matches what you got using regression.April 5, 2018 at 11:46 am #202442
@rbutler , thank you very much for clarification and suggestion on learning DoE.
I agree with the approach regarding software that you suggested, so I will go that way.
Also by the way I found that it is possible to simulate responses using JMP’s Simulate Responses feature and it works well.
The only difficulty that I encountered is how to translate linear model that I keep in the mind to the binomial model (1/(1 + exp(-L(x, β))).April 5, 2018 at 12:27 pm #202444
For a binomial outcome the regression method you would use is logistic regression. The final model is a log odds model of the form:
log (odds Y) = b0 +b1*x1 +b2*x2+…etc. which means
odds Y = exp(b0 +b1*x1 +b2*x2+…etc.)
Depending on the program you are using the output will be either in the form of the beta’s, in which case you will need to exponentiate them to get the odds ratio associated with the X’s or the exponentiation will have been done for you and the output will be in the form of odds ratio point estimates for each of the X’s.
So, if the interest is in the odds of something happening where something happening is coded as a “1” and not happening is coded as a “0” then the output will be a table listing each of the X’s and there will be an associated point estimate of the odds ratio along with a Wald confidence interval for that point estimate.
For example, lets take one of your percentages and say we find the coefficient for X1 is .1906. The point estimate will be 1.21 and, assuming it is statistically significant, the way you would interpret the results would be to say that for every unit increase in X1 the odds of Y occurring increases by 21%. If X is binomial and you are using the “0” setting as the reference then you would say that when X1 = 1 the odds of Y occurring are 1.21 times that for the situation where X1 = 0.
You must be logged in to reply to this topic.