DOE Quandry

Six Sigma – iSixSigma Forums Old Forums General DOE Quandry

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
  • #41907


    I’ve just completed a 1/2 fraction DOE with 4 factors and 3 replicates.  I have an interaction that’s significant and 2 main effects that are close with p-values of .055 and .088.  My concern is that the R^2 value is 48.18% and R^2 adjusted 25.45%.
    Am I to interpret this as I’ve found some significant factors but it’s not the only culprit in my search?  Anyone had this before or have any thoughts on how to proceed?


    Robert Butler

      In answer to your question – has anyone had this problem before – the answer is yes – many times.
     Your post gives the impression that all you have done is run a regression and taken a look at the summary statistics.  If this is the case then you need to think about doing a regression analysis.
    This means – plot your data and look at the results.
    Run plots of residuals vs. predicted – what kinds of patterns do you see – trends – linear or curvilinear, fan shaped, shotgun blast?
    Run plots of residuals against your X values – again – what kinds of patterns.
    Run plots of responses by experimental number – what do you see – do all of the experiments exhibit roughly the same amount of variability or are their big differences from experiment-to-experiment?
    Run plots of responses by experimental number with the experiments ordered in their run order – what do you see – trending, cycling, shotgun blast, etc.
      If there is any kind of trending it would suggest you have one or more lurking variables – which kind will depend on which plots exhibited trending. If the responses by experimental number show uniformly excessive variation but you do not see any trending it would suggest that you have a process that is very noisy – you may want to look into reasons for this.  If the plots of responses by experimental number show huge differences in variation from experiment-to-experiment you should give some thought as to why this is so.


    Mr IAM

    Robert B provides some great advice as usual… I would give his ideas a shot.
    My interpretation of your post is this – with an Adj R of 25% you have identified some significant effects – either main or interaction.  If an interaction is significant – make sure you pay attention to that before the main effects.
    There are a number of reasons why your Adj R could be low.  1) you are missing some important factors 2) you did not test the factors you used at levels that where extreme enough 3) you have a lot of measurement error
    If I were you, I would look closely at the interaction plot, and maybe run a full factorial with the factors that were significant on the interaction.  Maybe even in expand the factor levels a little.. depends on the situation.  Which direction do the interaction plots tell you to go?



    Good remarks although I just wanted to add that the meaning of the R-square and R-square adjusted values are just an indication of how well your regression equation comming out of the DOE points or in other words your regression line (read this well) is fitting the datapoints. If you want to do predictions with this ‘model’ than your R-Square value should be as high as possible. If not I personally don’t realy look at this value.
    A low value could perhaps show that you have taken too much variables in the equation. Try squeezing out some less important factors, you’ll see that R-square and R-square adjusted values will go up.
    If taking out the less important factors doesn’t give you a higher value than I do agree with the remark that you should look for other significant factors. In other words your pre-analysis phase for the DOE was not carried out correctly. This mostly tends (in my opinion) to Ishikawa’s that aren’t processed deep enough (you did ask why, why, why enough….).
    Always open for remarks on this one …


    Fred Newruck

    I always thought that R-square always increased (or at least never DECREASED) as you added more variables into the regression equation.  The Adjusted R-square could go down if the variables added only marginally contributed the increase in R-square.  How can removing variables cause R-square to increase? 



    Thanks for your remark;
    As I stated that R-Squared value was an indication of how well your regression line was fitting the datapoints. So adding variables that do not belong to the equation will decrease your R-square value if the datapoints added aren’t really fitting the regression line.
    All depends what is your starting point for saying the R-square never decreases. If you start with all the variables than it never does. If you start with the wrong ones than it can only increase. If you start with a mix it can go up or down.
    Other point, already stated if I am not mistaken: Who says your equation of your line going through your datapoints is a straight line? Perhaps non-lineair equations have better fitting to the datapoints.
    Nevertheless good remark. Will do some research and see if I can find something to show my point. I am not an native English writer so sometimes I do confuse terms when I read things in English.I am sorry if this should be the case here.


    Robert Butler

    R2 will increase as you add variables regardless of whether or not those variables are significant.  If you are running stepwise regression uisng backward elimination the R2 will only decrease.  The only time R2 will increase and then decrease and perhaps increase again is if you are doing stepwise regression with replacement.
    To test this try the following – take a full factorial design in 3 variables and, for the response just number the experiments 1-8
     A    B   C  resp
    -1  -1  -1  1
     1   -1 -1   2
    -1   1   -1  3
     1   1    -1  4
    -1  -1   1   5
     1  -1    1   6
    -1   1    1   7
     1   1    1   8
    Run the full design
    The final model will be
    resp = 4.5 +.5*A +1*B +2*C +0*AB +0*AC +0*BC +0*ABC;
    the R2 = 1 and the model predicts the points perfectly.
    Now, to permit some testing throw in a replicate -take the last point and for the replicate give it a value of 9.  The model will no longer exactly fit the data because of the fact that the replicate does not have the same value as the first run (8).
    Now, with all terms in the model you have
    resp = 4.5625 +.5625*A +1.0625*B +2.0625*C +.0625*AB +.0625*AC +.0625*BC +.0625*ABC
    and the R2 is .9916
    The significance of the individual terms in this model are as follows:
    Term  P Value
    A         .26
    B         .14
    C         .07
    AB      .84
    AC      .84
    BC      .84
    ABC .839
    Thus, in the full model R2 = .9916 and none of the terms are significant at a P < .05
    Now run stepwise backward elimination on the model (no replacement)  The R2 for the successive models are
    .9916 (full model)
    .9911 (ABC removed)
    .9905 (AB removed)
    .9897  (AC removed)
    .9889  (BC removed)
    and the final model is
    resp = 4.5833 +.5833*A +1.0833*B +2.083*C with all terms significant at P < .05



    I read all your suggestions and advise you to check;
    1.Your gage R&R result. If we do not measure enough how are we going to the design
    2.You couldn’t find may be the effective factors.You can determine it by checking residuals.If so you’re going to see a pattern as somebody mentioned before.
    3. I also advise you to check the ms error to see the standart variation.
    4.Did you put center point at your design? Then you’ll be able to see if there is curvature at your design.If so, pass through box benken.
    5.Check the constraints that you put to your design.



    With and adjusted r squared of only 25% you have nothing but an interesting set of data points.
    Your factors selected do not represent a significnat portion ofthe variation and as such of no value.


    Robert Butler

      Ron,  as written, your assessment of Dave’s situation appears to be based on nothing except his adjusted R2.  If that is true then I would have to strongly disagree with your conclusion. 
      As I pointed out in my first post to this subject, the value could be due to any number of things and Dave needs to check out all of them before he can come to the conclusion that his factors are of little or no interest.



    Ron:We identified some key drivers of credit quality that were very minor, yet significant. The benefit to the client was in tens of millions. Some models have a lot of noise, some don’t.Manufacturing, engineering and scientific applications usually have extremely high r-squared; I was spoiled for years by having neatly defined processes. The really interesting problems I get now are the ones where we wade through an enormous amount of information and enjoy the benefits of extremely small changes (I hate to use the term ‘data mining’ because the techniques are usually used so blindly).We also, like RB implies, test whether the increase in R-squared is simple the result of adding more degrees of freedom. We used Hamilton’s r-factor ratio test in the past. There are some others tests; some are built into iterative procedures for defining minimal regression models.Cheers, BTDT



    Fred and Ron got it right and I can’t believe no one mentioned the controversial epsilon-squared test.  Check the SS of the factors/SS total.  This tells you how much of the total variance was consumed by the factors on a percentage basis.  If a lot is left over or there is a bunch in the error term then you haven’t found much. 
    And remember your 2fi term that you think is significant?  It is aliased if you ran a half fraction.  What was the resolution of the test?

Viewing 12 posts - 1 through 12 (of 12 total)

The forum ‘General’ is closed to new topics and replies.