iSixSigma

Variance Inflation Factor

Six Sigma – iSixSigma Forums Old Forums General Variance Inflation Factor

This topic contains 7 replies, has 3 voices, and was last updated by  lin 12 years, 4 months ago.

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #47290

    lin
    Participant

    Hello,
    I have a question on experimental design.  I can understand checking to see if what the VIF values is for the factors (e.g, A, B, C).  But why do you have a VIF number of interactions (like AB) in the output from DOE software? 
    Thanks

    0
    #157570

    Robert Butler
    Participant

     If you are entertaining a model that includes A,B,C and AB and if the VIF for AB is too high your data is telling you that you can’t use AB in the model along with A,B, and C because it is confounded with one or more of those variables

    0
    #157575

    lin
    Participant

    Thanks for the reply Robert.  Then I don’t understand how VIF is calculated.  From an earlier post of yours (I did try to find the answer), you said:
    “When you are looking at the R2 in the VIF you are looking at the relationship between the X’s. In the VIF case R2 is the square of the multiple correlation coefficient from the regression of the ith X on all of the other X’s in the equation.  An R2 of 1, in this case, would indicate a perfect linear relationship between two X’s (perfect confounding).”
    Supose I have two factors, A and B.  They are independent.  But AB is the product of the two.  Is it not impossible for VIF for AB to be 1 when it is determined by the product of A and B.  Is not the VIF for AB the R2 value obtained by treating AB as if it was Y and regressing with A and B? 
    Than ks

    0
    #157577

    Craig
    Participant

    The VIF is an indicator of whether the regressors are orthogonal. (no linear dependence). When you regress AB on the remaining terms, hopefully the resultant Rsquared will be low, and   1 / [1-Rsquared] will be close to 1 for an ideal VIF. Remember that we are only looking at regression of the X’s here.
    In geeky terms, we are looking at the diagonal elements of the (X’X) inverse matrix when we evaluate multicollinearity.
    In non geeky terms, when you have a  term in your model that exhibits a high VIF, your prediction equation will be somewhat poor when you leave this term in the model.
    Mulitcolinearity with DOEs is somewhat confusing to me because you are controlling the X’s and in many cases you are using an orthongonal design. (VIFs = 1 ) . With regression while using “happenstance” data, it is easy to see how some X’s could  depend on others. I suppose it boils down to the underlined statement above.
    HACL
     
     

    0
    #157579

    Robert Butler
    Participant

    The answer is it depends on how “independent” A and B are.  If they are completely independent (orthogonal) then AB will be completely independent of either A or B.  To check this try the following:
    Response   A     B    AB
          1        -1     -1     1
          2         1     -1    -1
          3       -1       1   -1
          4         1       1     1
    (the response is just a simple numbering – all you want to use it for is to give yourself a Y value since some programs require a Y as part of the input needed to generate a VIF).   The VIF’ for A,B, and AB are 1.
    Now,  change the value of A in the second experiment to -1.  The new matrix is:
    Response   A     B    AB
          1        -1     -1     1
          2        -1     -1     1
          3       -1       1    -1
          4         1       1     1
    and the VIF’s for A and B are 1.8 and the VIF for AB won’t even compute and the computer spits out a message that says “Singularities or near singularities caused grossly large variance calculations.” In other words – while A and B are “independent enough” with respect to each other AB is not and if you forced AB into a regression model it would not be independent of A and B

    0
    #157583

    lin
    Participant

    Thanks Robert.   Two more questions: It appears to be the case as long as you look at the coded factors (-1 high, 1 low).  If I use actual values, I get strong correlations.  For example:

    A
    AB
    B

    0.8
    160
    200

    0.8
    160
    200

    1.2
    240
    200

    0.8
    100
    125

    0.8
    160
    200

    1.2
    240
    200

    1.2
    150
    125

    0.8
    160
    200

    0.8
    100
    125

    1.2
    150
    125

    1.2
    240
    200

    1.2
    150
    125

    1.2
    150
    125

    0.8
    100
    125

    0.8
    100
    125

    1.2
    240
    200
    The R squared for AB in this case is about 0.96 which leads to a high VIF.    So, you have to stay with the coded levels?
    Second question: if you just do the classical two level designs (full or fractional), do you really need to even check the VIF?  Won’t it always be 1?
    Thanks a lot.

    0
    #157585

    Robert Butler
    Participant

    Yes, you need to run the test on the coded values.  You also need to run your regression analysis using the coded values as well.
      There is a caution here which must be carefully observed – you have to generate the code so that it actually corresponds to what you did – you can’t just blindly assign a -1 to whatever was supposed to be your low level and a +1 to whatever value you got for you high level.  For example lets take a couple of lines from your list:

    0.8

    200

    1.2

    200

    0.8

    125
    which, when coded, would be
    -1  1
     1  1
    -1 -1
    Let’s say .8, 1.2, 200, and 125 were the highs and lows you planned to use.  During the course of running your experiment the actual settings were

    0.9

    200

    1.2

    180

    0.8

    125
    The actual code (assuming the actual highs and lows for the run were as before) would be
    -.5   1
     1    .46
    -1   -1
    and you would run you VIF and other X matrix checks such as the condition indices on the coded matrix that corresponded to what you actually did.
      If deviations from plan (such as the above) were severe enough or if one or more of the horses died (experiments failed to run, couldn’t be run etc.) then a check of the VIFs and the condition indices will tell you if your design matrix can still deliver what you thought it could. 
      Since it is rare that you will always hit the exact value you planned for any given level of any given variable I alway recommend checking the design matrix whether it is a formal design (classical, taguchi, whatever) or not.
    By the way to center and scale to a -1 to 1 range use the following
    scaled value = (actual value – max1)/min1
    were max1 = (min actual value + max actual value)/2
    and min1 = (max actual value – min actual value)/2
    So for your A above this would be
    scaled A = (actual A – 1)/.2
    The only exception to this kind of scaling is mixture designs – for those the scaling needs to be from 0 to 1.

    0
    #157587

    lin
    Participant

    Thanks so much for sharing your knowledge. 

    0
Viewing 8 posts - 1 through 8 (of 8 total)

The forum ‘General’ is closed to new topics and replies.