How MiniTab deals with missing data

Six Sigma – iSixSigma Forums Old Forums General How MiniTab deals with missing data

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #45971


    My Masters Thesis includes a 32 run, 2 level, 6 factor DoE in order
    to examine the effects of the plung process on Friction-Stir Welds.
    One of the pieces of information that we’re really looking at are the
    temperature profiles within the tool and on the weld plate. 2 of the
    7 channels have a complete list of results for all of the runs. The
    rest do not. A couple of the channels are missing up to 4 results.
    The missing data is totally at random. I plugged all the values I
    know into MiniTab and found it can still analyze the DoE and give
    me some answers. Does anyone know how it does this? Can you
    point me to some sources so I can read up about this? I’m hoping I
    can figure out if MiniTab’s method it legit for my purposes and also
    be able to explain what’s going on in my defense so I look smart!Thanks,


    Eric Maass

    Hello Matt,
    Well, first off – I’d recommend that you check out whether your local library or your university library has copies of any of a series of books by George A. Milliken and Dallas E. Johnson that start with the title, “Analysis of Messy Data…”
    Since your design is probably a 2^(6-1) for 32 runs, it probably has a resolution of V or VI if there were no missing data.  If each cell had replicates, and no set of replicates were lost, you probably have no real loss of information after all. Even if some data is lost, it is possible that you might be okay…
    If you entered the design and entered Asterisks for the missing data, then Minitab will analyze it using the Multiple Linear Regression tool that Minitab uses to analyse DOE.DOE – full and fractional factorial – is “orthogonal” to allow you to separate the effects of each factor.  I would suggest that you check whether any of your main effects are correlated…you can do this by simply doing Stat/Basic Statistics/Correlation and selecting all of the main effect factors.  You might have one or more main effects highly correlated with others – so that those main effects are confounded.You could then generate all of the two factor interactions using the calculator in Minitab, and then check the correlations between main effects and 2-way interactions and between 2-way interactions and other 2-way interactions.
    Again, please check out one of the books on Analysis of Messy Data. It will give you more ideas, and also help you prepare for your defense of a thesis involving analysis of some “messy data”.
    Good luck!


    Idiotin math.

    Congratulation Eric  for  your  excellent  comprehensive  reply.A  real  expert  answer ,not  like  those few pretenders  who  just introduce  comments  without  any  values.


    Eric Maass

    Why, thank you, “idiotin math”!  It’s great to hear nice, generous comments and feedback like this!Incidentally – I hope that you are just joking with the “idiotin math” name…I know several people who convinced themselves that there were not good at math – but, after spending a little time with a much better instructor or a good tutor, they discovered that they were much better at math then they ever thought they could be. Math can be made easy to understand, and fun…or it can be made to seem daunting and downright impossible through the convoluted efforts of a poor instructor or an instructor whose real motives are to impress and overwhelm rather than to teach.
    Best of luck! And – thanks again!


    Robert Butler

      A couple of questions: 
     You said that “2 of the 7 channels have a complete list of results for all of the runs.”  Does this mean that you ran a full half rep of a 2^6 for each channel? 
      If so, are you running a separate analysis for each channel or (assuming a full half rep for each channel) are you trying to combine all of the data and include channel as another variable?
      Eric has given you good advice.  The Messy Data books are very good and based on what you have said I’d recommend starting with Volume I which focuses on data from designed experiments.
      When data goes missing in a design usually what goes missing along with it is a loss of information about one of the interactions.  It’s been my experience that you have to lose far more than 4 design points before you will lose information on a main effect. 
      If you want to get an empirical sense of how much of a loss is too much take some time and set up a design matrix for a half rep of a 2^6 and make your Y value equal the count (i.e. results for experiment #1 = 1, experiment #2 = 2, etc). Then just randomly delete 4, 5, 6, 7 etc. experiments and run VIF’s (which I understand Minitab can do) as well as simple plots and correlations of your various main effects and see how many experiments you have to lose before you impact your main effects.
      Even better than this would be to rummage around your university and find someone who has SAS and whose package includes proc reg.  Under this proc there are two commands – vif and collin which will really allow you to look at the X matirx and assess the impact of losing experiments.  
      The collin command allows you to run an eigenvector analysis on the X matrix and, in addition to looking at the eigenvector matrix it also generates condition indices.  These two allow you to assess the issues of correlations of many against many as opposed to one-to-one. 
      If you can find someone with SAS the commands you would need to run would be the following:
    First the data.
     data mmoore1;
     input v1 v2 v3 v4 v5 v6;
     Yval = _n_;
     v12 = v1*v2;
     v13 = v1*v3;   – these are the expressions for all of your 2 way interactions.
    -1 -1 -1 -1 -1 -1
      ….etc ( your design matrix)
     proc reg data = mmoore1;
     model yval = v1 v2 v3 v4 v5 v6 v12 v13 …../vif collin;
      First run this with all data present so you can see what the VIF’s and the eigenvector matrix look like and then start dropping design points from the matrix and watch what happens to your diagnostics.



    You should give our technical support department a call.  They will be able to give you some answers.  The number is 814-231-2682 in the US.  If you are outside of the US, check for the number to call.

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General’ is closed to new topics and replies.